draft OETF #2: Universal Interchange Format

Solra Bizna · January 4, 2017

THIS IS A DRAFT. It may change before becoming "official". Please feel free to suggest breaking changes.
Abstract

This document provides a binary interchange format, intended primarily to support generic component IO.

Rationale

OpenComputers' component bus is designed for high-level languages. It sends and receives groups of dynamically typed values. It is intended to be user-friendly and self-discoverable, and it has largely achieved this goal. However, with low-level architectures, there is no obvious, straightforward way to represent these values. This document aims to provide a standard representation, freeing individual architects from having to devise their own representations, and minimizing unnecessary differences between architectures.

Every value that can be sent over an OpenComputers bus can be represented as described in this document, and (barring length restrictions) vice versa.

Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

All signed integers are two's-complement.

Concepts

Tag: Gives the type of a subsequent Value.
Value: Data whose structure and meaning depend on its type.
Tagged Value: A Tag, followed by a Value of the indicated type.
Producer: A program or process that generates data in this format.
Consumer: A program or process that consumes data in this format.
Packed mode: A representation designed to occupy very little space.
Unpacked mode: A representation designed to be easy to manipulate on 32-bit architectures.

Tags

A type tag denotes the type of a subsequent Value. In Packed mode, the tag is a 16-bit signed integer. In Unpacked mode, it is a 32-bit signed integer, aligned to a 4-byte boundary.

Values

String (UIFTAG_STRING = 0x0000–0x3FFF = 0–16383)

A UTF-8 code sequence. The tag provides the length, in bytes, of the sequence. Producers MUST NOT generate invalid code sequences, including "modified UTF-8" conventions such as non-zero NUL and UTF-8 encoded surrogate pairs. Producers MUST NOT arbitrarily prefix strings with a spurious U+FEFF BYTE ORDER MARK. Consumer handling of invalid code sequences is undefined.

If a Consumer encounters a String where a Byte Array is expected, the Consumer MAY incur a round-trip conversion to its native string type. This may mean that the bytes the Consumer actually sees differ from the original bytes where invalid code sequences occur.

Consumers MUST handle NUL bytes in a String in an appropriate manner. Consumers MUST not assume that Strings are NUL-terminated—they are not.

In Unpacked mode, additional zero bytes MUST be added to the end of the String, so that a subsequent Tag will be aligned to a 4-byte boundary.

Byte Array (UIFTAG_BYTE_ARRAY = 0x4000-0x7FFF = 16384-32767)

An arbitrary sequence of bytes. The tag, minus 16384, provides the length in bytes of the sequence.

If a Consumer encounters a Byte Array where a String is expected, the Consumer MUST interpret the Byte Array as if it were a String of the given length.

In Unpacked mode, additional zero bytes MUST be added to the end of the Byte Array, so that a subsequent Tag will be aligned to a 4-byte boundary.

End (UIFTAG_END = 0x...FFFF = -1)

A special tag signifying the end of an Array or Compound.

Null (UIFTAG_NULL = 0x...FFFE = -2)

Absence of a value. Equivalent to null and nil in various programming languages.

(Note: there is no tag -3.)

Double (UIFTAG_DOUBLE = 0x...FFFC = -4)

A 64-bit IEEE 754 floating point value. Consumers that encounter a Double where an Integer is expected MAY fail. Producers that are producing a Double which has an exact Integer representation SHOULD produce that Integer instead.

Integer (UIFTAG_INTEGER = 0x...FFFB = -5)

A 32-bit signed integer. Consumers that encounter an Integer where a Double is expected MUST convert the Integer to a Double.

Array (UIFTAG_ARRAY = 0x...FFFA = -6)

A series of Tagged Values, in a particular order, terminated by an End.

Compound (UIFTAG_COMPOUND = 0x...FFF9 = -7)

A series of pairs of Tagged Values. The order of the pairs is not significant. Each pair consists of a Key and a Value, in that order. A Key may be any type except a Byte Array, a Null, an Array, or a Compound. A Value may be any type. The list is terminated by an End. If a Consumer encounters an End as the second element of a pair, the result is undefined.

UUID (UIFTAG_UUID = 0x...FFF8 = -8)

A 128-bit RFC 4122 UUID. Regardless of endianness, the bytes are in display order. Consumers that encounter a UUID where a String is expected MUST convert the UUID to its canonical string representation, in lowercase. Producers and Consumers alike should take note that a random sequence of bytes is not necessarily a valid UUID.

True (UIFTAG_TRUE = 0x...FFF7 = -9)

A boolean true value.

False (UIFTAG_FALSE = 0x...FFF6 = -10)

A boolean false value.

TODO

This document is incomplete. Still to be written: recommendations on endianness and packing, useful common optimizations.

tim4242 · January 8, 2017

Should the String and Byte Array types be treated as atomic data types or as composites like Compounds and Arrays?

Solra Bizna · January 14, 2017

On 1/8/2017 at 8:22 AM, tim4242 said:

Should the String and Byte Array types be treated as atomic data types or as composites like Compounds and Arrays?

They should be atomic, because they do not contain other Values. (If I have understood the question correctly.)

Sign In

draft OETF #2: Universal Interchange Format

Recommended Posts

Solra Bizna 10

Link to post

Share on other sites

tim4242 1

Link to post

Share on other sites

Solra Bizna 10

Link to post

Share on other sites

Join the conversation

Browse

Activity

Important Information