draft OETF #1: Cross-Architecture Booting (draft)

Solra Bizna · December 2, 2016

THIS IS A DRAFT. It may change before becoming "official". Please feel free to suggest breaking changes.

Abstract

This document provides guidelines for dealing with EEPROMs and locating architecture-specific boot code.

Rationale

OpenComputers supports a wide variety of architectures. Even more so than in the real world, OpenComputers architectures can differ dramatically from one another. Some architectures run programs in a particular high-level language directly, while others simulate real or fictitious low-level ISAs. Some architectures natively deal with data in 8-bit units, while others have built-in advanced string handling and vector processing capabilities.

In contrast with this variety, OpenComputers components have a standard interface. An EEPROM containing boot code can easily end up being used on the "wrong" architecture, to say nothing of boot disks.

This standard aims to solve that problem, by providing architecture-aware guidelines for dealing with EEPROMs, and procedures for locating boot code on filesystem-based and sector-based boot media. It aims to be simple to implement on the widest possible variety of architectures.

Conventions

Unless otherwise specified, all references to text imply 7-bit ASCII codes. Behavior on encountering bytes with the high-bit set is undefined.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Architecture Identifier

Each architecture must provide a unique, preferably meaningful identifier which is specific to that architecture. This is the Architecture Identifier, or AID. The AID SHOULD be the same as the Architecture.Name annotation for the architecture, which is usually the same as the name used in the tooltip of the CPU.

An AID MUST contain no bytes other than the following characters: digits, upper and lowercase letters, periods (.), dashes (-), underscores (_), slashes (/), spaces. In addition, an AID MUST NOT begin with a space, end with a space, or contain a run of two or more spaces. An AID SHOULD begin with a capital letter, if for no other reason than to make boot code easier to tell apart from other files and directories in listings.

A new AID SHOULD NOT contain spaces, unless it is required for compatibility.

An architecture with multiple variants that are mutually incompatible SHOULD use different AIDs for each variant. Fallback schemes, such as one where multiple AIDs are tried, are architecture-specific and outside the scope of this document.

CAB-aware EEPROM images ("CABE images")

This section applies to architectures, EEPROM flashing utilities, and any hardcoded boot code (analogous to machine.lua in the "standard" Lua architecture).

A CAB-aware EEPROM image (a "CABE image") MUST begin with "--[", followed by zero or more "=", followed by "[CABE:". This is the prefix string. A CAB-compliant architecture MUST be prepared to deal with any number of "=" between 0 and 7, and a CABE image SHOULD NOT use more than 7.

The prefix string is followed by a single AID. This denotes the "intended architecture" for this CAB image. This is followed by one of two things:

A colon (:), in which case the "main body" consists of the bytes following the colon, up until a valid suffix string is encountered.
A suffix string, in which case the "main body" consists of the entire remainder of the image, and MUST be valid Lua 5.2 AND Lua 5.3 code.

A "suffix string" consists of an ASCII "]", followed by the same number of "=" as were in the prefix string, followed by another "]". In order to be a valid suffix string, it MUST contain precisely the same number of "=" as the prefix string contains. In particular, an otherwise valid suffix string that contains more "=" than the prefix string MUST be treated exactly the same as any other non-suffix-string byte sequence.

The "main body" contains the actual EEPROM data for that architecture. The interpretation of this data is architecture-defined.

If a CABE image contains data after a valid suffix string, that data MUST be Lua code which is cross-compatible between the Lua 5.2 and 5.3 architectures provided by OpenComputers. If this is not the intended architecture of the CABE image, this code SHOULD consist solely of an informative error call.

Any deviation whatsoever from this standard results in a non-CABE image. All handling of non-CABE images is architecture-specific, but a non-CABE image SHOULD be treated however the "main body" of a valid CABE image would be treated.

The preferred file extension for CABE images is ".cabe". The preferred MIME type is "application/x-cabe-image", though "application/octet-stream" is acceptable. CABE images MUST NOT be distributed using any "text/*" MIME type, as doing so will almost certainly corrupt binary CABE images.

Here is an example CABE image targeting a fictional architecture:

--[[CABE:HyperTalk:
ask "What is your name?"
answer "Hello," && it & "!"
]]
error"HyperTalk architecture required"

And here is one targeting a built-in Lua architecture:

--[[CABE:Lua 5.2]]
for n=1,5 do
  computer.beep(2000, 0.1)
end

EEPROM-based transparent architecture switching

A future OpenComputers release may add support for transparent architecture switching, through additional NBT data for EEPROMs. It is expected that this will consist of a single AID identifying the architecture the EEPROM is designed for. This section applies in the event of such support becoming available. To provide consistency to users, architectures SHOULD NOT attempt to implement any form of automatic architecture switching themselves.

An EEPROM flashing utility SHOULD attempt to parse all images as CABE images. If successful, it SHOULD tag the EEPROM appropriately and burn only the "main body". Otherwise, it SHOULD remove any existing architecture tag from the EEPROM and burn the entire image.

An architecture booting an EEPROM with a valid architecture tag SHOULD NOT also attempt to parse it as a CABE image. An architecture booting an EEPROM with no valid architecture tag SHOULD attempt to parse it as a CABE image, and MUST NOT affix an architecture tag itself.

Boot code

This section applies to EEPROMs intended as first-stage bootloaders, as well as programs intended to be booted by such EEPROMs.

Bootloaders SHOULD, if the boot EEPROM contains a boot device UUID, attempt to boot from that device first. A boot EEPROM contains a boot device UUID if eeprom.getData() consists entirely of an ASCII UUID, or if it begins with an ASCII UUID followed by a null byte.

Managed mode (filesystems)

Bootloader behavior on managed filesystems:

If "/<AID>" exists and is a directory, boot "/<AID>/boot".
If "/<AID>" exists and is a file, boot "/<AID>".
Any further cases are architecture-specific.

If one of the above conditions are met, but its booting attempt fails, the booting process MUST NOT continue automatically. For instance, if "/<AID>" is a directory but booting "/<AID>/boot" fails, the bootloader MUST either fail with an error, or prompt a user for further action.

Exactly what "booting" entails is architecture-specific. On a Lua architecture, it consists of loading a file as Lua code and then executing it. On low-level architectures, it might consist of loading a file's contents to a fixed RAM address and jumping into it. Architectures SHOULD provide a standard way for the first-stage bootloader to tell the booted code the UUID of the filesystem it was loaded from.

Example: Consider a boot on the OC-ARM architecture. The bootloader checks if "/OC-ARM" exists. It does exist, and is a directory. The bootloader then attempts to boot "/OC-ARM/boot". It fails, because "/OC-ARM/boot" is not valid. It crashes the machine with an error message explaining the problem.

Unmanaged mode (drives)

A CAB-compliant bootable disk begins with a boot sector. This boot sector MUST be the first or second sector of the drive. If both the first and second sectors contain a valid boot sector, only the first one will be used. A boot sector begins with the ASCII string "CAB", followed by zero or more text boot records. This list of text records is terminated with an exclamation mark. If this exclamation mark is followed by the particular byte sequence {0x00, 0x1A, 0xCA, 0xBD} (null byte, CP/M end-of-file marker, two-byte magic number), then it is followed by zero or more binary boot records, terminated by a null byte.

Boot records MUST NOT extend past the end of the boot sector. Architectures MAY specify that boot records for that architecture must be text or must be binary, and MAY specify that binary boot records must be a particular endianness and/or must be sector-aligned. Bootloaders for architectures that do not specify that boot records must be text or must be binary MUST support both.

A text record matches ":<AID>=<offset>+<length>". <AID> is the AID for which the code is intended. <offset> is a decimal number, giving the byte offset at which to begin reading, OR "s" followed by a decimal number, giving the sector number at which to begin reading. <length> is a decimal number giving the number of bytes to read.

A binary record is described by the following C99 structure:

struct {
  uint8_t record_length;
  uint8_t flags;
  uint16_t load_start;
  uint32_t load_length;
  char aid[];
};

record_length is the number that must be added to the offset of this record to skip it. It MUST be equal to 8 + AID length + 1.

flags is a bitfield. The following flags are defined:

0x40: If set, load_start is a sector number. If clear, load_start is a byte offset.
0x80: If set, load_start and load_length are little-endian. If clear, load_start and load_length are in network byte order (big-endian).

load_start is either a sector number or a byte offset to begin the loading process at. load_length is the number of bytes to read. aid is the AID of the intended architecture, and MUST be null-terminated.

As with managed mode, exactly what is done with the loaded data is architecture-specific. Bootloaders that only support binary records should consider a sector to be a valid boot sector if it begins with "CAB", and locate the end of the text boot records without parsing them by searching for the first "!". Bootloaders that only support text records need not consider any bytes past the first null byte. A boot sector that contains no records is valid, and MUST prevent any attempt to read possible subsequent boot sectors.

Example 1:

CAB:Lua 5.2=s3+17:Lua 5.3=s3+17:HyperTalk=384+5100!
(followed directly by binary data:)
001A CABD (valid binary records follow)
0F (the length of the first record)
C0 (little-endian, sector offset)
0900 (start at sector number 9)
0000 0100 (load 65536 bytes)
5342 3635 3032 00 (null-terminated string "SB6502")
00 (no more binary records)

This drive contains valid boot code for Lua 5.2, Lua 5.3, HyperTalk, and SB6502. Lua 5.2 and 5.3 both use the same boot code, which is 17 bytes long and starts at the beginning of sector number 3. The HyperTalk boot code is 5100 bytes long and starts 384 bytes into the disk, which, when using 256-byte sectors, is halfway through the second sector. The SB6502 boot code is 65536 bytes long and starts at the beginning of sector number 9.

Example 2:

CAB!

This is a valid boot sector, but contains no boot records. This is the safest way to mark a drive as non-bootable.

Edited March 19, 2017 by Solra Bizna
Remove padding from binary boot records.

Solra Bizna · March 19, 2017

I edited it to remove the 4-byte padding from the binary boot records. My reasoning is that, on IO port heavy 8-bit architectures (such as OCMOS), tracking the current position and skipping a specified number of bytes is clumsier than just reading until the NUL; whereas on alignment-sensitive architectures, it's not terribly difficult to do an unaligned read "by hand".

Wattana · May 28, 2022

Hey! I would like to chime in since I am developing a custom OS on an architecture that uses CAB standard. Since some BIOSes, weirdly enough, do not support offset load start and only offer sector load start, wouldn't it be better to put everything into the first sector similar to FAT or MBR?

bioscreeper · October 26, 2023

Can OpenComputers really run multiple architectures natively?

Sign In

draft OETF #1: Cross-Architecture Booting (draft)

Recommended Posts

Solra Bizna 10

Link to post

Share on other sites

Solra Bizna 10

Link to post

Share on other sites

Wattana 0

Link to post

Share on other sites

bioscreeper 0

Link to post

Share on other sites

Join the conversation

Browse

Activity

Important Information