Jump to content
  • Sky
  • Blueberry
  • Slate
  • Blackcurrant
  • Watermelon
  • Strawberry
  • Orange
  • Banana
  • Apple
  • Emerald
  • Chocolate
  • Charcoal
Adorable-Catgirl

OETF #15 - Universal Archive Format (URF)

Recommended Posts

this is just a draft, feel free to suggest whatever.

URF v1.1

Abstract

This document describes the URF, intended to make mass file exchange easier.

Rationale

There are many competing methods of exchanging large amounts of files, and many are incomplete, such as TAR implementations, or proprietary, such as NeoPAKv1. With this in mind, a standard format will make file exchange less error prone.

Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

All strings are encoded with UTF-8, prepended with an ALI corresponding to the length of the string in bytes.

Character specifications such as NULL and DC1 are part of US ASCII, unless otherwise specified.

Concepts

  • Signature: Data at the beginning of the File, marking the File as a URF format archive, and specifying the version.
  • File Table: Data structure containing all file information.
  • Entry: Any data sub-structure in the File Table.
  • Entry Specifier byte: A byte describing how to decode the data contained in an Entry.
  • Object: A filesystem structure, i.e.. "file" or "directory".
  • Attributes: Data describing the size, offset, parent Object, and Object ID of an Object.
  • Extended Attributes: Data describing non-core attributes such as Permissions, Owner, Security, etc.
  • Producer: A program or process that generates data in this format.
  • Consumer: A program or process that consumes data in this format.

Arbitrary length integers

Arbitrary length integers (ALIs) MAY be over 64-bits in precision. ALIs are little endian. For each byte, add the value of the first seven bits, shifted by 7 times the number of characters currently read bits, to the the read value and repeat until the 8th bit is 0.

Signature

The Signature MUST be URF (US-ASCII) followed by a DC1 character, or an unsigned 32-bit little endian integer equal to 1431455249. The next two bytes MUST be the version number, the first being the major version, the second being the minor version. The next two bytes MUST be DC2 followed by a NULL character.

File Table

The File Table MUST start after the Signature. This MUST contain all Entries, and MUST end with an EOH Entry. See EOH Entry.

Entry

An Entry MUST start with an Entry Specifier byte. The Entry Specifier byte MUST be followed with an ALI specifying the length. Data contained SHOULD be skipped if the Entry Specifier byte allows and the Entry Type is not understood.

Entry Specifier byte

An Entry Specifier byte MUST have the 7th bit set to 1. If the Consumer comes across an entry with the 7th bit not set to 1, the Consumer MUST stop reading the file and raise a fatal error. The 6th byte specifies if the entry is critical. The entry is critical if the 6th bit is set to 0. If the entry is critical and not understood, the Consumer should raise a fatal error; if the entry is non-critical, the Consumer SHOULD skip the entry and continue reading the file. If the 8th byte is set to 1, the Entry is a non-standard extension. Vendors MAY use this range for vendor-specific data.

Filesystem Structure

Any Object MUST have a Parent ID and an Object ID. Object ID 0 is reserved for the root directory.

File naming conventions

Object names MUST NOT contain any type of slash. Object names must also be of the 8.3 format, though the full name MAY be specified with Extended Attributes. Full names in Attributes MUST NOT contain any type of slash.

File offsets

File offsets are relative to the end of the File Table.

Entry Type: File

The Entry Specifier byte MUST be F, and the data contained MUST be the name as a string, followed by the file offset and file size represented as an ALI, then followed by the Object ID, then Parent ID.

Entry type: Directory

The Entry Specifier byte MUST be D, and the data contained MUST be the name as a string, followed by the Object ID, then Parent ID.

Entry type: Extended Attributes

The Entry Specifier byte MUST be x, and the data contained MUST be the Object ID of the Entry the Entry is describing, followed by a four byte Attribute and the value.

Currently recognized attributes include, names in US-ASCII:

  • NAME: The long name of the Object (String)
  • PERM: The POSIX-compatible permissions of the Object (Unsigned 16-bit integer)
  • W32P: Win32-compatible permissions of the Object, which override POSIX permissions (Unsigned 8-bit Integer, Read-Only [0x01], Hidden [0x02], System [0x04])
  • OTIM: Creation time of the Object (Unsigned 64-bit Integer)
  • MTIM: Modification time of the Object (Unsigned 64-bit Integer)
  • CTIM: Metadata update time of the Object (Unsigned 64-bit Integer
  • ATIM: Access time of the Object (Unsigned 64-bit Integer)
  • SCOS: Source OS of the Object (String)

Entry type: EOH

The Entry Specifier byte MUST be Z, and the data contained MUST be the offset required to reach the end of the file.

Compressed URF naming convention

In an environment with long names, the file extension SHOULD be urf followed by a period (.) and the compression method (i.e. gz, lzma, xz, deflate) In an environment with 8.3 names, the file extension MUST be one of the following:

  • UMA for LZMA
  • UXZ for XZ
  • UGZ for Gzip
  • UL4 for LZ4
  • UB2 for BZip2

TODO

Document is incomplete. Document should outline how to build the Filesystem structure, etc. Document should be checked for clarity and rewritten if needed.

Share this post


Link to post
Share on other sites

For the signature, is the current version of this document is 1.0 (1 as major and 0 as minor) or is it something else? Because it's not written inside the document.. Also, the document says all string are encoded in UTF-8, ok, but must have their length at first, does it applies for the "URF" string part of the signature?

 

There is also a lot of oddities in the entries, you say that if the entry is a file, the specifier byte must be "F", well it's ok because ASCII F byte have 6th and 7th bits equals to 1. However if the entry is a directory you say the specifier byte must be "D", but in ASCII, D equals 68 in decimal, in binary the 7th bit is 0. Also you say a entry should contains Object ID and Parent ID, but where, is parent id first or last? And is it written with 32-bit unsigned integer? Is it an ALI? You're not saying it which makes hard to make a standard implementation!

P.S.: Even if document is unprecise, i managed to make an implementation of it, the first URF file is born :)

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use and Privacy Policy.