Adorable-Catgirl 11 Posted January 14, 2019 Share Posted January 14, 2019 this is just a draft, feel free to suggest whatever. URF v1.1 Abstract This document describes the URF, intended to make mass file exchange easier. Rationale There are many competing methods of exchanging large amounts of files, and many are incomplete, such as TAR implementations, or proprietary, such as NeoPAKv1. With this in mind, a standard format will make file exchange less error prone. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. All strings are encoded with UTF-8, prepended with an ALI corresponding to the length of the string in bytes. Character specifications such as NULL and DC1 are part of US ASCII, unless otherwise specified. Concepts Signature: Data at the beginning of the File, marking the File as a URF format archive, and specifying the version. File Table: Data structure containing all file information. Entry: Any data sub-structure in the File Table. Entry Specifier byte: A byte describing how to decode the data contained in an Entry. Object: A filesystem structure, i.e.. "file" or "directory". Attributes: Data describing the size, offset, parent Object, and Object ID of an Object. Extended Attributes: Data describing non-core attributes such as Permissions, Owner, Security, etc. Producer: A program or process that generates data in this format. Consumer: A program or process that consumes data in this format. Arbitrary length integers Arbitrary length integers (ALIs) MAY be over 64-bits in precision. ALIs are little endian. For each byte, add the value of the first seven bits, shifted by 7 times the number of characters currently read bits, to the the read value and repeat until the 8th bit is 0. Signature The Signature MUST be URF (US-ASCII) followed by a DC1 character, or an unsigned 32-bit little endian integer equal to 1431455249. The next two bytes MUST be the version number, the first being the major version, the second being the minor version. The next two bytes MUST be DC2 followed by a NULL character. File Table The File Table MUST start after the Signature. This MUST contain all Entries, and MUST end with an EOH Entry. See EOH Entry. Entry An Entry MUST start with an Entry Specifier byte. The Entry Specifier byte MUST be followed with an ALI specifying the length. Data contained SHOULD be skipped if the Entry Specifier byte allows and the Entry Type is not understood. Entry Specifier byte An Entry Specifier byte MUST have the 7th bit set to 1. If the Consumer comes across an entry with the 7th bit not set to 1, the Consumer MUST stop reading the file and raise a fatal error. The 6th byte specifies if the entry is critical. The entry is critical if the 6th bit is set to 0. If the entry is critical and not understood, the Consumer should raise a fatal error; if the entry is non-critical, the Consumer SHOULD skip the entry and continue reading the file. If the 8th byte is set to 1, the Entry is a non-standard extension. Vendors MAY use this range for vendor-specific data. Filesystem Structure Any Object MUST have a Parent ID and an Object ID. Object ID 0 is reserved for the root directory. File naming conventions Object names MUST NOT contain any type of slash. Object names must also be of the 8.3 format, though the full name MAY be specified with Extended Attributes. Full names in Attributes MUST NOT contain any type of slash. File offsets File offsets are relative to the end of the File Table. Entry Type: File The Entry Specifier byte MUST be F, and the data contained MUST be the name as a string, followed by the file offset and file size represented as an ALI, then followed by the Object ID, then Parent ID. Entry type: Directory The Entry Specifier byte MUST be D, and the data contained MUST be the name as a string, followed by the Object ID, then Parent ID. Entry type: Extended Attributes The Entry Specifier byte MUST be x, and the data contained MUST be the Object ID of the Entry the Entry is describing, followed by a four byte Attribute and the value. Currently recognized attributes include, names in US-ASCII: NAME: The long name of the Object (String) PERM: The POSIX-compatible permissions of the Object (Unsigned 16-bit integer) W32P: Win32-compatible permissions of the Object, which override POSIX permissions (Unsigned 8-bit Integer, Read-Only [0x01], Hidden [0x02], System [0x04]) OTIM: Creation time of the Object (Unsigned 64-bit Integer) MTIM: Modification time of the Object (Unsigned 64-bit Integer) CTIM: Metadata update time of the Object (Unsigned 64-bit Integer ATIM: Access time of the Object (Unsigned 64-bit Integer) SCOS: Source OS of the Object (String) Entry type: EOH The Entry Specifier byte MUST be Z, and the data contained MUST be the offset required to reach the end of the file. Compressed URF naming convention In an environment with long names, the file extension SHOULD be urf followed by a period (.) and the compression method (i.e. gz, lzma, xz, deflate) In an environment with 8.3 names, the file extension MUST be one of the following: UMA for LZMA UXZ for XZ UGZ for Gzip UL4 for LZ4 UB2 for BZip2 TODO Document is incomplete. Document should outline how to build the Filesystem structure, etc. Document should be checked for clarity and rewritten if needed. Zen1th 1 Quote Link to post Share on other sites
Zen1th 14 Posted April 12, 2019 Share Posted April 12, 2019 For the signature, is the current version of this document is 1.0 (1 as major and 0 as minor) or is it something else? Because it's not written inside the document.. Also, the document says all string are encoded in UTF-8, ok, but must have their length at first, does it applies for the "URF" string part of the signature? There is also a lot of oddities in the entries, you say that if the entry is a file, the specifier byte must be "F", well it's ok because ASCII F byte have 6th and 7th bits equals to 1. However if the entry is a directory you say the specifier byte must be "D", but in ASCII, D equals 68 in decimal, in binary the 7th bit is 0. Also you say a entry should contains Object ID and Parent ID, but where, is parent id first or last? And is it written with 32-bit unsigned integer? Is it an ALI? You're not saying it which makes hard to make a standard implementation! P.S.: Even if document is unprecise, i managed to make an implementation of it, the first URF file is born Quote Link to post Share on other sites
Adorable-Catgirl 11 Posted May 29, 2019 Author Share Posted May 29, 2019 Sorry about that, haha. Feels free to help revise it. I just haven't worked on any OC stuff in a while. Quote Link to post Share on other sites