From 1470230b165fc5919512ad93ee420aa360283079 Mon Sep 17 00:00:00 2001 From: Stephen Seo Date: Tue, 24 Sep 2024 18:43:47 +0900 Subject: [PATCH] Create file format for format version 1 This is in preparation of improving compression by concatenating files together before compressing them to reduce the per-file overhead. Discussed in https://git.seodisparate.com/stephenseo/SimpleArchiver/issues/18 --- file_format.md | 102 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) diff --git a/file_format.md b/file_format.md index fb9c210..2f4831e 100644 --- a/file_format.md +++ b/file_format.md @@ -76,3 +76,105 @@ Following the file-count bytes, the following bytes are added for each file: 1. 8 bytes 64-bit unsigned integer "size of filename in this archive file" in big-endian. 2. X bytes file data (length defined by previous value). + +## Format Version 1 + +File extension is "*.simplearchive" but this isn't really checked. + +First 18 bytes of file will be (in ascii): + + SIMPLE_ARCHIVE_VER + +Next 2 bites is a 16-bit unsigned integer "version" in big-endian. It will be: + + 0x00 0x01 + +Next 4 bytes are bit-flags. + +1. The first byte + 1. The first bit is set if de/compressor is set for this archive. + +The remaining unused flags in the previous bit-flags bytes are reserved for +future revisions and are currently ignored. + +If the previous "de/compressor is set" flag is enabled, then the next section is +added: + +1. 2 bytes is 16-bit unsigned integer "compressor cmd+args" in big-endian. This + does not include the NULL at the end of the string. +2. X bytes of "compressor cmd+args" (length defined by previous value). Is a + NULL-terminated string. +3. 2 bytes is 16-bit unsigned integer "decompressor cmd+args" in big-endian. + This does not include the NULL at the end of the string. +4. X bytes of "decompressor cmd+args" (length defined by previous value). Is a + NULL-terminated string. + +The next 4 bytes is a 32-bit unsigned integer "link count" in big-endian which +will indicate the number of symbolic links in this archive. + +Following the link-count bytes, the following bytes are added for each symlink: + +1. 2 bytes bit-flags: + 1. The first byte. + 1. The first bit is UNSET if relative links are preferred, and is SET if + absolute links are preferred. + 2. The second byte. + 1. Currently unused. +2. 2 bytes is 16-bit unsigned integer "link target absolute path" in + big-endian. This does not include the NULL at the end of the string. +3. X bytes of link-target-absolute-path (length defined by previous value). + Is a NULL-terminated string. If the previous "size" value is 0, then + this entry does not exist and should be skipped. +4. 2 bytes is 16-bit unsigned integer "link target relative path" in + big-endian. This does not include the NULL at the end of the string. +5. X bytes of link-target-relative-path (length defined by previous value). + Is a NULL-terminated string. If the previous "size" value is 0, then + this entry does not exist and should be skipped. + +After the symlink related data, the next 4 bytes is a 32-bit unsigned integer +"chunk count" in big-endian which will indicate the number of chunks in this +archive. + +Following the chunk-count bytes, the following bytes are added for each chunk: + +1. 2 bytes that are a 16-bit unsigned integer "file count" in big-endian. + +The following bytes are added for each file within each chunk: + +1. 2 bytes that are a 16-bit unsigned integer "filename length" in big-endian. + This does not include the NULL at the end of the string. +2. X bytes of filename (length defined by previous value). Is a NULL-terminated + string. +3. 4 bytes bit-flags. + 1. The first byte. + 1. The first bit is "user read permission". + 2. The second bit is "user write permission". + 3. The third bit is "user execute permission". + 4. The fourth bit is "group read permission". + 5. The fifth bit is "group write permission". + 6. The sixth bit is "group execute permission". + 7. The seventh bit is "other read permission". + 8. The eighth bit is "other write permission". + 2. The second byte. + 1. The first bit is "other execute permission". + 3. The third byte. + 1. Currently unused. + 4. The fourth byte. + 1. Currently unused. +4. Two 4-byte unsigned integers in big-endian for UID and GID. + 1. A 32-bit unsigned integer in big endian that specifies the UID of the + file. Note that during extraction, if the user is not root, then this + value will be ignored. + 2. A 32-bit unsigned integer in big endian that specifies the GID of the + file. Note that during extraction, if the user is not root, then this + value will be ignored. +5. A 64-bit unsigned integer in big endian for the "size of file". + +After the files' metadata are the current chunk's data: + +1. A 64-bit unsigned integer in big endian for the "size of chunk". +2. X bytes of data for the current chunk of the previously specified size. If + not using de/compressor, this section is the previously mentioned files + concatenated with each other. If using de/compressor, this section is the + previously mentioned files concatenated and compressed into a single blob of + data.