Mega Drive simple filesystem design

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
User avatar
Sik
Member
Member
Posts: 251
Joined: Wed Aug 17, 2016 4:55 am

Mega Drive simple filesystem design

Post by Sik »

Just gonna be writing down some thoughts here. Don't be surprised if this post feels a bit unfocused.

Have been considering giving another try at an operating system for the Mega Drive (albeit not with discrete windows =P). Not really anything worth showing yet, but been thinking about how I'd do the filesystems. There would be two partitions for now: a ROM partition (which contains the programs and such) and a SRAM partition (for all writable files, at least for now). Still writing the driver for the ROM partition, which probably won't even have clusters since it's random access and I don't have to worry about fragmentation when nothing can be moved around.

Focusing on the format for directories right now. Each file entry would be 16 bytes (yes, I know, FAT spends 32 bytes despite originally being made for smaller partitions, but whatever). It only has the bare minimum needed, so no creation date or stuff like that (not like I have a RTC at hand either). It would look as follows:
  • 8 bytes: name
  • 2 bytes: extension
  • 3 bytes: length
  • 3 bytes: offset
File name and extension won't be encoded as ASCII but rather as a much more restricted charset that has up to 40 codepoints. Then I can cram three characters into two bytes (char1 × 40² + char2 × 40 + char3, for a total of 64000 combinations). This means the actual filenames are 12.3 (and yes, those three extra characters are a huge breathing room compared to 8.3 in DOS). The charset currently looks as follows (codepoint 0 is a space, needed for padding):

Code: Select all

  A B C D E F G
H I J K L M N O
P Q R S T U V W
X Y Z 0 1 2 3 4
5 6 7 8 9
(there's room for three more symbols, later I'll see what to do with them)

One obvious issue is that there's no way to tell apart a file from a subdirectory. Easy: subdirectories don't have an extension, and their extension will be an invalid value (i.e. larger than 63999). I may use invalid extension values in other ways too, seeing as I have 1536 of them.

Length and offset can address up to 16M only, but shouldn't be an issue seeing as the relevant partitions are 2MB anyway. What they address will be probably different for each of them, with the ROM I can just use raw addresses, while with the SRAM partition I may have to resort to some sort of clusters to allow fragmentation (not a big deal since seeking times are a non-issue, so it'd be actually desirable in this case).
Techel
Member
Member
Posts: 215
Joined: Fri Jan 30, 2015 4:57 pm
Location: Germany
Contact:

Re: Mega Drive simple filesystem design

Post by Techel »

I wonder why splitting the filename into a name and extension at all.
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Mega Drive simple filesystem design

Post by alexfru »

Having to use a very long division to extract a file name from a record like that smells like a pretty bad design. Better use 6 bits per character and simple bit manipulations and you can have a 13-char name and a room for A-Z, a-z, 0-9, and an underscore. Out of the 10 bytes for the name/extension you have left 2 spare bits, one of which you can use to mark a directory. The other bit you may use to indicate that there's no dot and the extension is just a continuation of the name.
User avatar
Sik
Member
Member
Posts: 251
Joined: Wed Aug 17, 2016 4:55 am

Re: Mega Drive simple filesystem design

Post by Sik »

Division is not that bad seeing as it's done only once in a long while (and the 68000 does a division in like ~140 cycles).

Honestly in hindsight though now I don't know why I'm even bothering, the whole idea of cramming three characters into two bytes came from when I was limited to 64KB (or worse, 32KB) since that's pretty much the only thing anything would give you, but now I have an emulator that emulates 2MB of SRAM (and on your own cartridge on real hardware you can have as much as you want) so there's probably no point in keeping records that tiny. Back to the design board I guess.

Even then some restriction may still be nice even if just to make things easier on the system. But maybe that's dumb, if I ever add support for SD cards I'll have to add support for 256 bytes long filenames anyway I guess =P
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Mega Drive simple filesystem design

Post by Schol-R-LEA »

Three things: first, I assume that when you said that FAT uses 32 bytes per entry, you were talking about the file records in the directory table, not the FAT entries, but it wasn't entirely clear.

There were several versions of FAT, most of which were named for the size of the file allocation table entries: for example, the original FAT8 (which was used in a product called 'Standalone Disk BASIC-86' before MS-DOS was developed, for use with single-sided 5 1/4" disks) used only 8 bits, FAT12 (used for floppy disks in MS-DOS) used 12, etc. However, these related to the blocks that held the file contents, while the information about the files themselves (name, attributes, etc.) were in a separate table. I don't know offhand how large the available space will be on a MegaDrive's secondary storage, but I presume that the smaller maximum file offset (24 bits -> 16,777,215 sectors/clusters) is reasonable for your hardware.

Second, as has already been mentioned, having a hardcoded, fixed-length extension was a bad idea for DOS, and it's still a bad idea (actually, I would argue that file extensions are a bad idea in general). Even if you are enforcing file types, using the extension is a bad way to do it; on the one hand, for a system like this one you probably won't have 17576 file types (26^3, the number of permutations of three case-insensitive letters from the Latin alphabet), so it would be too many bits to represent fixed types, while at the same time, it is too few to really give a useful file extension for some things (e.g., HTML, which infamously lead to people using the '.htm' extension for years). If you are going to have typed files, use a one-byte type marker for the file entries and have a separate table mapping to known file types, that way you can either skip the extensions altogether, or have the file system add them automagically.

Third, if you are going to that much trouble in setting up compressed filenames, just bite the bullet and use a bitwise Huffman encoding with a fixed table seeded on the letter frequencies for your native language (e.g., ETAOINSHRDLU...0123456789-_ for English). It wouldn't be consistent in the filename length (assuming 26 single-case letters + 10 digits + underscore + hyphen + space + period = 40 code points for a maximum required bit width of 6, with an 8-byte field width, it would vary between 18 and 64 characters), but it could be compressed and extracted with just bitwise operations and so would be reasonably fast.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Post Reply