A very simple file system design

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

A very simple file system design

Post by Candy »

Hi all,

I'm back to the OS dev direction again. I'm reworking my entire OS code, loader code and bootup code and ended up with loading files from a filesystem, where I decided to make a simple filesystem design that basically does the same thing as FAT - ie, simple file storage with no complicated structures - but seen from a 2014 perspective.

My current design:

Code: Select all

struct extent {
  uint64_t startblock;
  uint64_t length;
};

struct inode {
  uint256_t filehash;
  extent storage;
  uint64_t filesize;
  time_t creation;
  time_t modification;
  time_t access;
  uint32_t flags;
  uint32_t pad[12];
};

enum {
  INO_SYSTEM =0x0001,
  INO_MODIFIED = 0x0002,
  INO_READONLY = 0x0004,
  INO_HASHSTALE = 0x0008,
  INO_EXTENT_INDIRECTION = 0x0030,
  INO_DIRECTORY = 0x0040
};

struct direntry {
  uint64_t inodeno;
  uint32_t filenamelength;
  char[] name; // null-terminated, length includes terminator
};

struct directory {
  uint32_t formatid = 1;
  uint32_t entrycount;
  direntry first;
};
Basic idea is, a partition is split into 4k blocks starting from the exact start. Everything occupies blocks. Every thing on the disk is stored as files too; including the inode table itself. The disk contains (currently thought out) 6 files that are pretty much always present; the first three in fixed locations:

0. Boot block, 8 blocks (32KB) from the exact start (block 0).
1. Partition header (undefined currently), describing the partition where necessary. May be dropped if not useful. 4KB at block 8.
2. Inode table. Contains description for itself, so it is as long as it needs to be. Always contiguous though.
3. Root directory, lists file names.
4. Free block list, contains all extents that are not a part of an existing file.
5. Bad block list, contains all blocks that logically exist, but should not be used for another reason.

To read a file,

1. Read inode table first block.
2. Read root directory fully, find file you want.
3. Read inode for subdir, find file you want (repeat as necessary)
4. Read inode for file, read file.

To boot,

1. Load boot block fully from boot sector (if your platform requires)
2. Load files as desired, load any other things as desired.


Any ideas? Just wanted to reflect on this idea for a bit to see if it could be simplified or if there are major issues for writing.
User avatar
AndrewAPrice
Member
Member
Posts: 2300
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: A very simple file system design

Post by AndrewAPrice »

It's unclear how you handle chaining multiple blocks together for large files (maybe I'm overlooking something implied?)
My OS is Perception.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: A very simple file system design

Post by bluemoon »

You may want to take a look into File_Systems, especially SFS.

Anything aim for simplicity but more complex than a "file list" and "file content" defeat the purpose of being simple, and not worth inventing.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re: A very simple file system design

Post by Candy »

MessiahAndrw wrote:It's unclear how you handle chaining multiple blocks together for large files (maybe I'm overlooking something implied?)
No, I just did a terrible job of explaining. There is a bit mask in the flags, INO_EXTENT_INDIRECTION, which is either 0, 1, 2 or 3. If it's zero then the extent maps the storage area for the file directly. If it is 1, then the extent contains extends, which contain the file. If it is 2, then the extent contains extents that contain extents, that contain the file area. Similarly for 3.

The idea is, if you have a sequentially stored file you just store "starts at X, length Y". If you have a fragmented file, you have a list of such fragments, to which a single extent points. Two or three should only be necessary for very sparse files or for very fragmented file systems. You are always guaranteed to be able to store files up to (4k / 16b = 256) ^ 3 * 4k = 64 GB, assuming enough free space and a lot of fragmentation. The idea is similar to ext2, except with extents.
bluemoon wrote:You may want to take a look into File_Systems, especially SFS.
I helped design it and I know that it is not suitable as boot partition, as the wiki itself also claims. It does not support directories or fragmentation and both of those are IMO required for a serious file system that's not just used for transport - which SFS was intended for.
Anything aim for simplicity but more complex than a "file list" and "file content" defeat the purpose of being simple, and not worth inventing.
Make it as simple as it can be, but no simpler. I want it to support fragmentation (so you don't need to defragment to use it at all) and I want it to support directories, so you do not end up with everything in a single folder.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: A very simple file system design

Post by bluemoon »

Candy wrote:Make it as simple as it can be, but no simpler. I want it to support fragmentation (so you don't need to defragment to use it at all) and I want it to support directories, so you do not end up with everything in a single folder.
I just misunderstood what you meant simple, I though it's simple for developer - which tends to lack features; and what you wanted is simple but with some certain non-trivial features.
IMO This put it onto same level as FAT or EXT, it is still simple to some extend, and is fine.

Back to the topic, you may want to add "meta block" (maybe many different type of meta block) to describe directory and files, including inode list, tags and other things.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: A very simple file system design

Post by Brendan »

Hi,
Candy wrote:Any ideas? Just wanted to reflect on this idea for a bit to see if it could be simplified or if there are major issues for writing.
Just a random idea....

One of the things I'd be very tempted to do is to split it into 2 halves. The lower half would be responsible for managing storage space, and would provide an interface for storing, resizing and retrieving "arrays of bytes" where each array of bytes has a unique (integer) identifier. The upper half would provide the normal file system interface (for creating, reading and deleting files and directories; including handling all meta-data, etc); and would rely on the lower half to take care of managing the storage space.

The idea here is that you could have several different lower halves (e.g. one optimised for RAM disks, one for SSD, one for USB flash, one for rotating disks, some that do encryption and/or compression, some that are experimental and some that are stable, etc) and several different upper halves (some that use journaling and some that don't, some with different meta-data, some that support for checkpointing and rollbacks, some with "file content search" indexing, some that are experimental and some that are stable, etc).

The end-user would be able to mix & match - choose any lower half and use it with any upper half, to get a combination that's customised for their hardware and whatever they're using that file system for.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
embryo

Re: A very simple file system design

Post by embryo »

Candy wrote:I want it to support fragmentation (so you don't need to defragment to use it at all) and I want it to support directories, so you do not end up with everything in a single folder.
So, you want to create "a bit better FAT"? Does it worth the time spent? It can be interesting for personal OS, like a puzzle is interesting to solve, but not for many people except the author. However, I am not in a position to discourage your search for better world.
willedwards
Member
Member
Posts: 96
Joined: Sat Mar 15, 2014 3:49 pm

Re: A very simple file system design

Post by willedwards »

What is the filehash in the inode and what's it used for?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: A very simple file system design

Post by Combuster »

Well, indirection count is perhaps one way of solving it, but it puts a hard limit on the number of extents (and if I got the math right, you can run out with a >128MB file on a 512-byte block medium), and doesn't prevent you from doing a inorder traversal of the entire tree to be able to seek to a point near the end. Implementing a linked list by having the first/last extent of an indirect block function as an indirection saves you from an algorithmic problem.



That leaves two half-related queries from my end. Why not just Ext2 (with the 64-bit extensions since you appear to be wanting that) since that saves you a fair share of tool-related issues, and why not/what happened to *fs?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re: A very simple file system design

Post by Candy »

willedwards wrote:What is the filehash in the inode and what's it used for?
That was to allow files to be stored once, with a copy-on-write flag on it to save space. It is in the wrong place though, that should be a file system service built on top of this, so I removed it (and the timestamps) in my local update. Will post it tonight when I have a bit more free hand position.
Combuster wrote:Well, indirection count is perhaps one way of solving it, but it puts a hard limit on the number of extents (and if I got the math right, you can run out with a >128MB file on a 512-byte block medium), and doesn't prevent you from doing a inorder traversal of the entire tree to be able to seek to a point near the end. Implementing a linked list by having the first/last extent of an indirect block function as an indirection saves you from an algorithmic problem.
Blocks are always 4K, making the minimum supported file size 64GB. You have to do an in-order traversal of the full tree. Note that this is a mechanism mostly there for making it work in the worst case, not for use in common or optimal case. Optimally, files are always in one chunk so there is no indirection, typically files will be in one or (up to) 4-5 chunks, making a single indirection enough and the caching or inorder traversal moot. On the worst-case fragmented filesystem it's not going to be fast anyway - I've implemented a worst-case test for an employer once and it crashed both our proprietary system (causing a 30 second clip that should've been written in real time to take 3 hours to write) and the Windows machine used for reading it on later (which just flat out refused & crashed).

I want the setup to be simple to understand and it to be fast in typical and optimal cases, as well as it not breaking in worst case. Optimal case is identical to Brendan's SimpleFS and Microsoft ExFAT, typical case is possible (compared to SFS) and faster (compared to ExFAT).
That leaves two half-related queries from my end. Why not just Ext2 (with the 64-bit extensions since you appear to be wanting that) since that saves you a fair share of tool-related issues
Lot of legacy. I'm not so much afraid of having to make tools, but I am afraid of needless complexity of the tools I have to make because of legacy. One of the main reasons for this is to have a really simple file system format, that does not hide any filesystem bits behind weird access, that allows for a simple implementation above all and that is not curled up in legacy like Ext4/3/2/1, or ExFAT/FAT32/FAT16/FAT12.
, and why not/what happened to *fs?
Same thing, unnamed as of yet as I'm not sure I want to take it in the same direction as *FS. Thing is, that name was used a while ago and has a certain meaning that this will not have, so I didn't want to tag it with the name, but you may as well consider it a further evolution of what *FS was.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re: A very simple file system design

Post by Candy »

So... I decided to start work on the FS driver, the boot code and an MKFS tool. I found out that you can as a regular user mount & unmount FUSE file systems, which means that if you have a working FUSE driver and an MKFS that makes a blank FS, you're set. So I made them.

Because the design is so simple (intentionally) the mkfs tool is 110 lines of code total. It creates a filesystem of desired size (either a file or a device can be targeted) and will fill those with the FS-required files. The root directory contains them under their own names.

A hexdump after MKFS shows:

Code: Select all

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00009000  24 00 00 00 01 00 00 00  00 80 00 00 00 00 00 00  |$...............|
00009010  00 00 00 00 00 00 00 00  08 00 00 00 00 00 00 00  |................|
00009020  24 00 00 00 01 00 00 00  00 10 00 00 00 00 00 00  |$...............|
00009030  08 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00009040  24 00 00 00 01 00 00 00  00 80 00 00 00 00 00 00  |$...............|
00009050  09 00 00 00 00 00 00 00  08 00 00 00 00 00 00 00  |................|
00009060  44 00 00 00 06 00 00 00  00 10 00 00 00 00 00 00  |D...............|
00009070  11 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00009080  24 00 00 00 01 00 00 00  00 e0 fe 00 00 00 00 00  |$...............|
00009090  12 00 00 00 00 00 00 00  ee 0f 00 00 00 00 00 00  |................|
000090a0  24 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |$...............|
000090b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00011000  01 00 00 00 06 00 00 00  00 00 00 00 0b 00 24 62  |..............$b|
00011010  6f 6f 74 62 6c 6f 63 6b  00 01 00 00 00 08 00 24  |ootblock.......$|
00011020  68 65 61 64 65 72 00 02  00 00 00 08 00 24 69 6e  |header.......$in|
00011030  6f 64 65 73 00 03 00 00  00 0f 00 24 72 6f 6f 74  |odes.......$root|
00011040  64 69 72 65 63 74 6f 72  79 00 04 00 00 00 06 00  |directory.......|
00011050  24 66 72 65 65 00 05 00  00 00 0b 00 24 62 61 64  |$free.......$bad|
00011060  62 6c 6f 63 6b 73 00 00  00 00 00 00 00 00 00 00  |blocks..........|
00011070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01000000
And listing from Linux shows

Code: Select all

-rwxr-xr-x 1 pebi pebi 16777216 Jun 15 13:35 testfs.starfs
pebi@box:~/projects/atlantisos/mnt$ ls -l
total 24
-rw-r--r-- 1 root root        0 Jan  1  1970 $badblocks
-rw-r--r-- 1 root root    32768 Jan  1  1970 $bootblock
-rw-r--r-- 1 root root 16703488 Jan  1  1970 $free
-rw-r--r-- 1 root root     4096 Jan  1  1970 $header
-rw-r--r-- 1 root root    32768 Jan  1  1970 $inodes
drw-r--r-- 6 root root     4096 Jan  1  1970 $rootdirectory
pebi@box:~/projects/atlantisos/mnt$ 
Next thing is to add write support so I can write to it from the FUSE driver too. You can already recurse into $rootdirectory and get to the same dir again, so that looks to be working. Not the intended use of course, but it is a logical consequence.

For others following along, the updated structures are as follows:

Code: Select all

#define INVALID_INODE 0xFFFFFFFFFFFFFFFFULL
#define INO_EXTENT_INDIRECTION 0x00000003
#define INO_SYSTEM 0x00000004
#define INO_COPY_ON_WRITE 0x00000008
#define INO_MODIFIED 0x00000010
#define INO_READONLY 0x00000020
#define INO_DIRECTORY 0x00000040

struct extent {
  uint64_t startblock;
  uint64_t length;
};

struct inode {
  uint32_t flags;
  uint32_t linkcount;
  uint64_t filesize;
  struct extent ext;
} *inodetab;

struct dirformat {
  uint32_t formatid;
  uint32_t entrycount;
};

struct dirent {
  uint32_t inodeno;
  uint16_t filenamelength;
  char *filename;
} __attribute__((__packed__));
Directly grabbed from the source code.

Any ideas? I'll see if I will open source the tooling; the design is open in any case.
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Re: A very simple file system design

Post by jnc100 »

User-defined properties as name/value pairs? e.g. to support metadata on files or ownership/ACL information. You wouldn't have to add support yet, but maybe just reserve another entry in the dirent structure which points to a separate inode containing properties for the file/directory in question (if it has any).

Regards,
John.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re: A very simple file system design

Post by Candy »

Sounds good. Will add that.
Post Reply