Using FUSE as an VFS

rdos · Post by **rdos** » Wed Mar 17, 2021 5:02 am

The idea to use FUSE to get access to a lot of filesystems without much effort, particularly ext and NTFS is compelling, I wonder how well this would work in practice.

For example, the fatfuse which is part of ubunti (it no longer seems to be available as an independent project) has a lot of problems. It actually can only be used by mounting a file in the local filesystem, and then normal file operations are used instead of a read/write sector interface that would be expected if real hardware was used. It's not supported by cygwin, and actually no fuse flesystem is part of cygwin, and so testing it would require a real Linux box. Another problem is that it is not re-entrant and the code relies on a higher level interface where paths are parsed internally. I think the low-level interface must be used with fuse, but it's up to the implementer to decide which interface to use. That limits the number of possible fuse filesystems.

Another problem is that there is no ext4 fuse that has write operations, and no Reiser or other complex filesystem. There is an NTFS fues implementation, but one might wonder if this can be reused or not, or if it must be more or less rewritten from scratch to provide a reasonable interface that is reeentrant.

So, the question is if FUSE actually adds much usefulness?

Velko · Post by **Velko** » Wed Mar 17, 2021 7:16 am

Your mentioned FS implementations on FUSE probably does not get much love, because there are native kernel-mode implementations of those FS in Linux. I would imagine that they are more like proof-of-concept or research projects.

FUSE adds value with different kind of filesystems, such as SshFs or WebDavFs. Implementing those in-kernel would not be practical.

bzt · Post by **bzt** » Wed Mar 17, 2021 10:04 pm

rdos wrote:The idea to use FUSE to get access to a lot of filesystems without much effort, particularly ext and NTFS is compelling, I wonder how well this would work in practice.

I wouldn't say "without much effort", because you'll have to port a linux kernel driver to your OS. But yes, you do that once and you get access to many many file systems. I wrote about it on the wiki, but that page is more focused on how to implement a FUSE driver instead of how to port it. In a nutshell, it has 3 components:
1. a kernel driver: you must implement the syscalls
2. a user-space library: you can port libfuse, or you could implement one with the same API, but uses your own syscalls, that's a possibility too
3. a user-space application that handles the file system (if you did steps 1. and 2. properly, this should compile on your OS without probs)

rdos wrote:For example, the fatfuse which is part of ubunti (it no longer seems to be available as an independent project) has a lot of problems.

There are many other fat-fuse implementations: this or this etc.

rdos wrote:It actually can only be used by mounting a file in the local filesystem, and then normal file operations are used instead of a read/write sector interface that would be expected if real hardware was used.

Not really, the whole idea behind UNIX's everything is a file philosophy is that you can access block devices as a file. So while the fuse application using open/read/write/close syscalls, if it uses them on a block device, the kernel will actually translate those to sector reads/writes. Note: the same works for Win32, there you have to open the block device \\.\PhysicalDisk\X, and then you can use ReadFile and WriteFile.

rdos wrote:It's not supported by cygwin, and actually no fuse flesystem is part of cygwin, and so testing it would require a real Linux box.

Try out some other drivers, maybe one will compile out-of-the-box.

rdos wrote:Another problem is that it is not re-entrant and the code relies on a higher level interface where paths are parsed internally.

Not sure why would you need re-entrancy, but that higher-level interface is the one that libfuse provides, and in all and every fuse driver it is the driver's job to parse the path internally (at least the portion that's inside the image).

rdos wrote:I think the low-level interface must be used with fuse, but it's up to the implementer to decide which interface to use. That limits the number of possible fuse filesystems.

Not sure what you mean low-level here. There's a simpler interface with fuse_main, and another one fuse_loop. Both should be provided by the library, is this what you mean?

rdos wrote:Another problem is that there is no ext4 fuse that has write operations, and no Reiser or other complex filesystem.

How about this one (read-write ext2/3/4)? I've never used raiserfs so I don't know about that.

rdos wrote:There is an NTFS fues implementation, but one might wonder if this can be reused or not

Most certainly. Ntfs-3g is the best and absolutely portable.

rdos wrote:So, the question is if FUSE actually adds much usefulness?

I think a lot, it eases the portability significantly. But you have to find the drivers yourself (or write them).

Cheers,
bzt

rdos · Post by **rdos** » Thu Mar 18, 2021 2:09 am

bzt wrote:
rdos wrote:The idea to use FUSE to get access to a lot of filesystems without much effort, particularly ext and NTFS is compelling, I wonder how well this would work in practice.
I wouldn't say "without much effort", because you'll have to port a linux kernel driver to your OS. But yes, you do that once and you get access to many many file systems. I wrote about it on the wiki, but that page is more focused on how to implement a FUSE driver instead of how to port it. In a nutshell, it has 3 components:
1. a kernel driver: you must implement the syscalls
2. a user-space library: you can port libfuse, or you could implement one with the same API, but uses your own syscalls, that's a possibility too
3. a user-space application that handles the file system (if you did steps 1. and 2. properly, this should compile on your OS without probs)

I'll limit myself to only providing a compatible interface for the low-level functions. Those will allow me to put synchronization primitives on path parsing, modifying metadata and file data. The high-level functions parse complete paths within the filesystem implementation, and unless the writers thought about implementing synchronization, supporting the high-level interface would mean only one thread could execute within the file system at a time, a severe limitation that would affect performance.

bzt wrote:
rdos wrote:For example, the fatfuse which is part of ubunti (it no longer seems to be available as an independent project) has a lot of problems.
There are many other fat-fuse implementations: this or this etc.

I'll only look at them for ideas. I know FAT well enough to do the driver myself.

bzt wrote: Not really, the whole idea behind UNIX's everything is a file philosophy is that you can access block devices as a file. So while the fuse application using open/read/write/close syscalls, if it uses them on a block device, the kernel will actually translate those to sector reads/writes. Note: the same works for Win32, there you have to open the block device \\.\PhysicalDisk\X, and then you can use ReadFile and WriteFile.

It's still an extremely bad idea. They could just as well defined two callbacks within fuse that read and write sector data. As it is now, I'll have to do this myself on the drivers I want to use since I won't support doing this through the C file handle function. That whole idea that the file system actually should use the file system is quite bizarre.

Besides, real file systems are part of partitioning schemes, and so every access that the file system does must be biased with the start sector of the partition. By having this in a read-write sector interface instead of directly manipulating files, the fuse part could bias & limit check those.

bzt wrote:
rdos wrote:Another problem is that it is not re-entrant and the code relies on a higher level interface where paths are parsed internally.
Not sure why would you need re-entrancy, but that higher-level interface is the one that libfuse provides, and in all and every fuse driver it is the driver's job to parse the path internally (at least the portion that's inside the image).

By using the inode (low-level) interface I can write the general caching code myself, and make sure it is reentrant. As long as the filesystem supports using this interface, it won't need internal synchronization. Besides, they should also have defined a mutex interface in fuse.h so that non-Posix systems could have provided the synchronization primitives themselves.

bzt wrote:
rdos wrote:Another problem is that there is no ext4 fuse that has write operations, and no Reiser or other complex filesystem.
How about this one (read-write ext2/3/4)? I've never used raiserfs so I don't know about that.

OK, so the ext2 fuse supports ext2-ext4 with read/write. Excellent.

rdos · Post by **rdos** » Thu Mar 18, 2021 2:42 am

There are other issues with basing your real filesystem exclusively on fuse. For one, you obviously cannot load the filesystem driver as an ordinary application from the filesystem. It must be part of the OS image loaded with BIOS or EFI. I've fixed this issue now by including a new device type "server" that contains an ordinary application. The PE loader now can load it without using the filesystem by copying data from the OS image to the local address space. It's even possible to attach the normal application debugger to the file system application so it can be debugged as a normal application.

I'll start by doing a new USB disc driver that eventually will use Fuse. This seems like a good environment since the disc can be mounted & unmounted at any time by inserting & removing the USB drive.

bzt · Post by **bzt** » Thu Mar 18, 2021 2:49 am

rdos wrote:I'll limit myself to only providing a compatible interface for the low-level functions. Those will allow me to put synchronization primitives on path parsing, modifying metadata and file data. The high-level functions parse complete paths within the filesystem implementation, and unless the writers thought about implementing synchronization, supporting the high-level interface would mean only one thread could execute within the file system at a time, a severe limitation that would affect performance.

Should have used the file abstraction all of this won't be needed.

rdos wrote:It's still an extremely bad idea. They could just as well defined two callbacks within fuse that read and write sector data.

Then how would a fuse driver use a disk image? And it would have to implement synchronization and caching itself. My opinion is, it's much better if we put all of that in a separate layer, then there's no need for complex code in the fuse drivers, they can focus on interpreting the file system structures. D.R.Y.

rdos wrote:As it is now, I'll have to do this myself on the drivers I want to use since I won't support doing this through the C file handle function. That whole idea that the file system actually should use the file system is quite bizarre.

But you should. It's not bizarre, it's a good way of removing complexity, and you don't have to implement anything in the drivers. The open/read/write/close syscalls don't necessarily use the VFS, in case of a device file, they can translate the operation into sector operations directly. Think of it this way: the sector read and write functionality uses exactly the same API as the file system API, but that doesn't make them go through the VFS (so it isn't "file system actually should use the file system").

Code: Select all

int read(int fd, char *buf, int size)
{
  vfs_context ctx = get_context_for_fd(fd);
  if(is_block_device(ctx)) {
    return read_sector(ctx->blkdev, ctx->offset / ctx->blocksize, buf, size / ctx->blocksize);
  } else {
    return ctx->read(buf, size);
  }
}

I think it's perfectly fine if you expect the fuse drivers not to read at any arbitrary offsets and that the buffer size must be multiple of sector size. So for example the valid calls look like:

Code: Select all

// valid
seek(fd, 1024);
read(fd, buf, 512);
// invalid
seek(fd, 100);
read(fd, buf, 123);

rdos wrote:By using the inode (low-level) interface I can write the general caching code myself, and make sure it is reentrant.

I still don't understand. Not all filesystems have i-nodes, you shouldn't implement caching in every single fuse driver, and I still don't see why do you want to be re-entrant. I can see benefit in using the file abstraction re-entrant, but not the fuse library. (By file abstraction re-entrancy I mean you are using a file descriptor, so it could be a descriptor to a block device directly, or it could be a regular file's descriptor if that's a disk image which in turn resides on a block device. The fuse driver doesn't have to know nor care about. Plus using file descriptors allows implementing networking filesystems too via sockets.)

rdos wrote:As long as the filesystem supports using this interface, it won't need internal synchronization. Besides, they should also have defined a mutex interface in fuse.h so that non-Posix systems could have provided the synchronization primitives themselves.

Here you lost me again. Not all filesystems needs synchronization. As a matter of fact, most filesystems are written in a way to be entirely lock-free, either by using some journaling method or by implementing soft updates.

Cheers,
bzt

bzt · Post by **bzt** » Thu Mar 18, 2021 2:50 am

rdos wrote:There are other issues with basing your real filesystem exclusively on fuse. For one, you obviously cannot load the filesystem driver as an ordinary application from the filesystem.

I see no reason why you can't. On boot all microkernels face this issue, fuse or not. They use an initrd to overcome. The fuse driver could be in the initrd.
A monolithic kernel could use fuse as well, it's just an API, nobody said you can't implement libfuse in libk.

Cheers,
bzt

rdos · Post by **rdos** » Thu Mar 18, 2021 3:10 am

bzt wrote:
rdos wrote:I'll limit myself to only providing a compatible interface for the low-level functions. Those will allow me to put synchronization primitives on path parsing, modifying metadata and file data. The high-level functions parse complete paths within the filesystem implementation, and unless the writers thought about implementing synchronization, supporting the high-level interface would mean only one thread could execute within the file system at a time, a severe limitation that would affect performance.
Should have used the file abstraction all of this won't be needed.

I don't understand. The high-level interface basically is the normal file-API. You put in paths and get file handles. You write files & update metadata. What if you have soft links that are between partitions? Those obviously won't work. What if you write to two different files at the same time, and the cluster chain (FAT) get's corrupt because the writers have no internal synchronization? Fatfuse obviously doesn't have this. This leads to a need to put one big semaphore on the whole library!

Actually, using the high-level interface severely limits the usability.

bzt wrote: Then how would a fuse driver use a disk image?

Easy. You open the file in your application and let read/write sector use the file handle. It will be up to the one building application to decide what read/write sector would do. Besides, you failed to explain how you would handle a partition that resides at sector 0x50012 on a physical disc. You cannot use the physical device handle as then you would write at sector 0.

bzt wrote: And it would have to implement synchronization and caching itself. My opinion is, it's much better if we put all of that in a separate layer, then there's no need for complex code in the fuse drivers, they can focus on interpreting the file system structures. D.R.Y.

Agreed, but then the fuse drivers should work with inodes, not full paths & handles. It's actually a lot simpler to do a driver based on inodes compared to parsing full paths.

bzt wrote: I still don't understand. Not all filesystems have i-nodes, you shouldn't implement caching in every single fuse driver, and I still don't see why do you want to be re-entrant.

The i-node concept is a bit problematic. It should be named "handle" or something. It's just a 64-bit number for referencing something in the file system, and I think every possible implementation can support this.

bzt wrote: I can see benefit in using the file abstraction re-entrant, but not the fuse library. (By file abstraction re-entrancy I mean you are using a file descriptor, so it could be a descriptor to a block device directly, or it could be a regular file's descriptor if that's a disk image which in turn resides on a block device. The fuse driver doesn't have to know nor care about. Plus using file descriptors allows implementing networking filesystems too via sockets.)

In my design, you can link read-write sector to anything. I don't like the "everything is a file concept", and I certainly will not support it in this context.

rdos · Post by **rdos** » Thu Mar 18, 2021 3:21 am

bzt wrote:
rdos wrote:There are other issues with basing your real filesystem exclusively on fuse. For one, you obviously cannot load the filesystem driver as an ordinary application from the filesystem.
I see no reason why you can't. On boot all microkernels face this issue, fuse or not. They use an initrd to overcome. The fuse driver could be in the initrd.
A monolithic kernel could use fuse as well, it's just an API, nobody said you can't implement libfuse in libk.

Cheers,
bzt

For most uses, I don't see a need to manually mount things. This should be an automatic process. When you boot, the disc drivers will read the partition tables and start the appropriate file system drivers. When a USB drive is attached, the USB disk driver would load the partition tables and start the appropriate file system drivers. When it is unplugged, the drivers are stopped and unloaded.

It's a bit of Unix legacy that stuff must be explicitly mounted. A bit outdated too.

Sure, I can see how a user might want to mount a file within the file system and try out his own file system, but that's more of an exception.

Besides, I don't think I want to run the fuse drivers in kernel mode when I boot and user mode when I do stuff manually. Both should operate the same way.

bzt · Post by **bzt** » Thu Mar 18, 2021 5:46 am

rdos wrote:I don't understand. The high-level interface basically is the normal file-API. You put in paths and get file handles. You write files & update metadata. What if you have soft links that are between partitions? Those obviously won't work. What if you write to two different files at the same time, and the cluster chain (FAT) get's corrupt because the writers have no internal synchronization? Fatfuse obviously doesn't have this. This leads to a need to put one big semaphore on the whole library!

Not necessarily. Most kernels can solve this just fine. You see, if you use the file abstraction for block devices, then it's pretty easy to create a FIFO queue between the device writes and the actual sector writes. By checking the queue on reads you can make sure of it there'll be no sync issues without locking (or you could simply block the reader until the write queue isn't empty).

rdos wrote:Actually, using the high-level interface severely limits the usability.

Let's just agree on that we disagree

rdos wrote:Easy. You open the file in your application and let read/write sector use the file handle.

Now how would that be any different to the POSIX file abstraction, API-wise? There you open the file and you use read/write on the file handle (which then are translated into sector read/writes in the kernel if the handle is for a block device, but that's transparent to you).

rdos wrote:It will be up to the one building application to decide what read/write sector would do. Besides, you failed to explain how you would handle a partition that resides at sector 0x50012 on a physical disc. You cannot use the physical device handle as then you would write at sector 0.

That's what partition devices are for. The kernel can take care of sub-sector writes, or different logical sector size and physical sector size for that matter. (For example, a fuse driver could simply use 4096 blocks and let the kernel handle if the actual storage has really 4096 byte blocks or just old-fashioned 512 byte blocks.)

Maybe take a look at the Minix source, it's a lot simpler than the Linux source. In Minix, there's a special function, read_write(), which is responsible for translating the read() and write() calls into a series of block read/writes (if handle is sector based, like a pipe or a block device). The code that converts offsets and sizes into a series of fixed sized "chunk" starts at line 128. The actual code that handles sub-sector read and writes is at line 265 in function rw_chunk().

rdos wrote:Agreed, but then the fuse drivers should work with inodes, not full paths & handles.

You still don't get it, not all filesystems have i-nodes. What about rarfs for example? There are no i-nodes in a rar archive, only paths. And just for the records, handles aren't mandatory for fuse, it's just what most driver use because of their simplicity. You could use IO ports to talk directly to the ATA controller for example, as long as you have privilege to do that, fuse wouldn't care.

rdos wrote:The i-node concept is a bit problematic. It should be named "handle" or something. It's just a 64-bit number for referencing something in the file system, and I think every possible implementation can support this.

The problem is, those are entirely and completely different concepts. You simply can't replace one with another. A file system either uses i-nodes or not. For example there are no i-nodes in FAT, yet you can get a file descriptor, a FILE *f handle and a Win32 FileHandle for a path on FAT. I-node isn't just a 64-bit number, and it is totally unrelated to the id that the VFS uses to identify opened files (that can be an FCB struct like in CP/M, or just a wrapper around file descriptors like most FILE* implementation these days, see fileno()).

rdos wrote:In my design, you can link read-write sector to anything. I don't like the "everything is a file concept", and I certainly will not support it in this context.

Not every device that might contain a file system supports sectors. See sshfs for example. Or NFS. Or the aforementioned rarfs. They don't work on sector based block devices, rather on byte streams (where seek exists) or on network streams (where there's no seek and a message has arbitrary length).

Cheers,
bzt

rdos · Post by **rdos** » Thu Mar 18, 2021 6:29 am

bzt wrote:Not necessarily. Most kernels can solve this just fine. You see, if you use the file abstraction for block devices, then it's pretty easy to create a FIFO queue between the device writes and the actual sector writes. By checking the queue on reads you can make sure of it there'll be no sync issues without locking (or you could simply block the reader until the write queue isn't empty).

Nothing can solve this if the implementation isn't reentrant, other than only allowing one operation at a time. I think that is basic stuff. Sure, you can create a queue of commands, but that basically is like putting a semaphore on the whole fuse module. I'm not implementing this with queues. I'll pass requests directly from the user mode application to one or more server threads.

Besides, fuse does have a multitasking option, but it obviously depends on using a reentrant implementation.

bzt wrote:
rdos wrote:Easy. You open the file in your application and let read/write sector use the file handle.
Now how would that be any different to the POSIX file abstraction, API-wise? There you open the file and you use read/write on the file handle (which then are translated into sector read/writes in the kernel if the handle is for a block device, but that's transparent to you).

Because it allows me to connect the read-write operations to anything I like, not just stuff that can be connected to file handles. Additionally, by using this interface you will always need to copy stuff between the kernel and a user mode buffer. I plan to integrate Fuse with the device buffering scheme, and so if we have an AHCI driver, it will get the physical address of the buffer, put it in the memory-based schedule, and when it is done, it will be mapped to the file system driver in the linear address space. No need for any copy.

bzt wrote: You still don't get it, not all filesystems have i-nodes. What about rarfs for example? There are no i-nodes in a rar archive, only paths. And just for the records, handles aren't mandatory for fuse, it's just what most driver use because of their simplicity. You could use IO ports to talk directly to the ATA controller for example, as long as you have privilege to do that, fuse wouldn't care.

In rar, the i-node would be the file position of the object.

bzt wrote: For example there are no i-nodes in FAT, yet you can get a file descriptor, a FILE *f handle and a Win32 FileHandle for a path on FAT. I-node isn't just a 64-bit number, and it is totally unrelated to the id that the VFS uses to identify opened files (that can be an FCB struct like in CP/M, or just a wrapper around file descriptors like most FILE* implementation these days, see fileno()).

Wrong. I-nodes are cluster numbers in FAT.

Korona · Post by **Korona** » Thu Mar 18, 2021 6:33 am

UNIX inodes are not arbitrary handles. The inode identifies the file data + metadata and not its directory entry. An inode can exist even if there is no path to it (if you delete the directory entry while the file is still open), and several paths can lead to the same inode.

rdos · Post by **rdos** » Thu Mar 18, 2021 6:38 am

Korona wrote:UNIX inodes are not arbitrary handles. The inode identifies the file data + metadata and not its directory entry. An inode can exist even if there is no path to it (if you delete the directory entry while the file is still open), and several paths can lead to the same inode.

It's the same thing as cluster numbers in FAT. Both files and directories can be identified by their starting cluster number. The i-node value is returned by the low-level "open" function to the Fuse framework, and so it is up to the filesystem implementation to decide what these numbers actually mean.

thewrongchristian · Post by **thewrongchristian** » Thu Mar 18, 2021 8:29 am

rdos wrote:
bzt wrote: For example there are no i-nodes in FAT, yet you can get a file descriptor, a FILE *f handle and a Win32 FileHandle for a path on FAT. I-node isn't just a 64-bit number, and it is totally unrelated to the id that the VFS uses to identify opened files (that can be an FCB struct like in CP/M, or just a wrapper around file descriptors like most FILE* implementation these days, see fileno()).
Wrong. I-nodes are cluster numbers in FAT.

You can certainly use the cluster number as an i-node number, in that it provides an identifying handle for that file.

But I wouldn't call it an i-node. It doesn't contain what an i-node in a UNIX like OS would contain. FAT has a combined directory entry/i-node structure, meaning there is a 1 to 1 mapping between a directory entry and a file. No hard links are possible by linking two directory entries to a single FAT cluster chain, for example (well, you probably could for a RO file system, but I don't think it'd survive a chkdsk run.)

I too am looking at FUSE or FUSE like functionality to provide my filesystems, with an initrd bootstrap strategy to load the root FS driver via FUSE(alike), so I'm certainly interested in this thread and how it pans out.

My reservations about FUSE itself revolve around:

Identifying files. The FUSE operations tend to identify files by name. While that's good for things like archive filesystems, which tend to use file names in the archive to locate files, it doesn't really mirror VFS operations, which resolve path names to vnodes pathname component at a time.
Related to above, I want my FUSE interface to be stateless, to allow the restart of failed filesystem drivers. NFS achieves this by identifying files using handles. I think for FAT, I'd be more inclined to use the directory entry address. I'd have to check, but I'm not sure this handle goes back and forth over the protocol/API in FUSE.
Buffer sharing. I think the FUSE protocol sends buffer contents over the FUSE channel, rather than using some shared memory mechanism. This has obvious performance implications with all the copying required. But I'd have to confirm this.
Licensing. I've not decided what license to use for my kernel, as and when I "release" it. If I go the FUSE route, that will pretty much tie me into GPL if I reuse any of the kernel side FUSE code.

NetBSD PUFFS seems to be a better match to my requirements, mapping more closely to the VFS than FUSE appears to, plus it also has a FUSE compatibility wrapper, so it might be the best of both worlds. I'm not yet at the point, though, of investigating the details, and I'm not sure if PUFFS also has the same issue with FUSE in how buffer contents are read/written over the PUFFS channel.

rdos · Post by **rdos** » Thu Mar 18, 2021 2:33 pm

thewrongchristian wrote: NetBSD PUFFS seems to be a better match to my requirements, mapping more closely to the VFS than FUSE appears to, plus it also has a FUSE compatibility wrapper, so it might be the best of both worlds. I'm not yet at the point, though, of investigating the details, and I'm not sure if PUFFS also has the same issue with FUSE in how buffer contents are read/written over the PUFFS channel.

I think I will use the same concept as PUFF. I'll create my own interface and then provide a FUSE layer on top of it. Then I will take some important filesystems like FAT, NTFS & ext and port them to my own user mode interface. This would allow for zero-copy implentations of file IO for important filesystems.

OSDev.org

Using FUSE as an VFS

Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS

Re: Using FUSE as an VFS