Ways to implement partitions in drives

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
j4cobgarby
Member
Member
Posts: 64
Joined: Fri Jan 26, 2018 11:43 am

Ways to implement partitions in drives

Post by j4cobgarby »

I'm implementing the filesystem in my operating system at the moment, and have been wondering what various ways people implement partitions within drives.

My filesystem implementation at the moment basically consists of several tables:

- A table of different types of physical drive, where each type contains function pointers for reading and writing sectors in that type of drive.
- A table of "filesystem descriptor" structs, which contain function pointers for manipulating a filesystem.
- A table of drives. Each drive has an index into the drive type table, and an index into the filesystem type table. Each drive also has a void pointer to store arbitrary metadata about that specific drive.

Now, I want the system to be able to handle partitioned drives, but I'm unsure of the best way to do this. What I'm thinking is that, for example if I have a physical drive called A, and it has two partitions which I'll call B and C, then I will have an entry in my drive table for A, and also one for B and one for C. For each drive I'll store a value in its metadata to represent the first sector to read from (so, and offset to add when reading or writing sectors), and for A that would be 0, to represent the whole drive. Then for each partition it would be mostly the same metadata except that the offset would be the beginning of the partition, and the sector count would be the amount of sectors in the partition.

I think this would work, but I'm interested to see how other people would go about doing this. Any ideas?
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Ways to implement partitions in drives

Post by bzt »

Yep, that's a good way to do it. I can't think of a simpler way, that's what all the other OSes do. (Linux creates special device files where you have "table of drives", calling the entries "/dev/sda", "/dev/sda1", "/dev/sda2" etc. instead of "A", "B", "C" etc. and in reality all they differ is just the starting sector and the size, they use the same driver under the hood. Exactly the same for Minix too)

Cheers,
bzt
rdos
Member
Member
Posts: 3297
Joined: Wed Oct 01, 2008 1:55 pm

Re: Ways to implement partitions in drives

Post by rdos »

j4cobgarby wrote:I'm implementing the filesystem in my operating system at the moment, and have been wondering what various ways people implement partitions within drives.

My filesystem implementation at the moment basically consists of several tables:

- A table of different types of physical drive, where each type contains function pointers for reading and writing sectors in that type of drive.
- A table of "filesystem descriptor" structs, which contain function pointers for manipulating a filesystem.
- A table of drives. Each drive has an index into the drive type table, and an index into the filesystem type table. Each drive also has a void pointer to store arbitrary metadata about that specific drive.

Now, I want the system to be able to handle partitioned drives, but I'm unsure of the best way to do this. What I'm thinking is that, for example if I have a physical drive called A, and it has two partitions which I'll call B and C, then I will have an entry in my drive table for A, and also one for B and one for C. For each drive I'll store a value in its metadata to represent the first sector to read from (so, and offset to add when reading or writing sectors), and for A that would be 0, to represent the whole drive. Then for each partition it would be mostly the same metadata except that the offset would be the beginning of the partition, and the sector count would be the amount of sectors in the partition.

I think this would work, but I'm interested to see how other people would go about doing this. Any ideas?
You should separate discs and drives. The disc interface has functions to read/write sector data. A drive should be connected to a range of sectors within a disc and contain functions related to filesystems.
rdos
Member
Member
Posts: 3297
Joined: Wed Oct 01, 2008 1:55 pm

Re: Ways to implement partitions in drives

Post by rdos »

bzt wrote:Yep, that's a good way to do it. I can't think of a simpler way, that's what all the other OSes do. (Linux creates special device files where you have "table of drives", calling the entries "/dev/sda", "/dev/sda1", "/dev/sda2" etc. instead of "A", "B", "C" etc. and in reality all they differ is just the starting sector and the size, they use the same driver under the hood. Exactly the same for Minix too)

Cheers,
bzt
To have disc & drive interfaces as devices in the filesystem is a horrible idea.

Besides, DOS/Windows only uses drive letters for drives, not discs. For obvious reasons since discs don't have a filesystem, that's only part of drives.

So, in this example you would have c: and d:, not a:, b: and c:. This is because a: and b: is reserved for floppies, and hard drives always start with c:. In my design, I use b: for the EFI system partition, partly because you are unlikely to have two floppies with an UEFI BIOS, and partly because I want a fixed drive for the EFI system partition.
j4cobgarby
Member
Member
Posts: 64
Joined: Fri Jan 26, 2018 11:43 am

Re: Ways to implement partitions in drives

Post by j4cobgarby »

rdos wrote:
bzt wrote:Yep, that's a good way to do it. I can't think of a simpler way, that's what all the other OSes do. (Linux creates special device files where you have "table of drives", calling the entries "/dev/sda", "/dev/sda1", "/dev/sda2" etc. instead of "A", "B", "C" etc. and in reality all they differ is just the starting sector and the size, they use the same driver under the hood. Exactly the same for Minix too)

Cheers,
bzt
To have disc & drive interfaces as devices in the filesystem is a horrible idea.

Besides, DOS/Windows only uses drive letters for drives, not discs. For obvious reasons since discs don't have a filesystem, that's only part of drives.

So, in this example you would have c: and d:, not a:, b: and c:. This is because a: and b: is reserved for floppies, and hard drives always start with c:. In my design, I use b: for the EFI system partition, partly because you are unlikely to have two floppies with an UEFI BIOS, and partly because I want a fixed drive for the EFI system partition.
It does seem to make sense to separate discs and drives, but why would it be such a bad idea to have both discs and drives both devices?
rdos
Member
Member
Posts: 3297
Joined: Wed Oct 01, 2008 1:55 pm

Re: Ways to implement partitions in drives

Post by rdos »

j4cobgarby wrote: It does seem to make sense to separate discs and drives, but why would it be such a bad idea to have both discs and drives both devices?
I'm just allergic to the "everything is a file" concept. :-)

I have internal APIs for discs and so I don't need the filesystem to open a file to be able to access disc data. It's much cleaner & easier to use the disc buffering API that discs export to read disc data.
j4cobgarby
Member
Member
Posts: 64
Joined: Fri Jan 26, 2018 11:43 am

Re: Ways to implement partitions in drives

Post by j4cobgarby »

rdos wrote:
j4cobgarby wrote: It does seem to make sense to separate discs and drives, but why would it be such a bad idea to have both discs and drives both devices?
I'm just allergic to the "everything is a file" concept. :-)

I have internal APIs for discs and so I don't need the filesystem to open a file to be able to access disc data. It's much cleaner & easier to use the disc buffering API that discs export to read disc data.
Fair enough haha

I should've mentioned that in my operating system I'm not exactly doing that "everything is a file" concept, the drives will be accessed sort of like in windows, except instead of letters identifying drives it'll be numbers, so for instance a path could look like "0:/my/directory/file.txt"
User avatar
AndrewAPrice
Member
Member
Posts: 2300
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: Ways to implement partitions in drives

Post by AndrewAPrice »

j4cobgarby wrote:I should've mentioned that in my operating system I'm not exactly doing that "everything is a file" concept, the drives will be accessed sort of like in windows, except instead of letters identifying drives it'll be numbers, so for instance a path could look like "0:/my/directory/file.txt"
I'm doing something similar, except giving it a name. e.g.
/Optical 1/
/Hard Drive 1/
/Hard Drive 2/
/USB 1/

My launcher will scan for a subdirectory '/Applications/' on each device. So, for example, '/Optical 1/Applications/Calculator/Calculator.app' will show up.

I was thinking of adding special mount paths, e.g. /Application/ that will be the running program's files - e.g. so '/Application/Equals Symbol.png' would be shorthand for '/Optical 1/Applications/Calculator/Assets/Equals Symbol.png' without it caring what drive it's running on. Likewise, /Documents/ could be shorthand for '/Hard Drive 1/Documents/Andrew/'.
My OS is Perception.
thewrongchristian
Member
Member
Posts: 426
Joined: Tue Apr 03, 2018 2:44 am

Re: Ways to implement partitions in drives

Post by thewrongchristian »

rdos wrote:
j4cobgarby wrote:I'm implementing the filesystem in my operating system at the moment, and have been wondering what various ways people implement partitions within drives.

My filesystem implementation at the moment basically consists of several tables:

- A table of different types of physical drive, where each type contains function pointers for reading and writing sectors in that type of drive.
- A table of "filesystem descriptor" structs, which contain function pointers for manipulating a filesystem.
- A table of drives. Each drive has an index into the drive type table, and an index into the filesystem type table. Each drive also has a void pointer to store arbitrary metadata about that specific drive.

Now, I want the system to be able to handle partitioned drives, but I'm unsure of the best way to do this. What I'm thinking is that, for example if I have a physical drive called A, and it has two partitions which I'll call B and C, then I will have an entry in my drive table for A, and also one for B and one for C. For each drive I'll store a value in its metadata to represent the first sector to read from (so, and offset to add when reading or writing sectors), and for A that would be 0, to represent the whole drive. Then for each partition it would be mostly the same metadata except that the offset would be the beginning of the partition, and the sector count would be the amount of sectors in the partition.

I think this would work, but I'm interested to see how other people would go about doing this. Any ideas?
You should separate discs and drives. The disc interface has functions to read/write sector data. A drive should be connected to a range of sectors within a disc and contain functions related to filesystems.
Why should a disc (presumably by which I think you mean a physical drive) and drives (by which I think you mean a logical drive that contain a file system) be treated any differently to each other?

They both provide a block based view of sector based storage. Filesystems read and write to either in essentially in the same way, and whole discs can be used to store a single file system, and a file system can span more than a single disc (for example, RAID).

You're putting up artificial API barriers where none really exist. In fact, what you're advocating is more along the lines of the raw/buffered device model in traditional UNIX, where /dev/sd0 might be the buffered block interface to the first disc that can be used by file systems, whereas /dev/rsd0 is an unbuffered character device, which talks more directly to the underlying disc and is used by whatever the hell used to require unbuffered access to the disc (fsck?) I never understood that (false IMO) dichotomy either.

My 2c on the original question (theoretical, unimplemented as of yet.)

I intend to use a device manager in my kernel, which defines a simple device interface/properties for a device:

- Device address
- Device driver, which will be filled in once a driver claims ownership
- Device name
- Device type (disc, bus, fb etc.)
- Functions to abstractly enumerate the device contents.

The meat will be the enumeration. When used, the device will do whatever device enumeration the driver supports. For each enumerated sub-device, create a new device instance of the above, and broadcast its arrival in the kernel.

So, I'll have a bootstrap device type, which for a PC, might be an ACPI root device. Enumerating that will result in devices being created for each device described in the device tree in ACPI. That tree will likely include an (S)ATA disc, and when that is created, partition device drivers will probe it to see whether it contains an appropriate partition table (MBR, GPT), and once the appropriate partition driver has attached, the device is enumerated, and the partition table is read and a device for each partition created.

The same device interface will be used for whatever type of device is being enumerated. For a PCI bus, it'll enumerate PCI devices. For a USB controller, it'll enumerate USB devices connected to the root hub. For a USB hub, it'll do the same as for the root hub, but just under this USB hub.

Drivers will register interest in new device attachments using device attributes that can be matched. For a PCI driver, we'll have probes for PCI device class/subclass/function or PCI vendor information. For USB devices, we'll have similar USB device information available. For SATA block devices, we just mark devices attached to a SATA controller as a block device. For USB storage devices, the USB UMS driver will enumerate block devices contained in the USB device.

This all sounds a bit like BSD auto configuration. I'll have to look into the details to see how closely the above models the BSD model.

This recursive device discovery using enumeration allows partition drivers to simply probe newly attached block devices, and create the relevant partitions on the fly, which can themselves also contain partitions (and more devices). Once we get to the level of devices with file systems on it, the file system driver can attach, and read the file system parameters (such as disk label/UUID, status) and make the corresponding device manager file system entries to make the newly attached device mountable via whatever identification mechanism required (mount by device address, or UUID, or volume label) the user level code decides.
rdos
Member
Member
Posts: 3297
Joined: Wed Oct 01, 2008 1:55 pm

Re: Ways to implement partitions in drives

Post by rdos »

thewrongchristian wrote: Why should a disc (presumably by which I think you mean a physical drive) and drives (by which I think you mean a logical drive that contain a file system) be treated any differently to each other?
It's a bit like why would a class that handles discs and raw sectors be mixed up with a partition on the disc that contains file system data. Also, drives could be new mount-points in the filesystem, a Fuse driver or even a file in the filesystem that contains raw data.
thewrongchristian wrote: They both provide a block based view of sector based storage. Filesystems read and write to either in essentially in the same way, and whole discs can be used to store a single file system, and a file system can span more than a single disc (for example, RAID).
Not so. A disc is a physical device that contains sectors with data. A drive is something that has files & directories and it is not accessed as raw sector data, rather with file & directory operations.
thewrongchristian wrote: You're putting up artificial API barriers where none really exist. In fact, what you're advocating is more along the lines of the raw/buffered device model in traditional UNIX, where /dev/sd0 might be the buffered block interface to the first disc that can be used by file systems, whereas /dev/rsd0 is an unbuffered character device, which talks more directly to the underlying disc and is used by whatever the hell used to require unbuffered access to the disc (fsck?) I never understood that (false IMO) dichotomy either.
I have no /dev/x in my concept. A disc is a memory object that contains read & write pointers to access it's raw sector data. A drive is a memory object that has VFS pointers that allows you to open files, create directories and other similar filesystem operations. A drive does not have read/write pointers to read or write sector data.
thewrongchristian wrote: My 2c on the original question (theoretical, unimplemented as of yet.)

I intend to use a device manager in my kernel, which defines a simple device interface/properties for a device:

- Device address
- Device driver, which will be filled in once a driver claims ownership
- Device name
- Device type (disc, bus, fb etc.)
- Functions to abstractly enumerate the device contents.

The meat will be the enumeration. When used, the device will do whatever device enumeration the driver supports. For each enumerated sub-device, create a new device instance of the above, and broadcast its arrival in the kernel.
That's certainly not how my model works. I have no device manager in my system. I have a vfs device (in the new design, and a drive device in the old), that export a function so discs can register themselves. They basically do this by allocating the disc structure, fills in the function pointers for read/write and then sends this to the vfs device that keeps track of which discs are present. When it has registered itself, a server thread is started in the vfs device that first starts an initialization thread and then waits for disc requests. The initialization thread reads the partitions from the disc and starts filesystem servers for known filesystems. The read operation is queued in the disc cache, the server thread is signalled which then does the read from the device.

The is no coordination in this model. Every loaded disc device that discovers some disc will do the register process itself which starts the server thread and the filesystem threads.
thewrongchristian wrote: So, I'll have a bootstrap device type, which for a PC, might be an ACPI root device. Enumerating that will result in devices being created for each device described in the device tree in ACPI. That tree will likely include an (S)ATA disc, and when that is created, partition device drivers will probe it to see whether it contains an appropriate partition table (MBR, GPT), and once the appropriate partition driver has attached, the device is enumerated, and the partition table is read and a device for each partition created.
Doesn't work for USB discs. They are not part of ACPI, only the USB PCI devices are. USB discs servers are started with an USB attach that determines it's a disc, which results in creating the disc object and the server threads.

For other examples, the AHCI device driver will ask PCI for specific device types and then create the disc object and the server threads. The IDE device driver probes for legacy disc hardware, but also asks PCI for specific device types. The floppy driver will exclusively probe for legacy hardware.
thewrongchristian
Member
Member
Posts: 426
Joined: Tue Apr 03, 2018 2:44 am

Re: Ways to implement partitions in drives

Post by thewrongchristian »

rdos wrote:
thewrongchristian wrote: They both provide a block based view of sector based storage. Filesystems read and write to either in essentially in the same way, and whole discs can be used to store a single file system, and a file system can span more than a single disc (for example, RAID).
Not so. A disc is a physical device that contains sectors with data. A drive is something that has files & directories and it is not accessed as raw sector data, rather with file & directory operations.
Right, so when you say "drive", I'd equate that to a mounted filesystem, and the API for that is the VFS API?
rdos wrote:
thewrongchristian wrote: I intend to use a device manager in my kernel, which defines a simple device interface/properties for a device:

- Device address
- Device driver, which will be filled in once a driver claims ownership
- Device name
- Device type (disc, bus, fb etc.)
- Functions to abstractly enumerate the device contents.

The meat will be the enumeration. When used, the device will do whatever device enumeration the driver supports. For each enumerated sub-device, create a new device instance of the above, and broadcast its arrival in the kernel.
That's certainly not how my model works. I have no device manager in my system. I have a vfs device (in the new design, and a drive device in the old), that export a function so discs can register themselves. They basically do this by allocating the disc structure, fills in the function pointers for read/write and then sends this to the vfs device that keeps track of which discs are present. When it has registered itself, a server thread is started in the vfs device that first starts an initialization thread and then waits for disc requests. The initialization thread reads the partitions from the disc and starts filesystem servers for known filesystems. The read operation is queued in the disc cache, the server thread is signalled which then does the read from the device.

The is no coordination in this model. Every loaded disc device that discovers some disc will do the register process itself which starts the server thread and the filesystem threads.
Is this managed in the kernel, or at the user process level?

How do you, for example, prevent the kernel grabbing a newly inserted USB storage device and automatically mounting the contained file systems, when all you want to do is format or write a new raw image?

Do you have a mount/umount equivalent?
rdos wrote:
thewrongchristian wrote: So, I'll have a bootstrap device type, which for a PC, might be an ACPI root device. Enumerating that will result in devices being created for each device described in the device tree in ACPI. That tree will likely include an (S)ATA disc, and when that is created, partition device drivers will probe it to see whether it contains an appropriate partition table (MBR, GPT), and once the appropriate partition driver has attached, the device is enumerated, and the partition table is read and a device for each partition created.
Doesn't work for USB discs. They are not part of ACPI, only the USB PCI devices are. USB discs servers are started with an USB attach that determines it's a disc, which results in creating the disc object and the server threads.

For other examples, the AHCI device driver will ask PCI for specific device types and then create the disc object and the server threads. The IDE device driver probes for legacy disc hardware, but also asks PCI for specific device types. The floppy driver will exclusively probe for legacy hardware.
In this example above, ACPI just provides some abstract root device to start enumeration, but it could just as easily be the device tree, or hard-coded memory mapped devices.

The beauty of an abstract device interface is it doesn't matter what the device type is. If enumeration makes sense, it'll produce new devices whether you're enumerating a PCI bus, an AHCI HCI, a USB HCI, or a partition table. They'll all spit out abstract devices. If enumeration doesn't make sense, such as for a leaf node device, then those devices just become available for use directly. For a USB storage device, the leaf device nodes will be block buffered device(s), the same as IDE storage or SCSI storage. The consumers of block buffered devices (probably a file system) won't care about the device topology.

A PCI USB device knows how to enable the USB HCI, and how to enumerate the root hub, so it has all it needs to do its side of the enumeration and discover attached USB devices, make them addressable and creating new abstract device nodes for further consumption. The USB storage driver will be alerted to a new USB storage device being attached and produce a block device driver on enumeration, which can be picked up and enumerated by a partition driver, thus producing more block devices, one for each partition, until we have leaf node block devices.

The overhead is per-device driver enumeration code that must match some abstract interface. But you'll need that code anyway if a device supports enumeration, so the only actual overhead is the resulting abstract device tree.
rdos
Member
Member
Posts: 3297
Joined: Wed Oct 01, 2008 1:55 pm

Re: Ways to implement partitions in drives

Post by rdos »

thewrongchristian wrote: Right, so when you say "drive", I'd equate that to a mounted filesystem, and the API for that is the VFS API?
Yes.
thewrongchristian wrote: Is this managed in the kernel, or at the user process level?
Which devices that a system supports are decided when the OS image is built by including them in the configuration file. When the OS image is booted, every device has an "Init" point, where it will announce its user-level and kernel-level interface (by exporting APIs). At this point, it can also request server threads to be created, hook initialization points (like PCI initialization), or register discs or other device types. For an USB device, a device will hook the attach & detach process so it can determine if it relates to the device-type it supports. If it does, it will initialize it and in the case of a disc, register the disc with the VFS device, which starts the partition discovery process.

So, it's actually not managed anywhere. It's the device itself that is responsible for registering itself.
thewrongchristian wrote: How do you, for example, prevent the kernel grabbing a newly inserted USB storage device and automatically mounting the contained file systems, when all you want to do is format or write a new raw image?
I don't have to. Since a while back I can recreate partitions and start and stop filesystems dynamically. This is initiated with a disc class that can be bundled with an application or done through the command shell. So, I cannot stop a filesystem from being mounted (other than not including the relevant filesystem driver), but if it is reformatted it will be remounted with the new contents. I can even change from MBR to GPT on the fly now without needing a boot disc.
thewrongchristian wrote: Do you have a mount/umount equivalent?
I can still mount a file as a filesystem and create new mount-points from user-mode (at least should be able to if it's not broken).
thewrongchristian wrote: In this example above, ACPI just provides some abstract root device to start enumeration, but it could just as easily be the device tree, or hard-coded memory mapped devices.

The beauty of an abstract device interface is it doesn't matter what the device type is. If enumeration makes sense, it'll produce new devices whether you're enumerating a PCI bus, an AHCI HCI, a USB HCI, or a partition table. They'll all spit out abstract devices. If enumeration doesn't make sense, such as for a leaf node device, then those devices just become available for use directly. For a USB storage device, the leaf device nodes will be block buffered device(s), the same as IDE storage or SCSI storage. The consumers of block buffered devices (probably a file system) won't care about the device topology.

A PCI USB device knows how to enable the USB HCI, and how to enumerate the root hub, so it has all it needs to do its side of the enumeration and discover attached USB devices, make them addressable and creating new abstract device nodes for further consumption. The USB storage driver will be alerted to a new USB storage device being attached and produce a block device driver on enumeration, which can be picked up and enumerated by a partition driver, thus producing more block devices, one for each partition, until we have leaf node block devices.

The overhead is per-device driver enumeration code that must match some abstract interface. But you'll need that code anyway if a device supports enumeration, so the only actual overhead is the resulting abstract device tree.
I have device-trees for informational purposes. For instance, I can list nodes in ACPI and I can list all PCI devices from the command shell. I can list all USB devices and their configuration descriptors. I can list all discs, their partitions, including start positions and size, filesystem and current drive letter if it is mounted. I can also list all the connection points of HD audio to see the connections and possibly debug problems. I can list all properties of USB HID devices. I can list all processes, which images they use, and which threads are connected to them. Threads must have names in my design, which isn't supported by Posix. I can even show the current user-level call-stack per thread, and the state of the thread. However, I don't use any of this information in the kernel. It's solely for informational purposes and to debug problems.
Post Reply