What generates operating system files/folders, and when?

j4cobgarby · Post by **j4cobgarby** » Tue Oct 29, 2019 9:31 am

This is probably a really simple question, but I'm just beginning to think about hard drive access in my operating system and then realised that I actually don't know what bit of software creates the OS filesystem (for example /bin, /etc, etc.. on linux), and when it's generated.

Does the kernel, when booting up, check that all of these folders/files exist, and if they don't, it creates them? This doesn't seem quite right.

So let's say I'm booting up a computer from my kernel, which is loaded onto a CD-ROM. Hmm.. As I write this, it's all starting to make sense, so, is this correct?:

- The kernel is loaded into RAM by the bootloader.
- The kernel then can use hard drive reading procedures to talk to the CD-ROM or the hard disk. It would have to interpret the hard disk as the same filesystem format as the bootloader interpreted it as, when loading the kernel into RAM.
- The kernel can then access the various folders and files which are in the disk which the kernel was loaded from.

Please tell me if this is correct

In this case, though, how does the bootloader know what is code, and to load into ram, and what is data (such as the /bin, and /etc folders), so to not load those into ram? In other words, when the bootloader (say, Grub), is loading the kernel from the hard drive into ram (by searching for the 0xbadb002 magic number), how does it know when it's got to the end of the kernel program which it wants to load to ram, and into the beginning of data?

Sorry that this question was kind of all over the place, I hope it makes some amount of sense.

iansjack · Post by **iansjack** » Tue Oct 29, 2019 9:41 am

The installation program creates the directory structure when you first install the operating system. This is apart from some "files" that are created when the system boots (e.g. those in /proc). These don't necessarily exist as physical files on the hard disk but are memory-based or created on the fly.

Of course, as you use the operating system other files will be created/deleted.

~ · Post by ~ » Tue Oct 29, 2019 10:12 am

Execution logs from emulation (for example, to know which standard or 32-bit addresses -probably special/hardware-, which I/O ports, in which order, your BIOS firmware or native drivers call) so that for example you can derive a native driver from calling INT 10H video BIOS for each mode it is capable of.

Also, a file that will write a single char every time you press anything on your keyboard.

I have a 512-byte log file that has no fragmentation, that I just find with the partition information with PIO ATA READ SECTOR(S) (in the second IDE partition). I just type in the screen and if I run the command "dsklon", a function that accesses disk for that file gets enabled.

I could add code to that function to log the ports accessed by the emulated BIOS for being able to execute that sequence in protected mode without the BIOS, only by the log of memory and I/O ports as if it was an array of values to write and their addresses.

nullplan · Post by **nullplan** » Tue Oct 29, 2019 12:28 pm

Leaving what ~ said off to the side for the moment (does anyone here understand him anymore?), the kernel image is a file in a file system. The bootloader loads that file into memory. It is important to separate between the kernel image file and the kernel image in memory. How the kernel gets into memory is mostly immaterial. PXE boot for instance loads it via TFTP from a server. The kernel can then boot from whatever medium is desired, maybe even with NFS as root.

For boot loaders there are mostly two approaches: Either, like GRUB or syslinux, you parse the file system structure (even though each of syslinux, isolinux, and extlinux only understand a single FS each), and parse that at run time, or, like LILO, you save a list of blocks somewhere. The LILO installer asks the OS where the kernel image is on disk and writes the blocks into the bootloader itself, in order to load the image using BIOS.

Fundamentally, the boot loader is a pretty simple program that can only ever load predetermined files into memory. If that file happens to contain a kernel, then that's a happy coincidence, made possible by the intelligence of the configuration.

zaval · Post by **zaval** » Tue Oct 29, 2019 5:19 pm

Fundamentally, the boot loader is a pretty simple program that can only ever load predetermined files into memory. If that file happens to contain a kernel, then that's a happy coincidence, made possible by the intelligence of the configuration.

on fossilized systems, stuck in 70ths it is, but it's not the only possible approach, let's not confuse people, making them think one's beloved approach is the only one, whereas it's not even good one. OS loader could be way more sophisticated and dynamic in what it should do. and that is set with different ways of configuring/influencing it. for example user input during boot, can direct it to some special boot flow - say "safe boot" or "last known good" option. or different flags, it should take into account and pass them to the kernel - PAE, or 3+1 vs. 2+2 split on 32 bit systems. another thing of its not so simplicity is what set of functionality it should be capable of - it might support a bunch of file systems, though read only, - to be more robust in accessing storage. also - how many and what files it should load, as well may be fully configurable and not hardcoded. say, it loads "registry", some kind of it, and looks there for list of needed modules to load - kernel, boot drivers. this set is configured outside of the loader - at the OS runtime by the OS and driver installation procedures. and so on. also switching virtual memory on, some CPU configuring (TLBs, caches, branch predictor) and building initial mappings - are its responsibility, not so "simple".

and then realised that I actually don't know what bit of software creates the OS filesystem

as iansjack said, it's installation process creates this during installation. I'll add, that it copies it from the installation source medium (your "live" CD e.g.), it serves as a template, and is generated during build or packaging process, when you create your OS installation package. you put there relevant pieces (here, having modular OS is way more superior, than monolithic), they contain different sets of files, not necessarily all needed for a particular installation, but when the installation process is running, it chooses what's needed, what user confirmed he/she wants to install. this way, you create an OS installation instance on a target medium.

Does the kernel, when booting up, check that all of these folders/files exist, and if they don't, it creates them? This doesn't seem quite right.

no, it doesn't, it doesn't need to. it searches for some predefined places, where persistent configration is stored - registry in Windows for example. there, paths to other places might be specified (dynamically and configurable), so they don't need to be hardcoded. basically, your OS holds its persistent state information in some form of internal database, it's enough. depending on the development level of the OS, it could be some well thought out database, or just a messy set of text files, with different syntax in each, the only common thing between which is extreme unfriendliness to both the OS and human. xD

- The kernel then can use hard drive reading procedures to talk to the CD-ROM or the hard disk. It would have to interpret the hard disk as the same filesystem format as the bootloader interpreted it as, when loading the kernel into RAM.

yes, but with full support of the FS and other entities. meaning OS loader only needs to have read only support, it works in a restricted environment (interrupts masked, MMU can be off, no virtual memory etc), it doesn't need to support all features of the FS, and they might be numerous, just a read support to be able to read files out there. the same goes for registry (internal OS's configurational database), the OS loader needs to understand it as little as for being able to get out there the needed info on what to do, whereas OS needs to support everything, the full support - write, journalling (logging) etc.

- The kernel can then access the various folders and files which are in the disk which the kernel was loaded from.

obviously.

In this case, though, how does the bootloader know what is code, and to load into ram, and what is data (such as the /bin, and /etc folders), so to not load those into ram? In other words, when the bootloader (say, Grub), is loading the kernel from the hard drive into ram (by searching for the 0xbadb002 magic number), how does it know when it's got to the end of the kernel program which it wants to load to ram, and into the beginning of data?

Sorry that this question was kind of all over the place, I hope it makes some amount of sense.

generally speaking, it knows what to load and what not to, by following some protocol/convention. grub does this too, but it's too generic, so you need to adapt to it, whereas an OS specific loader is more specific, more fine grained in its knowledge about the kernel needs - it follows conventions and satisfies demands of the particular OS. say, our loader knows, that it needs to read SYSTEM registry hive file (a piece of the aforementioned OS config database), so it has 1) understanding of a set of supported FSs. it asks firmware to read drive on the block (sector) level. then it identifies the FS, if it belongs to the supported set, it proceeds parsing its internal metadata, asking FW to read more sectors. finally it searches for SYSTEM hive file and if successfully found, loads it. then it parses its structure and finds the list of drivers of the boot group - the drivers it needs to load for the OS to start, to avoid cyclic dependency. with this approach, even the kernel itself may be specified. so, it knows a full path for every module it needs to load. it loads them. and then also makes some in RAM modifications to them - base relocations, zero filling of uninitiliazed variables - if this is its responsibility, it could be. this is how it knows.

bzt · Post by **bzt** » Wed Oct 30, 2019 7:16 am

j4cobgarby wrote:Does the kernel, when booting up, check that all of these folders/files exist, and if they don't, it creates them? This doesn't seem quite right.

Actually that's quite common approach. The Linux kernel for example creates a filesystem in RAM on boot, and it fills up with entries found in the initrd, which is usually a cpio archive, and is loaded by the boot loader. For loading the kernel and the initrd the boot loader does not have to interpret file systems. For example in Xv6, the kernel simply starts at sector 2, outside of the partitions. The boot loader is unaware what's inside an initrd, so it does not need to know it's filesystem either. For the loader, both the kernel and the initrd are just unorganized bit-chunks, with the only exception that the loader has to find the entry point of the kernel, but that's all (this is usually done by reading one pointer from the beginning of the file, either ELF, PE, or a.out. No need to investigate the kernel's executable deeper, however the Xv6 loader does and it copies out the segments from the kernel into their specified addresses).

j4cobgarby wrote: - The kernel can then access the various folders and files which are in the disk which the kernel was loaded from.

They are not on the disk, rather in memory in a so called VFS structure. It only comes later when drivers can access sectors on disks and the appropriate FS driver interprets those sectors as a filesystem an attaches them to the VFS. Here there's a difference in monolithic and micro-kernel architecture, because the former can boot without an initialized VFS (and may mount the root filesystem directly from disk), however for a micro-kernel a boot-time VFS is a must as the disk driver file has to be loaded, and it can't be from the disk (obviously). So there's an extra abstraction layer here, which in early booting phase can only be eliminated for monolithic kernels (but they usually don't do that, see Linux).

Cheers,
bzt

ajxs · Post by **ajxs** » Fri Nov 08, 2019 6:54 pm

j4cobgarby wrote:This is probably a really simple question, but I'm just beginning to think about hard drive access in my operating system and then realised that I actually don't know what bit of software creates the OS filesystem (for example /bin, /etc, etc.. on linux), and when it's generated.

Speaking very generally, the necessary file-system structure for running an OS would typically be created during it's installation process.

j4cobgarby wrote: Does the kernel, when booting up, check that all of these folders/files exist, and if they don't, it creates them? This doesn't seem quite right.

The operating system has to assume that certain files needed for operation ( fonts, gui static content, etc ) are present in the file-system, if they're not present then loading the OS can't continue in a reasonable fashion. If they won't be created at this stage, booting will just fail. This is speaking very generally though, it depends on what kind of OS we're describing.

j4cobgarby wrote: - The kernel is loaded into RAM by the bootloader.
- The kernel then can use hard drive reading procedures to talk to the CD-ROM or the hard disk. It would have to interpret the hard disk as the same filesystem format as the bootloader interpreted it as, when loading the kernel into RAM.
- The kernel can then access the various folders and files which are in the disk which the kernel was loaded from.

Close, but not 100% correct. Again, very generally speaking, the kernel is loaded into RAM by the bootloader. The kernel can then load its own drivers to talk to the CD-ROM, HDD and other hardware. From there the kernel isn't limited in how it interprets the contents of physical media. Broadly speaking, in order for disk media to be bootable it needs to have a specific filesystem partition loaded at a certain physical location to be interpreted by the bootloader. The rest of the physical media can be formatted in any manner you like, as long as the OS can load drivers to read it in a meaningful way.
I like to think of the bootloader and kernel as being different programs entirely. I know that this simple mental model does not accurately describe all real-world cases, but it aids in understanding academic examples like this.

j4cobgarby wrote: In this case, though, how does the bootloader know what is code, and to load into ram, and what is data (such as the /bin, and /etc folders), so to not load those into ram? In other words, when the bootloader (say, Grub), is loading the kernel from the hard drive into ram (by searching for the 0xbadb002 magic number), how does it know when it's got to the end of the kernel program which it wants to load to ram, and into the beginning of data?

Following the example I've been following, the 'data' you've described above ( the /bin, and /etc folders ) will not be part of the kernel executable at all.
Think of the bootloader as an application that loads at boot time to bootstrap the kernel. It reads the kernel executable from the boot partition and loads it into RAM as you noted in your question. It does this by interpreting the format of the kernel executable and loading it according to the standards of that format. One common executable format is ELF, which specifies a clear way to demarcate executable code sections from data. GRUB contains code for loading an ELF executable into memory, there's a page in the wiki with more information on how an ELF executable is loaded and a wealth of information elsewhere too.
Once the kernel is loaded it can decide for itself, by loading media drivers, how it loads the rest of those files and folders. They'll typically be stored somewhere on your installation ISO.

I hope that helps!

eekee · Post by **eekee** » Sun Dec 22, 2019 6:40 pm

bzt wrote:
j4cobgarby wrote:Does the kernel, when booting up, check that all of these folders/files exist, and if they don't, it creates them? This doesn't seem quite right.
Actually that's quite common approach. The Linux kernel for example creates a filesystem in RAM on boot, and it fills up with entries found in the initrd, which is usually a cpio archive, and is loaded by the boot loader.

Uh, unpacking the initrd doesn't have anything to do with the files on disk. Linux' initrd is a temporary root fs later replaced by the real one. The directories in the real root are made by the installer or by crazy people like me who manually make root filesystems. (I've only done it 2 or 3 times in my life.

)

The files and directories in the various filesystems can be anything at all so long as they match the paths coded into kernel and userspace programs. It's not the kernels job to check them, just use them. The Linux kernel doesn't expect much, only /sbin/init on a physical disk if an "init=" option isn't given, /linuxrc on an initrd (if that hasn't changed in the last 15 years), and if module auto-loading is enabled, /lib/modules/$(uname -r)/*tree*of*modules* if i remember right. it doesn't care where its device nodes are so long as they're not on a filesystem mounted "nodev". userspace wants a lot more, but there are a bunch of ways to change which directories they look in.

A couple of years ago I put together a very minimal Linux system with a userspace which was largely non-standard. I didn't have an initrd because I didn't want or see the need for one. Only had /lib/modules because I couldn't be bothered to reconfigure the kernel. (Linux is way too big now.) I made /bin, symlinked /sbin to it, and made other directories as I found necessary programs needed them. I can't remember all the details, but / had maybe half the entries of a typical Linux system and yet it ran fine. And all this was without even recompiling anything. If you recompile & reconfigure, you could put all the files under /ooga/booga if you wanted to. (This is basically what systems such as mingw & msys do.) Closed-source software wouldn't be able to find anything in that case, but anything else can be configured or patched to find it.

The only technical problem I had with it was Linux' refusal to remount / read-only, but it's always been ridiculously difficult about that. (Plan 9 has no problem with it. In true Unix fashion, it assumes the sysadmin knows what he's doing. Linux thinks it knows better, or maybe its VFS is just that broken.) I was told it was because I didn't use an initrd, but if that makes any sense at all, I just think the no-initrd code has bit-rotted. It's a very Linux-specific issue.

Solar · Post by **Solar** » Sun Dec 22, 2019 7:10 pm

zaval wrote:
Fundamentally, the boot loader is a pretty simple program that can only ever load predetermined files into memory. If that file happens to contain a kernel, then that's a happy coincidence, made possible by the intelligence of the configuration.
on fossilized systems, stuck in 70ths it is, but it's not the only possible approach, let's not confuse people, making them think one's beloved approach is the only one, whereas it's not even good one.

On BIOS systems, the boot loader (out of the MBR) is a pretty simple program (due to size restrictions) that can only ever load predetermined files into memory. These can be a kernel, or a more sophisticated boot manager which then allows the more elaborate selections to be made, file systems be understood and all those things.

But zaval's basic assertion is correct.

nullplan · Post by **nullplan** » Mon Dec 23, 2019 12:33 am

Solar wrote:But zaval's basic assertion is correct.

Oh, come on. zaval not being capable of making a point without making an @$$ of himself is by now sort of expected, but I thought at least you might notice why I wrote that. The OP was confused about something extremely basic in the boot process. Of course I was not going to list all the things boot loaders and managers can do, that would only confuse the OP. Once they got how that all basically worked, then they could learn more about the fine details. You don't teach people about integral equations on the first day of Calculus 101.

Plus, zaval did nothing to refute my point. First, user input: Nice one, but that is for a boot manager, not a boot loader. Second, flexibility: OK, so the determination of which file to load happens pretty late, namely during the boot process, but the boot loader still fundamentally loads the boot file(s) from the boot medium (which can be floppy, hard disk, CD, or even network, maybe even a ROM) into memory and then starts it somehow. All the other stuff is fluff. What I just listed is the basic function of a boot loader. It is what they do. What they all do. From the MBR bucket chain approach (BIOS loads the MBR, which loads the PBR which loads the boot file, which loads...) via U-Boot and GRUB all the way to the UEFI loaders, putting things into memory and starting them somehow is their job. And if starting the kernel entails switching virtual memory on, then that is one part of their job. In my case, the boot loader must enable virtual memory, since the kernel runs in 64-bit mode, which only exists with virtual memory (in fact, turning paging on is the last step of setting up long mode).

The point I was getting at, is that there is this hard cut between the bootloader phase and the kernel phase. The kernel, after being loaded from somewhere, does not know where it came from. Not innately. It has to be told, somehow. Or maybe not, maybe it is told something else deliberately. You can load a kernel from one medium and then tell it that the rootfs is on another. It's called a bootdisk and used to be a quite popular approach for dealing with deficient BIOSes or limited data media. And it is basically how PXE boot works (kernel from TFTP, root usually from NFS).

bzt · Post by **bzt** » Mon Dec 23, 2019 4:29 am

eekee wrote:Uh, unpacking the initrd doesn't have anything to do with the files on disk. Linux' initrd is a temporary root fs later replaced by the real one.

Perfectly correct, however the question was "What generates operating system files/folders, and when?". For Linux with initrd, that temporary root fs is the system folder, generated during boot, so my post does answer that question

(And it is just one possible answer I'd like to add.)

eekee wrote:The files and directories in the various filesystems can be anything at all so long as they match the paths coded into kernel and userspace programs. It's not the kernels job to check them, just use them.

Not entirely true, let's just say it depends on the operating system's design. All you wrote about Linux is true, but Linux is not the only way to implement an OS.

eekee wrote:(Plan 9 has no problem with it. In true Unix fashion, it assumes the sysadmin knows what he's doing. Linux thinks it knows better, or maybe its VFS is just that broken.) I was told it was because I didn't use an initrd, but if that makes any sense at all, I just think the no-initrd code has bit-rotted. It's a very Linux-specific issue.

I can totally relate. I also quite often run into issues with Linux which should work on a POSIX system out-of-the-box (and which btw does run on BSDs and on Solaris), but not under Linux. I also keep a failsafe Linux at hand, because after kernel update, there's always something that goes wrong with the devices, and I can't boot. (One time it was the mkinitcpio that "forgot" to include important modules, but most recent update just simply randomizes disk's order (why, oh, why when the hw config doesn't change at all?). Things like this always happens with Linux, even when they are not rewriting the entire /dev code. There was at least 3 different complete /dev reimplementation since I've started using Linux if I recall correctly, but could be it was 4.)

Cheers,
bzt

eekee · Post by **eekee** » Mon Dec 23, 2019 6:01 am

bzt wrote:For Linux with initrd, that temporary root fs is the system folder, generated during boot, so my post does answer that question

Ah, this just looks like a misunderstanding. The question can be paraphrased as, "Does the kernel check for cetain folders, and then make them if they're not found." With initrd, Linux doesn't check anything, it just retrieves a pre-made tree from cpio. From my point of view,

it's not different from finding a pre-made tree on disk.

bzt wrote:
eekee wrote:The files and directories in the various filesystems can be anything at all so long as they match the paths coded into kernel and userspace programs. It's not the kernels job to check them, just use them.
Not entirely true, let's just say it depends on the operating system's design. All you wrote about Linux is true, but Linux is not the only way to implement an OS.

I concede my bias affected my statement. I much prefer code which only checks what it actually uses, and which doesn't use many different things.

bzt wrote:
eekee wrote:(Plan 9 has no problem with it. In true Unix fashion, it assumes the sysadmin knows what he's doing. Linux thinks it knows better, or maybe its VFS is just that broken.) I was told it was because I didn't use an initrd, but if that makes any sense at all, I just think the no-initrd code has bit-rotted. It's a very Linux-specific issue.
I can totally relate. I also quite often run into issues with Linux which should work on a POSIX system out-of-the-box (and which btw does run on BSDs and on Solaris), but not under Linux. I also keep a failsafe Linux at hand, because after kernel update, there's always something that goes wrong with the devices, and I can't boot. (One time it was the mkinitcpio that "forgot" to include important modules, but most recent update just simply randomizes disk's order (why, oh, why when the hw config doesn't change at all?). Things like this always happens with Linux, even when they are not rewriting the entire /dev code. There was at least 3 different complete /dev reimplementation since I've started using Linux if I recall correctly, but could be it was 4.)

Ouch! I see. I remember previous Linux device order problems back in the '00s. For example, on the Sharp Zaurus models with an internal disk drive, if a CF card is plugged in at boot, the CF card will be hda and the internal disk hdb or hdc. It wasn't just the Zaurus, any device could have this problem, and it came on suddenly with Linux 2.4 or 2.6. It wasn't literally random, but varied with module loading order. That might be the problem now, actually. If module load order has been randomized, that will randomize device number assignment. (And WHYYYYY do they still even have exposed sequential numbers?!?!?! lol) The root of the problem is Torvalds policy of "no policy in the kernel." I don't like heavy policy, but a little policy in the right place can save tons of trouble.

bzt wrote:Cheers,
bzt

Cheers!

bzt · Post by **bzt** » Mon Dec 23, 2019 9:45 am

eekee wrote:if a CF card is plugged in at boot, the CF card will be hda and the internal disk hdb or hdc.

That's the thing, my hardware configuration does not change. I don't plugin-in nor disconnect nothing. I do not change any firmware configuration either.

eekee wrote:varied with module loading order. That might be the problem now, actually.

Nope, both disks are connected to the same sata controller, therefore both should be using exactly the same driver. Also my initrd image does not change, so in theory readdir() should return the modules in the same order for every boot.

eekee wrote:The root of the problem is Torvalds policy of "no policy in the kernel." I don't like heavy policy, but a little policy in the right place can save tons of trouble.

Well said! However I don't recall Torvalds ever said "no policy", I can more recall fights when he refused to merge codes because as he put it "this is utterly garbage". I even recall once he said "someone is lying, not telling the truth about this patch" to the Intel developers. If I remember correctly they have even removed Linus for a while because of this. But I don't follow the drama around Linux development closely, just the headlines. Regardless I absolutely agree that without a consistent concept kept no software can be useful to end-users (it could be a marketing-success sold in millions of copies though, that has nothing to do with the software's actual quality unfortunately).

Cheers,
bzt

Solar · Post by **Solar** » Mon Dec 23, 2019 5:17 pm

nullplan wrote:
Solar wrote:But zaval's basic assertion is correct.
Oh, come on. zaval not being capable of making a point without making an @$$ of himself is by now sort of expected, but I thought at least you might notice why I wrote that.

Ah... erm...

I meant YOUR basic assertion ("Fundamentally...") was correct. I was a bit miffed at zaval's tone, that's why I made that comment in the first place. Sorry for getting the names mixed up.

klange · Post by **klange** » Mon Dec 23, 2019 7:16 pm

If I remember correctly they have even removed Linus for a while because of this.

Linus removed himself for a while because his own daughter said he was being too much of an arse, but it had very little to do with technical or management decisions, just his phrasing and abrasiveness in emails.

This thread has taken a rather different direction from the original question.

OSDev.org

What generates operating system files/folders, and when?

What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?

Re: What generates operating system files/folders, and when?