Fundamentally, the boot loader is a pretty simple program that can only ever load predetermined files into memory. If that file happens to contain a kernel, then that's a happy coincidence, made possible by the intelligence of the configuration.
on fossilized systems, stuck in 70ths it is, but it's not the only possible approach, let's not confuse people, making them think one's beloved approach is the only one, whereas it's not even good one. OS loader could be way more sophisticated and dynamic in what it should do. and that is set with different ways of configuring/influencing it. for example user input during boot, can direct it to some special boot flow - say "safe boot" or "last known good" option. or different flags, it should take into account and pass them to the kernel - PAE, or 3+1 vs. 2+2 split on 32 bit systems. another thing of its not so simplicity is what set of functionality it should be capable of - it might support a bunch of file systems, though read only, - to be more robust in accessing storage. also - how many and what files it should load, as well may be fully configurable and not hardcoded. say, it loads "registry", some kind of it, and looks there for list of needed modules to load - kernel, boot drivers. this set is configured outside of the loader - at the OS runtime by the OS and driver installation procedures. and so on. also switching virtual memory on, some CPU configuring (TLBs, caches, branch predictor) and building initial mappings - are its responsibility, not so "simple".
and then realised that I actually don't know what bit of software creates the OS filesystem
as iansjack said, it's installation process creates this during installation. I'll add, that it copies it from the installation source medium (your "live" CD e.g.), it serves as a template, and is generated during build or packaging process, when you create your OS installation package. you put there relevant pieces (here, having modular OS is way more superior, than monolithic), they contain different sets of files, not necessarily all needed for a particular installation, but when the installation process is running, it chooses what's needed, what user confirmed he/she wants to install. this way, you create an OS installation instance on a target medium.
Does the kernel, when booting up, check that all of these folders/files exist, and if they don't, it creates them? This doesn't seem quite right.
no, it doesn't, it doesn't need to. it searches for some predefined places, where persistent configration is stored - registry in Windows for example. there, paths to other places might be specified (dynamically and configurable), so they don't need to be hardcoded. basically, your OS holds its persistent state information in some form of internal database, it's enough. depending on the development level of the OS, it could be some well thought out database, or just a messy set of text files, with different syntax in each, the only common thing between which is extreme unfriendliness to both the OS and human. xD
- The kernel then can use hard drive reading procedures to talk to the CD-ROM or the hard disk. It would have to interpret the hard disk as the same filesystem format as the bootloader interpreted it as, when loading the kernel into RAM.
yes, but with full support of the FS and other entities. meaning OS loader only needs to have read only support, it works in a restricted environment (interrupts masked, MMU can be off, no virtual memory etc), it doesn't need to support all features of the FS, and they might be numerous, just a read support to be able to read files out there. the same goes for registry (internal OS's configurational database), the OS loader needs to understand it as little as for being able to get out there the needed info on what to do, whereas OS needs to support everything, the full support - write, journalling (logging) etc.
- The kernel can then access the various folders and files which are in the disk which the kernel was loaded from.
obviously.
In this case, though, how does the bootloader know what is code, and to load into ram, and what is data (such as the /bin, and /etc folders), so to not load those into ram? In other words, when the bootloader (say, Grub), is loading the kernel from the hard drive into ram (by searching for the 0xbadb002 magic number), how does it know when it's got to the end of the kernel program which it wants to load to ram, and into the beginning of data?
Sorry that this question was kind of all over the place, I hope it makes some amount of sense.
generally speaking, it knows what to load and what not to, by following some protocol/convention. grub does this too, but it's too generic, so you need to adapt to it, whereas an OS specific loader is more specific, more fine grained in its knowledge about the kernel needs - it follows conventions and satisfies demands of the particular OS. say, our loader knows, that it needs to read SYSTEM registry hive file (a piece of the aforementioned OS config database), so it has 1) understanding of a set of supported FSs. it asks firmware to read drive on the block (sector) level. then it identifies the FS, if it belongs to the supported set, it proceeds parsing its internal metadata, asking FW to read more sectors. finally it searches for SYSTEM hive file and if successfully found, loads it. then it parses its structure and finds the list of drivers of the boot group - the drivers it needs to load for the OS to start, to avoid cyclic dependency. with this approach, even the kernel itself may be specified. so, it knows a full path for every module it needs to load. it loads them. and then also makes some in RAM modifications to them - base relocations, zero filling of uninitiliazed variables - if this is its responsibility, it could be. this is how it knows.