Links and advice for new members
Posted: Wed Jul 24, 2019 1:39 pm
This is a brief collection of links and (hopefully) useful advice for new members of the message board, as well as a reference for those who are looking for help.
The first thing I want to say is this: if you aren't already using version control for all software projects you are working on, drop everything and start to do that now. Set up a VCS such as Git, Subversion, Mercurial, Bazaar, or what have you - which you use is less important than the fact that you need to use it. Similarly, setting up your repos on an offsite host such as Gitlab, Github, Sourceforge, CloudForge, or BitBucket should be the very first thing you do whenever you start a new project, no matter how large or small it is.
If nothing else, it makes it easy to share your code with us on the forum, as you can just post a link, rather than pasting oodles and oodles of code into a post.
Once you have that out of the way (if you didn't already), you can start to consider the OS specific issues.
If you haven't already, I would strongly advise you to read the introductory material in the wiki:
After this, go through the material on the practical aspects of
running an OS-dev project:
This brings you to your first big decision: which platform, or platforms, to target. Commonly options include:
You further need to choose the compiler, assembler, linker, build tool, and support utilities to use - what is called the 'toolchain' for your OS. For most platforms, there aren't many to choose from, and the obvious choice would be GCC and the Binutils toolchain due to their ubiquity. However, on the Intel x86 platform, it isn't as simple, as there are several other toolchains which are in widespread use for it, the most notable being the Microsoft one - a very familiar one to Windows programmers, but one which presents problems in OSDev. The biggest issue with Visual Studio, and with proprietary toolchains in general, is that using it rules out the possibility of your OS being "self-hosting" - that is to say, being able to develop your OS in the OS itself, something most OSdevs do want to eventually be able to do. The fact that Porting GCC to your OS is feasible, whereas porting proprietary x86 toolchains isn't, is a big factor in the use Binutils and GCC, as it their deep connection to Linux and other Unix derivatives.
It is crucial to set up a separate cross-development environment targeting the bare hardware you are writing for, even if it is the same as the host hardware, as their are subtle (and not so subtle) differences in the code produced, how libraries are or aren't accessed, the final executable file, and so on. The wiki page Why do I need a Cross Compiler? explains this in greater detail. For GCC, you should read the pages on setting up a OS Specific Toolchain, and if you are using GCC a GCC Cross-Compiler as well. There is a page for setting up a LLVM Cross-Compiler as well, if youo want to use that toolchain instead, but it is currently in need of updating.
Regardless of the high-level language you use for OS dev (if any), you will still need to use assembly language, which means choosing an assembler. If you are using Binutils and GCC, the obvious choice would be GAS, but for x86 especially, there are other assemblers which many OSdevs prefer, such as Netwide Assembler (NASM) and Flat Assembler (FASM).
The important thing here is that assembly language syntax varies more among the x86 assemblers than it does for most other platforms, with the biggest difference being that between the Intel syntax used in the majority of x86 assemblers, and the AT&T syntax used in GAS. You can see an overview of the differences on the somewhat misnamed wiki page Opcode syntax. While it is possible to coax GAS to use the Intel syntax using the .intel_syntax noprefix directive, the opposite is generally not true for Intel-based assemblers such as NASM, and even with that directive, GAS is still quite different from other x86 assemblers in other regards.
It is still important to understand that the various Intel syntax assemblers - NASM, FASM, and YASM among others - have differences in how they handle indexing, in the directives they use, and in their support for features such as macros and defining data structures. While most of these follow the general syntax of Microsoft Assembler (MASM), they all diverge from it in various ways.
Once you know which platform you are targeting, and the toolchain you want to use, you need to understand them. You should read up on the core technologies for the platform. Assuming that you are targeting the PC architecture, this would include:
You need to consider what kind of File System to use. Common ones used when starting out in OS dev include: We generally don't recommend designing your own, but as with boot loaders, it is a possibility as well.
While this is a lot of reading, it simply reflects the due diligence that any OS-devver needs to go through in order to get anywhere. OS development, even as a simple project, is not amenable to the Stack Overflow cut-and-paste model of software development; you really need to understand a fair amount of the concepts and principles before writing any code, and the examples given in tutorials and forum posts generally are exactly that. Copying an existing code snippet without at least a basic idea of what it is doing simply won't do. While learning itself is an iterative process - you learn one thing, try it out, see what worked and what didn't, read some more, etc. - in this case a basic foundation is needed at the start. Without a solid understanding of at least some of the core ideas before starting, you simply can't get very far in OS dev.
Hopefully, this won't scare you off; it isn't nearly as bad as it sounds. It just takes a lot of patience and a bit of effort, a little at a time.
You might also want to peek at the various OS developer archtypes, such as Lino Commando, James T. Klik, and Alta Lang, for both amusement and to see how different people approach OS-Dev (and how other OS-Devs poke fun at them for it). Just steer clear of becoming a Dr. Duct von Tape, for your own sake and that of the others here, please.
The first thing I want to say is this: if you aren't already using version control for all software projects you are working on, drop everything and start to do that now. Set up a VCS such as Git, Subversion, Mercurial, Bazaar, or what have you - which you use is less important than the fact that you need to use it. Similarly, setting up your repos on an offsite host such as Gitlab, Github, Sourceforge, CloudForge, or BitBucket should be the very first thing you do whenever you start a new project, no matter how large or small it is.
If nothing else, it makes it easy to share your code with us on the forum, as you can just post a link, rather than pasting oodles and oodles of code into a post.
Once you have that out of the way (if you didn't already), you can start to consider the OS specific issues.
If you haven't already, I would strongly advise you to read the introductory material in the wiki:
After this, go through the material on the practical aspects of
running an OS-dev project:
- What order should I make things in
- Code Management
- How kernel, compiler, and C library work together
- Using Programming Languages other than C
- Assembly Language
- OS Specific Toolchain
This brings you to your first big decision: which platform, or platforms, to target. Commonly options include:
- x86 - the CPU architecture of the stock PC desktops and laptops, and the system which remains the 'default' for OS dev on this group. However, it is notoriously quirky, especially regarding Memory Segmentation, and the sharp divisions between 16-bit Real Mode, 16-bit and 32-bit Protected Modes, and 64-bit Long Mode. Despite this, it is generally the go-to architecture for high-performance systems, business machines, and gaming, largely thanks to the immense amount of effort and talent applied by Intel and AMD into continuously pushing it further and further. It also has a great advantage in being paired with the highly standardized PC system architecture, which is implemented by a large number of different motherboard designers (whereas different ARM, MIPS, and RISC-V single-board computers can have radically different boot-up, memory, bus, and peripheral systems).
- ARM - a RISC architecture widely used on mobile devices and for 'Internet of Things' and 'Maker' equipment, including the popular Raspberry Pi, Beagleboard, and Rock64 single board computers. The overwhelming majority of smartphones and tablets, including those by Apple and Samsung, use ARM processors, though most of these are locked down in various ways which would make them difficult to impossible to target for a third-party OS dev. While it is generally seen as easier to work with that x86, most notably in the much less severe differences in between the 32-bit and 64-bit modes and the lack of memory segmentation, the wiki and other resources don't cover it nearly as well (though this is changing over time as it becomes more commonly targeted). Because the architecture lends itself to low-power, low-heat applications, most of the implementations are less powerful than the top x86 implementations, with designs focusing on its use in areas where that advantage is greatest, though recently Amazon has developed a high-performance implementation which they claim is comparable to Intel's in terms of CPU speed.
- MIPS, another RISC design which is slightly older than ARM. It is one of the first RISC design to come out, being part of the reason the idea caught on, and is even simpler than ARM in terms of programming, though a bit tedious when it comes to assembly programming. While it was widely used in workstations and game consoles in the 1990s, it has declined significantly due to mismanagement by the owners of the design, and at the present is mostly seen in devices such as routers. There are a handful of System on Chip single-board computers that use it, such as the Creator Board and the Onion Omega2, and manufacturers in both China and Russia have licensed the ISA with the idea of breaking their dependence on Intel. The ISA was made open-source in 2018, which may lead to more widespread use. Finding good information on the instruction set is easy, as it is widely used in courses on assembly language and computer architecture and there are several emulators that run MIPS code, but finding usable information on the actual hardware systems using it is often difficult at best.
- RISC-V is an up and coming open source hardware ISA, closely related to MIPS in overall design, but so far is Not Ready For Prime Time. It is developing rapidly, however, and a handful of maker-grade SBCs using it have come on the market in 2018 and 2019, most notably the HiFive1. Also, the architecture is being adopted for a number of device-internal uses by companies such as Western Digital and nVidia, though not in a way that would be exposed to 3rd-party programmers for the most part. While it is poised to have a significant role in the future and is worth considering for a forward-looking project, it is too early to say what that impact will be and the situation is rapidly evolving.
You further need to choose the compiler, assembler, linker, build tool, and support utilities to use - what is called the 'toolchain' for your OS. For most platforms, there aren't many to choose from, and the obvious choice would be GCC and the Binutils toolchain due to their ubiquity. However, on the Intel x86 platform, it isn't as simple, as there are several other toolchains which are in widespread use for it, the most notable being the Microsoft one - a very familiar one to Windows programmers, but one which presents problems in OSDev. The biggest issue with Visual Studio, and with proprietary toolchains in general, is that using it rules out the possibility of your OS being "self-hosting" - that is to say, being able to develop your OS in the OS itself, something most OSdevs do want to eventually be able to do. The fact that Porting GCC to your OS is feasible, whereas porting proprietary x86 toolchains isn't, is a big factor in the use Binutils and GCC, as it their deep connection to Linux and other Unix derivatives.
It is crucial to set up a separate cross-development environment targeting the bare hardware you are writing for, even if it is the same as the host hardware, as their are subtle (and not so subtle) differences in the code produced, how libraries are or aren't accessed, the final executable file, and so on. The wiki page Why do I need a Cross Compiler? explains this in greater detail. For GCC, you should read the pages on setting up a OS Specific Toolchain, and if you are using GCC a GCC Cross-Compiler as well. There is a page for setting up a LLVM Cross-Compiler as well, if youo want to use that toolchain instead, but it is currently in need of updating.
Regardless of the high-level language you use for OS dev (if any), you will still need to use assembly language, which means choosing an assembler. If you are using Binutils and GCC, the obvious choice would be GAS, but for x86 especially, there are other assemblers which many OSdevs prefer, such as Netwide Assembler (NASM) and Flat Assembler (FASM).
The important thing here is that assembly language syntax varies more among the x86 assemblers than it does for most other platforms, with the biggest difference being that between the Intel syntax used in the majority of x86 assemblers, and the AT&T syntax used in GAS. You can see an overview of the differences on the somewhat misnamed wiki page Opcode syntax. While it is possible to coax GAS to use the Intel syntax using the .intel_syntax noprefix directive, the opposite is generally not true for Intel-based assemblers such as NASM, and even with that directive, GAS is still quite different from other x86 assemblers in other regards.
It is still important to understand that the various Intel syntax assemblers - NASM, FASM, and YASM among others - have differences in how they handle indexing, in the directives they use, and in their support for features such as macros and defining data structures. While most of these follow the general syntax of Microsoft Assembler (MASM), they all diverge from it in various ways.
Once you know which platform you are targeting, and the toolchain you want to use, you need to understand them. You should read up on the core technologies for the platform. Assuming that you are targeting the PC architecture, this would include:
- Real Mode, especially the section on memory addressing, and Segmentation
- Protected mode
- Long Mode
- Learning 80x86 Assembly, especially real mode assembly
- Memory Map, Detecting Memory and A20 Line
- Interrupts
- BIOS, and Boot Sequence
You need to consider what kind of File System to use. Common ones used when starting out in OS dev include: We generally don't recommend designing your own, but as with boot loaders, it is a possibility as well.
While this is a lot of reading, it simply reflects the due diligence that any OS-devver needs to go through in order to get anywhere. OS development, even as a simple project, is not amenable to the Stack Overflow cut-and-paste model of software development; you really need to understand a fair amount of the concepts and principles before writing any code, and the examples given in tutorials and forum posts generally are exactly that. Copying an existing code snippet without at least a basic idea of what it is doing simply won't do. While learning itself is an iterative process - you learn one thing, try it out, see what worked and what didn't, read some more, etc. - in this case a basic foundation is needed at the start. Without a solid understanding of at least some of the core ideas before starting, you simply can't get very far in OS dev.
Hopefully, this won't scare you off; it isn't nearly as bad as it sounds. It just takes a lot of patience and a bit of effort, a little at a time.
You might also want to peek at the various OS developer archtypes, such as Lino Commando, James T. Klik, and Alta Lang, for both amusement and to see how different people approach OS-Dev (and how other OS-Devs poke fun at them for it). Just steer clear of becoming a Dr. Duct von Tape, for your own sake and that of the others here, please.