Pitfalls to avoid when designing your OS?

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
FallenAvatar
Member
Member
Posts: 283
Joined: Mon Jan 03, 2011 6:58 pm

Re: Pitfalls to avoid when designing your OS?

Post by FallenAvatar »

He meant unplugging it so a write to your usb (that might be emulated as a hard drive) with an off by one error wouldn't write to your main hd and corrupt windows.

- Monk
User avatar
iansjack
Member
Member
Posts: 4703
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Pitfalls to avoid when designing your OS?

Post by iansjack »

Octocontrabass wrote:
zehawk wrote:In the meantime, am I ok to use my dev pc as a testbed then? I guess people are giving conflicting answers, some say definitely not and others say there's no harm in doing it.
If you unplug the hard drive(s) before testing, you have nothing to worry about. It's kind of a pain to do that, though.

For the most part, I avoid testing on my dev PC simply because unplugging the drives is more work than putting the floppy disk in another machine. (That, and my most recent progress has been geared more towards older hardware anyway...)
That's a good solution to any potential problems. But you wouldn't want to do that on a long-term basis or you might indeed end up physically harming the computer.

I think the OP has misunderstood my concerns about the potential problems. I am assuming that with an OS of and
y reasonable complexity you will want to write to a storage device. You might think "I'm nly going to write to my USB drive" so my main drive won't be affected. Fine - if you get it right. But one small error can mean that you write to the wrong part of the wrong drive and corrupt your operating system.

If you are prepared for that eventuality and can easily clone a working image of the disk back then it might be less of a problem. But the "I'm only writing to my USB drive" is not a valid argument. I think it is highly unlikely that you will be able to communicate with USB devices any time soon. (I'm assuming here that you are not restricting yourself to a real-mode MS-DOS clone with everything done via BIOS calls. If that's all you are aiming at then most problems disappear.) By far the easiest mass storage device to access, certainly for your early attempts, is going to be an IDE/ATA hard drive; and that - I assume - is what you have your OS stored on. So the potential to corrupt your OS shouldn't be ignored.

I think the idea of running on real hardware is one that often seems to attract newcomers, but it has everything going against it. I think we've covered most of the reasons for that already. A VM is by far the most useful testbed for a fledgling OS. And with a VM you don't have to rely on a very particular hardware configuration if you want someone else to run your OS on their machines. So I would suggest, in order of preference:

1. Use a VM on your current PC. Forget about real hardware until you have advanced a little more.

2. Use a computer that has nothing of value on it as your testbed.

3. Use your main computer, but unplug your hard drive every time you run your tests.

4. Use your main computer, don't unplug your hard drive, and be prepared to restore your hard disk from a cloned image or from good backups.

5. Use your main computer, don't unplug your hard drive, cross your fingers, and be prepared to recreate everything on your computer from scratch.

6. Take up cat herding instead. It's much safer and easier than OS development.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Pitfalls to avoid when designing your OS?

Post by Brendan »

Hi,

The sad fact is that a lots of bugs depend on a specific scenario - e.g. an area in the memory map that isn't aligned on a nice 4 KiB boundary, a timing problem where it blows up on fast machines, a quirk in a specific CPU that reports the wrong thing, assuming a feature is supported and crashing when its not, getting TLB invalidation "slightly wrong", etc.

For an example; a few months ago I rewrote my PXE boot loader. It worked fine on VirtualBox (initially) and Qemu and about half of the real computers I tested it on. It crashed on some computers due to a bug in the implementation of PXE that comes with (some?) Realtek network cards (where "TFTP OPEN" fails if the requested max. packet size is too large and it won't just use a smaller packet size like the specifications say it should). I implemented a quick work-around for that bug (using a small 1024-byte packet size). That's when I uncovered a bug in the implementation of PXE that comes with VirtualBox, which ignores the requested packet size and just gives you the max. packet size (and if you've only got a 1024-byte buffer and request a max. packet size of 1024 bytes, it'll silently use ~1500 byte packets and overflow your buffer). I worked around that bug (auto-detecting the largest message size from the MTU reported by "GET UNDI INFORMATION" and using that). I was still getting erratic behaviour (sometimes working and sometimes not) on 2 of my computers, with annoying/bizarre symptoms. It turned out I was allocating some RAM and filling it with zeros, but my code had a bug and filled the wrong area with zeros - the trash left in RAM caused the erratic behaviour. It worked on most computers (because RAM happened to be left zeroed - pure luck). I fixed that; then tested on more computers and found a bug in an NSC Geode CPU (which doesn't handle calls/returns with override prefixes correctly). I worked around that bug too. If I only tested on virtual machines I would've found none of these problems, and my PXE boot loader would probably only work on 80% of computers.

Of course a PXE boot loader is a relatively simple thing. An OS consists of many pieces - some are simple and some more complicated. If you have 10 pieces and each piece only works on 80% of computers, then the chance of all pieces working at the same time would be "0.8**10 = 0.1" (the OS would only work on 10% of computers). Basically; as the OS grows, the number of "worked because I was lucky" bugs also grows; and the day you try the OS on something you've never tested it on is the day you find out that your OS is an unstable mess.

The only way to mitigate the problem is to test on as many different (real and virtual) computers as you can get your hands on. If you only test on virtual machines, then you're limited to about 6 different (virtual) computers; which is nowhere near enough to be confident that your OS actually works.

There's no sane reason to be worried about testing on real hardware - use a little common sense and you'll be fine. If you're worried your code will corrupt data on your hard disk, then build a version of the OS where your hard disk driver only allows reads so you can still test most of the hard disk driver (and all of the rest of the OS).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
zehawk
Member
Member
Posts: 35
Joined: Thu Jun 11, 2015 7:26 am
Libera.chat IRC: zehawk

Re: Pitfalls to avoid when designing your OS?

Post by zehawk »

Brendan wrote:Hi,

The sad fact is that a lots of bugs depend on a specific scenario - e.g. an area in the memory map that isn't aligned on a nice 4 KiB boundary, a timing problem where it blows up on fast machines, a quirk in a specific CPU that reports the wrong thing, assuming a feature is supported and crashing when its not, getting TLB invalidation "slightly wrong", etc.

For an example; a few months ago I rewrote my PXE boot loader. It worked fine on VirtualBox (initially) and Qemu and about half of the real computers I tested it on. It crashed on some computers due to a bug in the implementation of PXE that comes with (some?) Realtek network cards (where "TFTP OPEN" fails if the requested max. packet size is too large and it won't just use a smaller packet size like the specifications say it should). I implemented a quick work-around for that bug (using a small 1024-byte packet size). That's when I uncovered a bug in the implementation of PXE that comes with VirtualBox, which ignores the requested packet size and just gives you the max. packet size (and if you've only got a 1024-byte buffer and request a max. packet size of 1024 bytes, it'll silently use ~1500 byte packets and overflow your buffer). I worked around that bug (auto-detecting the largest message size from the MTU reported by "GET UNDI INFORMATION" and using that). I was still getting erratic behaviour (sometimes working and sometimes not) on 2 of my computers, with annoying/bizarre symptoms. It turned out I was allocating some RAM and filling it with zeros, but my code had a bug and filled the wrong area with zeros - the trash left in RAM caused the erratic behaviour. It worked on most computers (because RAM happened to be left zeroed - pure luck). I fixed that; then tested on more computers and found a bug in an NSC Geode CPU (which doesn't handle calls/returns with override prefixes correctly). I worked around that bug too. If I only tested on virtual machines I would've found none of these problems, and my PXE boot loader would probably only work on 80% of computers.
This. EXACTLY this. That's the reason why writing an OS and testing on hardware is almost vital in my opinion, and why it's the direction I want to go. As I mentioned earlier, VB recognizes AMP. My real computer, believe it or not, does not.

I understand the value of using a VM, I really do. I am going to use it to debug, and general testing. But you do need to test it on real hardware BEFORE your system gets too complicated. What would happen if I designed a miniOS jut to find my bootloader doesn't work on my computer due to BIOS bugs or lack of depricated features something like VM supports? Sure, I could test Bochs too, but even that only goes so far.
zehawk
Member
Member
Posts: 35
Joined: Thu Jun 11, 2015 7:26 am
Libera.chat IRC: zehawk

Re: Pitfalls to avoid when designing your OS?

Post by zehawk »

I think the OP has misunderstood my concerns about the potential problems. I am assuming that with an OS of and
y reasonable complexity you will want to write to a storage device. You might think "I'm nly going to write to my USB drive" so my main drive won't be affected. Fine - if you get it right. But one small error can mean that you write to the wrong part of the wrong drive and corrupt your operating system.

If you are prepared for that eventuality and can easily clone a working image of the disk back then it might be less of a problem. But the "I'm only writing to my USB drive" is not a valid argument. I think it is highly unlikely that you will be able to communicate with USB devices any time soon. (I'm assuming here that you are not restricting yourself to a real-mode MS-DOS clone with everything done via BIOS calls. If that's all you are aiming at then most problems disappear.) By far the easiest mass storage device to access, certainly for your early attempts, is going to be an IDE/ATA hard drive; and that - I assume - is what you have your OS stored on. So the potential to corrupt your OS shouldn't be ignored.

I think the idea of running on real hardware is one that often seems to attract newcomers, but it has everything going against it. I think we've covered most of the reasons for that already. A VM is by far the most useful testbed for a fledgling OS. And with a VM you don't have to rely on a very particular hardware configuration if you want someone else to run your OS on their machines. So I would suggest, in order of preference:

1. Use a VM on your current PC. Forget about real hardware until you have advanced a little more.

2. Use a computer that has nothing of value on it as your testbed.

3. Use your main computer, but unplug your hard drive every time you run your tests.

4. Use your main computer, don't unplug your hard drive, and be prepared to restore your hard disk from a cloned image or from good backups.

5. Use your main computer, don't unplug your hard drive, cross your fingers, and be prepared to recreate everything on your computer from scratch.

6. Take up cat herding instead. It's much safer and easier than OS development.
Very valid things. I definitely do not want to overwrite my Windows OS, and i can definitely see how unplugging my harddrive each time is a bad idea. I think at this point, I'll just have to practice making a read-only OS before making a read-write system, and just deal with it.


Another option I'm considering actually is an external hard drive and store Windows on that one. So it's more convenient to unplug and run, and I have a much smaller risk of overwriting Windows. (As well as small risk of damaging components everytime I try to unplug the internal hard drive.) I guess if I went that route though, I would have basically decided that my Dev machine is also my testbed. I'll just have to go out and get a real testbed.
Post Reply