Hi all, newbie here!
I've just started some hobby OS development, and have always had a keen interest in how the low-level stuff works. So it's nice to see there's a wealth of information now available as to how to create a basic kernel, and the tools (in my case, gcc/g++, GRUB, vmware...) exist to create and boot a kernel with less pain than used to be involved.
So, I've got my "Hello World" kernel, it's happily loading at 0xC0000000 with paging enabled, a simple GDT loaded (0 = null, 1 = code, 2 = data. Both the last two span 4 GB), the IDT set up and filled with 32 ISRs and able to handle interrupts...
What now?
Well, I've re-written what I have so far in a few different ways, finally settling on C++ with some ASM to bring it all to life. I'm keen on object-oriented coding so am aiming to make everything object-oriented.
The more trivial objects are intended to be compiled inline, thus stored in headers. This includes things like "Port", which is a simple wrapper around port I/O inline ASM calls, that takes a port number in its constructor. So it ought to be fairly speedy (no slower than calling a C function from another module).
This idea can also be extended to the PICs, which could be wrapped in a class that takes the base address of the PIC and allows you to program it. Of course, there'd be 2 of these - to cater for the master and slave.
(Though I gave up trying to make GDT, IDT and even the CPU and memory as objects... They work better as namespaces!!).
Anyway, what I'm curious about mainly at this stage is what the different CPU rings/privilege levels can be used for? My understanding is you can have the most critical code in ring 0, and then for rings 1 and 2 have things like drivers and GUI, and then in ring 3 standard apps.
A lot of operating systems seem to be designed to be portable - presumably by creating a kernel tailored to a particular architecture, you a) can take advantage of non-portable instructions and features and b) don't have to write so many drivers ?
Also I'm fairly interested in this microkernel stuff, although I hear that these do not offer as great performance as a monolithic kernel. What's the situation with this?
My main reason for liking the idea of microkernels is the modular aspect. Though my understanding so far is you need some kind of mechanism for communicating between the modules. Could this be implemented as a set of "stubs"? Whereby an application calls on another module, and the request then gets automatically piped through to the appropriate place and called as if the app was directly calling it...
Or, I guess each module could be treated as an object, and have instances created... In the case of streams, each instance might have some kind of identifier (network socket, file descriptor, etc.) associated with it, some operations/actions (receive/send) and some memory shared between the "driver" and the application.
I'm guessing that sharing memory between 2 processes would involve some tweaking of the GDT? I don't know... Enlighten me
As for process scheduling... Is it possible/wise to adjust the PIT so that the tick-rate adjusts based on the priority of a process? Obviously I'd need to compensate for this for any kind of internal counters that are based on time, and it may be a stupid idea... Alternatively, can real-time tick rate adjustment be done?
Finally, what sort of issues need I be aware of in the case of SMP and APIC? My understanding is that recent systems have something like 32 IRQs... And would there be interrupts occurring all over the place on both processors?
Forgive me for being totally clueless with regard to the possibilities, I'm just curious as to what can be achieved...
Objects, rings/privileges, modules, IPC, etc.
-
- Posts: 8
- Joined: Wed Jan 02, 2008 1:19 pm
- Location: Oxfordshire, UK
Objects, rings/privileges, modules, IPC, etc.
Last edited by Silver Blade on Wed Jan 02, 2008 2:11 pm, edited 1 time in total.
Re: Objects, rings/privileges, modules, IPC, etc.
Hi - Good to see another UK-based os devver hereSilver Blade wrote:Hi all, newbie here!
'Rings' are a segmentation-based protection idea. You will find that nowadays, most people only bother with rings 0 and 3. In fact, the page-level memory protection system only has two privileges - 'user' (which allows execution and read/write from ring 3) and 'supervisor'.Silver Blade wrote: Anyway, what I'm curious about mainly at this stage is what the different CPU rings/privilege levels can be used for? My understanding is you can have the most critical code in ring 0, and then for rings 1 and 2 have things like drivers and GUI, and then in ring 3 standard apps.
Also I'm fairly interested in this microkernel stuff, although I hear that these do not offer as great performance as a monolithic kernel. What's the situation with this?
Silver Blade wrote: Because a microkernel has all it's 'servers' as different processes, a task switch must occur every time you want a server to do something. On a monolithic kernel, with the kernel in the higher half, no such switch is required. This makes IPC faster at the risk of a lower protection level.
Absolutely no reason why you can't have a modular monolithic kernel (like Linux) - just load modules in to the same address space as the kernel.Silver Blade wrote: My main reason for liking the idea of microkernels is the modular aspect.
I wouldn't advise this - use paging instead and keep with the same 4GB segments you are currently using. Segmentation is not implemented to the same extent in 64 bit processors and will not be extended in to the future, so stick with a flat memory model.Silver Blade wrote: I'm guessing that sharing memory between 2 processes would involve some tweaking of the GDT? I don't know... Enlighten me
The key to paging, is that you construct virtual memory in such a way that each process can only see the memory that belongs to it.
We all have to start somewhere. Sorry for the somewhat sketchy response - I'll have a more in depth look if I get a chance later!Silver Blade wrote: Forgive me for being totally clueless with regard to the possibilities, I'm just curious as to what can be achieved...
Cheers,
Adam
Re: Objects, rings/privileges, modules, IPC, etc.
Ring3 usually runs in virtual memory, typically with no special privileges (user-level applications). If you choose to base your highly privileged kernel/microkernel (running in ring0) in physical memory, rather than virtual memory -- then you have a choice to make regarding device drivers. Are they going to run in physical or virtual memory, and just how privileged are you going to make them? And if they run in virtual memory, how are they going to access physical-memory-mapped spaces, etc.? If you want device drivers NOT to run in virtual memory, and NOT to be highly privileged -- then you need to stick them in some other ring than 0 or 3. But as AJ says, if you are running everything in virtual memory, then this ring stuff makes no real difference.Silver Blade wrote: Anyway, what I'm curious about mainly at this stage is what the different CPU rings/privilege levels can be used for? My understanding is you can have the most critical code in ring 0, and then for rings 1 and 2 have things like drivers and GUI, and then in ring 3 standard apps.
I agree, but a lot of the programmers here would consider such a thing to be heresy. Many seem to consider it vital to code for the broadest-possible installed base. Narrowing the OS to a particular architecture makes it easier to debug, too.Silver Blade wrote: presumably by creating a kernel tailored to a particular architecture, you a) can take advantage of non-portable instructions and features and b) don't have to write so many drivers?
Like AJ said -- not the GDT. It's the virtual memory Page Directory Tables that need to be tweaked, usually.Silver Blade wrote: I'm guessing that sharing memory between 2 processes would involve some tweaking of the GDT? I don't know... Enlighten me
Interesting idea, and for higher-priorities, it could probably work -- but it sounds unwise. Generally, you cannot reset the PIT in the middle of a countdown between interrupts. So your first and last countdowns would not be the length you want, if threads were relinquishing their timeslices early. Also, as I understand it, you can run into problems if you are trying to reset the countdown start value at the exact moment the PIT is rolling over. If you only do it once, during boot, this is clearly not a problem. But you are suggesting doing it several million times a second. Also, resetting the PIT involves doing 2 OUT opcodes to IO ports, as I recall -- and that is a quite slow thing to do ... especially several million times a second.Silver Blade wrote: As for process scheduling... Is it possible/wise to adjust the PIT so that the tick-rate adjusts based on the priority of a process?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Objects, rings/privileges, modules, IPC, etc.
I don't expect anybody to schedule a million times a second. that means that there are 1000 cycles for each thread of which half is to be spent on doing scheduler stuff, without even considering OUTs.bewing wrote:Interesting idea, and for higher-priorities, it could probably work -- but it sounds unwise. Generally, you cannot reset the PIT in the middle of a countdown between interrupts. So your first and last countdowns would not be the length you want, if threads were relinquishing their timeslices early. Also, as I understand it, you can run into problems if you are trying to reset the countdown start value at the exact moment the PIT is rolling over. If you only do it once, during boot, this is clearly not a problem. But you are suggesting doing it several million times a second. Also, resetting the PIT involves doing 2 OUT opcodes to IO ports, as I recall -- and that is a quite slow thing to do ... especially several million times a second.Silver Blade wrote: As for process scheduling... Is it possible/wise to adjust the PIT so that the tick-rate adjusts based on the priority of a process?
Adjusting the PIT works well if the timeslices are significantly different in size and thus if you want to do the same with a fixed timer you get many timer ticks without schedules. Each of which requires an EOI. If the timeslice is at 4 ticks, the amount of OUTs break even. (EOI + 3x PIT vs 4x EOI) And then you haven't yet considered the time you spent in the rest of the interrupt handler.
Essentially you improve the worst case, and lessen the best case (if the slice is given up, you get relatively more overhead, but it also means that there isn't much else for the processor to do anyway)
If you want to fix the race, program the pit live to the maximum number of cycles and then call the scheduler (which will reprogram it again to the desired value) if the PIT happens to run over before that, the contraction point of the yield will end up in the next timeslice, which can just as well happen without reprogramming the PIT
Now consider doing it with the local APIC timer, which does *not* need any slow OUTs
-
- Posts: 8
- Joined: Wed Jan 02, 2008 1:19 pm
- Location: Oxfordshire, UK
Re: Objects, rings/privileges, modules, IPC, etc.
Hi,
Back-to-back I/O port accesses are slow because the CPU needs to wait for the first to finish before the second begins, but (AFAIK) the CPU doesn't need to wait for the an I/O port write to finish before executing normal code.
When the timer IRQ occurs you do one I/O port write to send the EOI to the PIC. To reprogram the timer you do one I/O port write. Therefore there is no back-to-back I/O port accesses and the CPU doesn't need to wait for them to complete.
If a task consumes it's entire time slice you break even if the task consumes 2 ticks (1 * EOI + 1 * PIT vs. 2 * EOI).
If a task doesn't consume it's entire time slice and blocks after 1 tick then you break even (1 * PIT vs. 1 * EOI). If it blocks after 2 or more ticks reprogramming the PIT has less overhead (1 * PIT vs. N * EOI). If it blocks after less than 1 tick then reprogramming the PIT has more overhead (1 * PIT vs. 0 * EOI).
Reprogramming the PIT also means far more accurate control over time slice lengths. With "fixed frequency" you'd probably use a 1 MHz timer frequency (or slower) and end up with "N * 1 ms" precision and +/- 1 ms accuracy (due to quantumization). With "one-shot, high byte only" you get "N * 0.215 ms" precision with +/- 838 ns accuracy (or about 4 times more precision and 1193 times more accuracy).
For the local APIC timer the accuracy/precision is much better: "N * 40 ns" precision with +/- 40 ns accuracy for a slow 25 MHz front-side bus, and "N ns" precision with +/- 1 ns accuracy for a fast 1 GHz front-side bus, assuming you use "divide by 1" prescaling.
For the local APIC timer you get the same race condition (the same "ignore the timer IRQ" flag method works).
Cheers,
Brendan
It is possible (my previous kernel did it).Silver Blade wrote:As for process scheduling... Is it possible/wise to adjust the PIT so that the tick-rate adjusts based on the priority of a process?
In "one-shot" mode (or "mode zero, interrupt on terminal count") you can.bewing wrote:Generally, you cannot reset the PIT in the middle of a countdown between interrupts.
If you set the PIT to "low byte only" or "high byte only" you can change the reload value with one I/O port write. With "low byte only" it ranges from 838 ns to 0.215 ms (which the chipset probably won't handle). With "high byte only" it ranges from 0.215 ms to 55 ms (which is what I used).bewing wrote:Also, resetting the PIT involves doing 2 OUT opcodes to IO ports, as I recall -- and that is a quite slow thing to do ... especially several million times a second.
Back-to-back I/O port accesses are slow because the CPU needs to wait for the first to finish before the second begins, but (AFAIK) the CPU doesn't need to wait for the an I/O port write to finish before executing normal code.
When the timer IRQ occurs you do one I/O port write to send the EOI to the PIC. To reprogram the timer you do one I/O port write. Therefore there is no back-to-back I/O port accesses and the CPU doesn't need to wait for them to complete.
The "best case" is where only one task can be run, where no timer IRQ is needed at all (there's no reason to end the time slice after N ms because there's no other tasks to switch to). This is especially true if the only task that can run happens to be the idle thread, as "no timer IRQ at all" means no need to wake the CPU from a sleep state to service the IRQ (huge improvement for power management, even if the sleep state is just a simple "hlt" instruction).Combuster wrote:Adjusting the PIT works well if the timeslices are significantly different in size and thus if you want to do the same with a fixed timer you get many timer ticks without schedules. Each of which requires an EOI. If the timeslice is at 4 ticks, the amount of OUTs break even. (EOI + 3x PIT vs 4x EOI) And then you haven't yet considered the time you spent in the rest of the interrupt handler.
If a task consumes it's entire time slice you break even if the task consumes 2 ticks (1 * EOI + 1 * PIT vs. 2 * EOI).
If a task doesn't consume it's entire time slice and blocks after 1 tick then you break even (1 * PIT vs. 1 * EOI). If it blocks after 2 or more ticks reprogramming the PIT has less overhead (1 * PIT vs. N * EOI). If it blocks after less than 1 tick then reprogramming the PIT has more overhead (1 * PIT vs. 0 * EOI).
Reprogramming the PIT also means far more accurate control over time slice lengths. With "fixed frequency" you'd probably use a 1 MHz timer frequency (or slower) and end up with "N * 1 ms" precision and +/- 1 ms accuracy (due to quantumization). With "one-shot, high byte only" you get "N * 0.215 ms" precision with +/- 838 ns accuracy (or about 4 times more precision and 1193 times more accuracy).
For the local APIC timer the accuracy/precision is much better: "N * 40 ns" precision with +/- 40 ns accuracy for a slow 25 MHz front-side bus, and "N ns" precision with +/- 1 ns accuracy for a fast 1 GHz front-side bus, assuming you use "divide by 1" prescaling.
To avoid the race condition, it's better to use an "ignore the timer IRQ" flag. You'd set the flag and call the scheduler, and the scheduler would (with interrupts enabled) reprogram the timer count and then clear the flag. The timer IRQ handler checks the flag and only ends the time slice if the flag is clear.Combuster wrote:If you want to fix the race, program the pit live to the maximum number of cycles and then call the scheduler (which will reprogram it again to the desired value) if the PIT happens to run over before that, the contraction point of the yield will end up in the next timeslice, which can just as well happen without reprogramming the PIT
For the local APIC timer you get the same race condition (the same "ignore the timer IRQ" flag method works).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Posts: 8
- Joined: Wed Jan 02, 2008 1:19 pm
- Location: Oxfordshire, UK
Sounds good - that also got me thinking that, instead of reprogramming the timer on each interrupt, I could probably have a "time to live" - set it when giving a thread the opportunity to run, then on each timer tick, decrement this counter on each timer IRQ and when it hits zero, it's time for something else to be scheduled.
I guess that may be how it's done at present?
Of course, this kinda goes backward from the idea of the reprogramming of the timer on each interrupt
But then surely it's a decision between:
a) Frequent timer IRQ calls (plus whatever overhead this entails)
b) Less frequent calls, slight more I/O overhead
I guess that may be how it's done at present?
Of course, this kinda goes backward from the idea of the reprogramming of the timer on each interrupt
But then surely it's a decision between:
a) Frequent timer IRQ calls (plus whatever overhead this entails)
b) Less frequent calls, slight more I/O overhead
Hi,
To use the PIT or local APIC timer/s in "one-shot" mode for scheduling and keeping track of the system clock you'd need to read the timer's "remaining count" and calculate time elapsed when a task blocks, and also read the timer's "remaining count" when something wants to know the current system clock time. It's a little messy, and would probably suffer from inaccuracy/drift.
For the next version of my OS I want more flexibility with timers. For example, a better idea would be to use RDTSC instead of the CMOS/RTC to keep track of the OS's time, but this only works in some situations (RDTSC isn't necesssarily "fixed frequency"). I'd also like to support HPET and use that to keep track of the OS's time if RDTSC isn't usable (and then only use CMOS/RTC as a last resort).
RDTSC is by far the most interesting method of keeping track of real time, as it doesn't use any IRQs and is extremely precise...
Cheers,
Brendan
I used the CMOS/RTC periodic IRQ to keep track of the system clock, and the PIT or local APIC timer/s to control the scheduler's time slice lengths (i.e. completely seperate timers for completely different purposes).bewing wrote:Plus a bit more difficulty keeping the system clock updated properly. You would probably need to be more proactive in keeping the system clock synced to the RealTime Clock.
To use the PIT or local APIC timer/s in "one-shot" mode for scheduling and keeping track of the system clock you'd need to read the timer's "remaining count" and calculate time elapsed when a task blocks, and also read the timer's "remaining count" when something wants to know the current system clock time. It's a little messy, and would probably suffer from inaccuracy/drift.
For the next version of my OS I want more flexibility with timers. For example, a better idea would be to use RDTSC instead of the CMOS/RTC to keep track of the OS's time, but this only works in some situations (RDTSC isn't necesssarily "fixed frequency"). I'd also like to support HPET and use that to keep track of the OS's time if RDTSC isn't usable (and then only use CMOS/RTC as a last resort).
RDTSC is by far the most interesting method of keeping track of real time, as it doesn't use any IRQs and is extremely precise...
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.