Objects, rings/privileges, modules, IPC, etc.

Silver Blade · Post by **Silver Blade** » Wed Jan 02, 2008 1:57 pm

Hi all, newbie here!

I've just started some hobby OS development, and have always had a keen interest in how the low-level stuff works. So it's nice to see there's a wealth of information now available as to how to create a basic kernel, and the tools (in my case, gcc/g++, GRUB, vmware...) exist to create and boot a kernel with less pain than used to be involved.

So, I've got my "Hello World" kernel, it's happily loading at 0xC0000000 with paging enabled, a simple GDT loaded (0 = null, 1 = code, 2 = data. Both the last two span 4 GB), the IDT set up and filled with 32 ISRs and able to handle interrupts...

What now?

Well, I've re-written what I have so far in a few different ways, finally settling on C++ with some ASM to bring it all to life. I'm keen on object-oriented coding so am aiming to make everything object-oriented.

The more trivial objects are intended to be compiled inline, thus stored in headers. This includes things like "Port", which is a simple wrapper around port I/O inline ASM calls, that takes a port number in its constructor. So it ought to be fairly speedy (no slower than calling a C function from another module).

This idea can also be extended to the PICs, which could be wrapped in a class that takes the base address of the PIC and allows you to program it. Of course, there'd be 2 of these - to cater for the master and slave.

(Though I gave up trying to make GDT, IDT and even the CPU and memory as objects... They work better as namespaces!!).

Anyway, what I'm curious about mainly at this stage is what the different CPU rings/privilege levels can be used for? My understanding is you can have the most critical code in ring 0, and then for rings 1 and 2 have things like drivers and GUI, and then in ring 3 standard apps.

A lot of operating systems seem to be designed to be portable - presumably by creating a kernel tailored to a particular architecture, you a) can take advantage of non-portable instructions and features and b) don't have to write so many drivers ?

Also I'm fairly interested in this microkernel stuff, although I hear that these do not offer as great performance as a monolithic kernel. What's the situation with this?

My main reason for liking the idea of microkernels is the modular aspect. Though my understanding so far is you need some kind of mechanism for communicating between the modules. Could this be implemented as a set of "stubs"? Whereby an application calls on another module, and the request then gets automatically piped through to the appropriate place and called as if the app was directly calling it...

Or, I guess each module could be treated as an object, and have instances created... In the case of streams, each instance might have some kind of identifier (network socket, file descriptor, etc.) associated with it, some operations/actions (receive/send) and some memory shared between the "driver" and the application.

I'm guessing that sharing memory between 2 processes would involve some tweaking of the GDT? I don't know... Enlighten me

As for process scheduling... Is it possible/wise to adjust the PIT so that the tick-rate adjusts based on the priority of a process? Obviously I'd need to compensate for this for any kind of internal counters that are based on time, and it may be a stupid idea... Alternatively, can real-time tick rate adjustment be done?

Finally, what sort of issues need I be aware of in the case of SMP and APIC? My understanding is that recent systems have something like 32 IRQs... And would there be interrupts occurring all over the place on both processors?

Forgive me for being totally clueless with regard to the possibilities, I'm just curious as to what can be achieved...

AJ · Post by AJ » Wed Jan 02, 2008 2:10 pm

Silver Blade wrote:Hi all, newbie here!

Hi - Good to see another UK-based os devver here

Silver Blade wrote: Anyway, what I'm curious about mainly at this stage is what the different CPU rings/privilege levels can be used for? My understanding is you can have the most critical code in ring 0, and then for rings 1 and 2 have things like drivers and GUI, and then in ring 3 standard apps.

'Rings' are a segmentation-based protection idea. You will find that nowadays, most people only bother with rings 0 and 3. In fact, the page-level memory protection system only has two privileges - 'user' (which allows execution and read/write from ring 3) and 'supervisor'.

Also I'm fairly interested in this microkernel stuff, although I hear that these do not offer as great performance as a monolithic kernel. What's the situation with this?

Silver Blade wrote: Because a microkernel has all it's 'servers' as different processes, a task switch must occur every time you want a server to do something. On a monolithic kernel, with the kernel in the higher half, no such switch is required. This makes IPC faster at the risk of a lower protection level.

Silver Blade wrote: My main reason for liking the idea of microkernels is the modular aspect.

Absolutely no reason why you can't have a modular monolithic kernel (like Linux) - just load modules in to the same address space as the kernel.

Silver Blade wrote: I'm guessing that sharing memory between 2 processes would involve some tweaking of the GDT? I don't know... Enlighten me

I wouldn't advise this - use paging instead and keep with the same 4GB segments you are currently using. Segmentation is not implemented to the same extent in 64 bit processors and will not be extended in to the future, so stick with a flat memory model.

The key to paging, is that you construct virtual memory in such a way that each process can only see the memory that belongs to it.

Silver Blade wrote: Forgive me for being totally clueless with regard to the possibilities, I'm just curious as to what can be achieved...

We all have to start somewhere. Sorry for the somewhat sketchy response - I'll have a more in depth look if I get a chance later!

Cheers,
Adam

bewing · Post by **bewing** » Wed Jan 02, 2008 4:40 pm

Silver Blade wrote: Anyway, what I'm curious about mainly at this stage is what the different CPU rings/privilege levels can be used for? My understanding is you can have the most critical code in ring 0, and then for rings 1 and 2 have things like drivers and GUI, and then in ring 3 standard apps.

Ring3 usually runs in virtual memory, typically with no special privileges (user-level applications). If you choose to base your highly privileged kernel/microkernel (running in ring0) in physical memory, rather than virtual memory -- then you have a choice to make regarding device drivers. Are they going to run in physical or virtual memory, and just how privileged are you going to make them? And if they run in virtual memory, how are they going to access physical-memory-mapped spaces, etc.? If you want device drivers NOT to run in virtual memory, and NOT to be highly privileged -- then you need to stick them in some other ring than 0 or 3. But as AJ says, if you are running everything in virtual memory, then this ring stuff makes no real difference.

Silver Blade wrote: presumably by creating a kernel tailored to a particular architecture, you a) can take advantage of non-portable instructions and features and b) don't have to write so many drivers?

I agree, but a lot of the programmers here would consider such a thing to be heresy. Many seem to consider it vital to code for the broadest-possible installed base. Narrowing the OS to a particular architecture makes it easier to debug, too.

Silver Blade wrote: I'm guessing that sharing memory between 2 processes would involve some tweaking of the GDT? I don't know... Enlighten me

Like AJ said -- not the GDT. It's the virtual memory Page Directory Tables that need to be tweaked, usually.

Silver Blade wrote: As for process scheduling... Is it possible/wise to adjust the PIT so that the tick-rate adjusts based on the priority of a process?

Interesting idea, and for higher-priorities, it could probably work -- but it sounds unwise. Generally, you cannot reset the PIT in the middle of a countdown between interrupts. So your first and last countdowns would not be the length you want, if threads were relinquishing their timeslices early. Also, as I understand it, you can run into problems if you are trying to reset the countdown start value at the exact moment the PIT is rolling over. If you only do it once, during boot, this is clearly not a problem. But you are suggesting doing it several million times a second. Also, resetting the PIT involves doing 2 OUT opcodes to IO ports, as I recall -- and that is a quite slow thing to do ... especially several million times a second.

Combuster · Post by **Combuster** » Wed Jan 02, 2008 5:57 pm

bewing wrote:
Silver Blade wrote: As for process scheduling... Is it possible/wise to adjust the PIT so that the tick-rate adjusts based on the priority of a process?
Interesting idea, and for higher-priorities, it could probably work -- but it sounds unwise. Generally, you cannot reset the PIT in the middle of a countdown between interrupts. So your first and last countdowns would not be the length you want, if threads were relinquishing their timeslices early. Also, as I understand it, you can run into problems if you are trying to reset the countdown start value at the exact moment the PIT is rolling over. If you only do it once, during boot, this is clearly not a problem. But you are suggesting doing it several million times a second. Also, resetting the PIT involves doing 2 OUT opcodes to IO ports, as I recall -- and that is a quite slow thing to do ... especially several million times a second.

I don't expect anybody to schedule a million times a second. that means that there are 1000 cycles for each thread of which half is to be spent on doing scheduler stuff, without even considering OUTs.

Adjusting the PIT works well if the timeslices are significantly different in size and thus if you want to do the same with a fixed timer you get many timer ticks without schedules. Each of which requires an EOI. If the timeslice is at 4 ticks, the amount of OUTs break even. (EOI + 3x PIT vs 4x EOI) And then you haven't yet considered the time you spent in the rest of the interrupt handler.
Essentially you improve the worst case, and lessen the best case (if the slice is given up, you get relatively more overhead, but it also means that there isn't much else for the processor to do anyway)

If you want to fix the race, program the pit live to the maximum number of cycles and then call the scheduler (which will reprogram it again to the desired value) if the PIT happens to run over before that, the contraction point of the yield will end up in the next timeslice, which can just as well happen without reprogramming the PIT

Now consider doing it with the local APIC timer, which does *not* need any slow OUTs

Silver Blade · Post by **Silver Blade** » Thu Jan 03, 2008 12:14 am

Actually, I'd like to know more about using the APIC. It seems pretty interesting.

Some of this "legacy compatible" stuff like having 2 PICs and 2 DMA controllers which mean you lose an IRQ and 2 DMAs is rather silly. But it does make writing code to manage it more interesting...

Brendan · Post by **Brendan** » Thu Jan 03, 2008 10:22 am

Hi,

Silver Blade wrote:As for process scheduling... Is it possible/wise to adjust the PIT so that the tick-rate adjusts based on the priority of a process?

It is possible (my previous kernel did it).

bewing wrote:Generally, you cannot reset the PIT in the middle of a countdown between interrupts.

In "one-shot" mode (or "mode zero, interrupt on terminal count") you can.

bewing wrote:Also, resetting the PIT involves doing 2 OUT opcodes to IO ports, as I recall -- and that is a quite slow thing to do ... especially several million times a second.

If you set the PIT to "low byte only" or "high byte only" you can change the reload value with one I/O port write. With "low byte only" it ranges from 838 ns to 0.215 ms (which the chipset probably won't handle). With "high byte only" it ranges from 0.215 ms to 55 ms (which is what I used).

Back-to-back I/O port accesses are slow because the CPU needs to wait for the first to finish before the second begins, but (AFAIK) the CPU doesn't need to wait for the an I/O port write to finish before executing normal code.

When the timer IRQ occurs you do one I/O port write to send the EOI to the PIC. To reprogram the timer you do one I/O port write. Therefore there is no back-to-back I/O port accesses and the CPU doesn't need to wait for them to complete.

Combuster wrote:Adjusting the PIT works well if the timeslices are significantly different in size and thus if you want to do the same with a fixed timer you get many timer ticks without schedules. Each of which requires an EOI. If the timeslice is at 4 ticks, the amount of OUTs break even. (EOI + 3x PIT vs 4x EOI) And then you haven't yet considered the time you spent in the rest of the interrupt handler.

The "best case" is where only one task can be run, where no timer IRQ is needed at all (there's no reason to end the time slice after N ms because there's no other tasks to switch to). This is especially true if the only task that can run happens to be the idle thread, as "no timer IRQ at all" means no need to wake the CPU from a sleep state to service the IRQ (huge improvement for power management, even if the sleep state is just a simple "hlt" instruction).

If a task consumes it's entire time slice you break even if the task consumes 2 ticks (1 * EOI + 1 * PIT vs. 2 * EOI).

If a task doesn't consume it's entire time slice and blocks after 1 tick then you break even (1 * PIT vs. 1 * EOI). If it blocks after 2 or more ticks reprogramming the PIT has less overhead (1 * PIT vs. N * EOI). If it blocks after less than 1 tick then reprogramming the PIT has more overhead (1 * PIT vs. 0 * EOI).

Reprogramming the PIT also means far more accurate control over time slice lengths. With "fixed frequency" you'd probably use a 1 MHz timer frequency (or slower) and end up with "N * 1 ms" precision and +/- 1 ms accuracy (due to quantumization). With "one-shot, high byte only" you get "N * 0.215 ms" precision with +/- 838 ns accuracy (or about 4 times more precision and 1193 times more accuracy).

For the local APIC timer the accuracy/precision is much better: "N * 40 ns" precision with +/- 40 ns accuracy for a slow 25 MHz front-side bus, and "N ns" precision with +/- 1 ns accuracy for a fast 1 GHz front-side bus, assuming you use "divide by 1" prescaling.

Combuster wrote:If you want to fix the race, program the pit live to the maximum number of cycles and then call the scheduler (which will reprogram it again to the desired value) if the PIT happens to run over before that, the contraction point of the yield will end up in the next timeslice, which can just as well happen without reprogramming the PIT

To avoid the race condition, it's better to use an "ignore the timer IRQ" flag. You'd set the flag and call the scheduler, and the scheduler would (with interrupts enabled) reprogram the timer count and then clear the flag. The timer IRQ handler checks the flag and only ends the time slice if the flag is clear.

For the local APIC timer you get the same race condition (the same "ignore the timer IRQ" flag method works).

Cheers,

Brendan

Silver Blade · Post by **Silver Blade** » Thu Jan 03, 2008 11:49 am

Sounds good - that also got me thinking that, instead of reprogramming the timer on each interrupt, I could probably have a "time to live" - set it when giving a thread the opportunity to run, then on each timer tick, decrement this counter on each timer IRQ and when it hits zero, it's time for something else to be scheduled.

I guess that may be how it's done at present?

Of course, this kinda goes backward from the idea of the reprogramming of the timer on each interrupt

But then surely it's a decision between:
a) Frequent timer IRQ calls (plus whatever overhead this entails)
b) Less frequent calls, slight more I/O overhead

bewing · Post by **bewing** » Thu Jan 03, 2008 4:16 pm

Plus a bit more difficulty keeping the system clock updated properly. You would probably need to be more proactive in keeping the system clock synced to the RealTime Clock.

Brendan · Post by **Brendan** » Thu Jan 03, 2008 11:10 pm

Hi,

bewing wrote:Plus a bit more difficulty keeping the system clock updated properly. You would probably need to be more proactive in keeping the system clock synced to the RealTime Clock.

I used the CMOS/RTC periodic IRQ to keep track of the system clock, and the PIT or local APIC timer/s to control the scheduler's time slice lengths (i.e. completely seperate timers for completely different purposes).

To use the PIT or local APIC timer/s in "one-shot" mode for scheduling and keeping track of the system clock you'd need to read the timer's "remaining count" and calculate time elapsed when a task blocks, and also read the timer's "remaining count" when something wants to know the current system clock time. It's a little messy, and would probably suffer from inaccuracy/drift.

For the next version of my OS I want more flexibility with timers. For example, a better idea would be to use RDTSC instead of the CMOS/RTC to keep track of the OS's time, but this only works in some situations (RDTSC isn't necesssarily "fixed frequency"). I'd also like to support HPET and use that to keep track of the OS's time if RDTSC isn't usable (and then only use CMOS/RTC as a last resort).

RDTSC is by far the most interesting method of keeping track of real time, as it doesn't use any IRQs and is extremely precise...

Cheers,

Brendan

OSDev.org

Objects, rings/privileges, modules, IPC, etc.

Objects, rings/privileges, modules, IPC, etc.

Re: Objects, rings/privileges, modules, IPC, etc.

Re: Objects, rings/privileges, modules, IPC, etc.

Re: Objects, rings/privileges, modules, IPC, etc.

Re: Objects, rings/privileges, modules, IPC, etc.