One kernel stack per CPU (again...?!)

bluemoon · Post by **bluemoon** » Fri Apr 20, 2012 11:16 am

Seeing you guys choose such design choice, I too deciding to pick one model to replace the current dummy multitasking model(one kstack per cpu, easy to do preempt directly).

Please correct me if I'm wrong, my concern are implement kernel threads and handle IRQ and preemption:

For kernel thread I can:
1. (enter kernel) syscall or interrupt trigger, control transfer from application to kernel, CR3 unchanged yet.
2. kernel does the handling (syscall or IRQ)
3. (leave kernel) scheduler pick a thread from { kernel threads, return to same user thread, diff thread of same user process, thread from other process }
4. if kernel thread -> transfer to it
5. if same process, same thread -> return to it
6. if same process, diff thread -> switch TCB
7. if diff process -> switch process

The idea is to add control flow to "leave kernel"

To handle IRQ
I plan to have a high priority kernel thread that is blocking on a message queue, when IRQ happen, the handler post a message to the queue and wake that thread (by marking such thread "the next one"? see below for preemption). the thread then fetch the message and dispatch to registered handler/drivers.

Preemption
I still have not design it well. Since for one kstack per CPU, the kernel usually do not switch process until it return to user. If a thread/preempt all we can do is mark that thread "next" or raise it's probability to be picked by the scheduler; so if another process preempted before the kernel return, there is chance that the previously preempted process may not run next, but later. Can it be an issue?

xenos · Post by **xenos** » Fri Apr 20, 2012 2:04 pm

bluemoon wrote:Please correct me if I'm wrong, my concern are implement kernel threads and handle IRQ and preemption:

I don't have any "kernel threads" except for some "idle thread" running an "sti; hlt;" loop (which is not even a real thread, it's just a wait loop that runs when there is no thread ready for execution), so please take my comments on this with a bit of caution.

For kernel thread I can:
1. (enter kernel) syscall or interrupt trigger, control transfer from application to kernel, CR3 unchanged yet.
2. kernel does the handling (syscall or IRQ)
3. (leave kernel) scheduler pick a thread from { kernel threads, return to same user thread, diff thread of same user process, thread from other process }
4. if kernel thread -> transfer to it
5. if same process, same thread -> return to it
6. if same process, diff thread -> switch TCB
7. if diff process -> switch process

The idea is to add control flow to "leave kernel"

In principle this looks fine to me. If you replace "kernel thread" by "wait loop", it's more or less what my kernel is doing.

To handle IRQ
I plan to have a high priority kernel thread that is blocking on a message queue, when IRQ happen, the handler post a message to the queue and wake that thread (by marking such thread "the next one"? see below for preemption). the thread then fetch the message and dispatch to registered handler/drivers.

I guess it would be more efficient to have a separate message queue for each IRQ, and to completely remove this "high priority kernel thread". Instead, the interrupt handler would push a message to the IRQ's message queue and wake some handler / driver thread that is waiting on that queue. There is no need for an intermediate kernel step which simple dispatches the message from the queue to another thread.

Preemption
I still have not design it well. Since for one kstack per CPU, the kernel usually do not switch process until it return to user. If a thread/preempt all we can do is mark that thread "next" or raise it's probability to be picked by the scheduler; so if another process preempted before the kernel return, there is chance that the previously preempted process may not run next, but later. Can it be an issue?

I'm not quite sure whether I get the point. The case that two threads get preempted at nearly the same time can happen only on systems with multiple CPUs (otherwise you wouldn't have more than one thread running at the same time). You need to make sure that both CPUs can enter the kernel safely without causing problems. One possibility would be to have separate thread queues for each CPU, so that scheduling one thread on one CPU doesn't affect the other CPUs.

bluemoon · Post by **bluemoon** » Fri Apr 20, 2012 2:35 pm

1. Yes I'll have message queue for each IRQ. However since multiple drivers could share the same IRQ I think I need a registration system and kernel thread to dispatch it. Your solution sound like IRQ handler doing condition variable broadcast() and wake all blocking drivers and put them in high priority, this sounds elegant too.

2. The trouble is timer fire during kernel thread, and the thread used all quantum and need to reschedule.
EDIT: never mind I can handle #2.

xenos · Post by **xenos** » Fri Apr 20, 2012 3:13 pm

bluemoon wrote:1. Yes I'll have message queue for each IRQ. However since multiple drivers could share the same IRQ I think I need a registration system and kernel thread to dispatch it. Your solution sound like IRQ handler doing condition variable broadcast() and wake all blocking drivers and put them in high priority, this sounds elegant too.

Instead of sending a broadcast message, you could do something like the following: If an IRQ happens, your kernel sends an IPC from some "interrupt thread" to the first driver thread which is listening on that IRQ. If the driver can handle the IRQ, it sends an acknowledge IPC back to the interrupt thread and the kernel knows that the IRQ has been handled. Otherwise, the driver sends a different IPC and tells the interrupt thread that it cannot handle the IRQ, so it must be sent to the next driver in the list.

The "interrupt thread" doesn't need to be a real thread. The kernel could simply simulate the behavior of some thread which then appears as the sender and receiver of the IPC. For example, the IPC syscall could check whether the destination thread ID is within some reserved range for interrupt threads and call the IRQ dispatcher directly.

2. The trouble is IRQ fire during kernel thread. To avoid leaking in stack, perhaps I can use Brendan's idea on checking ISR's caller CPL, and if it's ring0 (ie no stack switch) I just IRET, otherwise I can call exit_kernel() safely without worry on growing esp as next kernel_enter it reset to esp0.

The problem with this idea is that 1. kernel threads can never be preempted unless they voluntarily yield the CPU, and that 2. IRQs are not handled and get lost if a kernel thread is running. Even worse, if you simply iret and don't send an EOI to the PIC or APIC, you will not get any further interrupts at all.

Actually this is the reason why I think that kernel threads are a bit problematic in the "one kernel stack per CPU" model, and why I don't have them in my kernel anymore. To avoid stack overflows from nested interrupts, there are only few points in the kernel where interrupts are allowed while the kernel stack is in use. These points are outside any nested function levels (which would fill up the stack), so that the kernel stack is almost empty when an interrupt occurs.

The only thing that still may cause problems in my approach is (possibly nested) NMIs since these cannot be prevented from happening in deeply nested function calls. I haven't solved this problem yet.

Brendan · Post by **Brendan** » Fri Apr 20, 2012 4:39 pm

Hi,

XenOS wrote:Instead of sending a broadcast message, you could do something like the following: If an IRQ happens, your kernel sends an IPC from some "interrupt thread" to the first driver thread which is listening on that IRQ. If the driver can handle the IRQ, it sends an acknowledge IPC back to the interrupt thread and the kernel knows that the IRQ has been handled. Otherwise, the driver sends a different IPC and tells the interrupt thread that it cannot handle the IRQ, so it must be sent to the next driver in the list.

That's what I'd do; except that the kernel's IRQ handlers would send "IRQ occurred messages" directly to the first driver/s in the list and the kernel would have a "IRQ_completed(IRQ_number, status)" function that drivers would call (which would either send the EOI to the PIC/APIC or send an "IRQ occurred message" to the next driver in the list). There would be no kernel thread.

I'd also make sure the IRQ's "list of drivers to send messages to" is kept sorted in order from "most likely to have caused the IRQ" to "least likely to have caused the IRQ". This would involve tracking some statistics and re-sorting the list when things change; but would improve "average latency" and reduce overhead.

For multi-CPU systems I'd also be tempted to send an "IRQ occurred message" to the first group of N drivers in the IRQ's list (rather than just the first driver in the list) where N is the number of CPUs. That way several device drivers can be checking if their device caused the IRQ at the same time (on different CPUs). This would improve latency when the first driver's device wasn't responsible for the generating IRQ (at the expense of a little extra overhead).

Cheers,

Brendan

bluemoon · Post by **bluemoon** » Fri Apr 20, 2012 5:17 pm

Thanks. This inspired me.

gerryg400 · Post by **gerryg400** » Fri Apr 20, 2012 6:33 pm

bluemoon wrote:To handle IRQ
I plan to have a high priority kernel thread that is blocking on a message queue, when IRQ happen, the handler post a message to the queue and wake that thread (by marking such thread "the next one"? see below for preemption). the thread then fetch the message and dispatch to registered handler/drivers.

When I get an interrupt I allow the ISR to optionally queue a message to a special interrupt-msg-queue for the current core. Once all the (usually one but possibly nested) interrupts are processed, before returning to userspace the scheduler takes the messages from the interrupt-msg-queue and delivers them to their proper destinations. I do it this way to make sure I don't need any locking between the interrupt code and the msg delivery code. Interrupts are always enabled in my kernel. I don't use a kernel thread. It's not necessary in my design because interrupt message delivery is higher priority than any thread and so the scheduler does it inline.

bluemoon wrote:Preemption
I still have not design it well. Since for one kstack per CPU, the kernel usually do not switch process until it return to user. If a thread/preempt all we can do is mark that thread "next" or raise it's probability to be picked by the scheduler; so if another process preempted before the kernel return, there is chance that the previously preempted process may not run next, but later. Can it be an issue?

Even though the kernel always executes on a single stack, there is still a 'current thread'. It's the thread that would be returned to if you did an iret (or sysexit or sysret). The trick is to keep to a minimum the amount of context information that is kept on the single kernel stack so that switching threads can be done very easily. So during a system call, as much info as possible should be manually stored in the TCB. That way threads can voluntarily give up the CPU if they are not making progress or a higher priority thread is ready.

bluemoon · Post by **bluemoon** » Sat Apr 21, 2012 2:12 pm

My attempt to change to one kernel stack per CPU was a total failure

I revert to one kstack per thread approach, but the scheduler now being smarter that do not switch CR3 if activating a "kernel process".

My "kernel process" are the same structure as ordinary process, except it runs in ring0 and do not require switching CR3.
each kernel process has threads, and all this subject to the same scheduling policy.

For IRQ handling, I take your advices. IRQ handler ask the registered handlers, if handled it send EOI and optionally preempt into driver's thread. No kernel thread involved here.

Thanks all of you guys.

OSDev.org

One kernel stack per CPU (again...?!)

One kernel stack per CPU (again...?!)

Re: One kernel stack per CPU (again...?!)

Re: One kernel stack per CPU (again...?!)

Re: One kernel stack per CPU (again...?!)

Re: One kernel stack per CPU (again...?!)

Re: One kernel stack per CPU (again...?!)

Re: One kernel stack per CPU (again...?!)

Re: One kernel stack per CPU (again...?!)