in-kernel event system

oscoder · Post by **oscoder** » Fri May 15, 2020 8:37 am

Hi, I've been back working on my kernel for the first time in a while. My main goal is code readability and elegance - not speed - so I'm able to try out a few interesting ideas

One is to generalise interrupts into events, which other parts of the kernel can sign up their functions to, to be called when they are triggered. Three reasons. First, it makes portability easier because the internal details of how interrupts work don't need to be known by system call handlers, etc. Second, it helps code separation. My interrupt handlers don't need to have it hardcoded which parts of the kernel to call when there's a page fault for example - the logic is pushed elsewhere. Finally, it makes things more extensible. I can have a system call handler that changes the page tables of another process, without doing permissions checks. Instead of making the memory subsystem security-aware and putting checks in there, I can just make a security module that hooks into the system call event but with a higher priority. It then rejects or accepts the call, without the memory subsystem getting involved. As well as simpler, neater code this means I can write a totally new permissions system, using a totally different paradigm, without having to change any other code.

Any other kernels use something similar? Are there any issues with this which I may not have thought of?

AndrewAPrice · Post by **AndrewAPrice** » Fri May 15, 2020 1:41 pm

Handling interrupts as events works in my OS. Although I only do it for hardware interrupts.

The advantage of doing this for any interrupt (including the page fault handler) is that it makes it easy to wake up some kernel thread (or user thread in a microkernel) to do the actual handling, and this thread could do something slow such as log the error or swap a page out to disk, or connect to a remote debugger over a network.

The disadvantage of this method is what to do if interrupts are firing faster than you can handle them and your event queue gets full?

I'm my kernel where I'm sending messages to user processes, and if the driver is too slow that its queue gets full, then the messages just get dropped. What will you do if processes in your system start page faulting faster than you can handle them? (Some options could be to be to freeze all of the process's threads until the event is done processing, or declare all hope lost and kill the firing application if the queue gets full.)

oscoder · Post by **oscoder** » Fri May 15, 2020 5:38 pm

The disadvantage of this method is what to do if interrupts are firing faster than you can handle them and your event queue gets full?

I was planning to just disable interrupts while one is being handled... is that a mistake will I miss hardware interrupts while it's ongoing?

Korona · Post by **Korona** » Sat May 16, 2020 12:55 am

Since this topic of event systems seems to be a recurring theme these days, let me report my experiences from the fully asynchronous event handling in Managarm.

First, let's talk about IRQs. IRQs give you only one bit of information: did the IRQ happen or not? The same applies to notifications that poll(), select() or epoll() return. You only get one bit of information per possible event: did this file become readable in the meantime? Did it become writable? The first crucial observation is that these kinds of events can be handled in a stateless way (from the kernel's point of view). The main technique that is required for this are sequence numbers. Sequence numbers are incredibly powerful, even for more complex events. In the case of IRQs, it works as follows: the IRQ handler increments a sequence number. Threads (whether they are in the kernel or in userspace) can wait for the sequence number to increment. In the case that two IRQs happened in the meantime, the consumer just sees that the sequence number incremented twice (but for well-designed drivers, this does not make a difference). This mechanism does not require any per-consumer state inside the kernel since each consumer just keeps track of the last sequence number that it saw. Thus, there is no resource exhaustion issue since the kernel never queues an unbounded number of notifications.¹

A second crucial observation is that a pull-based model is much easier to control than a push-based model w.r.t. resource accounting and concurrency issues. Here, by push-based, I mean a model where the kernel (or any producer) pushes notifications to consumers, e.g., by invoking a callback. On the other hand, in a pull-based model, the consumer asks for notifications (e.g., the Linux poll() function follows this model). From a concurrency point of view, the main advantage of a pull-based model is that all operations flow into one direction: from the consumer to the producer. Let's visualize the direction of the operations:

Code: Select all

PUSH-BASED
Consumer  -------- attach consumer -------> Producer
Consumer <-------- invoke callback -------  Producer [X]
Consumer  -------- detach consumer -------> Producer

PULL-BASED
Consumer  -------- attach consumer -------> Producer
Consumer  ------- pull notification ------> Producer [✓]
Consumer  -------- detach consumer -------> Producer

This means that the producer can freely take a lock to process the operation without any chance of deadlocking. In the push-based case, one needs to be careful not to introduce producer -> consumer -> producer deadlocks. It also makes attaching and detaching consumers easier since the attach/detach operations are ordered w.r.t. the operations that pull new notifications from the producer. In the push-based case, some mechanism is required that makes sure that after a consumer is detached, no new notifications will be sent (not even from concurrently running CPUs). This is hard because producers cannot just call into consumers with locks held (otherwise, the producer -> consumer -> producer deadlock can occur). Regarding accounting, it is easier to ensure that resource exhausting does not happen: the number of concurrently queued operations can just be limited on a per-consumer basis. In almost all situations, the producer just has to buffer a single operation per consumer at a time.

¹ As a side note: yes, (level-triggered) IRQs do have to be masked when they are not handled synchronously, but that's an orthogonal issue. Other than the overhead of two additional writes to the PIC, it is also does not impact the system's performance.

AndrewAPrice · Post by **AndrewAPrice** » Sat May 16, 2020 7:59 pm

oscoder wrote:
The disadvantage of this method is what to do if interrupts are firing faster than you can handle them and your event queue gets full?
I was planning to just disable interrupts while one is being handled... is that a mistake will I miss hardware interrupts while it's ongoing?

If you're disabling interrupts AND handling the interrupt in the kernel, what advantage will you have by building a messaging system and not just a table of function pointers to call?

OSDev.org

in-kernel event system

in-kernel event system

Re: in-kernel event system

Re: in-kernel event system

Re: in-kernel event system

Re: in-kernel event system