New approach to scheduling (microkernel)

vlad9486 · Post by **vlad9486** » Sat Jul 20, 2024 12:14 pm

I would like to discuss an idea. I have not worked out all the details, I want to know your thoughts on whether it is even viable.

I want to eliminate the thread/process table in the kernel. First, userspace can allocate a page that it is not allowed to read/write (system page). This page is used as a thread control block (for debugging, this page could be a normal page, so the parent process could inspect/edit the child's registers). Second, the user space task (parent) can make a syscall to call an entry function in another address space (child) with a timeout and specifying this thread control block pointer. The child will call a syscall to return (or call deeper), so the kernel should have a stack of stacks for each cpu core to know where to return. If the timeout occurs, this syscall will return the error code "timeout" and store the state in the thread control block. If an interrupt occurs, the provided TCB will also be used, but the parent will not notice anything.

That's how userspace could implement the scheduler. There is no global thread table, but a tree. And each node of the tree will implement a scheduler (of course it will use the same, possibly shared, library, but it could implement scheduling on its own).

Does this approach have a name? I could not google it.

nullplan · Post by **nullplan** » Sun Jul 21, 2024 9:35 am

This does sound horrifyingly like voluntary multitasking. Your approach has several of its weaknesses:

Since there is no preemption (every child directs the scheduling itself), a hung process brings down the whole system. Well, only a CPU in your case, but this also leads me to:
You are going to have subpar CPU utilization. Since each child is allocated on the same CPU as its parent, having a process that spawns many threads expecting to share the load around to other CPUs will instead mean that the threads all are stuck on the same CPU.
Without a kernel process table, there is no kernel process management, and therefore no Task Manager and no way to kill badly behaved programs.
And for all of these issues you get: A slightly slimmer kernel. Not really a big advantage, is it?

vlad9486 · Post by **vlad9486** » Sun Jul 21, 2024 1:48 pm

Thanks for the answer.

Of course there is preemption. The child will be invoked with a timeout. This is the main point! The code just calls another code between address spaces like a normal function call (but through the kernel). And since we cannot fully trust the code in another address space (which is the point of having address spaces in the first place), we need a timeout. This is enough (I think) to replace classical threads and scheduling.

The parent address space could be executed on multiple CPUs and could schedule its children on multiple CPUs. I need to think more about this. This is a good point.

Yes, the kernel cannot kill a misbehaving process, but the parent can! And if parent itself misbehaves, it has another parent, and so on. Of course, an application can misbehave, but the kernel can too.

I aim for locality, better to solve the problem closer to where it is, where we have more information. Parent process knows what childs it has. Kernel just knows it is task0, task1, task2, and so on. Of course parrent can better schedule them.

rdos · Post by **rdos** » Mon Jul 22, 2024 12:57 am

I fail to understand how the timeout would work, and how this can be used to implement preemption. Just because the syscall returns "timeout" to the caller, doesn't mean you have solved the issue. First, you will have race conditions when detecting the timeout. Second, what happens on the child side when the timeout occurs? Just because the parent side returns with timeout, doesn't mean the child side code stops executing. Either you try to stop it from executing after signalling timeout, which might leave the code owing critical resources that will block other code, or the code simply continues until it eventually returns.

In a multicore OS, which you appear to want to support, scheduling is quite tricky, and distributing this in various user spaces seems like a synchronization nightmare to me. You should keep executing threads per CPU core to minimize these problems.

iansjack · Post by **iansjack** » Mon Jul 22, 2024 1:46 am

I aim for locality, better to solve the problem closer to where it is, where we have more information. Parent process knows what childs it has. Kernel just knows it is task0, task1, task2, and so on. Of course parrent can better schedule them.

That seems like fatally false assumption to me. A well written kernel should be looking at the overall state of the system and what is best for that overall state. Individual processes are necessarily selfish, looking out for themselves, and quite possibly malicious.

You can assume that the kernel is not, by design, malicious. You should work under the assumption that a user process may well be malicious and design your tasking model accordingly. What if the parent decides to call the syscall with an indefinite timeout? It could, effectively, kill every other process. Anarchy may be fine in theory, but in practice you need one ring to rule them all.

vlad9486 · Post by **vlad9486** » Mon Jul 22, 2024 1:49 am

rdos wrote: ↑Mon Jul 22, 2024 12:57 am I fail to understand how the timeout would work, and how this can be used to implement preemption.

Parent code does syscall "invoke". This is synchronous syscall, everything in my picture is synchronous. Like this (I did not use C for many years, this is pseudocode):

Code: Select all

int address_space_id = sys_retain_address_space("/user/some_application");
int timeout = 10; // milliseconds
tcb *context = sys_alloc_tcb();

// this call will block until the child address space function returns or a timeout occurs;
// if the timeout occurs, the context of the call remains in the TCB and you can call it again later;
int invocation_result = sys_invoke(address_space_id, context, timeout);
if (invocation_result == timeout) {
	// invoke something else
}

In the kernel, syscall "sys_invoke" will set timer and proceed in child's address space in _entry function. When the timeout occurs, the execution in the child's address space will be interrupted (timer interrupt) and the kernel will see that this is a timeout and proceed in the parent's address space (and set "rax" register with "timeout" error code).

First, you will have race conditions

How can I have a race condition if everything is on the single CPU? Of course there can be more CPUs running, but they will not interact with this picture. Parent calls child synchronously, control flow is transferred. When the timeout occurs, control flow is transferred back to the parent.

which might leave the code owing critical resources that will block other code

The same could happen with classical threads.

vlad9486 · Post by **vlad9486** » Mon Jul 22, 2024 1:54 am

iansjack wrote: ↑Mon Jul 22, 2024 1:46 am Individual processes are necessarily selfish, looking out for themselves, and quite possibly malicious.

Of course they are selfish, but if they spawn childs, they will take care of the child selfishly. Otherwise, why to spawn child at all? So they will take care of scheduling the childs properly.

vlad9486 · Post by **vlad9486** » Mon Jul 22, 2024 2:04 am

iansjack wrote: ↑Mon Jul 22, 2024 1:46 am What if the parent decides to call the syscall with an indefinite timeout? It could, effectively, kill every other process.

If it is topmost parent, there is no way to prevent this. That's why topmost parent is part of the OS (as well as drivers). If this is a different parent (in the middle), it will also be timed out (by its parent).

rdos · Post by **rdos** » Mon Jul 22, 2024 2:36 am

So you use invoke and a IRQ to signal timeout? This means that you child code can be interrupted in kernel space, and while holding critical kernel resources/locks that might bring down the whole system. If you redesign it and use a user space timer instead, then it will become a bit more reasonable, but then the child can manipulate with the timer to gain unlimited access to the CPU.

How do you plan to implement waits and voluntary yields in the context? For instance, the child code might decide it must wait for something before it can complete, and so it either must busy-wait, or block itself. Another scenario is that the child code might become blocked in kernel, and then how would the timeout operate? You don't want to use busy-wait in kernel, which is why a typical scheduler will pick another thread to run, but in your design, there is no kernel scheduler, so should kernel code use busy-waits instead??

iansjack · Post by **iansjack** » Mon Jul 22, 2024 2:41 am

To come back to your original post, why do you want to eliminate the process table? I can't really understand why sys calls with timeouts are any better than time slices with niceness values. Isn't the end result the same, but with a simpler structure (table vs tree) to manage the processes?

vlad9486 · Post by **vlad9486** » Mon Jul 22, 2024 2:57 am

rdos wrote: ↑Mon Jul 22, 2024 2:36 am This means that you child code can be interrupted in kernel space, and while holding critical kernel resources/locks that might bring down the whole system.

I cannot see any difference with classic scheduling. In classic OS child process is interrupted by timer and another thread runs instead. In my OS it is the same except the kernel does not decide which thread to run next, but always runs the parent and the parent decides.

How do you plan to implement waits and voluntary yields in the context?

This is a good question. If the task has to wait for an interrupt, its timer will be paused. There will be more possible return values along with "timeout", at least "waiting". The parent will get such a status and invoke something else (or call the syscall "yield" if there is nothing more to schedule). The status "waiting" means that the child is waiting for an interrupt. But, how does the corresponding parent know that its child has finished waiting for an interrupt and can proceed? This question remains.

vlad9486 · Post by **vlad9486** » Mon Jul 22, 2024 2:59 am

iansjack wrote: ↑Mon Jul 22, 2024 2:41 am To come back to your original post, why do you want to eliminate the process table? I can't really understand why sys calls with timeouts are any better than time slices with niceness values. Isn't the end result the same, but with a simpler structure (table vs tree) to manage the processes?

The fair scheduler gives each thread the same time slice. But process A spawns 2 threads and process B spawns 20 threads. Process B will get 10 times more process time. It that fair?

rdos · Post by **rdos** » Mon Jul 22, 2024 3:19 am

vlad9486 wrote: ↑Mon Jul 22, 2024 2:57 am
rdos wrote: ↑Mon Jul 22, 2024 2:36 am This means that you child code can be interrupted in kernel space, and while holding critical kernel resources/locks that might bring down the whole system.
I cannot see any difference with classic scheduling. In classic OS child process is interrupted by timer and another thread runs instead. In my OS it is the same except the kernel does not decide which thread to run next, but always runs the parent and the parent decides.

There is a definite difference here since the kernel scheduler can implement locks that prevents scheduling at critical times.

vlad9486 wrote: ↑Mon Jul 22, 2024 2:57 am
How do you plan to implement waits and voluntary yields in the context?
This is a good question. If the task has to wait for an interrupt, its timer will be paused. There will be more possible return values along with "timeout", at least "waiting". The parent will get such a status and invoke something else (or call the syscall "yield" if there is nothing more to schedule). The status "waiting" means that the child is waiting for an interrupt. But, how does the corresponding parent know that its child has finished waiting for an interrupt and can proceed? This question remains.

It doesn't need to be waits for interrupts. A thread can be blocked on a mutex waiting for another thread to release it. In kernel space, a thread can be blocked on a kernel mutex.

iansjack · Post by **iansjack** » Mon Jul 22, 2024 3:25 am

vlad9486 wrote: ↑Mon Jul 22, 2024 2:59 am The fair scheduler gives each thread the same time slice. But process A spawns 2 threads and process B spawns 20 threads. Process B will get 10 times more process time. It that fair?

Isn't that what group scheduling in the CFS was designed to eliminate? Scheduling is somewhat more sophisticated than simply dividing CPU time by the number of threads in the system.

vlad9486 · Post by **vlad9486** » Mon Jul 22, 2024 4:43 am

rdos wrote: ↑Mon Jul 22, 2024 3:19 am There is a definite difference here since the kernel scheduler can implement locks that prevents scheduling at critical times.

rdos wrote: ↑Mon Jul 22, 2024 3:19 am It doesn't need to be waits for interrupts. A thread can be blocked on a mutex waiting for another thread to release it. In kernel space, a thread can be blocked on a kernel mutex.

Does this problem disappear if my kernel doesn't have a kernel mutex? Also. The kernel doesn't do any scheduling, memory allocation, or memory mapping. It only switches address spaces. I don't even think that the kernel needs a stack.

OSDev.org

New approach to scheduling (microkernel)

New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)

Re: New approach to scheduling (microkernel)