Why and when would a kernel need own stack?

amn · Post by **amn** » Sun Mar 01, 2015 12:20 pm

Hi all,

I've been lurking on OSDev.org for quite a while now, skimming knowledge and talking to myself mostly

But now we are writing a simple microkernel at my university. I would like to add that I am taking the course voluntarily, because I genuinely am and have been interested in OS theory and implementation (also compilers), so it's not like I am forced to write an OS for my diploma.

The assistant teachers ask me in writing "Why do we need two stacks for processes?" (verbatim). I have dug into that and my conclusion so far is that we actually don't. We also do not need a separate stack for a kernel unless the kernel preempts itself. This is the root of my argument.

I figure, the only reason to have a dedicated stack per process is because the state of said stack may not be known at the time of preemption by the kernel (on timer interrupt for instance) - it may be non-empty ("dirty plates stacked up on the table", so to speak) and the process would thus depend on having exactly the same state when it is dispatched by the kernel later on again. There is not much to do about this stack concurrency inherent in a kernel switching between processes. So that's that, and it's a given. Fair enough.

The kernel we are writing has only one main thread of execution however - handling interrupts, syscalls - the usual. Now, obviously, whatever it pushes on a stack during these activities, will be removed from said stack one way or another, before a process state snapshot is restored and that process is dispatched. That's how stack works, right? Why would the kernel LEAVE something added to a stack when it effectively completes a procedure of sorts? Thus we can argue that it would be perfectly safe for the kernel to just use the stack of whatever process last preempted - the stack pointer of the process will be in the same state it was in when the process was preempted. No kernel plates on the table. Furthermore, if the same process is dispatched, the stack pointer register stored in its PCB will be reset to what it was anyhow.

Am I making sense, and if not, what crucial piece of the puzzle have I missed?

I think the assistants assume a particular style of implementation (they wrote the bulk of the code which we are tasked with "patching" into a compilable and working state) and try to guide me in that direction. I won't have a chance until tomorrow, to discuss with them and clarify what they mean with their question. I am simply intrigued by their supposed confidence and the way they pose their "question".

I am looking forward to read your educated opinions on this.

Rusky · Post by **Rusky** » Sun Mar 01, 2015 12:47 pm

How does the kernel know the process it just preempted didn't just point its stack pointer somewhere bad? Or if it's even using a stack?

iansjack · Post by **iansjack** » Sun Mar 01, 2015 1:00 pm

Consider a user program that runs wild and causes a stack overflow. This will lead to a page fault exception, which pushes information on the stack; but this can't be done because of the stack overflow. The result is that a buggy program crashes the whole OS. This is generally considered to be not a good thing. Hence you have a separate kernel stack which you guarantee will be OK (assuming your kernel is well behaved). You can generally assume that your kernel is (meant to be) well behaved but that a user program may not be.

amn · Post by **amn** » Sun Mar 01, 2015 1:09 pm

Very good points, thanks! Ok, I think we can pretty much agree that this is a very weighty factor indeed - security and stability.

So, N+1 stacks then, where N is amount of user-level processes running.

However, why would processes need 2 stacks each? The way I sort of understood the question from the teaching assistants, I think they imply something like 1 stack for the user-level process code and 1 stack for the process code when CPU is switched to ring 0, e.g. as part of a syscall. But if we follow my original counter argument to a degree, then we can still get away with just 1 process stack, used for user-mode only. When a syscall is entered and the CPU mode is switched to ring 0, the kernel will save the state in PCB, and switch the stack pointer to use its own stack. That's it.

Any reason to use 2N or 2N+1 stacks in the system, then?

cmdrcoriander · Post by **cmdrcoriander** » Sun Mar 01, 2015 2:06 pm

This thread might interest you. One quote of particular relevance:

XenOS wrote:Actually this is the reason why I think that kernel threads are a bit problematic in the "one kernel stack per CPU" model, and why I don't have them in my kernel anymore. To avoid stack overflows from nested interrupts, there are only few points in the kernel where interrupts are allowed while the kernel stack is in use. These points are outside any nested function levels (which would fill up the stack), so that the kernel stack is almost empty when an interrupt occurs.

Rusky · Post by **Rusky** » Sun Mar 01, 2015 4:05 pm

If you only allow one thread to be in the kernel at once, then you do only need one kernel stack (per CPU). But if you ever switch threads while in the kernel (blocking in the kernel, time slice expiration, interrupt for higher priority thread, etc.) then you need separate kernel stacks for each thread.

It's a tradeoff between how long you're willing to keep other threads/interrupts out of the kernel and how much memory you're willing to use. Smaller kernels like microkernels can usually get away with per-CPU kernel stacks, but monolithic kernels may perform better with per-thread kernel stacks.

chekwob · Post by **chekwob** » Mon Mar 23, 2015 8:17 am

EDIT: oh heh the thread is almost a month old, didn't notice at the time, sorry

If system calls don't utilize a separate stack, they'll leak tons of internal information such as local variables or function call arguments used inside of the kernel while handling the call. I've decided to come up with an example scenario to illustrate the issue. Imagine a system call named sys_login for associating your process with a specific user account when given the correct username and password:

Code: Select all

struct user { char *username; char *password; struct process *last_process; };

// some function for iterating over the list of users
extern struct user *userdb_getnext(struct user *)

// login as user (system call)
int sys_login(const char *username,const char *password)
{
    struct *user = NULL;
    while (user = userdb_getnext(user))
    {
        if (!strcmp_uk(username,user->username))
        {
            if (!strcmp_uk(password,user->password))
            {
                current_process->user = user;
                user->last_process = current_process;
                return SYS_OK;
            }
            else
            {
                // since user list is sorted and usernames
                // are unique, we can return immediately
                return SYS_WRONG_USERNAME_OR_PASSWORD;
            }
        }
    }
    return SYS_WRONG_USERNAME_OR_PASSWORD;
}

// some function to read a byte from user space
// assume it returns -1 if the address is kernel-only or unmapped
extern int user_readbyte(const void *);

// compare user-mode string with kernel-mode string
// similar to standard C strcmp
int strcmp_uk(const char *str1,const char *str2)
{
    int left;
    int right;
    do
    {
        left = user_readbyte(str1++);
        right = *(str2++);
        if (left == -1)
        {
            // user code passed a bad pointer, kill the process
            // assume this never returns
            kill(current_process,KILL_BAD_USER_POINTER));
        }
        if (left != right)
            return left - right;
    } while (left && right);
    return 0;
}

Right off the bat, sys_login is exploitable to see whether or not a given user account exists. (It's actually already vulnerable to a timing attack, but this one is much more convenient.)

Code: Select all

int main(int argc,char **argv)
{
    uintptr_t stack_access[1];
    sys_login("admin","garbage");
    if (stack_access[-5])
        printf("admin account exists\n");
    else
        printf("admin account doesn't exist\n");
}

This would work if your stack looked something like this:

So by reading out stack_access[-5] you're reading the value held in "user" from the system call when it finished. If the account doesn't exist, user is NULL and the stack beyond sys_login is actually stuff from userdb_getnext, whatever secrets that might hold. But if it does exist, you can look further down to -11 and -7 to see not only which character you got wrong, but also what the correct character was. This means you can pull someone's password and log in as them whenever you please.

Now the example's a bit contrived since nobody would handle passwords this way (and a compiler would likely optimize quite a bit of this to use registers instead), but you get the idea. When you return you should also clear out any necessary registers as well, for the same reason.

letli · Post by **letli** » Sat Jul 04, 2020 1:20 am

amn wrote:Very good points, thanks! Ok, I think we can pretty much agree that this is a very weighty factor indeed - security and stability.

So, N+1 stacks then, where N is amount of user-level processes running.

However, why would processes need 2 stacks each? The way I sort of understood the question from the teaching assistants, I think they imply something like 1 stack for the user-level process code and 1 stack for the process code when CPU is switched to ring 0, e.g. as part of a syscall. But if we follow my original counter argument to a degree, then we can still get away with just 1 process stack, used for user-mode only. When a syscall is entered and the CPU mode is switched to ring 0, the kernel will save the state in PCB, and switch the stack pointer to use its own stack. That's it.

Any reason to use 2N or 2N+1 stacks in the system, then?

nice question, correct me if I'm wrong.

1. to achieve Kernel Preemption.
2. to save parameters pass from user-space

bzt · Post by **bzt** » Sat Jul 04, 2020 8:03 am

amn wrote:However, why would processes need 2 stacks each? The way I sort of understood the question from the teaching assistants, I think they imply something like 1 stack for the user-level process code and 1 stack for the process code when CPU is switched to ring 0, e.g. as part of a syscall.

Or you could have a completely different approach. I have (nproc+2*ncpu) stacks for example. One for each process (obviously), and two for each CPU core. One for the "normal" exceptions and interrupts and one for debugging and NMI (I do not pass arguments to syscalls on the stack, just in registers). There are almost infinite possibilities with long mode and IST

The process stack is used to store local variables. When an interrupt happens, it uses a separate stack, for two reasons: first, user stack might be corrupt, and I want to serve interrupts reliably. Second, I don't want user processes to peek on the leftovers of kernel local variables. Now the separate debug and NMI stack is there for the same reason: a debug exception (int 3) or an NMI might happened in an ISR as well, and I want to serve those reliably too. Plus using a separate stack for debug exceptions simplifies things in my internal debugger (it does not mess up the exception stack, so it easy to debug page faults for example).

Cheers,
bzt

nullplan · Post by **nullplan** » Sat Jul 04, 2020 9:34 am

Note that this approach requires a more thoughtful approach to multitasking than is employed in the usual tutorials. Since there is only a single stack for syscalls and interrupts, syscalls must not block, or else other syscalls in other processes cannot run concurrently. Therefore, every syscall that would block now has to save process state in such a way that it can be restored later, and then schedule another process. This means the task structures have to be flexible enough to be able to say "this process is waiting for that event", where "that event" can be "read of sector X from disk Y" or "response on TCP stream" or whatever. It also means all system calls have to be written in that way. Essentially, you need to enumerate all possible things a process could be waiting for. Whereas, in the usual "kernel stack per task" approach, that is encoded implicitly on the kernel stack.

linguofreak · Post by **linguofreak** » Sun Jul 05, 2020 3:40 am

bzt wrote:
amn wrote:However, why would processes need 2 stacks each? The way I sort of understood the question from the teaching assistants, I think they imply something like 1 stack for the user-level process code and 1 stack for the process code when CPU is switched to ring 0, e.g. as part of a syscall.
Or you could have a completely different approach. I have (nproc+2*ncpu) stacks for example. One for each process (obviously), and two for each CPU core. One for the "normal" exceptions and interrupts and one for debugging and NMI (I do not pass arguments to syscalls on the stack, just in registers). There are almost infinite possibilities with long mode and IST

The process stack is used to store local variables. When an interrupt happens, it uses a separate stack, for two reasons: first, user stack might be corrupt, and I want to serve interrupts reliably. Second, I don't want user processes to peek on the leftovers of kernel local variables. Now the separate debug and NMI stack is there for the same reason: a debug exception (int 3) or an NMI might happened in an ISR as well, and I want to serve those reliably too. Plus using a separate stack for debug exceptions simplifies things in my internal debugger (it does not mess up the exception stack, so it easy to debug page faults for example).

Cheers,
bzt

I'm not sure the OP is around to hear you. The thread had been dead for 5 years.

AndrewAPrice · Post by **AndrewAPrice** » Sun Jul 05, 2020 4:50 am

I also prescribe to the one kernel stack per processor core rather than one kernel stack per process approach.

There's no right answer, but I think it simplifies things if you don't make interrupts preemptable.

Make your interrupts and syscalls super fast. If they need to do something time consuming, they can sleep the running thread (by scheduling the running thread and returning into a different thread's context), and create/wake a kernel thread to do the long running work.

If you don't want to do that, you could make your could use a pool of kernel stacks (and grow it on demand) and if you want to make your interrupt or syscall handler preemptable, you could grab a stack from the pool and set it as the next stack for that processor core before re-enabling interrupts.

nexos · Post by **nexos** » Sun Jul 05, 2020 5:01 am

I prefer the kernel stack per thread approach. The reason why is because it limits the chance of a stack overflow occurring in kernel land, which would result in a panic or triple fault in some cases. Plus, it is very easy to setup and use this approach.

AndrewAPrice · Post by **AndrewAPrice** » Sun Jul 05, 2020 9:29 am

How would a kernel stack per thread limit the chance of an overflow? Your interrupt/syscall handler would always enter with an empty stack.

bzt · Post by **bzt** » Mon Jul 06, 2020 4:53 am

linguofreak wrote:I'm not sure the OP is around to hear you. The thread had been dead for 5 years.

Wooops, you are totally correct. I haven't realized @letli was necroposting. Sorry about that.

Cheers,
bzt

OSDev.org

Why and when would a kernel need own stack?

Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?

Re: Why and when would a kernel need own stack?