Confused about context switch

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
JAAman
Member
Member
Posts: 879
Joined: Wed Oct 27, 2004 11:00 pm
Location: WA

Re: Confused about context switch

Post by JAAman »

KrotovOSdev wrote: I think the problem is stack overflow. When interrupt handler is called it calls resched() which leads to creating a stack frame. If I'm right (but i maybe not) I have somehow to bypass it. How can I do this?
1) Why are you using CLI? this is dangerous, and should never be done. Instead, use a interrupt gate (which automatically disables interrupts for you), otherwise, it is possible to receive an interrupt after calling your handler, but before executing that first instruction (the CLI) -- this can lead to stack overflow

2) Why do you have an STI instruction? The STI instruction should never be used. Ever. There is no place (beyond your bootloader) when an STI instruction should be used. Again, you should be using an interrupt gate which automatically disables interrupts when it is called (and will automatically restore them as part of the IRET). -- this can cause stack overflow
Octocontrabass
Member
Member
Posts: 5501
Joined: Mon Mar 25, 2013 7:01 pm

Re: Confused about context switch

Post by Octocontrabass »

Code: Select all

sched_time_handler:
Interrupt handlers should not start with CLI. If you need interrupts disabled, use an interrupt gate. If you're not using an interrupt gate, interrupts may arrive before the CLI and overflow your stack.

You push EBP but there's no corresponding pop, so the IRET does not pop the correct return address.

You call a C function without saving the registers that may be clobbered by a C function and without clearing the direction flag.

STI before IRET does nothing. IRET pops EFLAGS from the stack, and the stored EFLAGS value determines whether interrupts will be enabled.

Code: Select all

kernel_pf_handler:
Exception handlers should not start with CLI. If you need interrupts disabled, use an interrupt gate. If you're not using an interrupt gate, interrupts may arrive before the CLI and overflow your stack.

You pop the error code into EBX, overwriting the interrupted program's state.

You push EBX without a corresponding pop, causing POPAD and IRET to pop the wrong values from the stack.

You call a C function without clearing the direction flag.

STI before IRET does nothing.
KrotovOSdev wrote:I think the problem is stack overflow.
Your code has several problems. It's hard to say which one is causing the crash without debugging.
KrotovOSdev wrote:When interrupt handler is called it calls resched() which leads to creating a stack frame.
Rescheduling should not involve creating anything, just switching from one task to another. (And you probably should not switch tasks on every timer interrupt!)
KrotovOSdev wrote:If I'm right (but i maybe not) I have somehow to bypass it. How can I do this?
If you don't want to switch tasks on every timer interrupt, keep track of how many timer interrupts have arrived since the last time you've switched tasks.
rdos
Member
Member
Posts: 3270
Joined: Wed Oct 01, 2008 1:55 pm

Re: Confused about context switch

Post by rdos »

JAAman wrote:
KrotovOSdev wrote: I think the problem is stack overflow. When interrupt handler is called it calls resched() which leads to creating a stack frame. If I'm right (but i maybe not) I have somehow to bypass it. How can I do this?
1) Why are you using CLI? this is dangerous, and should never be done. Instead, use a interrupt gate (which automatically disables interrupts for you), otherwise, it is possible to receive an interrupt after calling your handler, but before executing that first instruction (the CLI) -- this can lead to stack overflow

2) Why do you have an STI instruction? The STI instruction should never be used. Ever. There is no place (beyond your bootloader) when an STI instruction should be used. Again, you should be using an interrupt gate which automatically disables interrupts when it is called (and will automatically restore them as part of the IRET). -- this can cause stack overflow
Disagree. OSes that do the whole scheduling process with interrupts disabled end up getting poor interrupt latency. A better design is to only disable interrupts when really necessary, primarily in parts of the IRQ and in spinlocks. Other than that, interrupts should be enabled. That means that the scheduler need locks, and task switches must be postponed until all IRQs are handled.

OTOH, in a multicore OS, sti/cli should only be used in spinlocks and as parts of IRQs. They are no good for protecting code from reentrance problems.
rdos
Member
Member
Posts: 3270
Joined: Wed Oct 01, 2008 1:55 pm

Re: Confused about context switch

Post by rdos »

Octocontrabass wrote:If you don't want to switch tasks on every timer interrupt, keep track of how many timer interrupts have arrived since the last time you've switched tasks.
Timers should not be built by counting tics in IRQs. The PIT can be programmed in one-shot mode and so can be used as a timer. Timers is what you should use for preemption. Other alternative timers is the APIC timer and HPET.
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

Octocontrabass wrote:

Code: Select all

sched_time_handler:
Interrupt handlers should not start with CLI. If you need interrupts disabled, use an interrupt gate. If you're not using an interrupt gate, interrupts may arrive before the CLI and overflow your stack.

You push EBP but there's no corresponding pop, so the IRET does not pop the correct return address.

You call a C function without saving the registers that may be clobbered by a C function and without clearing the direction flag.

STI before IRET does nothing. IRET pops EFLAGS from the stack, and the stored EFLAGS value determines whether interrupts will be enabled.
I've tried to push EBP to pass it as a function parameter to free stack frame manually but it didn't help.
Of course, I've deleted all STI and CLI instructions, looks I got the Interrupt gate idea wrong.
Octocontrabass wrote:

Code: Select all

kernel_pf_handler:
You pop the error code into EBX, overwriting the interrupted program's state.

You push EBX without a corresponding pop, causing POPAD and IRET to pop the wrong values from the stack.
How can I pass error code as a function parameter?
Octocontrabass wrote:You call a C function without clearing the direction flag.
Do I have to clear DF or what? Or just push EFLAGS (I've just added this).
Octocontrabass wrote: Your code has several problems. It's hard to say which one is causing the crash without debugging.

Rescheduling should not involve creating anything, just switching from one task to another. (And you probably should not switch tasks on every timer interrupt!)

If you don't want to switch tasks on every timer interrupt, keep track of how many timer interrupts have arrived since the last time you've switched tasks.[]
It does not crash but throws exception. How can I not to create anything if it calls C functions and when interrupt happens, CPU puts some values on the stack? I switch tasks every 50 ms (or every 5 timer IRQs).
Maybe the problem with the second task. I have only kernel and idle task now so maybe I should load drivers first?
Octocontrabass
Member
Member
Posts: 5501
Joined: Mon Mar 25, 2013 7:01 pm

Re: Confused about context switch

Post by Octocontrabass »

KrotovOSdev wrote:I've tried to push EBP to pass it as a function parameter to free stack frame manually but it didn't help.
Why are you trying to free the stack frame? You need that stack frame to resume the thread.
KrotovOSdev wrote:How can I pass error code as a function parameter?
You can do something like this:

Code: Select all

push [esp+32]
call function_name_here ; void function_name_here( uint32_t error_code );
add esp, 4
But then you only have the error code and nothing else. Usually you want access to most of the values on the stack, so you'd push a pointer to the stack and define a struct that matches your stack layout:

Code: Select all

push esp
call function_name_here ; void function_name_here( struct stack_frame * context );
add esp, 4
KrotovOSdev wrote:Do I have to clear DF or what? Or just push EFLAGS (I've just added this).
You need to clear DF. You don't need to push EFLAGS, the CPU automatically pushes it when entering an interrupt handler, and IRET pops it.
KrotovOSdev wrote:It does not crash but throws exception.
Close enough. You can use the values on the stack to examine what the CPU was doing when the exception occurred.
KrotovOSdev wrote:How can I not to create anything if it calls C functions and when interrupt happens, CPU puts some values on the stack?
I don't understand. When an interrupt happens, the CPU puts some values on the stack, and then you use IRET to remove those values and return to the program. That happens with or without task switching.
KrotovOSdev wrote:Maybe the problem with the second task. I have only kernel and idle task now so maybe I should load drivers first?
You don't need drivers. As long as both tasks can run individually without causing exceptions, you should be able to switch between them without exceptions.
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

Octocontrabass wrote:Why are you trying to free the stack frame? You need that stack frame to resume the thread.
OK, I need. I thought it can overflow the stack.
Octocontrabass wrote:

Code: Select all

push esp
call function_name_here ; void function_name_here( struct stack_frame * context );
add esp, 4
I've added this and now I can see all registers. Nice! Thank you

Done
Octocontrabass wrote:
KrotovOSdev wrote:Maybe the problem with the second task. I have only kernel and idle task now so maybe I should load drivers first?
You don't need drivers. As long as both tasks can run individually without causing exceptions, you should be able to switch between them without exceptions.
Looks like I got it wrong. My "best" scheduling algorithm is priority-based. So it never preempts kernel task until it yields the CPU. It turns out that kernel task switches to itself. Maybe it causes stack overflow.
Octocontrabass
Member
Member
Posts: 5501
Joined: Mon Mar 25, 2013 7:01 pm

Re: Confused about context switch

Post by Octocontrabass »

KrotovOSdev wrote:It turns out that kernel task switches to itself. Maybe it causes stack overflow.
Performing a task switch that doesn't switch to a different task is a waste of time, but it shouldn't overflow the stack.
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

Octocontrabass wrote:
KrotovOSdev wrote:It turns out that kernel task switches to itself. Maybe it causes stack overflow.
Performing a task switch that doesn't switch to a different task is a waste of time, but it shouldn't overflow the stack.
I know it but my main task is to perform a context switching and load some drivers.
Trying to fix stack overflow day (unknown)...
Octocontrabass
Member
Member
Posts: 5501
Joined: Mon Mar 25, 2013 7:01 pm

Re: Confused about context switch

Post by Octocontrabass »

How do you know it's a stack overflow and not something else?
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

Octocontrabass wrote:How do you know it's a stack overflow and not something else?
If I use QEMU with "-d int" argument, my OS causes many many page faults.
This were happening before I rewrote ISR entry points in assembly. Now it throws GPF exception on the second timer interrupt. I guess that the problem may be with saving registers. I'm working on it.
Octocontrabass
Member
Member
Posts: 5501
Joined: Mon Mar 25, 2013 7:01 pm

Re: Confused about context switch

Post by Octocontrabass »

KrotovOSdev wrote:If I use QEMU with "-d int" argument, my OS causes many many page faults.
How do you know the page faults are caused by a stack overflow? What does your page fault handler do in response to the page faults?
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

Octocontrabass wrote: How do you know the page faults are caused by a stack overflow? What does your page fault handler do in response to the page faults?
I reserve some memory for kernel needs and, of course, for stack. Page fault may notice kernel that it needs more memory. Or it also may signal about wrong EIP but I think that wrong EIPl causes another exception exception, isn't it?
Octocontrabass
Member
Member
Posts: 5501
Joined: Mon Mar 25, 2013 7:01 pm

Re: Confused about context switch

Post by Octocontrabass »

KrotovOSdev wrote:Page fault may notice kernel that it needs more memory.
Or it means there is a bug that causes your kernel to access an invalid address.
KrotovOSdev wrote:Or it also may signal about wrong EIP but I think that wrong EIPl causes another exception exception, isn't it?
There's no exception for wrong EIP because the CPU doesn't know when EIP is wrong. You might get a page fault when EIP is wrong, if EIP points to a page that isn't executable.
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

Octocontrabass wrote: Or it means there is a bug that causes your kernel to access an invalid address.

There's no exception for wrong EIP because the CPU doesn't know when EIP is wrong. You might get a page fault when EIP is wrong, if EIP points to a page that isn't executable.
It may be a bug, I know. I am trying to understand the problem.
According to an address which is near to 1MB, it should be mapped. But now it just causes GPF.

Even gdb doesn't work with my kernel. When I add -g option for GCC and NASM and use GCC remote debugger with qemu, it says that there is no symbol table so I can't set breakpoints...
Post Reply