SOLVED: Task switching race conditions

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
techdude17
Posts: 22
Joined: Fri Dec 23, 2022 1:06 pm

SOLVED: Task switching race conditions

Post by techdude17 »

Hi all,

I use local APIC timer interrupts to preempt usermode tasks (ISR 123). However, I noticed that when switching to a usermode task, the CPU core that catches it deadlocks and the local APIC IRR line remains at 123. I quickly figured out that this was because RFLAGS did not have the IF bit set after switching tasks, but I'm not sure what's wrong with my code.

The local APIC interrupt handler does the EOI for the local APIC first, then jumps to the IRQ handler. If the request came from usermode, a reschedule test is performed, and if the process is out of its timeslice it is preempted.

However, this usermode process gets preempted multiple times on BSP, but not when any other core catches it. I'm confused on why it's working on BSP but not other cores.

Any help is appreciated. Codebase link: https://github.com/sasdallas/Hexahedron (feel free to rip into me :oops: )
Last edited by techdude17 on Sun Mar 02, 2025 11:55 am, edited 2 times in total.
Octocontrabass
Member
Member
Posts: 5722
Joined: Mon Mar 25, 2013 7:01 pm

Re: Interrupts not being enabled when switching task context

Post by Octocontrabass »

Why does your multitasking look like you based it on setjmp/longjmp instead of doing things the easy way?

Why do you send the local APIC EOI in the assembly interrupt stub?

Why can't kernel-mode processes be rescheduled?
techdude17
Posts: 22
Joined: Fri Dec 23, 2022 1:06 pm

Re: Interrupts not being enabled when switching task context

Post by techdude17 »

1. Not actually sure... My original scheduler was based off the setjmp/longjmp principle, but I think that this might be a better thing to try. Thanks!

2. Why not? It makes it so I don't have to worry about sending the EOI anywhere else.

3. This is knew and it was a deliberate decision - I like the idea that kthreads should be able to run until they're ready to yield. I'm open to changing this if the philosophy is stupid - this is my first scheduler after all :oops:
Octocontrabass
Member
Member
Posts: 5722
Joined: Mon Mar 25, 2013 7:01 pm

Re: Interrupts not being enabled when switching task context

Post by Octocontrabass »

techdude17 wrote: Sun Feb 23, 2025 8:30 pm2. Why not? It makes it so I don't have to worry about sending the EOI anywhere else.
Because now you need to have two copies of that code instead of only one. (Although you actually have a third copy that doesn't seem to be used anywhere.)
techdude17 wrote: Sun Feb 23, 2025 8:30 pm3. This is knew and it was a deliberate decision - I like the idea that kthreads should be able to run until they're ready to yield. I'm open to changing this if the philosophy is stupid - this is my first scheduler after all :oops:
Having a way to stop kernel threads from being preempted is reasonable. Having no way to preempt kernel threads is a bit questionable; usually that sort of thing is a side effect of not following the traditional one-kernel-stack-per-thread design instead of being specifically chosen.
techdude17
Posts: 22
Joined: Fri Dec 23, 2022 1:06 pm

Re: Interrupts not being enabled when switching task context

Post by techdude17 »

Sure yeah, I'll take a look at adding preemption to my kernel threads. I'm not entirely in the mood to rewrite my scheduler though so I'd like to weigh the pros and cons of following the guide you linked.

In the mean time, even though I have to use two code slices the acknowledgement does work - it's just that interrupts aren't being reenabled after task switch
Octocontrabass
Member
Member
Posts: 5722
Joined: Mon Mar 25, 2013 7:01 pm

Re: Interrupts not being enabled when switching task context

Post by Octocontrabass »

techdude17 wrote: Mon Feb 24, 2025 7:57 amI'm not entirely in the mood to rewrite my scheduler though so I'd like to weigh the pros and cons of following the guide you linked.
There's a lot going on in that guide, and while it's mostly good advice, you don't have to do everything exactly the way the guide does it. (For example, I prefer to keep the stack-switching assembly code as simple as possible. The caller can deal with stuff like CR3.)
techdude17 wrote: Mon Feb 24, 2025 7:57 amIn the mean time, even though I have to use two code slices the acknowledgement does work - it's just that interrupts aren't being reenabled after task switch
I've run out of things to check by looking at code, so I guess it's time to take a closer look with a debugger. First thing's first: how did you check RFLAGS on the task that refuses to be interrupted?

Since this is a usermode task that got preempted by an interrupt, restoring the saved context should direct the CPU to an IRETQ instruction that restores RFLAGS from the stack, which should set the interrupt flag. I could see this failing if the saved context is corrupted somehow (e.g. two threads using the same kernel stack) or if restoring the saved context doesn't resume execution inside the interrupt handler where it was saved. What is the saved RFLAGS value right before the thread begins running on an AP? Does that saved RFLAGS value get restored by an IRETQ instruction?

You can also get weird behavior from trying to use the legacy PICs and the APICs at the same time, but that wouldn't affect RFLAGS.
techdude17
Posts: 22
Joined: Fri Dec 23, 2022 1:06 pm

Re: Interrupts not being enabled when switching task context

Post by techdude17 »

Well, the way the scheduler is works is by setjmping at process_yield() during an IRQ. When the task is switched to, it resumes at process_yield and returns. For the LAPIC this goes back into common interrupt handler which restores RFLAGS to whatever it was when the IRQ happened.

I'm not sure if I need to acknowledge the IRQ for the local APIC from the 8259 PIC as well as the APIC itself. Normally I rely on acknowledging the PIC to reset the IF flag in RFLAGS but it seems like I have to do this myself? I read on the wiki to disable the PIC if using the APIC but it didn't clarify whether it was referring to the local or I/O (didn't seem to make a difference last I checked but I can check again)
Octocontrabass
Member
Member
Posts: 5722
Joined: Mon Mar 25, 2013 7:01 pm

Re: Interrupts not being enabled when switching task context

Post by Octocontrabass »

techdude17 wrote: Mon Feb 24, 2025 3:41 pmNormally I rely on acknowledging the PIC to reset the IF flag in RFLAGS
Huh? Why would acknowledging the PIC have any effect on RFLAGS?
techdude17
Posts: 22
Joined: Fri Dec 23, 2022 1:06 pm

Re: Interrupts not being enabled when switching task context

Post by techdude17 »

Octocontrabass wrote: Mon Feb 24, 2025 4:36 pm
techdude17 wrote: Mon Feb 24, 2025 3:41 pmNormally I rely on acknowledging the PIC to reset the IF flag in RFLAGS
Huh? Why would acknowledging the PIC have any effect on RFLAGS?
Sorry, misspoke. I know the PIC doesn't reenable IF bit and that's it's done with IRET. Pretty sure I just forgot to reenable interrupts in my codebase somewhere - saw that this is done in unlock_scheduler on the guide you linked, just have to figure out where to put it.

Other than that, if you have any other feedback I'd love to hear it - your comments have been very helpful in my design :D
thewrongchristian
Member
Member
Posts: 441
Joined: Tue Apr 03, 2018 2:44 am

Re: Interrupts not being enabled when switching task context

Post by thewrongchristian »

techdude17 wrote: Tue Feb 25, 2025 3:39 pm
Octocontrabass wrote: Mon Feb 24, 2025 4:36 pm
techdude17 wrote: Mon Feb 24, 2025 3:41 pmNormally I rely on acknowledging the PIC to reset the IF flag in RFLAGS
Huh? Why would acknowledging the PIC have any effect on RFLAGS?
Sorry, misspoke. I know the PIC doesn't reenable IF bit and that's it's done with IRET. Pretty sure I just forgot to reenable interrupts in my codebase somewhere - saw that this is done in unlock_scheduler on the guide you linked, just have to figure out where to put it.

Other than that, if you have any other feedback I'd love to hear it - your comments have been very helpful in my design :D
I had a "functional" userspace for some time before I realised I'd disabled interrupts whenever in user space!

In my case, I'd just pushed the bootstrap eflags onto the user stack along with the return address to start init process at, and iret to get it going. As fork then just duplicates the user state, the cleared interrupt flag just propagated to all my user processes.

D'oh!
Octocontrabass
Member
Member
Posts: 5722
Joined: Mon Mar 25, 2013 7:01 pm

Re: Interrupts not being enabled when switching task context

Post by Octocontrabass »

techdude17 wrote: Tue Feb 25, 2025 3:39 pmPretty sure I just forgot to reenable interrupts in my codebase somewhere
If interrupts are already enabled when the user process gets interrupted, they will be re-enabled by the IRETQ instruction the next time the user process runs. You only need to enable interrupts when you first create the user mode context, which you appear to be doing already. (You really should initialize the user RFLAGS to a constant instead of copying the kernel RFLAGS. It's also a good idea to initialize all of the user-accessible registers...)
techdude17
Posts: 22
Joined: Fri Dec 23, 2022 1:06 pm

Re: Interrupts not being enabled when switching task context

Post by techdude17 »

Octocontrabass wrote: Tue Feb 25, 2025 9:14 pm
techdude17 wrote: Tue Feb 25, 2025 3:39 pmPretty sure I just forgot to reenable interrupts in my codebase somewhere
If interrupts are already enabled when the user process gets interrupted, they will be re-enabled by the IRETQ instruction the next time the user process runs. You only need to enable interrupts when you first create the user mode context, which you appear to be doing already. (You really should initialize the user RFLAGS to a constant instead of copying the kernel RFLAGS. It's also a good idea to initialize all of the user-accessible registers...)
After a bit more triaging I found that lapic_timer_irq does interrupt the usermode process with RFL set to 0x200282 (IF bit set), but I noticed interesting behavior when I added a log statement after process_yield() and it didn't print. yield should return the next time the process gets picked (after which the lapic irq returns).

The core that got the process did actually get to usermode code somehow, too (according to GDB).

Also, yeah, saw that. It's in a list of tiny notes I keep on me - will make sure to add that after this starts working
techdude17
Posts: 22
Joined: Fri Dec 23, 2022 1:06 pm

Re: Interrupts not being enabled when switching task context

Post by techdude17 »

Figured out what was wrong with this. It was a race condition caused by scheduler_update() rescheduling the process before process_yield could save its context. There's still a few bugs left though and I'm getting some crashes and full locks - will update this with triage of those..

Another weird thing I noticed is that for some reason on I386 the scheduler runs much faster than on x86_64. The same configuration of the local APIC is being used but it's making me worried that x86_64 isn't properly handling interrupts or something.
techdude17
Posts: 22
Joined: Fri Dec 23, 2022 1:06 pm

Re: Interrupts not being enabled when switching task context

Post by techdude17 »

I have not been able to triage any of these failures, sadly. The scheduler seems to still have some race condition but I don't know where and how I should flush it out. Everything seems to have locks and things seem to be okay

UPDATE: It was another race condition, where process_yield added the process to the queue and the current core couldn't switch to the next thread (and switch stacks) fast enough
Post Reply