Including an unused variable corrupts the multiboot kernel

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
FrankRay78
Posts: 22
Joined: Fri Jan 05, 2024 10:10 am

Including an unused variable corrupts the multiboot kernel

Post by FrankRay78 »

Hello,

I've encountered a really strange bug in my nascent kernel development. Whilst PatienceOS is a C# bare metal kernel (nb. nothing close to an OS yet), the simplicity of the codebase and compilation to direct machine code means that it's nothing much more than the C barebones tutorial here.

Bootstrap Assembly: src
Linker template: src
Main function: src
Console struct: src
Build script: src

The checked-in code (above) builds and runs fine in QEMU. However, when I add a single line to the console struct (see below), a variable which is declared but never used/referenced, the kernel no longer boots in QEMU. Rather, the screen flashes as if the multiboot has been corrupted somehow.

Code: Select all

private byte foregroundColor = 0x0F;
I'm guessing it's something to do with the packing of the struct (see here) and/or the memory alignment in the linker template, perhaps.

To be honest, I'm a little out of my depth, but I would really appreciate any suggestions as to how I can practically troubleshoot the situation. I'm more interested in learning how to go about understanding how to fix this, rather than seeking a silver bullet.

Frank
Better software requirements can change the world. Better Software UK.
User avatar
iansjack
Member
Member
Posts: 4683
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Including an unused variable corrupts the multiboot kern

Post by iansjack »

I'd suggest that you run the kernel under a debugger. I'm not familiar with Windows debuggers, but gdb can run on Windows and works in cooperation with qemu. Ideally you debug in the high-level language, but I'm not familiar enough with C# to know how you could set that up. But the program is simple enough for you to just debug the assembly code directly.

Here's a link to gdb for Windows: https://rpg.hamsterrepublic.com/ohrrpgce/GDB_on_Windows

and using gdb with qemu: https://qemu-project.gitlab.io/qemu/system/gdb.html

Learning how to use a debugger is a very good discipline for OS development, and this provides an opportunity to gain that knowledge on a simple system.

I could say that all of this would be much easier if you were using C or Rust with a Linux development machine, but I'm guessing you don't want to hear that. ;)
MichaelPetch
Member
Member
Posts: 774
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Including an unused variable corrupts the multiboot kern

Post by MichaelPetch »

I don't have an appropriate build environment to build this. Would you be able to make available the kernel.elf file that works (prior to the change) and the kernel.elf that doesn't work? You could put them somewhere in your Github repo.
FrankRay78
Posts: 22
Joined: Fri Jan 05, 2024 10:10 am

Re: Including an unused variable corrupts the multiboot kern

Post by FrankRay78 »

MichaelPetch wrote:I don't have an appropriate build environment to build this. Would you be able to make available the kernel.elf file that works (prior to the change) and the kernel.elf that doesn't work? You could put them somewhere in your Github repo.
Thank you. I have placed both of them here: https://github.com/FrankRay78/PatienceO ... /Debugging

I load them in QEMU with the following command

Code: Select all

qemu-system-i386 -kernel <kernel filename>.elf
Better software requirements can change the world. Better Software UK.
FrankRay78
Posts: 22
Joined: Fri Jan 05, 2024 10:10 am

Re: Including an unused variable corrupts the multiboot kern

Post by FrankRay78 »

Thank you for the advice and links to the debugger, I will seriously look into this more.
iansjack wrote: I could say that all of this would be much easier if you were using C or Rust with a Linux development machine, but I'm guessing you don't want to hear that. ;)
Believe me, I really did try to get the toolchain working end to end on Linux. Explanation of my failed attempts are here: Commentary on the build environment. Something to come back to, in the fullness of time.
Better software requirements can change the world. Better Software UK.
MichaelPetch
Member
Member
Posts: 774
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Including an unused variable corrupts the multiboot kern

Post by MichaelPetch »

I ran QEMU with these options to see what exceptions and interrupts were occurring:

Code: Select all

qemu-system-i386 -kernel kernel-notworking.elf -d int -no-reboot -no-shutdown
I saw this:

Code: Select all

     0: v=06 e=0000 i=0 cpl=0 IP=0008:00201006 pc=00201006 SP=0010:00207fd4 env->regs[R_EAX]=00000000
EAX=00000000 EBX=00009500 ECX=00207ff0 EDX=00010511
ESI=00000000 EDI=00002000 EBP=00207fe8 ESP=00207fd4
EIP=00201006 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000cb2b4 00000027
IDT=     00000000 000003ff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00000014 CCD=00207fd4 CCO=SUBL
EFER=0000000000000000
v=06 is exception 0x06 (Invalid opcode). When I look at address 0x00201006 where the exception occurred I see this:

Code: Select all

201006:       0f 57 e4                xorps  %xmm4,%xmm4
This is an SSE instruction. I didn't look at your code but I suspect the issue is because SSE instructions are not enabled in the processor before executing this code. I guess the option is to build without SSE instructions (don't know if you can do that with C#) or enable SSE instruction support. You can find code to do that here: https://wiki.osdev.org/SSE. In the working version of the kernel SSE instructions aren't being used. The change you made seems to have prompted some optimizations that include using SSE/SIMD.

Because you don't have an IDT set up with proper exception handlers the processor ends up triple faulting and reboots when it encounters the Invalid Opcode.

Note: I didn't connect a debugger to determine what was at address 0x00201006. I dumped the contents of the ELF file with this command:

Code: Select all

objdump -Dx kernel-notworking.elf
FrankRay78
Posts: 22
Joined: Fri Jan 05, 2024 10:10 am

Re: Including an unused variable corrupts the multiboot kern

Post by FrankRay78 »

Thank you, MichaelPetch, that's incredibly helpful and very much appreciated. It's amazing seeing what you've done step by step.

For a moment there, I thought the solution would be trivial.

Code: Select all

ilc --help
indicates a number of instruction sets can be used:

Code: Select all

x86: base, sse, sse2, sse3, ssse3, sse4.1, sse4.2, avx, avx2, aes, etc


and

Code: Select all

qemu-system-i386 -cpu help | more
indicates the actual CPU can be specified:

Code: Select all

Available CPUs:
x86 486                   (alias configured by machine type)
x86 486-v1
x86 Broadwell             (alias configured by machine type)
x86 Broadwell-IBRS        (alias of Broadwell-v3)
x86 Broadwell-noTSX       (alias of Broadwell-v2)
etc
So... I explicitly enabled sse in the compilation, and also set the CPU to pentium3 (which has sse support)

Code: Select all

ilc --targetos windows --targetarch x86 --instruction-set base,sse --verbose kernel.ilexe -g -o kernel.obj --systemmodule kernel --map kernel.map -O 
...
qemu-system-i386 -cpu pentium3 -kernel kernel.elf
But alas, the issue still remains.

I'll need to look into this further. I suspect it's either the .Net AOT compiler, ilc, not respecting the command line switch, or my native Windows install of QEMU (which they mark as 'experimental').

Massive progress though, and thanks once again.
Better software requirements can change the world. Better Software UK.
MichaelPetch
Member
Member
Posts: 774
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Including an unused variable corrupts the multiboot kern

Post by MichaelPetch »

Your compile process is already emitting SSE (causing the issues when you added the extra ember to the structure). That is the problem. You want to be able to turn that off (not on). The issue revolves around the fact that GRUB doesn't guarantee anything about whether the processors SSE support is enabled when transferring control to your kernel. It is likely not enabled even on processors that support SSE.

If you want to enable SSE in your kernel you have to programmatically turn it on. Adding the appropriate code to loader.asm before your kernel main is called is where that should be done. https://wiki.osdev.org/SSE has code to do that. I haven't tested this (it is based on the Wiki code) but I think the logic is correct:

Code: Select all

_start:
    cli                   ; block interrupts
    mov esp, stack_space  ; set stack pointer

enablesse:
    ; Is SSE supported on this CPU?
    mov eax, 0x1
    cpuid
    test edx, 1<<25
    jnz .sse                   ; If SSE supported enable it.
.nosse:
    ; SSE not supported - do something like print an error and stop
    jmp $

.sse:
    ;now enable SSE and the like
    mov eax, cr0
    and ax, 0xFFFB             ; clear coprocessor emulation CR0.EM
    or ax, 0x2                 ; set coprocessor monitoring  CR0.MP
    mov cr0, eax
    mov eax, cr4
    or ax, 3 << 9              ; set CR4.OSFXSR and CR4.OSXMMEXCPT at the same time
    mov cr4, eax

    ; Call Main
    call __managed__Main

    ; Infinite loop
    hlt
    jmp $
If you choose not to disable SSE from your code generator, you will need your kernel to check for SSE *support* and if there is none do something (print an error) and go into an infinite loop informing the user that you need a CPU with SSE support. If there is SSE support in the processor then you need to enable the SSE instruction set.
Last edited by MichaelPetch on Mon Apr 08, 2024 4:45 pm, edited 1 time in total.
MichaelPetch
Member
Member
Posts: 774
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Including an unused variable corrupts the multiboot kern

Post by MichaelPetch »

FrankRay78 wrote:

Code: Select all

x86: base, sse, sse2, sse3, ssse3, sse4.1, sse4.2, avx, avx2, aes, etc
I assume (just a guess) "base" would be code without SSE. If you can change to that then you may find the code works. From what you are saying SSE code generation could be disabled using `--instruction-set base` (notice I removed SSE). If you can't turn off code generation with SSE instructions you'll have to enable SSE at run time with code similar to what I have in my previous post.

I don't believe the problem here is with QEMU. Use QEMU as you were originally invoking it.
FrankRay78
Posts: 22
Joined: Fri Jan 05, 2024 10:10 am

Re: Including an unused variable corrupts the multiboot kern

Post by FrankRay78 »

Apologies MichaelPetch, it was late and my response was poor.

I did try everything with the ilc to prevent the sse code from being emitted. The instruction-set switch with only ‘base’ didn’t work. I trawled and trawled GitHub issues and could not find a single bit of documentation whether this was intended, or not. It was at that point I decided to see if I could force sse to be always on, but ran foul of (what I thought) was QEMU not behaving.

Today I plan to log an issue with Microsoft regarding the ‘base’ switch, to confirm whether that should be allowing sse optimisations, and in the meantime, enable sse support in my bootstrapper, which you’ve kindly pointed out. Requiring that startup assembly was a gap in my understanding, even though I was reading about what cpus supported which versions of sse.

Update - An issue has been logged with the Microsoft runtime/AOT team, here: ilc.exe is emitting the sse instruction, xorps, with --instruction-set base
Better software requirements can change the world. Better Software UK.
FrankRay78
Posts: 22
Joined: Fri Jan 05, 2024 10:10 am

Re: Including an unused variable corrupts the multiboot kern

Post by FrankRay78 »

The answers given on the above GitHub issue I raised are clear and unambiguous, namely:
Firstly, win-x86 is unsupported. Secondly, the baseline is SSE2.
and also
The support for pre-SSE2 hardware was removed several years back and there is no interest in adding it back. We consider at least SSE, SSE2, CMOV, and CPUID as part of our baseline requirements.
Better software requirements can change the world. Better Software UK.
MichaelPetch
Member
Member
Posts: 774
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Including an unused variable corrupts the multiboot kern

Post by MichaelPetch »

So keep your build as it was before and modify loader.asm with the code I suggested. Hopefully if I haven't screwed anything up that should work. My code changes to loader.asm check if SSE is supported by the CPU. If it isn't supported it just goes into an infinite loop (you could add code to print an error to the display). If SSE is supported then I enable the SSE features. That should allow your kernel code to run even if it uses SSE.

Initializing the x87/FPU to a valid state probably isn't a bad idea either although that's not currently an issue for you. On some systems if you issue a x87 FPU instruction it may also cause an exception if not initialized ahead of time.
Octocontrabass
Member
Member
Posts: 5501
Joined: Mon Mar 25, 2013 7:01 pm

Re: Including an unused variable corrupts the multiboot kern

Post by Octocontrabass »

If your compiler always uses SSE2, that means you'll need to save/restore the SSE registers in every kernel entry/exit point instead of only during a context switch. The same applies to any other registers your compiler might use, but most examples you'll see were written with the assumption that the compiler only uses general-purpose registers.

Most Linux distros require i686+SSE2 at minimum, but the Linux kernel (usually) doesn't use SSE registers.
User avatar
iansjack
Member
Member
Posts: 4683
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Including an unused variable corrupts the multiboot kern

Post by iansjack »

What happens if you initial the variables in the constructor rather than in the structure definition? Perhaps C# is using an inbuilt memory move routine (which often uses SSE instructions) when there are multiple initialized variables in a structure definition.

If that is the case then, IMO, C# isn’t a suitable tool for OS development. It would be interesting to know whether the same problem exists if open-source tools, such as mono, are used.
FrankRay78
Posts: 22
Joined: Fri Jan 05, 2024 10:10 am

Re: Including an unused variable corrupts the multiboot kern

Post by FrankRay78 »

Dear MichaelPetch, your suggestion worked and I'm very grateful, here's the commit: Enable cpu support for sse in bootstrap. I'm also very inspired to take seriously my OS learning, given how your support has opened my eyes to this truly fascinating subject.

Dear iansjack, I tried the following:

Code: Select all

        private byte foregroundColor;

        public Console(int width, int height, FrameBuffer frameBuffer, byte foregroundColor = 0x0F)
        {
            this.width = width;
            this.height = height;
            this.frameBuffer = frameBuffer;
            this.foregroundColor = foregroundColor;
        }
and also without the default value specified on the constructor, both still result in the sse instruction being emitted.

I don't understand enough about how sse works, nor the memory move comments, and given the 32-bit AOT compiler isn't officially supported yet, I'm not sure what I can deduce from this. I'll read up some more, and probably inspect the generated IL (resulting in with/without sse) to see if that sheds any light.
Better software requirements can change the world. Better Software UK.
Post Reply