OSDev.org

Posted: **Sun Nov 17, 2024 1:16 pm**

Since my last post on this forum, I've come a long way, yet I'm again getting very frustrated with the IDT.

My kernel is modular, and the default modules are loaded from a boot-time initramfs. They can include a hook, which is called after loading like so:

Code: Select all

if (mod_info->entry != 0) {
  ((void (*)()) ((uint32_t) load_addr + mod_info->entry - CONTENT_OFFSET(mod_info)))();
}

It's somewhat ugly, but it's worked perfectly so far. I've also created a kernel device registry, with the following way of adding new stuff:

Code: Select all

#define REG_ENTRIES 65536

static struct reg_device devices[REG_ENTRIES];

size_t last_dev = 0;

void register_dev (struct reg_device dev) {
  devices[last_dev++] = dev;
}

The registry device has the following form:

Code: Select all

struct reg_device {
  enum proto proto;
  uint32_t ident_lo;
  uint32_t ident_hi;
  uint32_t pad;
};

For simplicity, I'm starting off with a PCI driver, since there's not much I could do without it either. The hook performs the standard mechanism of brute-force listing all 8192 bus-device combinations. It registers the valid devices it encounters with the main kernel by means of an interrupt via

Code: Select all

__asm__ __volatile__ ( "int $0xDE" : : "a"(ident_hi), "c"(PROTO_PCI_OLD), "d"(ident_lo) : );

with the following ISR:

Code: Select all

void isr_222 () {
  uint32_t eax;
  uint32_t ecx;
  uint32_t edx;
  struct reg_device dev;
  
  __asm__ ( "movl %%eax, %0\n\t"
	    "movl %%ecx, %1\n\t"
	    "movl %%edx, %2" : "=m"(eax), "=m"(ecx), "=m"(edx) : : );

  dev.proto = (enum proto) ecx;
  dev.ident_hi = eax;
  dev.ident_lo = edx;
  dev.pad = 0;

  register_dev(dev);
}

However, when I run my driver, it finds two devices, then causes a general protection fault (which in turn causes a double, then a triple, fault, then a full machine reboot). What's bamboozling me is that none of the following criteria listed on the wiki seem to apply:

- No segment-related errors, since the interrupt works the first time it's called and we don't change CS, SS or the GDT.
- We're in ring 0.
- We don't change CR0.
- We're in 32-bit mode, without paging yet.
- The reg_device structure should be 32-bit aligned.

It's also well-known that a GPF can occur from a nonexistent vector, which was my first suspicion but isn't the case here either.

Here's the evidence that it's a GPF, in the form of QEMU interrupt logs (minus SMIs):

Code: Select all

     0: v=de e=0000 i=1 cpl=0 IP=0008:002103b5 pc=002103b5 SP=0010:00304f50 env->regs[R_EAX]=12378086
EAX=12378086 EBX=002114d4 ECX=00000001 EDX=06000000
ESI=00000000 EDI=00210469 EBP=00304f98 ESP=00304f50
EIP=002103b5 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
[...]
     1: v=de e=0000 i=1 cpl=0 IP=0008:002103b5 pc=002103b5 SP=0010:00304f50 env->regs[R_EAX]=70008086
EAX=70008086 EBX=002114d4 ECX=00000001 EDX=06010008
ESI=00000000 EDI=00210469 EBP=00304f98 ESP=00304f50
EIP=002103b5 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
[...]
     2: v=0d e=06f2 i=0 cpl=0 IP=0008:002103b5 pc=002103b5 SP=0010:00304f50 env->regs[R_EAX]=70008086
EAX=70008086 EBX=002114d4 ECX=00000001 EDX=06010008
ESI=00000000 EDI=00210469 EBP=00304f98 ESP=00304f50
EIP=002103b5 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
[...]

0d is a GPF. de is the interrupt vector used by the modules. I also confirmed that it is going straight from int 0xDE to reset by single-stepping in GDB, where we're taken to the destination jumped to by the reset vector, 0x0000e05b (which seems to consist of a bunch of add %al, (%eax)'s).

Can anyone help figure out why this is happening?

Posted: **Sun Nov 17, 2024 1:53 pm**

restingwitchface wrote: ↑Sun Nov 17, 2024 1:16 pmIt registers the valid devices it encounters with the main kernel by means of an interrupt

You could use an ordinary function call if you give your modules some way to find the functions they'd need to call.

restingwitchface wrote: ↑Sun Nov 17, 2024 1:16 pm

Code: Select all

void isr_222 () {
  uint32_t eax;
  uint32_t ecx;
  uint32_t edx;
  struct reg_device dev;
  
  __asm__ ( "movl %%eax, %0\n\t"
	    "movl %%ecx, %1\n\t"
	    "movl %%edx, %2" : "=m"(eax), "=m"(ecx), "=m"(edx) : : );

Inline assembly doesn't work like that. If you want to pass arguments to a function, you need to follow the ABI.

Normally you'd do it by passing (a pointer to) a struct containing the saved CPU context and let the ISRs decide how to interpret it.

restingwitchface wrote: ↑Sun Nov 17, 2024 1:16 pmHowever, when I run my driver, it finds two devices, then causes a general protection fault

The general protection fault occurs when it attempts to call the interrupt for the second device, and the error code indicates a problem with the descriptor in the IDT. The fact that it was successful the first time confirms that you did have a valid IDT in the beginning. You probably have a memory corruption bug. Maybe a bad pointer somewhere? Maybe your stack grew too large?

restingwitchface wrote: ↑Sun Nov 17, 2024 1:16 pm(which seems to consist of a bunch of add %al, (%eax)'s)

It only looks like that because GDB doesn't handle segmentation correctly.

Posted: **Sun Nov 17, 2024 2:08 pm**

> You could use an ordinary function call
Sure, but this seems easier right now.

> If you want to pass arguments to a function, you need to follow the ABI.
Sorry, but how is this relevant? I'm deciding to pass arguments to the ISR by means of EAX, ECX and EDX, rather than on the stack (which the i686 SysV calling convention dictates). It's not like I'm accidentally pushing them on the stack and then reading junk into EAX, ECX and EDX.

> You probably have a memory corruption bug. Maybe a bad pointer somewhere? Maybe your stack grew too large?
I guessed something like that. I'd expect a stack overflow to cause #PF, and ESP is equal in the two interrupt logs. The bad pointer seems plausible, but I'm not sure where it would be, What I neglected to mention in the original post is that no weird behaviour (string corruption, assertions failing, etc.) occurs without calling the interrupt 0xDE (to be absolutely sure, I'd have to use UBSan or something):

Code: Select all

qemu-system-i386 -kernel pgos_kern.img -initrd pgos_initramfs.cpio -serial stdio
Successfully located initramfs.
.
Empty file or directory.
./pci.pml
Found PCI device: device ID 0x1237, vendor ID 0x8086, class code 0x06
Found PCI device: device ID 0x7000, vendor ID 0x8086, class code 0x06
Found PCI device: device ID 0x1111, vendor ID 0x1234, class code 0x03
Found PCI device: device ID 0x100e, vendor ID 0x8086, class code 0x02
Successfully loaded.
./vga.pml
Successfully loaded.

> It only looks like that because GDB doesn't handle segmentation correctly.
Alright.

Posted: **Sun Nov 17, 2024 2:31 pm**

restingwitchface wrote: ↑Sun Nov 17, 2024 2:08 pmSorry, but how is this relevant? I'm deciding to pass arguments to the ISR by means of EAX, ECX and EDX, rather than on the stack (which the i686 SysV calling convention dictates). It's not like I'm accidentally pushing them on the stack and then reading junk into EAX, ECX and EDX.

The compiler is free to insert code that does whatever it wants with EAX, ECX, and EDX before your inline assembly, so the result may still be junk! You really do need to follow the C ABI to pass arguments to a function written in C. Differences between the C ABI and the interrupt ABI can be handled by the assembly stub that wraps every ISR.

Although you can choose a different C ABI if you don't like System V. (And you can override it on individual functions, if you really need that level of control.)

restingwitchface wrote: ↑Sun Nov 17, 2024 2:08 pmI'd expect a stack overflow to cause #PF,

How? You can't have page faults without paging.

Posted: **Sun Nov 17, 2024 2:48 pm**

> The compiler is free to insert code that does whatever it wants with EAX, ECX, and EDX before your inline assembly, so the result may still be junk!

I was about to say that stack frame setup shouldn't touch those registers... yet, while testing this out, it magically works when I make my local variables in isr_222 global! Any idea why this could be? I was partially right, as the stack frame setup with local variables only touched ESP and EBP.

> You can't have page faults without paging.
You're right; I had recently read a paragraph describing kernel stack overflows, but that was in the context of their bootloader already setting up some paging and a stack.

Posted: **Sun Nov 17, 2024 3:05 pm**

restingwitchface wrote: ↑Sun Nov 17, 2024 2:48 pmit magically works when I make my local variables in isr_222 global! Any idea why this could be?

Without tracking down exactly what's overwriting your IDT, it's hard to say. Maybe it decreased stack usage or increased the space between the stack and the IDT enough to prevent the descriptor from getting overwritten?

restingwitchface wrote: ↑Sun Nov 17, 2024 2:48 pmI was partially right, as the stack frame setup with local variables only touched ESP and EBP.

Sure, but you can't rely on that. It will break in the future if you don't change it now.

OSDev.org

IDT troubles... part 2

IDT troubles... part 2

Re: IDT troubles... part 2

Re: IDT troubles... part 2

Re: IDT troubles... part 2

Re: IDT troubles... part 2

Re: IDT troubles... part 2