Page 1 of 1

Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 7:13 pm
by RayanMargham
Sometimes it 0xd's, sometimes its 0xe's at memcpy, sometimes it 0xes infinitely at the scheduler. i dont know what to do


try it yourself i dont understand whats happening
https://github.com/rayanmargham/NyauxKC

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 7:19 pm
by Octocontrabass
How do you know it's undefined behavior if you haven't found the code responsible for the exception?

Anyway, you can start by sharing more information about the CPU state when the first exception occurs. Perhaps the output from QEMU's "-d int" log?

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 7:26 pm
by RayanMargham
its ub because different exceptions happen every qemu boot.

here are some of the pastebins of some exceptions that can happen per boot
https://pastebin.com/JXErQRpb
https://pastebin.com/s4LAvey6

the exceptions are very random and make no sense at ALL

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 7:26 pm
by RayanMargham
ive been debugging this for 6 hours

i can tell you its very much UB. i starred at the disassembly so long and nothing is making sense

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 7:32 pm
by RayanMargham
This is an issue that won't be solved for weeks most likely. as there is some really difficult bug to track somewhere in the code thats causing this

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 7:52 pm
by Octocontrabass
What happens if two CPUs call kmalloc() at exactly the same time?

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 7:54 pm
by RayanMargham
i dont have a lock. so i dont know

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 7:56 pm
by RayanMargham

Code: Select all

spinlock_t mem_lock;
void* kmalloc(uint64_t amount)
{
	spinlock_lock(&mem_lock);
	if (amount > 1024)
	{
		void* him = kvmm_region_alloc(amount, PRESENT | RWALLOWED);
		memset(him, 0, amount);
		spinlock_unlock(&mem_lock);
		return him;
	}
	else
	{
#ifdef __SANITIZE_ADDRESS__
		void* him = slaballocate(amount + 256);
		memset(him + amount, 0xFD, 256);
		spinlock_unlock(&mem_lock);
		return him;

#else
		void* him = slaballocate(amount);
		memset(him, 0, amount);
		spinlock_unlock(&mem_lock);
		return him;
#endif
	}
}
void kfree(void* addr, uint64_t size)
{
	spinlock_lock(&mem_lock);
	if (size >> 63)
	{
		kprintf("kfree: memory corruption detected\n");
		spinlock_unlock(&mem_lock);
		__builtin_trap();
	}
	if (size > 1024)
	{
		kvmm_region_dealloc(addr);
		spinlock_unlock(&mem_lock);
	}
	else
	{
		slabfree(addr);
		spinlock_unlock(&mem_lock);
	}
}
adding a lock like this
still having UB issues, nothing really changed

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 7:59 pm
by RayanMargham
not UB issues anymore , adding a lock has made the behaviour consistent!

Code: Select all

arch_late_init(): CPU 9 is Online!
arch_late_init(): CPU 12 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 1 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 2 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 5 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 4 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

Page Fault! CR2 0xaUBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 17 is Online!
arch_late_init(): CPU 8 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 16 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

RIP is 0xffffffff800318ef. Error Code 0x0UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
arch_late_init(): CPU 11 is Online!
Page Fault! CR2 0xaUBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 3 is Online!
-> Function: schedd() -- 0xffffffff80031848UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
arch_late_init(): CPU 18 is Online!
arch_late_init(): CPU 14 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

-> Function: schedd() -- 0xffffffff80031848UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
-> Function: schedd() -- 0xffffffff80031848
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
-> Function: schedd() -- 0xffffffff80031848
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
arch_late_init(): CPU 10 is Online!
-> Function: sched() -- 0xffffffff80002ed8UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 8:00 pm
by Octocontrabass
Are there any other places where two CPUs might access the same data structure at the same time?

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 8:01 pm
by RayanMargham
I am unsure, they would not usually.

but i dont know what is causing this as it is NOT null??

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 8:08 pm
by Octocontrabass
Clearly it is null, otherwise UBSAN wouldn't be complaining. So when does it become null?

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 8:14 pm
by RayanMargham
it shouldnt be though because in arch_late_init i create it and put it in the segment register gs

Re: Kernel Having weird UB issues i don't understand whats going on

Posted: Mon Dec 23, 2024 10:16 pm
by RayanMargham
it was a far return meme with the asm code :D
solved now!!