Kernel Having weird UB issues i don't understand whats going on

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
RayanMargham
Member
Member
Posts: 59
Joined: Tue Jul 05, 2022 12:37 pm

Kernel Having weird UB issues i don't understand whats going on

Post by RayanMargham »

Sometimes it 0xd's, sometimes its 0xe's at memcpy, sometimes it 0xes infinitely at the scheduler. i dont know what to do


try it yourself i dont understand whats happening
https://github.com/rayanmargham/NyauxKC
Octocontrabass
Member
Member
Posts: 5568
Joined: Mon Mar 25, 2013 7:01 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by Octocontrabass »

How do you know it's undefined behavior if you haven't found the code responsible for the exception?

Anyway, you can start by sharing more information about the CPU state when the first exception occurs. Perhaps the output from QEMU's "-d int" log?
RayanMargham
Member
Member
Posts: 59
Joined: Tue Jul 05, 2022 12:37 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by RayanMargham »

its ub because different exceptions happen every qemu boot.

here are some of the pastebins of some exceptions that can happen per boot
https://pastebin.com/JXErQRpb
https://pastebin.com/s4LAvey6

the exceptions are very random and make no sense at ALL
RayanMargham
Member
Member
Posts: 59
Joined: Tue Jul 05, 2022 12:37 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by RayanMargham »

ive been debugging this for 6 hours

i can tell you its very much UB. i starred at the disassembly so long and nothing is making sense
RayanMargham
Member
Member
Posts: 59
Joined: Tue Jul 05, 2022 12:37 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by RayanMargham »

This is an issue that won't be solved for weeks most likely. as there is some really difficult bug to track somewhere in the code thats causing this
Octocontrabass
Member
Member
Posts: 5568
Joined: Mon Mar 25, 2013 7:01 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by Octocontrabass »

What happens if two CPUs call kmalloc() at exactly the same time?
RayanMargham
Member
Member
Posts: 59
Joined: Tue Jul 05, 2022 12:37 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by RayanMargham »

i dont have a lock. so i dont know
RayanMargham
Member
Member
Posts: 59
Joined: Tue Jul 05, 2022 12:37 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by RayanMargham »

Code: Select all

spinlock_t mem_lock;
void* kmalloc(uint64_t amount)
{
	spinlock_lock(&mem_lock);
	if (amount > 1024)
	{
		void* him = kvmm_region_alloc(amount, PRESENT | RWALLOWED);
		memset(him, 0, amount);
		spinlock_unlock(&mem_lock);
		return him;
	}
	else
	{
#ifdef __SANITIZE_ADDRESS__
		void* him = slaballocate(amount + 256);
		memset(him + amount, 0xFD, 256);
		spinlock_unlock(&mem_lock);
		return him;

#else
		void* him = slaballocate(amount);
		memset(him, 0, amount);
		spinlock_unlock(&mem_lock);
		return him;
#endif
	}
}
void kfree(void* addr, uint64_t size)
{
	spinlock_lock(&mem_lock);
	if (size >> 63)
	{
		kprintf("kfree: memory corruption detected\n");
		spinlock_unlock(&mem_lock);
		__builtin_trap();
	}
	if (size > 1024)
	{
		kvmm_region_dealloc(addr);
		spinlock_unlock(&mem_lock);
	}
	else
	{
		slabfree(addr);
		spinlock_unlock(&mem_lock);
	}
}
adding a lock like this
still having UB issues, nothing really changed
RayanMargham
Member
Member
Posts: 59
Joined: Tue Jul 05, 2022 12:37 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by RayanMargham »

not UB issues anymore , adding a lock has made the behaviour consistent!

Code: Select all

arch_late_init(): CPU 9 is Online!
arch_late_init(): CPU 12 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 1 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 2 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 5 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 4 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

Page Fault! CR2 0xaUBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 17 is Online!
arch_late_init(): CPU 8 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 16 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

RIP is 0xffffffff800318ef. Error Code 0x0UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
arch_late_init(): CPU 11 is Online!
Page Fault! CR2 0xaUBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

arch_late_init(): CPU 3 is Online!
-> Function: schedd() -- 0xffffffff80031848UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
arch_late_init(): CPU 18 is Online!
arch_late_init(): CPU 14 is Online!UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

-> Function: schedd() -- 0xffffffff80031848UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')

Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
-> Function: schedd() -- 0xffffffff80031848
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
-> Function: schedd() -- 0xffffffff80031848
Page Fault! CR2 0xa
RIP is 0xffffffff800318ef. Error Code 0x0
arch_late_init(): CPU 10 is Online!
-> Function: sched() -- 0xffffffff80002ed8UBSAN: type_mismatch @ src/sched/sched.c:95:11 (member access within NULL pointer of type 'struct per_cpu_data')
Octocontrabass
Member
Member
Posts: 5568
Joined: Mon Mar 25, 2013 7:01 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by Octocontrabass »

Are there any other places where two CPUs might access the same data structure at the same time?
RayanMargham
Member
Member
Posts: 59
Joined: Tue Jul 05, 2022 12:37 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by RayanMargham »

I am unsure, they would not usually.

but i dont know what is causing this as it is NOT null??
Octocontrabass
Member
Member
Posts: 5568
Joined: Mon Mar 25, 2013 7:01 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by Octocontrabass »

Clearly it is null, otherwise UBSAN wouldn't be complaining. So when does it become null?
RayanMargham
Member
Member
Posts: 59
Joined: Tue Jul 05, 2022 12:37 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by RayanMargham »

it shouldnt be though because in arch_late_init i create it and put it in the segment register gs
RayanMargham
Member
Member
Posts: 59
Joined: Tue Jul 05, 2022 12:37 pm

Re: Kernel Having weird UB issues i don't understand whats going on

Post by RayanMargham »

it was a far return meme with the asm code :D
solved now!!
Post Reply