Spreading the kernel around the address space

cardboardaardvark · Post by **cardboardaardvark** » Tue Nov 19, 2024 10:13 pm

I'm trying to avoid a traditional higher-half kernel by having a small interrupt handling trampoline that is the only thing in common between the kernel and user address spaces. Here's an image of what I'm thinking about for the layout of the kernel and user address spaces (note, address space not to scale):

I've already got my kernel implemented using such an address space layout aside from the interrupt trampoline. I'm identity mapping the address space between 0x0 and the end of where the kernel was placed by the boot loader plus the memory used to hold the list of physical page pointers for the physical memory manager. The kernel heap is formed from everything after the identity mapped region.

I want user space programs to load at the start of the first page after 0x0 and to have a possible heap that goes all the way to the end of the address space minus the size of the interrupt trampoline. The trampoline is not yet implemented.

What I'm looking for is some help with is splitting up the kernel in it's address space. From my research I've discovered it is possible to tell the linker to take the contents of object files and store them in their own named ELF sections. This makes sense to me. It seems then that I could implement the interrupt trampoline in it's own files, tell the linker to place those object files into a section dedicated to the trampoline, align that section at a page boundary, then map the contents of that section to the very end of the address space.

What I'm not sure about though is how that would work in practice. Normally the program is split up into the text and bss sections but what I just described would merge the instructions and variables into one section. Is that going to cause a problem? As well I'm trying to park the end of the trampoline at the very end of the address space with out knowing exactly how large the section will be which means the linker isn't going to know the correct start address for the section after it has been mapped. I'm wondering if I can solve that problem by compiling the entire contents of the trampoline section as position independent code.

My question here is really about how do I make the C++ compiler and linker pull off this stunt. If this was covered in the wiki I apologize for the noise in the forum.

Thanks for any help and pointers. I've never tried to do something this advanced before.

iansjack · Post by **iansjack** » Wed Nov 20, 2024 3:20 am

If I read your diagrams completely, the address space of user programs overlaps that of the kernel. Assuming that you are using paging (and why wouldn't you?), doesn't this mean that every system call involves a reload of the page tables? That doesn't seem to be very efficient. Surely it's better to use a separate kernel address space.

(And if you are not using paging, presumably you have to reload the kernel for every system call, and then reload the program on return.)

rdos · Post by **rdos** » Wed Nov 20, 2024 6:17 am

The "public" part of the kernel must live in it's own reserved address space. The public part includes code part of syscalls, interrupts, physical memory handling, page tables and the scheduler. You could have a "system process" that handles background jobs, or things that can be linked using messaging similar to a microkernel approach.

rdos · Post by **rdos** » Wed Nov 20, 2024 8:31 am

iansjack wrote: ↑Wed Nov 20, 2024 3:20 am If I read your diagrams completely, the address space of user programs overlaps that of the kernel. Assuming that you are using paging (and why wouldn't you?), doesn't this mean that every system call involves a reload of the page tables? That doesn't seem to be very efficient. Surely it's better to use a separate kernel address space.

(And if you are not using paging, presumably you have to reload the kernel for every system call, and then reload the program on return.)

That would not be enough. User programs might pass pointers to kernel, and if paging is changed as part of the syscall, these parameters must first be copied (or mapped) into kernel space.

iansjack · Post by **iansjack** » Wed Nov 20, 2024 8:41 am

Was that not the point of the interrupt trampoline area?

Octocontrabass · Post by **Octocontrabass** » Wed Nov 20, 2024 11:33 am

cardboardaardvark wrote: ↑Tue Nov 19, 2024 10:13 pmI'm trying to avoid a traditional higher-half kernel

Any particular reason why you're avoiding higher-half specifically? (Not that you need a reason.)

cardboardaardvark wrote: ↑Tue Nov 19, 2024 10:13 pmby having a small interrupt handling trampoline that is the only thing in common between the kernel and user address spaces.

Many OSes already do something like this to work around security vulnerabilities.

cardboardaardvark wrote: ↑Tue Nov 19, 2024 10:13 pmI'm identity mapping the address space between 0x0 and the end of where the kernel was placed by the boot loader plus the memory used to hold the list of physical page pointers for the physical memory manager.

That's probably fine if you're using legacy BIOS, but what about UEFI? UEFI doesn't guarantee available memory at any specific addresses, so you might need to load your kernel at different physical addresses on different PCs.

cardboardaardvark wrote: ↑Tue Nov 19, 2024 10:13 pmNormally the program is split up into the text and bss sections but what I just described would merge the instructions and variables into one section. Is that going to cause a problem?

You don't need to merge everything together into one section, but if you do it anyway, the only problem it can cause is that you won't be able to assign different permissions to things that share the same page.

cardboardaardvark wrote: ↑Tue Nov 19, 2024 10:13 pmAs well I'm trying to park the end of the trampoline at the very end of the address space with out knowing exactly how large the section will be which means the linker isn't going to know the correct start address for the section after it has been mapped.

Usually this sort of trampoline code is written in assembly to minimize how much of the kernel is mapped at once. This has the added bonus of making the size of the trampoline much more predictable.

cardboardaardvark wrote: ↑Tue Nov 19, 2024 10:13 pmI'm wondering if I can solve that problem by compiling the entire contents of the trampoline section as position independent code.

If you link position-independent code into a static executable, the linker will perform optimizations that make the code no longer position-independent.

cardboardaardvark wrote: ↑Tue Nov 19, 2024 10:13 pmMy question here is really about how do I make the C++ compiler and linker pull off this stunt.

If the trampoline is entirely in its own object files, you can tell the linker to treat those files specially. For example:

Code: Select all

.hightext :
{
    trampoline.o(.text*)
}
.text :
{
    *(.text*)
}

Input sections are consumed by the first matching input section description, so the above example wouldn't work if the output sections were listed the other way around.

cardboardaardvark · Post by **cardboardaardvark** » Wed Nov 20, 2024 2:29 pm

iansjack wrote: ↑Wed Nov 20, 2024 3:20 am If I read your diagrams completely, the address space of user programs overlaps that of the kernel. Assuming that you are using paging (and why wouldn't you?), doesn't this mean that every system call involves a reload of the page tables? That doesn't seem to be very efficient. Surely it's better to use a separate kernel address space.

You read the diagrams correctly. That's the plan anyway. When coming out of user space the trampoline will install the kernel page directory and a few other tasks such as changing over to a kernel stack. Having to manually do page translation for pointers coming out of user space is not going to be great for performance as well. I'm not sure yet if this will wind up being a bad idea.

iansjack wrote: ↑Wed Nov 20, 2024 8:41 am Was that not the point of the interrupt trampoline area?

I'm pretty sure this is in reference to 'The "public" part of the kernel must live in it's own reserved address space.' and you are correct.

Octocontrabass wrote: ↑Wed Nov 20, 2024 11:33 am Any particular reason why you're avoiding higher-half specifically? (Not that you need a reason.)

Perhaps because I have no idea what I am doing. But really the motivation here is it absolutely feels wrong to have such a large portion of the kernel address space overlap with the user address space. I see it as a least-privilege violation even when accounting for the ability to tell the MMU to keep the kernel pages inaccessible from rings above 0.

Many OSes already do something like this to work around security vulnerabilities.

It's good to see my sysadmin gut has at least some idea of what it is doing. Of course my trampoline design brings with it performance issues and some pain in implementation passing data to/from user space. This is going to be an interesting experiment at least.

That's probably fine if you're using legacy BIOS, but what about UEFI? UEFI doesn't guarantee available memory at any specific addresses, so you might need to load your kernel at different physical addresses on different PCs.

Thank you for the heads up here. I'm really just cutting my teeth and decided to start with targeting i686. I figured wrapping my head around a turbocharged 386 running in Qemu is a nice simple place to start. On a lark I wanted to see if my kernel would actually boot on real hardware and found out fast my spare laptop doesn't even know how to boot legacy media. It did boot on my workstation though which nicely has a more flexible bios.

Doing the transition to x86_64 and UEFI and all that I'm sure is going to wind up with my head exploding, regrets, and swearing when I eventually get around to it.

You don't need to merge everything together into one section, but if you do it anyway, the only problem it can cause is that you won't be able to assign different permissions to things that share the same page.
...
If you link position-independent code into a static executable, the linker will perform optimizations that make the code no longer position-independent.
...
If the trampoline is entirely in its own object files, you can tell the linker to treat those files specially. For example:

Ahhhhhhhhhh thank you very much for the knowledge sharing. I've never had to worry about these kinds of details before.

I also realized after asking that if I tried to use relocatable code to move the trampoline around then I'd have to maintain and calculate function pointers so the identity mapped chunk of the kernel could call into the trampoline chunk or only interface with the trampoline through interrupts. Neither of those choices sounded particularly great. I think what I might do for a first pass is ask the linker to stick the trampoline around the end of the address space minus 32 MiB. That'll definitely hold for quite a while.

Do you know how "smart" GRUB/multiboot is going to be about loading an ELF binary with sections that have a very different address than other ones? My linker script currently starts off with a 2 MiB initial offset as suggested by the barebones tutorial. If I ask the linker to put sections at the end of the address space is multiboot going to see that in the ELF binary and attempt to stick it there? I'm wondering if I'm going to have to make the trampoline it's own ELF binary and bring it in with a multiboot module.

rdos · Post by **rdos** » Wed Nov 20, 2024 3:14 pm

If you want to target UEFI or modern hardware, which tends to support UEFI mainly, they all come as x86_64 implementations that cannot directly load a 32-bit OS image. It's possible to switch from long mode to protected mode, but you will need to use binary images or fix up ELF yourself.

The architecture seems similar to a microkernel design, but it doesn't offer all the advantages and still has the same disadvantages of address space switches.

Octocontrabass · Post by **Octocontrabass** » Wed Nov 20, 2024 5:04 pm

cardboardaardvark wrote: ↑Wed Nov 20, 2024 2:29 pmDoing the transition to x86_64 and UEFI and all that I'm sure is going to wind up with my head exploding, regrets, and swearing when I eventually get around to it.

Keep in mind you can have x64 with legacy BIOS or i386 with UEFI.

cardboardaardvark wrote: ↑Wed Nov 20, 2024 2:29 pmDo you know how "smart" GRUB/multiboot is going to be about loading an ELF binary with sections that have a very different address than other ones? [...] If I ask the linker to put sections at the end of the address space is multiboot going to see that in the ELF binary and attempt to stick it there?

GRUB uses the lowest and highest LMA in the program headers to figure out how much memory your binary needs. If the LMA for one section is far away from the others, GRUB will think your binary needs tons of memory and probably won't be able to load it.

GRUB doesn't use the VMA for loading, so you can specify a LMA that's convenient for GRUB and then use paging to map the section to its VMA.

Multiboot2 has a relocatable header tag. If you're willing to let go of identity mapping, you could use that tag to tell GRUB it's safe to load your kernel at different physical addresses, and then use paging to map it to the virtual address where you really want it. (You could also use it with an actual relocatable ELF binary, but GRUB doesn't handle ELF relocations, so you'd still have to change your startup code.)

cardboardaardvark · Post by **cardboardaardvark** » Wed Nov 20, 2024 5:56 pm

Octocontrabass wrote: ↑Wed Nov 20, 2024 5:04 pm GRUB uses the lowest and highest LMA in the program headers to figure out how much memory your binary needs. If the LMA for one section is far away from the others, GRUB will think your binary needs tons of memory and probably won't be able to load it.

GRUB doesn't use the VMA for loading, so you can specify a LMA that's convenient for GRUB and then use paging to map the section to its VMA.

This is good to hear. Thank you for the info. I had to look up LMA and VMA and found it nicely explained here with examples of linker scripts to control both of them: https://allthingsembedded.com/post/2020 ... er-script/ I hope that's a useful reference if anyone else stumbles on this post in the future.

cardboardaardvark · Post by **cardboardaardvark** » Wed Nov 27, 2024 2:36 pm

I've made some progress with implementing my (bad) idea. I've got the linker behaving for me with a script that lets me place specific files into their own ELF sections and place those sections at arbitrary places in the virtual address space. I am using the MEMORY feature of the linker to do it. I start the normal kernel at 1 megabyte in memory and carve out the last megabyte to be the shared virtual address space. Here's a simplified version to serve as an example:

Code: Select all

/* Kernel entry point */
ENTRY(boot)

MEMORY
{
    /* Place the kernel starting at 1 MiB. This will not work with UEFI. */
    KERNEL : ORIGIN = 1M, LENGTH = 4096M - 2M
    SHARED : ORIGIN = 4096M - 1M, LENGTH = 1M
}

SECTIONS
{
    /* Get the multiboot header as early as possible */
    .multiboot : ALIGN(4K)
    {
        *(.multiboot)

        _shared_start_physical = ALIGN(4K);
    } >KERNEL

    .shared_text : ALIGN(4K)
    {
        _shared_start_virtual = ALIGN(4K);

        trampoline.a(.text)

        _shared_end_virtual = ALIGN(4K);
    } >SHARED AT>KERNEL

    .text : ALIGN(4K)
    {
        _shared_end_physical = ALIGN(4K);

        *(.text)
    } >KERNEL

    .rodata : ALIGN(4K)
    {
        *(.rodata)
    } >KERNEL

    .data : ALIGN(4K)
    {
        *(.data)
    } >KERNEL

    .bss : ALIGN(4K)
    {
        *(COMMON)
        *(.bss)
    } >KERNEL

    _link_end = ALIGN(4K);
}

(edit: Provided improved version with symbols that can be used to find the physical and virtual addresses of the start and end of the SHARED sections.)

I wrote my trampoline in assembly and it only has text section contents which is why there is only a shared_text section going into the SHARED region. More sections can be added to the SHARED region if needed.

OSDev.org

Spreading the kernel around the address space

Spreading the kernel around the address space

Re: Spreading the kernel around the address space

Re: Spreading the kernel around the address space

Re: Spreading the kernel around the address space

Re: Spreading the kernel around the address space

Re: Spreading the kernel around the address space

Re: Spreading the kernel around the address space

Re: Spreading the kernel around the address space

Re: Spreading the kernel around the address space

Re: Spreading the kernel around the address space

GNU linker script