Page Frame Manager in x86_64

sebihepp · Post by **sebihepp** » Wed Oct 16, 2024 1:08 pm

Dear Forum,

I am writing an OS for x86_64 just for fun and learning purpose. The kernel is booted via Limine and I have a question about memory management (Only if my understanding is correct).

As paging is required for long mode, Limine sets one up for me, maps my kernel to the virtual address and maps the entire first 4GiB into another address.

I am getting to a point now, that I need to implement my own paging and therfore also a memory manager. My first goal is to implement a manager that allocates physical addresses in 4KiB blocks - to conform with paging:

Easiest solution would be a bitmap, but in long mode the possible amount of memory makes this wasting huge amounts of memory.
Second thought was linked lists, similar to what a EFI Memory Map looks like. But if I store the data in my own allocated memory, i have the chicken egg problem - I need a page frame to store the initial free memory ranges, but to get one page frame I need to know the free ranges. Furthermore, as memory fragments, I need to expand this page frame by more.

So my thought was: "What if I store the information about a free memory range just in the first page frame of that range?"
I keep a pointer to the start, map it to a virtual address, load the data on that address and look, if it is enough. If no, get the next pointer and instead map this address to the same virtual address, loading the data again. If I find a big enough range, I return that address and put new data just to the page behind the returned address, updating the previous range->next pointer.

This can be done with just one virtual page mapped as needed. The Problem is only that I cannot know to where used memory belongs - if a process exits I have to trust it releases all its physical page frames. That may be a problem...

That do you think about it?

Best regards
Sebi

rdos · Post by **rdos** » Thu Oct 17, 2024 1:31 am

I don't think bitmaps are wasting a lot of memory. Eight 4k pages (32k) requires a one byte bitmap. That's 0.003%. The required structures for mapping memory waste a lot more (8 bytes per 4k frame), which is 0.2%.

However, the main reason to use bitmaps is that you can implement a lock-free physical memory manager, something that is impossible with linked lists.

thewrongchristian · Post by **thewrongchristian** » Thu Oct 17, 2024 2:41 am

sebihepp wrote: ↑Wed Oct 16, 2024 1:08 pm This can be done with just one virtual page mapped as needed. The Problem is only that I cannot know to where used memory belongs - if a process exits I have to trust it releases all its physical page frames. That may be a problem...

That do you think about it?

Best regards
Sebi

If you're writing the kernel, you're in control of each process and its memory, and when it exits, the kernel can clean up all its no longer required resources such as memory.

Typically, a process will have a virtual address space description, which will be something like a list of memory mappings. In those mappings, you should have a reference to every page currently mapped, so you can walk the mappings, walk each page in the mapping, removing references to physical pages as you go. For each page that was a unique reference (ie. Not a shared page) you know that page is now free, and it can be released back to the PFM.

A bitmap to manage available memory is not the best. As you say, it wastes memory, and large runs of available memory can be represented in the free memory itself. Also, finding memory to allocate is typically a linear scan of the bitmaps as well, though the scan size is not typically massive.

To be honest, most physical page allocations are probably done on a page by page basis, in response to memory mapping requests, so another option is to have a "stack" of free pages, with a pointer to the current "head" free page. In that "head" free page, you have a pointer to the next free page, and so on.

Freeing a memory page then becomes a matter of recording the current "head" free page in the page you're freeing, then storing the page number of the new free page in your "head" free page.

Allocating a memory page is then just the opposite. You use the "head" free page to map in the page to a temporary address, read the next free page and write that to the "head" free page, then unmap and use the newly allocated page.

Both simple, quick O(1) operations. And another benefit is that memory is reused in LIFO order, so there is still a chance that an allocated page contents are still in a CPU cache.

The downside being that allocating contiguous pages is a bit of a nightmare, as the free page stack will become a fragmented mess. But again, that tends to be not a problem, as the only likely use of contiguous memory is for data buffers to transfer to devices, and most devices these days provide scatter/gather DMA to handle non-contiguous buffers.

sebihepp · Post by **sebihepp** » Thu Oct 17, 2024 3:56 am

rdos wrote: ↑Thu Oct 17, 2024 1:31 am I don't think bitmaps are wasting a lot of memory. Eight 4k pages (32k) requires a one byte bitmap. That's 0.003%. The required structures for mapping memory waste a lot more (8 bytes per 4k frame), which is 0.2%.

However, the main reason to use bitmaps is that you can implement a lock-free physical memory manager, something that is impossible with linked lists.

I am in long mode and to be prepared for the future and the biggest memory amount my bitmap would need space for 2^64 Bytes. That's too big, even if the bitmap only uses 0.003%.

thewrongchristian wrote: ↑Thu Oct 17, 2024 2:41 am The downside being that allocating contiguous pages is a bit of a nightmare, as the free page stack will become a fragmented mess. But again, that tends to be not a problem, as the only likely use of contiguous memory is for data buffers to transfer to devices, and most devices these days provide scatter/gather DMA to handle non-contiguous buffers.

I think I like a split approach:
A bitmap for the lower 1MiB to be able to allocate continuous page frames for old ISA DMA devices
and a "stack" for the rest of the memory.

The Stack takes some time to setup initially, right? As one needs to go through all free page frames, setting the pointers to the next stack frame.

iansjack · Post by **iansjack** » Thu Oct 17, 2024 9:09 am

Your bitmap doesn’t have to be allocated at compile time. The OS can determine its size at boot and allocate just the memory required to account for actual physical memory.

nullplan · Post by **nullplan** » Thu Oct 17, 2024 9:20 pm

I use a memblock allocator, but in the end, that is the same as an RLE encoded bitmap allocator. Advantage of that one is that it doesn't take more time to initialize it when you add more memory. I always felt weird about bitmaps and linked lists for that reason: Double the memory means double the initialization time. That just feels wrong. With a memblock allocator, double the memory just means double the number I write into the "length" part of the block descriptor.

The allocation problem also persists here. I solve it by adding "enough" descriptors into the data section so that I can probably save all the reserved regions where my kernel is (mustn't overwrite the kernel, after all) before allowing the allocator to itself allocate memory for more descriptors. So far, 32 regions for the reserved and free lists each have been enough everywhere.

rdos · Post by **rdos** » Fri Oct 18, 2024 1:09 am

sebihepp wrote: ↑Thu Oct 17, 2024 3:56 am
rdos wrote: ↑Thu Oct 17, 2024 1:31 am I don't think bitmaps are wasting a lot of memory. Eight 4k pages (32k) requires a one byte bitmap. That's 0.003%. The required structures for mapping memory waste a lot more (8 bytes per 4k frame), which is 0.2%.

However, the main reason to use bitmaps is that you can implement a lock-free physical memory manager, something that is impossible with linked lists.
I am in long mode and to be prepared for the future and the biggest memory amount my bitmap would need space for 2^64 Bytes. That's too big, even if the bitmap only uses 0.003%.

For a list based allocator, you need to reserve linear memory for the largest amount of physical memory you want to support. Since long mode only can address 2^48 bytes, you obviously cannot support 2^64 bytes with a list-based allocator. For a bitmap allocator, you need to reserve linear memory (not physical) for the largest amount of physical memory you want to support. You can support 32768 times more physical memory with a bitmap allocator compared to a list based allocator. So, realistically, you can support 2^46 bytes of physical memory using a list based allocator (if you want to sacrifice 1/4 of your linear address space, but 2^61 bytes with a bitmap allocator (using the same reservation).

Obviously, this is why a 32-bit kernel needs to use a bitmap allocator. I reserve 0x2800000 (40MB) of linear address space for the physical memory allocator, and so I support 1.28 TB of physical memory. With a list based allocator, I couldn't support much more 1/4 of GB or so since the kernel linear address space is very limited.

Actually, a list based allocator for long mode cannot support much more physical memory than a bitmap allocator for 32-bit mode.

There are other issues with list based allocators than the needs for locks, like very hard or impossible to allocate more than one continuous 4k page. You might want to use 2M pages to save on page tables, and some PCI devices wants larger continuous blocks of physical memory than 4k. Also, you can make sure you catch incorrect physical memory frees easily (by checking if the page is allocated), while double-frees on a list will corrupt the list without you knowing it.

rdos · Post by **rdos** » Fri Oct 18, 2024 1:44 am

nullplan wrote: ↑Thu Oct 17, 2024 9:20 pm I use a memblock allocator, but in the end, that is the same as an RLE encoded bitmap allocator. Advantage of that one is that it doesn't take more time to initialize it when you add more memory. I always felt weird about bitmaps and linked lists for that reason: Double the memory means double the initialization time. That just feels wrong. With a memblock allocator, double the memory just means double the number I write into the "length" part of the block descriptor.

The allocation problem also persists here. I solve it by adding "enough" descriptors into the data section so that I can probably save all the reserved regions where my kernel is (mustn't overwrite the kernel, after all) before allowing the allocator to itself allocate memory for more descriptors. So far, 32 regions for the reserved and free lists each have been enough everywhere.

I fail to see how the time to initialize would not be increasing linearly with the size of physical memory. I also use blocks/descriptors in my physical memory allocator, and only allocate them when physical memory in the range are present. However, you still need to intialize the contents of your bitmap, and this tends to consume increasing time with increasing amount of physical memory.

nullplan · Post by **nullplan** » Fri Oct 18, 2024 9:29 am

rdos wrote: ↑Fri Oct 18, 2024 1:44 am I fail to see how the time to initialize would not be increasing linearly with the size of physical memory.[...]However, you still need to intialize the contents of your bitmap,[...]

Ah, a misunderstanding. A memblock descriptor consists of a base address and a length. No bitmap. The overall effect is equivalent to a run-length encoded bitmap, but the time to initialize the block descriptors is linear in the number of blocks, not their size. And most motherboards don't make the region above 4GB very complicated, so the time to initialize with 8GB equals that to do it with 16GB or 32GB. In those cases, merely the length part of the last descriptor is higher. And since allocation only means to add a block to the reserved list, and for the most part I can just append to the final reserved block descriptor (or prepend, I suppose), this means often allocation is just one subtraction and one addition (I always allocate the highest address fitting a request).

thewrongchristian · Post by **thewrongchristian** » Fri Oct 18, 2024 11:29 am

rdos wrote: ↑Fri Oct 18, 2024 1:09 am
sebihepp wrote: ↑Thu Oct 17, 2024 3:56 am
rdos wrote: ↑Thu Oct 17, 2024 1:31 am I don't think bitmaps are wasting a lot of memory. Eight 4k pages (32k) requires a one byte bitmap. That's 0.003%. The required structures for mapping memory waste a lot more (8 bytes per 4k frame), which is 0.2%.

However, the main reason to use bitmaps is that you can implement a lock-free physical memory manager, something that is impossible with linked lists.
I am in long mode and to be prepared for the future and the biggest memory amount my bitmap would need space for 2^64 Bytes. That's too big, even if the bitmap only uses 0.003%.
For a list based allocator, you need to reserve linear memory for the largest amount of physical memory you want to support. Since long mode only can address 2^48 bytes, you obviously cannot support 2^64 bytes with a list-based allocator. For a bitmap allocator, you need to reserve linear memory (not physical) for the largest amount of physical memory you want to support. You can support 32768 times more physical memory with a bitmap allocator compared to a list based allocator. So, realistically, you can support 2^46 bytes of physical memory using a list based allocator (if you want to sacrifice 1/4 of your linear address space, but 2^61 bytes with a bitmap allocator (using the same reservation).

Obviously, this is why a 32-bit kernel needs to use a bitmap allocator. I reserve 0x2800000 (40MB) of linear address space for the physical memory allocator, and so I support 1.28 TB of physical memory. With a list based allocator, I couldn't support much more 1/4 of GB or so since the kernel linear address space is very limited.

Actually, a list based allocator for long mode cannot support much more physical memory than a bitmap allocator for 32-bit mode.

I think you've misunderstood how this has been suggested to work.

The linked list is a list of hardware page numbers.

To allocate a page, a page in VM is set aside for the current head of the free page list, given by page number variable, the page number being the page to which this virtual address is mapped.

The free page so mapped contains the page number of the NEXT free page, so allocating a page is:

- Make a copy of the current first free page number.
- Read the next page from the current mapped first free page, and set the first free page number to that page number, and remap the virtual address to that page number.
- Return the old first free page number as the now allocated page.

You only ever need to map one virtual page to maintain the linked list, and the linked list doesn't use virtual addresses at all.

But, there is the obvious drawback that this probably isn't very SMP friendly, as there will be TLB shoot downs flying all over the place, perhaps necessitating a per-CPU list of free pages that can be refreshed when required from a central pool.

rdos wrote: ↑Fri Oct 18, 2024 1:09 am There are other issues with list based allocators than the needs for locks, like very hard or impossible to allocate more than one continuous 4k page. You might want to use 2M pages to save on page tables, and some PCI devices wants larger continuous blocks of physical memory than 4k. Also, you can make sure you catch incorrect physical memory frees easily (by checking if the page is allocated), while double-frees on a list will corrupt the list without you knowing it.

All valid points.

A buddy allocator will resolve all these, at the cost of a little more complexity.

Perhaps a hybrid approach could work? Central pool with buddy allocator, then a per CPU stack of single page allocations.

For the common case, getting a single page, first go to the per-CPU cache of free pages in a linked list.

If that is empty, go to the central pool.

The common case will be using locally mapped virtual addresses, and a CPU-local list of free pages, so no locking or TLB shootdown would be required when allocating and freeing single pages from/to the linked list, just a single local TLB invalidation for the updated mapping.

rdos · Post by **rdos** » Fri Oct 18, 2024 1:20 pm

It was my impression that people would typically combine a list allocator with mapping all physical memory to make it easier to access physical memory in memory schedules of PCI devices. I suppose these things doesn't need to be connected, but only mapping a single 4k page doesn't seem very effective. Long mode has enough linear memory to map physical memory in today's typical computer, at least.

thewrongchristian · Post by **thewrongchristian** » Fri Oct 18, 2024 3:49 pm

rdos wrote: ↑Fri Oct 18, 2024 1:20 pm It was my impression that people would typically combine a list allocator with mapping all physical memory to make it easier to access physical memory in memory schedules of PCI devices. I suppose these things doesn't need to be connected, but only mapping a single 4k page doesn't seem very effective. Long mode has enough linear memory to map physical memory in today's typical computer, at least.

But I've just noticed the topic specifically mentions x86_64, so you're probably right.

Meh, maybe I'm just old school. I'm still targetting 32-bit x86, and so have a limited kernel address space (use just top 256MB, 0xf0000000-0xffffffff) but tested with 3GB physical memory (I don't support PAE yet).

I'm personally more interested in older or smaller computers, so I'm targetting the sort of machines in scale that I grew up with, so we're talking of 16MB being considered a luxury. I'm aiming to be able to scale down, as well as designing to be able to scale up (more done the former, I can run in about 2MB with a small initrd.)

rdos · Post by **rdos** » Mon Oct 21, 2024 1:30 am

thewrongchristian wrote: ↑Fri Oct 18, 2024 3:49 pm Meh, maybe I'm just old school. I'm still targetting 32-bit x86, and so have a limited kernel address space (use just top 256MB, 0xf0000000-0xffffffff) but tested with 3GB physical memory (I don't support PAE yet).

So do I. I reserve 1GB for kernel, and support both 32-bit paging and PAE.

thewrongchristian wrote: ↑Fri Oct 18, 2024 3:49 pm I'm personally more interested in older or smaller computers, so I'm targetting the sort of machines in scale that I grew up with, so we're talking of 16MB being considered a luxury. I'm aiming to be able to scale down, as well as designing to be able to scale up (more done the former, I can run in about 2MB with a small initrd.)

Indeed. I wish I had an operational 386SX with 4MB of RAM to run tests on (similar to the initial one I had when I started with OS development). It should be possible to boot and run some rudimentary applications on it.

WumbologyMajor · Post by **WumbologyMajor** » Thu Jan 09, 2025 4:55 pm

I'm not sure how much more efficient it is than a normal bitmap allocator, but my hobby project (still in early stages) uses a pretty straightforward region-based bitmap where every byte corresponds to one page table's worth of physical memory. Each bit represents a subregion, and they vary in size from 4 pages to 256, with allocations bigger than that either taking up the whole table or even spilling over into the next one. The structure's still pretty big, but it fits easily within the kernel heap space, so no need to hardcode anything. At the end of the day, page tables are going to take up more space than the bitmap is, so the region-based mode is just to save time searching the bitmap for an available region.

OSDev.org

Page Frame Manager in x86_64

Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64

Re: Page Frame Manager in x86_64