RISC-V which is a relatively new ISA did not choose to implement recursive page tables in its HW page table walker. To me this is astonishing as it has been a known and appreciated trick for 30+ years now. ARM which didn't from the beginning implemented this realized this, and later implemented it in its LPAE variants also now found in Aarch64. RISC-V virtual memory design is set and now it's no way back.
Why did RISC-V do this even if it has been well known for so many years?
Am I missing something with the design, where non recursive page tables offers advantages like free bits?
RISC-V non recursive pagetables
Re: RISC-V non recursive pagetables
It is a known and appreciated trick in the niche community of hobby OS developers. Mainstream operating systems, as far as I know at least, don't use it. They all use a linear map, at least for a part of the address space, and switch out the mappings on demand for the rest.OSwhatever wrote: ↑Sun Feb 23, 2025 8:58 am To me this is astonishing as it has been a known and appreciated trick for 30+ years now.
You have to see that 64-bit ISAs offer so much address space that there is no problem just dedicating even a quarter of address space to a linear map, and then you still have a quarter left over for the logical map of the kernel. And the combination of 32-bit CPUs and lots of memory (which in this case is 1GB or more) is extremely rare.
Besides, recursive mapping algorithms are typically really bad at doing things like demand paging. You can only change the currently loaded address space, so if you run low on memory and would like to swap a page to disk, you have to load the address space it belongs to first so you can change it. But the pages you swap out are almost by definition old pages that haven't been touched in a while, so you poison your cache when you do this.
Carpe diem!
Re: RISC-V non recursive pagetables
I can't brag, that I've understood the wording in wiki's "Recursive paging" paragraph, but if this term is about when (on the simplest example of 2 level ia32 paging) the PD table has the same virtual address as one of the PT, then my guess, that riscv made the decision for the same sake as 32 arm did, to "be different". There is not only no advantages in purposefully making table entries/descriptor formats of different level incompatible, but, imho, it's plain stupid as arm32 has proven. Also, even though I didn't check for mainstream OSs situation on this currently, but I know, that at least 32 bit Windows XP does use this sane approach. You can see PD VA belongs to the PT range. Basically, that page physically is both the PD for the whole space and the PT that maps the map. Is it "recursive mapping"? Then Windows uses it.
nullplan, it's interesting what you said about page outing in demand paging. I feel I definitely am missing something, but how do you expect to tell MMU that a page is not present without clearing the Present bit of a table entry of the page at all?
nullplan, it's interesting what you said about page outing in demand paging. I feel I definitely am missing something, but how do you expect to tell MMU that a page is not present without clearing the Present bit of a table entry of the page at all?
-
- Member
- Posts: 598
- Joined: Mon Jul 05, 2010 4:15 pm
Re: RISC-V non recursive pagetables
The recursive page tables usually applies for only mapping the page table itself in virtual memory and no demand paging is usually ever used for the page table itself.nullplan wrote: ↑Sun Feb 23, 2025 10:01 amBesides, recursive mapping algorithms are typically really bad at doing things like demand paging. You can only change the currently loaded address space, so if you run low on memory and would like to swap a page to disk, you have to load the address space it belongs to first so you can change it. But the pages you swap out are almost by definition old pages that haven't been touched in a while, so you poison your cache when you do this.
Re: RISC-V non recursive pagetables
No, you do need to clear the present bit of that page table entry, that is correct. But with recursive paging, you do that by loading CR3 with the necessary value for the old virtual space, then clearing the present bit, then loading CR3 with the value for the task currently running. With a linear map, you simply clear the bit without reloading CR3, because you can just locate the page table entry.
BTW, if that wasn't enough of a drawback, all algorithms I have seen so far that add to a recursive page table will link uninitialized pages into the page tables before clearing them out (because you can't access the page before linking it in, and linking it in to a temporary place first to clear it out would be more work). Unfortunately, this means that for a brief period the CPU can load the TLB with garbage, which is often enough not cleared out.
Carpe diem!
Re: RISC-V non recursive pagetables
But with the approach you described, you have to have maps of all processes in the public kernel address space part (mapped the same for all processes), what means, that if you have say 10000 processes, they will consume a lot of the space opening yet another possibility to denial of service. it can be a scalability issue. on the other hand, placing page tables of the private part of the process space in the private part of the process map, doesn't have this drawback.
plus, not only this approach requires more complex algorithms and data to organize and find the maps, I mean, not just a VA pointer to the highest level of the map, but with all the levels - if it's something simpler, it would consume more space, if it's more complex, it would be slower. But also, wouldn't this approach result in the same if not worse cache and TLB littering? After all a process, whose pages are about to be gotten out is still a dormant process, and accessing those structures that will give you the process maps and the maps themselves still would be accessing to something, that hasn't been accessed for a significant time and thus has been thrown away from caches?
plus, not only this approach requires more complex algorithms and data to organize and find the maps, I mean, not just a VA pointer to the highest level of the map, but with all the levels - if it's something simpler, it would consume more space, if it's more complex, it would be slower. But also, wouldn't this approach result in the same if not worse cache and TLB littering? After all a process, whose pages are about to be gotten out is still a dormant process, and accessing those structures that will give you the process maps and the maps themselves still would be accessing to something, that hasn't been accessed for a significant time and thus has been thrown away from caches?
Re: RISC-V non recursive pagetables
Linux makes the linear map on 32-bit platforms 768 MB in size. That can fit an awful lot of page tables. And in 64-bit mode there is no issue at all, since all of address space is available in that part anyway.zaval wrote: ↑Tue Feb 25, 2025 8:01 pm But with the approach you described, you have to have maps of all processes in the public kernel address space part (mapped the same for all processes), what means, that if you have say 10000 processes, they will consume a lot of the space opening yet another possibility to denial of service. it can be a scalability issue. on the other hand, placing page tables of the private part of the process space in the private part of the process map, doesn't have this drawback.
But yes, in the end it is possible to run out of memory. But that is nothing new. Sometimes memory requests cannot be fulfilled.
It is truly maddening, the complexity of adding a constant to the physical address. And you have the PA of the other levels available in the page table, anyway.
Not particularly, no. Depends on how you implement your page cache, of course, but even clearing a "present" bit requires you to touch all four levels of the 64-bit paging structure, that is still only four cache lines added you could do without, and up to four TLBs. But you don't drop the entire cache and TLB as with loading CR3.
Carpe diem!
- Demindiro
- Member
- Posts: 111
- Joined: Fri Jun 11, 2021 6:02 am
- Libera.chat IRC: demindiro
- Location: Belgium
- Contact:
Re: RISC-V non recursive pagetables
I briefly considered recursive mappings before dismissing them for making inter-process memory copies impractical and DMA awkward.
I use an identity mapping with which you can just walk the page table for any address space yourself and use physical (+ constant offset) addresses directly. It simplifies things a lot.
You also can use 2MiB and 1GiB hugepages for the identity mapping, which will most definitely improve TLB performance.
As nullplan said, virtual address space on 64-bit is way larger than the physical amount of memory:
- 2^12 * (2^9)^3 = 2^39 = 512GiB (Sv39)
- 2^12 * (2^9)^4 = 2^48 = 256TiB (Sv48, PML4)
- 2^12 * (2^9)^5 = 2^57 = 128PiB (Sv57, PML5)
A quarter of those is 128GiB, 64TiB and 32PiB respectively. Do you really need more?
RV32I might be more awkward, especially since Sv32 actually supports 34-bit addresses for some reason. In practice, I don't expect any 32-bit processor produced today to be used with more than 1GiB of RAM. For high-performance applications that may actually need the extra memory it makes more sense to have 64 bit registers for the larger address space + faster 64-bit scalar processing.
(aside: if you're bothered by cache pressure: you don't need to use 64 bit pointers everywhere, use base pointer + 32 bit offset. The JVM does this for up to 32GiB heaps (8 byte blocks)).
IMO recursive page tables are a neat trick but nothing more.
I use an identity mapping with which you can just walk the page table for any address space yourself and use physical (+ constant offset) addresses directly. It simplifies things a lot.
You also can use 2MiB and 1GiB hugepages for the identity mapping, which will most definitely improve TLB performance.
As nullplan said, virtual address space on 64-bit is way larger than the physical amount of memory:
- 2^12 * (2^9)^3 = 2^39 = 512GiB (Sv39)
- 2^12 * (2^9)^4 = 2^48 = 256TiB (Sv48, PML4)
- 2^12 * (2^9)^5 = 2^57 = 128PiB (Sv57, PML5)
A quarter of those is 128GiB, 64TiB and 32PiB respectively. Do you really need more?
RV32I might be more awkward, especially since Sv32 actually supports 34-bit addresses for some reason. In practice, I don't expect any 32-bit processor produced today to be used with more than 1GiB of RAM. For high-performance applications that may actually need the extra memory it makes more sense to have 64 bit registers for the larger address space + faster 64-bit scalar processing.
(aside: if you're bothered by cache pressure: you don't need to use 64 bit pointers everywhere, use base pointer + 32 bit offset. The JVM does this for up to 32GiB heaps (8 byte blocks)).
IMO recursive page tables are a neat trick but nothing more.
-
- Member
- Posts: 598
- Joined: Mon Jul 05, 2010 4:15 pm
Re: RISC-V non recursive pagetables
Makes me wonder if Alpha and initially MIPS was right having the TLB management in software. Then you can have whatever data structure you want. Another possibility is to add a small core with embedded memory that handles the TLB misses. This core would be so small that you potentially could clock it higher than the main cores.