Cache and devices

devc1 · Post by **devc1** » Mon Jul 31, 2023 2:35 pm

Do I need to map the page as uncached each time when accessing a device or I can just map it as write through and the cpu will recognize if any updates are written to memory via the device ?

Or the best, can I work with write back with devices ?

Octocontrabass · Post by **Octocontrabass** » Mon Jul 31, 2023 4:18 pm

I'm assuming you're talking about x86.

The actual memory type depends on the memory type selected in the page tables and the memory type selected by the MTRRs. Firmware is supposed to initialize the MTRRs so that the page tables can use WB everywhere, but some firmware is buggy. Some combinations of page and MTRR values behave differently on different CPUs, so choose carefully.

Ordinary RAM should always be WB. Devices that perform DMA participate in the cache coherency protocol, so you don't need to worry about it. (This is not true on many other architectures!)

Most MMIO needs to be UC. The other memory types allow the CPU to insert or remove reads and writes, which will confuse most devices.

MMIO that accesses your display adapter's linear framebuffer should usually be WC. This memory type is specifically optimized for the typical framebuffer usage.

I'm not aware of any situations where WT or WP are useful, but I'm sure Intel had some reason to include them...

thewrongchristian · Post by **thewrongchristian** » Mon Jul 31, 2023 4:59 pm

devc1 wrote:Do I need to map the page as uncached each time when accessing a device or I can just map it as write through and the cpu will recognize if any updates are written to memory via the device ?

Or the best, can I work with write back with devices ?

Most modern OS have a file cache, with file vnode and file offset as the cache key. Writes to files are written to the page cache at this level, operating in write back mode, even before getting to the filesystem code.

The filesystem itself communicates with the block device using a separate buffer mechanism.

The problem is co-ordinating between the cache and the buffer layer. They operate in separate address spaces, the file cache using the vnode/file offset as the cache key, which is translated by the filesystem to the device offset.

When a device containing a filesystem is mounted, the OS should prevent access to the device in a manner that could undermine the filesystem. So it might be that the device would allow, for example, read-only access to the device, but writes to the device would be blocked.

The benefit of this cache arrangement is that the virtual memory system and virtual file system can be unified, with file reads/writes and virtual memory page resolution using the same cache, keeping open files and memory mapped files synchronised.

Older UNIX implementations, such as SVR2, implemented the cache at the device block level, underneath the filesystem. Any file access would have to go through the filesystem before being resolved to the device buffer cache (if cached.) It's also harder to keep file based access and memory mapped access coherent with a device buffer cache, as memory mapped files redundantly copy data from the filesystem to a page cache (though I'm not certain SVR2 actually had memory mapped files.)

What are you actually trying to achieve?

*edit*

Sigh, just read Octocontrabass response, then re-read your initial question. Ignore me, I misread what you were asking.

devc1 · Post by **devc1** » Mon Jul 31, 2023 6:25 pm

So from what I understand I should use : (X86_64)

Cache disable on most MMIO (Like PCI, or ACPI ?)
Write back on DMA Devices (most pci devices)
Write combine on the frame buffer

Another question, should I use CACHE Disable with Spinlock where there are multiple processors. I guess yes on SMP systems and no need on CCNuma Systems ?

Octocontrabass · Post by **Octocontrabass** » Mon Jul 31, 2023 9:32 pm

devc1 wrote:Cache disable on most MMIO (Like PCI, or ACPI ?)
Write back on DMA Devices (most pci devices)
Write combine on the frame buffer

Mostly? Make sure you're clear on the distinction between MMIO and RAM. MMIO usually needs to be UC, RAM should always be WB.

devc1 wrote:Another question, should I use CACHE Disable with Spinlock where there are multiple processors. I guess yes on SMP systems and no need on CCNuma Systems ?

Spinlocks are in RAM, which means they should be WB like all other RAM.

devc1 · Post by **devc1** » Tue Aug 01, 2023 5:46 am

You know I'm years doing this so I should know, thanks for mentionning btw.

I think what you meant by distincting MMIO from RAM is that ACPI Memory is actual RAM I think correct me if I'm wrong !

And about spinlocks, I'm talking spinlocks in a multiprocessor system. If you have WB in spinlocks 2 processors may enter at the same time because they both read cache and they think that the spinlock is free. Or there is some cache coherency protocol between cpus ? I think thats only available with CCNuma.

So should I use UC With spinlocks ?

I figured out recently that probably I should use UC with MMIO, and WB with RAM for DMA Accesses as it is cache coherent

Octocontrabass · Post by **Octocontrabass** » Tue Aug 01, 2023 11:23 am

devc1 wrote:I think what you meant by distincting MMIO from RAM is that ACPI Memory is actual RAM I think correct me if I'm wrong !

ACPI tables are located in RAM, but you may need to interact with MMIO when you interpret the contents of those tables.

devc1 wrote:And about spinlocks, I'm talking spinlocks in a multiprocessor system.

I am too. CPUs also participate in the cache coherency protocol, so the RAM containing your spinlock can be WB the same as the rest of your RAM. (Ensuring the correct order of operations is a separate issue - that requires either assembly or compiler built-in functions.)

devc1 · Post by **devc1** » Tue Aug 01, 2023 12:27 pm

And why would there be a cache coherent NUMA ? I'm thinking about implementing it.

However, thanks for mentioning it. I used UC alot.

So for DMA I should use WB, for MMIO with devices such as AHCI, EHCI, VMSVGA I should use UC ?

Octocontrabass · Post by **Octocontrabass** » Tue Aug 01, 2023 1:07 pm

devc1 wrote:And why would there be a cache coherent NUMA ?

That's how x86 was designed. It's always cache-coherent.

devc1 wrote:So for DMA I should use WB, for MMIO with devices such as AHCI, EHCI, VMSVGA I should use UC ?

Devices like AHCI and EHCI use both DMA and MMIO. But yes, ordinary RAM including RAM used for DMA should be WB, and MMIO should be UC.

OSDev.org

Cache and devices

Cache and devices

Re: Cache and devices

Re: Cache and devices

Re: Cache and devices

Re: Cache and devices

Re: Cache and devices

Re: Cache and devices

Re: Cache and devices

Re: Cache and devices