Microkernels and DMA

OSwhatever · Post by **OSwhatever** » Wed Jan 20, 2021 4:42 am

DMA is a little bit of an outcast in micrkernels where the design of microkernels is a limiting factor for efficiently use DMA. Many times DMA is a part of a driver where you need to setup DMA jobs quickly. If the DMA driver would be a isolated driver in its own process, that would severely degrade the performance as there would a significant delay using the IPC to setup DMA jobs.

The question is what is a good approach for DMA in such systems. One possibility it to move the DMA driver into the kernel and where it can be operated with system calls which are in general faster than the IPC. Some polite HW DMA blocks might have each channel on mapped separate 4K pages which makes it possible to map each channel in separate processes. However, is this something that can be assumed when you design a system.

What do you think is a good approach for DMA in microkernel systems?

AndrewAPrice · Post by **AndrewAPrice** » Wed Jan 20, 2021 8:52 am

When I get around to DMA, I'm going to implement a syscall that is: "allocate a block of [x] physical pages below [y] and this block can't cross boundry [z]"

Then with ISA, x = <pages needed>, y = 16MB, z = 64KB.

I'd start scanning for free memory starting at min(y, system memory) and scaning backwards.

16MB of RAM is not much, so it's possible the first 16MB is already allocated out, so I'd like to keep a map physical memory and which pages are 'pinned' (used for IO and has to be where it is) and 'unpinned'. Unpinned memory can be moved out of the way if we update the owning processes' page tables.

The syscall would return 2 pointers - the virtual memory address (that the process can use for reading/writing), and a physical memory address (that we can send to the hardware.)

------

It might be useful to keep an allowlist of processes allowed to call this syscall. One may be a "ISA DMA" process that acts as a resource manager, and individual drivers would talk to it to get find a free ISA DMA channel so you're not stepping over another driver's feet.

I don't see why this would be slow. You can give your individual driver permission to speak to the needed IO ports, or maybe you can focus on improving your IPC latency?

bzt · Post by **bzt** » Wed Jan 20, 2021 3:45 pm

OSwhatever wrote:DMA is a little bit of an outcast in micrkernels where the design of microkernels is a limiting factor for efficiently use DMA. Many times DMA is a part of a driver where you need to setup DMA jobs quickly. If the DMA driver would be a isolated driver in its own process, that would severely degrade the performance as there would a significant delay using the IPC to setup DMA jobs.

I don't think this is an issue. First of all, you're using DMA because the operation you're about to perform is going to be slow, so performance doesn't matter. A little overhead in setting up DMA is not a big deal (compared to the actual peripheral operation's time), plus it only has to be done once, your drivers are going to use the same DMA buffer over and over again.

OSwhatever wrote:The question is what is a good approach for DMA in such systems. One possibility it to move the DMA driver into the kernel and where it can be operated with system calls which are in general faster than the IPC. Some polite HW DMA blocks might have each channel on mapped separate 4K pages which makes it possible to map each channel in separate processes. However, is this something that can be assumed when you design a system.

What do you think is a good approach for DMA in microkernel systems?

IMHO simply don't care. Use the same way as you would with a monolithic kernel. DMA writes to physical memory, which has to be mapped for the CPU. In a monolithic kernel that's in kernel space. In a microkernel you simply map each buffer in its corresponding task's address space (using a syscall), otherwise everything is the same:
1. the driver starts the operation and blocks,
2. when the peripheral finishes it informs the kernel,
3. the kernel puts the driver into the active queue from the blocked queue
4. the awakened driver task can now read the buffer

AndrewAPrice wrote:16MB of RAM is not much, so it's possible the first 16MB is already allocated out, so I'd like to keep a map physical memory and which pages are 'pinned' (used for IO and has to be where it is) and 'unpinned'. Unpinned memory can be moved out of the way if we update the owning processes' page tables.

This works, but I use a simpler approach. I simply allocate all the 16Mb as soon as possible, so that my PMM thinks it is used physical memory, and malloc never uses it. Then I pass that entire area to the DMA memory allocator, which then records which parts are allocated to which driver.

Cheers,
bzt

nexos · Post by **nexos** » Wed Jan 20, 2021 4:16 pm

On ISA DMA, that is a problem. If you skip the SoundBlaster 16 and Floppy Disk Controller, then you won't need to deal with ISA DMA, and you can use the PCI BusMastering system, which requires no setup (except setting a bit in the configuration space when starting the driver). Then your driver just requests for pages to be allocated, and the VMM does it like normal. If you insist on supporting ISA DMA, with asynchronous I/O, the overhead of setting up ISA DMA is inconsiderable, as the thread performing the I/O request can keep progressing. Also, with a little hole in security, you could allow drivers to set up ISA DMA directly, with no DMA server at all.

OSwhatever · Post by **OSwhatever** » Wed Jan 20, 2021 6:09 pm

bzt wrote:IMHO simply don't care. Use the same way as you would with a monolithic kernel. DMA writes to physical memory, which has to be mapped for the CPU. In a monolithic kernel that's in kernel space. In a microkernel you simply map each buffer in its corresponding task's address space (using a syscall), otherwise everything is the same:
1. the driver starts the operation and blocks,
2. when the peripheral finishes it informs the kernel,
3. the kernel puts the driver into the active queue from the blocked queue
4. the awakened driver task can now read the buffer

I think I wasn't clear enough when I mentioned 4K IO pages. With 4K pages I meant IO address space of the HW DMA block (which means exposed registers for start/stop and other settings for the channel). In some HW implementations the designer have been smart and put each channel control on separate 4K pages so that they can be mapped independently in separate processes. This way you have direct HW register (via memory mapped IO) access of the DMA channel in each process. In my opinion this is quite ideal when it comes to speed and versatility. However, this doesn't apply for all DMA blocks I've seen. With DMA blocks I mean DMA controller (I say DMA block as they are an ASIC block in modern SoC).

OSDev.org

Microkernels and DMA

Microkernels and DMA

Re: Microkernels and DMA

Re: Microkernels and DMA

Re: Microkernels and DMA

Re: Microkernels and DMA