Not all hardware support MSI or MSI-X, but moving interrupts between cores is a lot easier with MSI and MSI-X. Currently, I have no devices operating with MSI-X, so I only move IRQs between cores for MSI-based devices.bellezzasolo wrote:Yeah, the crucial thing there was to support a per-core APIC timer, which is used for preemption. That said, the IRQ setup code doesn't need to know in most cases - all that is handled by the PCI MSI(-X) layer and interrupt dispatcher. Although that needs fleshing out, currently they're assigned to the CPU running the initialisation procedure, which is the BSP. The idea though is that the OS handles load balancing and such, all the driver has to do is assign a particular function and parameter to a particular PCI MSI. Generally the parameter is an opaque class pointer, and the function is a static that just casts the class and calls a member function to handle the interrupt.rdos wrote:So you have the IDT per CPU core? That's an interesting idea, but you would need some common exceptions to be shared.
Given that IRQs are a limited resource, and some modern devices want a lot of them, having the IDT per core might better support such devices. However, that means the IRQ setup code must know which core the IRQ is suppose to occur on, and the core of course must be running. I usually default all IRQs to BSP, and only start additional cores when load goes up. I then move both IRQs and server procedures to other cores when they use a lot of CPU time.
A possible strategy to keep IRQs dynamic is to allocate IRQs as private on BSP, and when they are moved to another core, deallocate on BSP and reallocate on the new core.
The Intel i210 network chip provides an interesting receive queue scheme where 5 different IRQs can be used, and in the optimal case they should be allocated on different cores.
How that interoperates with user applications requesting resources will be a little fun, but there we go.
However, using MSI-X on Intel i210 is compelling, and would require the ability to move MSI-X IRQs. OTOH, since MSI-X allows each IRQ to be individually configured, a better solution might be to allocate one MSI-X IRQ and start one receive queue with one receive thread. Then, as load increase, another receive queue is enabled, the IRQ is tied to another core, and another receive thread is started. In this case the IRQ could be the same, and the difference could be in the target CPU core. I suspect this would cause problems with my scheduler code though. Or maybe this is a good strategy for a majority of MSI-X able devices.
I'd rather not go back to a situation where the device decides which core the IRQs are tied to. I want this to be automatic based on load. But I could create a MSI-X mechanism where some MSI-X slots can use the same IRQ on several different cores. This would not require IDTs per core though, which I think I want to avoid (too large risk of breaking things).