QEMU runs my OS slower with KVM than without (on AMD?)

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
pjht
Posts: 5
Joined: Mon Jun 24, 2024 8:01 am
GitHub: https://gitea.pterpstra.com

QEMU runs my OS slower with KVM than without (on AMD?)

Post by pjht »

As of late QEMU will run my OS significantly slower when I enable KVM than when KVM is disabled. Its definitely specific to my OS as a linux QEMU guest runs just fine with KVM enabled and is slower without KVM as expected.
I'm not sure why, though have two theories: One, than it might be an AMD issue as it doesn't happen on my Intel laptop and I only noticed it on changing from Intel to AMD on my desktop. Two, it has something to do with the graphics display output, specifically updates. Reasoning for that sums up to shell prompt coming ~100x sooner than the last lines of the boot log finishes displaying with KVM enabled.

This is the command line I normally use to start QEMU:

Code: Select all

qemu-system-x86_64 -nodefaults -m 4G --no-reboot -machine type=q35,accel=kvm,sata=false,smbus=false -bios <path to OVMF> -serial vc -monitor stdio -device ahci,id=ahci -device VGA -blockdev driver=raw,node-name=disk,file.driver=file,file.filename=<path to root disk> -device ide-hd,drive=disk,bus=ahci.0
and I can recreate it narrowed slightly down to this:

Code: Select all

qemu-system-x86_64 -m 4G -machine type=q35,accel=kvm -bios <path to OVMF> -device ahci,id=ahci -blockdev driver=raw,node-name=disk,file.driver=file,file.filename=<path to root disk> -device ide-hd, drive=disk,bus=ahci.0
All the code is here: https://gitea.pterpstra.com/mikros. I can share a root image if anyone wants since it's not trivial to build, but not sure if that would be useful since I at least can only recreate it on one of my devices.

Not sure if this is the right sub-forum or even forum overall to ask this in but didn't know where else to put this, also brand new to posting here. Thanks for any help. Am utterly lost myself.
Octocontrabass
Member
Member
Posts: 5722
Joined: Mon Mar 25, 2013 7:01 pm

Re: QEMU runs my OS slower with KVM than without (on AMD?)

Post by Octocontrabass »

What happens if you use a serial port or debug console for output instead of the display? Accessing the framebuffer might have more overhead under hardware virtualization if the hypervisor is trapping accesses to know when the guest is updating the display, and you might be accessing the framebuffer more often than necessary.
pjht
Posts: 5
Joined: Mon Jun 24, 2024 8:01 am
GitHub: https://gitea.pterpstra.com

Re: QEMU runs my OS slower with KVM than without (on AMD?)

Post by pjht »

Serial port runs quite faster, part of why I had suspected the display initially. Actually it is definitely the display, the way I ended up testing it now gave me a good comparison between KVM/non KVM log timing, everything else has a 2x-10x speedup with KVM like I would expect. Not sure why I didn't do it that way before. Display updates have (or had see next paragraph) a pseudo double buffer so I'm fairly efficient on updates - draw into a local copy of the framebuffer and then copy into the actual hardware buffer.

Realized after I typed that that I could do real hardware double buffering with the bochs VBE extensions and did that quickly but that still does nothing. I am still fully copying from a draw buffer but would think since it's a non-visible portion of VRAM it shouldn't need to trap accesses to that portion. Besides, have to do full updates once scrolling starts so partial updates only help so long.

Still think it has something to do with the way I run QEMU though. Something is going on when the brand of my host CPU determines the speed of the VM to this extent. I am attempting to figure virtio out to try the virtio-gpu device out though since that seems to have a proper "update all at once" signal to the host. On the more complex end of the drivers I've written though.
User avatar
Demindiro
Member
Member
Posts: 111
Joined: Fri Jun 11, 2021 6:02 am
Libera.chat IRC: demindiro
Location: Belgium
Contact:

Re: QEMU runs my OS slower with KVM than without (on AMD?)

Post by Demindiro »

Given you're testing on a VM it's probably not relevant but: what cache attributes are you using?

I recall using SSE2 streaming stores (core::arch::{_mm_stream_si32, _mm_stream_si128}), which proved quite fast on real hardware, before Octocontrabass pointed out I misconfigured my cache attributes. After fixing that display updates became practically instant (aside from the very occasional cursor glitch, which I've never figured out and was just a minor visual issue anyway).

There's two locations you need to check: the page table and a certain register (msr::IA32_PAT after checking my own code). I forgot the exact details though.
My OS is Norost B (website, Github, sourcehut)
My filesystem is NRFS (Github, sourcehut)
^ defunct
pjht
Posts: 5
Joined: Mon Jun 24, 2024 8:01 am
GitHub: https://gitea.pterpstra.com

Re: QEMU runs my OS slower with KVM than without (on AMD?)

Post by pjht »

Caching was set to write-through, but configuring the PAT and setting the wiki-recommended write-combining type for the framebuffer had no effect.
Is the 128-bit SSE2 store worth trying over just a u8 copy_from_nonoverlapping? I'd try it now but am currently on my laptop which won't recreate the issue.
User avatar
Demindiro
Member
Member
Posts: 111
Joined: Fri Jun 11, 2021 6:02 am
Libera.chat IRC: demindiro
Location: Belgium
Contact:

Re: QEMU runs my OS slower with KVM than without (on AMD?)

Post by Demindiro »

If the issue is due to the VM trapping every write access then using 128 bit over 8 bit stores is definitely worth it, since it'll result in 128/8=16 times less traps.

EDIT: That said, if you're using compiler-builtins then chances are it's already using rep movsq (and rep movsb to align the destination and handle the tail), which uses 64-bit load/stores unless you set the ermsb target feature, which will replace memcpy with just rep movsb.
My OS is Norost B (website, Github, sourcehut)
My filesystem is NRFS (Github, sourcehut)
^ defunct
pjht
Posts: 5
Joined: Mon Jun 24, 2024 8:01 am
GitHub: https://gitea.pterpstra.com

Re: QEMU runs my OS slower with KVM than without (on AMD?)

Post by pjht »

Tried the 128-bit stores and it does help, about a 2x speedup. Given your edit I only saw now that makes sense.

I probably should implement output buffering for stdout/err since given the way rust does output I get a bunch of little writes to the screen and writes are basically constant time as of now with how slow display output is, but it still feels like patching over a problem. The 400ms per log line average I get as of now is ridiculous even for no buffering. And quite literally 99% of that is the framebuffer update. Disable framebuffer updates and it's 500us per write down from ~60ms.
User avatar
Demindiro
Member
Member
Posts: 111
Joined: Fri Jun 11, 2021 6:02 am
Libera.chat IRC: demindiro
Location: Belgium
Contact:

Re: QEMU runs my OS slower with KVM than without (on AMD?)

Post by Demindiro »

Perhaps you could also update the framebuffer only every 1/"FPS" seconds, which will be a benefit with all programs regardless of stdout/err buffering (the latter usually being unbuffered).
My OS is Norost B (website, Github, sourcehut)
My filesystem is NRFS (Github, sourcehut)
^ defunct
pjht
Posts: 5
Joined: Mon Jun 24, 2024 8:01 am
GitHub: https://gitea.pterpstra.com

Re: QEMU runs my OS slower with KVM than without (on AMD?)

Post by pjht »

Not a bad idea but not sure it's too helpful for this, 60ms framebuffer updates mean ~16FPS. Which is quite slow, the lag at that fps is very noticeable even when typing characters. Would work better if it was 10-30ms updates and the slowdown was lots of really tiny writes really fast since I could maintain a steady 30-60 FPS.

It would be a mild improvement over what I have now during the log output at least. I may attempt implementing it. Would need extra kernel support but worth a shot.

I suppose that means I only need another 2x speed up though... 256 bit writes might do the trick but I'd need to add AVX support to my kernel. Limits my processor support too, SSE2 is required for x86_64 but AVX isn't.

EDIT: I could of course do a runtime AVX check and fall back to SSE2. And if it is in fact something to do with modern AMD cpus as a VM host they'd definitely have AVX so I would always have the extra speed when needed.
Post Reply