CPU bug makes virtually all chips vulnerable

Brendan · Post by **Brendan** » Wed Jan 03, 2018 11:46 pm

Hi,

~ wrote:How do the patches for these vulnerabilities actually work (Meltdown/Spectre)? Are they really just a separation of page tables for kernel and programs?

Currently; for systems without PCID, whenever the kernel is returning to CPL=3 it sets the pages to almost everything in kernel space to "not present", and then when anything causes a switch from CPL=3 back to CPL=0 the kernel restores all of its pages back to "present". For systems with PCID, kernel and the process each have their own "address space ID", and when kernel is returning to CPL=3 it changes the current address space ID to the process' address space ID, and when anything causes a switch from CPL=3 back to CPL=0 the kernel changes the current address space ID to the kernel's address space ID.

For both cases there's a small piece of the kernel that contains the kernel's entry points and exit points, that can't be protected because it contains the code to adjust the current address space (but the small piece that can't be protected doesn't contain any sensitive data either, so that's not really a problem).

This works by making it impossible for the CPU to determine the physical address to speculatively fetch data from.

Note: I'm not sure how small the "small piece of the kernel" actually is. Windows is closed source (and hasn't been patched yet?), and I'm too lazy to look at the Linux patches.

Cheers,

Brendan

davidv1992 · Post by **davidv1992** » Thu Jan 04, 2018 2:54 am

For those interested, the papers have been published now at https://spectreattack.com/

Solar · Post by **Solar** » Thu Jan 04, 2018 3:30 am

After quickly skimming the above link (thank you davidv1992):

Sik wrote:
Solar wrote:Apparently there is no way to fix this in hardware / firmware.
AMD is not affected so it's clearly avoidable.

Correction:

AMD is affected by Spectre.

It is unclear if AMD is affected by Meltdown.

The same, by the way, is true for ARM CPUs.

Octacone · Post by **Octacone** » Thu Jan 04, 2018 5:35 am

But what can we (as OS developers) do about it? Is it really that painful to fix?

~ · Post by ~ » Thu Jan 04, 2018 6:01 am

We would probably need to be able to fully reproduce the bugs or we might be solving nothing if we don't fully know what's happening and how to avoid it.

As I said I would start by doing the following:

- Disable the CPU cache fully. Most OSes here don't use the cache due to their beginner-level simplicity. When we learn to use the cache efficiently, we can selectively invalidate the whole cache and wait a time for the cache flush operation to complete, when we don't want data leak. We could add cases to the kernel, like flushing the cache every 15 or 30 seconds, or when using hashing/criptography APIs, for the things that would trigger a full cache flush.

- Provide APIs to set the program configuration to invalidate the cache every time it's switched by the multitasking system, and also a flag to invalidate it if it's run as root/administrator.

This measure should be something that could be enabled or disabled for efficiency, but in all cases we must understand it to make possible, for example, to come up with highly optimized Assembly tricks that take advantage on the behavior of a CPU and that use it to implement robust code that intrinsically avoids this problem by the way it uses the cache usefully, not just trying to stop an undesired side effect of the behavior of the CPU implementation. Always implement for functionality, not merely to patch flaws which would actually make for a poorer code.

Korona · Post by **Korona** » Thu Jan 04, 2018 6:01 am

Considering the AMD post on the LKML AMD does not perform speculative prefetch from user to supervisor pages, so it is not affected by Meltdown.

There is no absolutely effective software defense against Spectre. We would need ISA updates (e.g. an instruction that invalidates speculative state like the branch prediction buffer). The PoC does not even depend on RDTSC and can read Chromes address space from JavaScript.

davidv1992 · Post by **davidv1992** » Thu Jan 04, 2018 6:02 am

Based on the articles, for meltdown we have options as OS developers:
1) Minimize attack surface by keeping most of your kernel memory unmapped when running user mode code.
2) Disabling the TSC in userspace should increase the complexity of using and reduce bandwidth of the cache miss side channel
3) Disabling transaction support should do the same (to a point, exploiting branch mispredictions could make this point moot), and would allow you to detect trap/exception based meltdown attacks by their unusually large number of page faults on invalid memory addresses.

The only approach that seems reasonably foolproof to me is 1), the second and third can most likely be worked around by a determined attacker.

Furthermore, from what I understand from the article, it will be rather non-trivial to execute a trap/exception based meltdown attack from something like javascript. If that is the case, an attacker would need to gain execute privileges via some other attack/exploit. Also, meltdown seems to only affect intel cpus for the time being.

Each of the mitigations mentioned above has some ammount of cost associated with it. The highest price is paid for 1), because that will cause a large number of TLB misses on switching to the kernel. How big that cost is depends on a number of factors, but in my opinion, the 30% figure is large overestimate. The more realistic scenarios for frequent switching to kernel mode are most likely IO bound, in which case the lost performance is largely irrelevant. Furthermore, as mentioned by Brendan, using PCID largely negates the problem of those TLB misses, and should be available on platforms where the I/O speed is sufficient to significantly load the processor.

The more problematic attack is Spectre. This has been demonstrated to work from within a browser, and as far as I can tell there are few countermeasures currently effective in general. It is also able to read any memory that can be read by currently executing code. The one thing that seems to be a reoccuring trend among most of the side channels suggested for Spectre attacks is that they seem to rely on reasonably accurate timing. Disabling high precision time measurements could perhaps go some way to preventing these, but given that they managed this in javascript, I am unsure how one would go about that in practice, and whether this would be enough.

NOTE: This is just my personal analysis. Anything here could be complete bullcrap. DO NOT USE THIS INFORMATION TO MAKE DECISIONS FOR CRITICAL HARDWARE/SOFTWARE SYSTEMS. THE AUTHOR DISCLAIMS, TO THE EXTENT PERMISSIBLE BY LAW, ANY RESPONSIBILITY AND OR CULPABILITY FOR HARM CAUSED BY INACCURACIES IN THE ABOVE INFORMATION.

davidv1992 · Post by **davidv1992** » Thu Jan 04, 2018 6:09 am

~ wrote:We would probably need to be able to fully reproduce the bugs or we might be solving nothing if we don't fully know what's happening and how to avoid it.

As I said I would start by doing the following:

- Disable the CPU cache fully. Most OSes here don't use the cache due to their beginner-level simplicity. When we learn to use the cache efficiently, we can selectively invalidate the whole cache and wait a time for the cache flush operation to complete, when we don't want data leak. We could add cases to the kernel, like flushing the cache every 15 or 30 seconds, or when using hashing/criptography APIs, for the things that would trigger a full cache flush.

- Provide APIs to set the program configuration to invalidate the cache every time it's switched by the multitasking system, and also a flag to invalidate it if it's run as root/administrator.

This measure should be something that could be enabled or disabled for efficiency, but in all cases we must understand it to make it possible, for example, to come up with highly optimized Assembly tricks that take advantage on the behavior of a CPU and that uses it to implement robust code that intrinsically avoids this problem by the way it uses the cache usefully, not just trying to stop an undesired side effect of the behavior of the CPU implementation. Always implement for functionality, not merely to patch flaws which would actually make for a poorer code.

The solution proposed here won't work. Simply flushing caches wont be enough as the kernel does not necessarily regain control over the processor between the speculative instructions executing and the code extracting the data executing. On a single core platform, the attacker could simply use transactional memory instructions, and when using multiple cores, extraction of the data can be done in parallel with the speculative execution stream, meaning that the data will have been leaked before the kernel has had any chance to respond.

As for disabling the caches, even simple hobby OSes use these extensively. The cache infrastructure of the processor is mostly transparant to user programs, even the kernel, and the performance benefits are ENORMOUS (as in a factor 100 or more in execution speed these days). While you could use this as a mitigation, the fact that you are using it will most likely relegate you to the amateur space so thoroughly that I would no longer worry about meltdown anyway.

NOTE: This is just my personal analysis. Anything here could be complete bullcrap. DO NOT USE THIS INFORMATION TO MAKE DECISIONS FOR CRITICAL HARDWARE/SOFTWARE SYSTEMS. THE AUTHOR DISCLAIMS, TO THE EXTENT PERMISSIBLE BY LAW, ANY RESPONSIBILITY AND OR CULPABILITY FOR HARM CAUSED BY INACCURACIES IN THE ABOVE INFORMATION.

~ · Post by ~ » Thu Jan 04, 2018 6:11 am

Wouldn't the intended measures make CPUs consume more power and also stay hotter?

Maybe the situation is more paranoia-based than a problem, and probably there will come a time when only computers that aren't connected to any network/Internet will be truly safe to use for the most irreplaceable data; the rest of machines could come to a situation where they will always be at risk if connected in any way to the outside data/code world.

Korona · Post by **Korona** » Thu Jan 04, 2018 6:34 am

It seems that the Spectre exploit can be mitigated on Intel by replacing all indirect jumps with the sequence

Code: Select all

push target
jmp __trampoline

[...]
__trampoline:
call 1f
2:
lfence
jmp 2b
1:
lea 8(%rsp), %rsp
ret

.. which is ugly at best and also somewhat inefficient: It introduces of two jumps AND prevents branch prediction. It seems that GCC will be patched to use this sequence. People who wrote their OSes in assembly: Have fun fixing your jumps

.

For calls it gets even uglier as you need a call to a label to push the current RIP before you jump to the trampoline.

zaval · Post by **zaval** » Thu Jan 04, 2018 6:45 am

I think hobby OSes could live without any worries about these bugs, since there is no really any harm for "users", due to the absence of such.

Personally, I'm not going to refuse from mapping kernel into every process AS as this is a fundamental part of VM architecture. When our OSes are ready to hit the market, Intel won't have these bugs, there will be other ones.

Octacone · Post by **Octacone** » Thu Jan 04, 2018 6:59 am

But why did people even release this information in the first place? We were all fine all these years without any consequences.
If nobody knows that XYZ exists, nobody can interact (in any way) with XYZ. Simple.
Why let hackers know about a bug as crucial as this without fixing it first and then releasing it?

zaval · Post by **zaval** » Thu Jan 04, 2018 7:21 am

^ users didn't want to buy every year new rocks, so they were massaged to get an additional motivation.

davidv1992 · Post by **davidv1992** » Thu Jan 04, 2018 7:36 am

The reason to share information about any vulnerability is twofold. First of all, thinking that just because you dont tell them, the bad guys will never figure it out is foolish. Sharing it allows people to be on their guard and implement solutions. Beyond that, opening up the discussion also allows more people to contribute to potential solutions.

Roman · Post by **Roman** » Thu Jan 04, 2018 7:48 am

Octacone wrote:But why did people even release this information in the first place? We were all fine all these years without any consequences.
If nobody knows that XYZ exists, nobody can interact (in any way) with XYZ. Simple.
Why let hackers know about a bug as crucial as this without fixing it first and then releasing it?

> If nobody knows that XYZ exists

Why do you think nobody knows?

Well, what to do in such cases is a debatable topic. If you are interested, it's called full disclosure.

OSDev.org

CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all Intel chips vulnerable

Re: CPU bug makes virtually all Intel chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable

Re: CPU bug makes virtually all chips vulnerable