Breaking ASLR on systems with memory caches

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
azblue
Member
Member
Posts: 147
Joined: Sat Feb 27, 2010 8:55 pm

Re: Breaking ASLR on systems with memory caches

Post by azblue »

Thank you Brendan and willedwards for your answers :)

willedwards wrote: You can harden your exploit mitigation further by W^X, even for JITed code
It's funny you mentioned W^X in the context of JIT compiling as I was just thinking about that today. I'm trying to figure out how to maintain the security W^X offers while still allowing JIT compiling; my understanding is that the application writes the code to data pages(W) and then asks the OS to change the pages to code pages (X). If the OS always honors this request, then W^X is essentially non-existent (surely an attacker doesn't care if they have to first ask the OS for permission before executing their malicious code if the OS will always grant it).

So the only thing I can figure is that the OS will not always grant permission to change page permissions. Which begs the question, how does the OS differentiate between legitimate requests to change page permissions and malicious requests?

The best I can come up with is that at install time an admin can grant* programs like JIT compilers authority to request changing pages from writable to executable; programs that do not have this permission will be treated as if they're generating a GPF if they try calling this function.

Is there a better approach?

*(I've considered a few such privileges which will be determined at install time).
StephanvanSchaik
Member
Member
Posts: 127
Joined: Sat Sep 29, 2007 5:43 pm
Location: Amsterdam, The Netherlands

Re: Breaking ASLR on systems with memory caches

Post by StephanvanSchaik »

Hi,

I also recommend you to read the project page for more information.
Brendan wrote:Once upon a time Intel invented a new instruction - "RDTSC". Intel weren't stupid and knew that access to such precise timing would be a security problem, so they also provided a flag in CR4 that an OS can use to prevent CPL=3 code from being able to use the RDTSC instruction, and in the "Volume 3: System Programming Guide" (starting from the very first version that mentioned the time stamp counter) they wrote (highlighting is mine):
Intel in 1995 wrote:A secure operating system would set the TSD flag during system initialization to disable user access to the time stamp counter. An operating system that disables user access to the timestamp counter should emulate the instruction through a user-accessible programming interface.
This same warning (using the exact same wording) has existed in every single version of the Intel manuals since, and still exists today.

Sadly; incompetent morons ignored Intel then and continued to ignore Intel since. The end result is that every 6 months or so (for a period of over 20 years) there's yet another white-paper showing yet another security problem that relies on very precise timing side-channels that could've and should've been impossible.


Cheers,

Brendan
Except that the Javascript implementation of this attack does not even use the RDTSC or RDTSCP instructions at all, and while the native implementation can be configured to use these, it doesn't have to use them either. One of the tricks used by both implementations is to spawn a thread acting as a cycle counter by incrementing a global variable continuously. This was especially useful when testing the native implementation on ARMv7-A and ARMv8-A, where the Performance Monitoring Unit (PMU) is inaccessible from user mode by default. Furthermore, on AMD Bobcat, for instance, the results were actually a lot better when using this approach over the RDTSC and RDTSCP instructions.

However, I certainly do agree with you that disabling both the RDTSC and RDTSCP instructions would be a good first step to take to make these attacks a lot less trivial.
OSwhatever wrote:Is there a good reason cache flush instructions should be available to user space at all?
Do note though that these unprivileged instructions aren't necessary for these attacks. They just make it a lot more trivial and less time-consuming to execute them. For instance, the native implementation of the AnC attack evicts entire cache sets by allocating a buffer as large as the last-level cache, assuming that it is inclusive and virtually indexed, and by touching all the entries that map to the same cache set. By evicting a cache set and timing the memory access, you can determine whether that cache hosted one or more page table entries or not.


Yours sincerely,
Stephan.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Brendan »

Hi,
Rusky wrote:
Brendan wrote:That doesn't really sound like enough to matter; given that NIC drivers are drivers (which tend to be "more trusted" and could or would be exempt) and the rest don't need precision at all (e.g. benchmark how much work they can do in a 5 day period and you can use a sundial instead ;-) ).
Low-latency networking tends to be done in userspace, not in a driver; performance measurements that only measure overall throughput are useful but insufficient.

Someone's always going to be able to come up with a good reason to want high-precision timing information. The solution is not to get rid of it, but to use a well-designed execution environment with enough mitigations that it becomes impractical to take advantage of timing for malicious purposes.
I think you mean that there will always be whiners that want 2 or more mutually exclusive things at the same time. Fortunately, wanting and needing are very different things.

For IRQs vs. polling (and measuring time spent polling to decide if/when you should switch back to using IRQs) is something that only a NIC driver should do. Measuring RTT times is something you might do in user-space, but you shouldn't need to measure RTT for every single packet and micro-second precision is enough. For full/detailed performance measurements, RDTSC has always been a poor substitute for performance monitoring counters.

If you attempt to provide mutually exclusive things at the same time (high precision low overhead timing and protection against timing side-channels) you're guaranteed to fail; but maybe you can choose whichever is more appropriate - e.g. have no protection against timing side-channels on high-end HPC servers that few people own or care about that typically only run specific pieces of software, and have protection against timing side-channels for "generic desktop" where untrained users are far more likely to install malware, and maybe have a special/restricted "debug mode" (enabled for a process at compile time) for developers that unlocks a whole pile of security disasters (like allowing one process to inspect another process' memory, allowing full timing and profiling, etc).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Brendan »

Hi,
StephanvanSchaik wrote:
Brendan wrote:Sadly; incompetent morons ignored Intel then and continued to ignore Intel since. The end result is that every 6 months or so (for a period of over 20 years) there's yet another white-paper showing yet another security problem that relies on very precise timing side-channels that could've and should've been impossible.
Except that the Javascript implementation of this attack does not even use the RDTSC or RDTSCP instructions at all, and while the native implementation can be configured to use these, it doesn't have to use them either.
The Javascript implementation of this attack not using RDTSC or RDTSCP is so unlikely that I wouldn't even consider it plausible. I'd expect Javascript's timer would be derived from something like POSIX "clock_gettime()" and most OSs have optimised that to use RDTSC in user-space to avoid kernel API overhead.

Note that the paper itself does say:

"Recent work shows that timing side channels can be exploited in the browser to leak sensitive information such as randomized pointers [6] or mouse movement [48]. These attacks rely on the precise JavaScript timer in order to tell the difference between an access that is satisfied through a cache or main memory. In order to thwart these attacks, major browser vendors have reduced the precision of the timer."

And:

"The decreased precision makes it difficult to tell the difference between a cached or memory access (in the order of tens of nanoseconds) which we require for AnC to work."

Essentially; the paper itself confirms what I've been saying - nerfing the precision of RDTSC is relatively successful at thwarting these attacks.
StephanvanSchaik wrote:One of the tricks used by both implementations is to spawn a thread acting as a cycle counter by incrementing a global variable continuously. This was especially useful when testing the native implementation on ARMv7-A and ARMv8-A, where the Performance Monitoring Unit (PMU) is inaccessible from user mode by default. Furthermore, on AMD Bobcat, for instance, the results were actually a lot better when using this approach over the RDTSC and RDTSCP instructions.

However, I certainly do agree with you that disabling both the RDTSC and RDTSCP instructions would be a good first step to take to make these attacks a lot less trivial.
Sure; and in a clinical setting where you can lock threads to specific CPUs and know detailed timing between CPUs, and don't have to worry about people noticing a massive performance drain (an entire CPU spending 100% of its time), that might work well; and in a practical attack it makes success significantly harder (and making a successful attack significantly harder is the only thing any security mechanism can do - it's why good security depends on multiple layers, where each layer increases the difficulty).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
willedwards
Member
Member
Posts: 96
Joined: Sat Mar 15, 2014 3:49 pm

Re: Breaking ASLR on systems with memory caches

Post by willedwards »

Brendan wrote: The Javascript implementation of this attack not using RDTSC or RDTSCP is so unlikely that I wouldn't even consider it plausible.

...

Note that the paper itself does say:

"Recent work shows that timing side channels can be exploited in the browser to leak sensitive information such as randomized pointers [6] or mouse movement [48]. These attacks rely on the precise JavaScript timer in order to tell the difference between an access that is satisfied through a cache or main memory. In order to thwart these attacks, major browser vendors have reduced the precision of the timer."

And:

"The decreased precision makes it difficult to tell the difference between a cached or memory access (in the order of tens of nanoseconds) which we require for AnC to work."

Essentially; the paper itself confirms what I've been saying - nerfing the precision of RDTSC is relatively successful at thwarting these attacks.
The info is quite spread out; from https://www.vusec.net/projects/anc/ it says:
Precise Timing from JavaScript

The AnC attack requires a precise timer in JavaScript to tell the difference between a cached and uncached memory access. Recently, browser vendors have broken the precise JavaScript timer, performance.now(), in order to thwart cache attacks.

We built two new timers that bypasses this mitigation in order to make the AnC attack work. Our new timers not only make the AnC attack possible, they also revive the previously known cache attacks from the browser. For more information, we invite you to read Section 4 of our NDSS’17 paper.
Now I can't seem to find that paper. NDSS’17 is in a week from now. Presumably we have to wait.

But I heard - and I forget which outlet it was - the same as StephanvanSchaik, that they used web workers.

Anyway, I promise you, hoping that a lack of a precise timer will stop attackers being able to synthesize one in user-space programmatically is hopelessly optimistic.
willedwards
Member
Member
Posts: 96
Joined: Sat Mar 15, 2014 3:49 pm

Re: Breaking ASLR on systems with memory caches

Post by willedwards »

azblue wrote:Thank you Brendan and willedwards for your answers :)

willedwards wrote: You can harden your exploit mitigation further by W^X, even for JITed code
It's funny you mentioned W^X in the context of JIT compiling as I was just thinking about that today. I'm trying to figure out how to maintain the security W^X offers while still allowing JIT compiling; my understanding is that the application writes the code to data pages(W) and then asks the OS to change the pages to code pages (X). If the OS always honors this request, then W^X is essentially non-existent (surely an attacker doesn't care if they have to first ask the OS for permission before executing their malicious code if the OS will always grant it).

So the only thing I can figure is that the OS will not always grant permission to change page permissions. Which begs the question, how does the OS differentiate between legitimate requests to change page permissions and malicious requests?

The best I can come up with is that at install time an admin can grant* programs like JIT compilers authority to request changing pages from writable to executable; programs that do not have this permission will be treated as if they're generating a GPF if they try calling this function.

Is there a better approach?

*(I've considered a few such privileges which will be determined at install time).
I think that RWX for JITs is not just technical, its also historic. Back when the JITting started, it wasn't seen as much of a problem. Now, there are lots of sunk effort in making those JITs work the way they do, and there's not much appetite for rewriting them. Also, on the technical side, theres not a lot of options if you want any semblance of portability. Its hard to support new processor or OS features that require major reorganisation when you expect the same binaries to run on older machines.

I believe the modern iOS stopped embedding the WebView browser in apps and started running the browser in a separate process and using UI composition to give the user experience of a seamless browser in order to be able to enable stronger mitigations like W^X in normal iOS apps, even though they can't enable them in webkit processes. Something like that.

The normal attack is to string together a row of vulnerabilities. First you want to find a bug that lets you write to arbitrary memory that the language VM (e.g. Java, Flash, Javascript) can write to. Then you want to find the address of something on the heap, so you can rewrite some of the javascript vtable-like pointers so you can get code executed. Then you want to find somewhere to put your code that is WX, and then you want to write the address of the code you injected into that function pointer in the heap. And finally, now you can take over the VM and make it run arbitrary code, you want to find a sandbox escape :) Take this to Pwn2Own and make your millions :)

Anyway, fundamentally, even the Mill CPU security is based on strong memory isolation. But a new CPU can facilitate W^X-type mitigations for JITs to thwart or impede the chain of vulnerabilities leading to an exploit. For example: you could have two turfs, one that can X and one that can W. You can portal to the JITter turf that can W in order to do the JIT. This means that the attacker can't just write to executable memory, they have to find a vulnerability in the JIT to get it to do that for it. Another example is that the sandbox escape seems to invariably be that the kernel and graphics subsystem is such a large API that nobody seems to be able to squash all the privilege escalation vulnerabilities. On the Mill, the Javascript turf can be sandboxed so it can only write to a handful of portals, and these portals need not even be the kernel but rather some whitelisting validator. NaCL also went in this direction but perhaps not so far. There can still be bugs in whitelisting validators that let things through.

All in all, the attackers keep winning, but we must never ever give up! The only truly secure code seems to be the code that is never run :)
User avatar
dchapiesky
Member
Member
Posts: 204
Joined: Sun Dec 25, 2016 1:54 am
Libera.chat IRC: dchapiesky

Re: Breaking ASLR on systems with memory caches

Post by dchapiesky »

Why not just run a couple of processes which load/evict pages randomly?

Have these extra processes actually USE the side channel to communicate with each other and when they can no longer do so then someone is attacking your system....

If they were kernel threads, they could listen to an event stream about valid eviction/loads to effectively cancel out other process noise.

OR

just go the route of partitioned OS's which partition execution time and reset the entire page table at each context switch.....
Plagiarize. Plagiarize. Let not one line escape thine eyes...
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Breaking ASLR on systems with memory caches

Post by alexfru »

Why not just write code properly? (ducks)
User avatar
Roman
Member
Member
Posts: 568
Joined: Thu Mar 27, 2014 3:57 am
Location: Moscow, Russia
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Roman »

Why not just write code properly? (ducks)
Because it's already been written. No one wants to rewrite it (and there are no guarantees that it will be better).
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Breaking ASLR on systems with memory caches

Post by alexfru »

Roman wrote:
Why not just write code properly? (ducks)
Because it's already been written. No one wants to rewrite it (and there are no guarantees that it will be better).
https://en.wikipedia.org/wiki/Address_space_layout_randomization wrote:Address space layout randomization (ASLR) is a computer security technique involved in protection from buffer overflow attacks.
You mean code with buffer overflows is proper?
User avatar
Roman
Member
Member
Posts: 568
Joined: Thu Mar 27, 2014 3:57 am
Location: Moscow, Russia
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Roman »

I didn't say it's proper.
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Breaking ASLR on systems with memory caches

Post by alexfru »

Roman wrote:I didn't say it's proper.
Darn, I fell for it.
Post Reply