MASScan & the Problems of Static Detection of Microarchitectural attacks

 

Introduction

Microarchitectural attacks have been known for more than a decade now.  The designs behind those architectures are typically optimized for performance, cost and backward compatibility. Therefore it seems unlikely that we will see fixes in CPU architectures which address the root cause for vulnerabilities any time soon. With this in mind, the search for software-based solutions to this problem becomes a priority.

As a contribution to this effort Irazoqui  et al. [1] puplished an interesting paper on static methods to detect microachtitetural attacks which is titled “MASScan: Stopping Microarchitectural Attacks Before Execution”.
The idea is as good as it is naïve. In this blog post I will discuss the reasons behind this position. It should also be noted that the paper in question is an early version and subject to changes.

 

MASScan

The analysis works by flagging code that is rare in real-life applications and often used in an attack context. In this case, ‘attack context’ is defined as code which is either required for an attack to work, or because it improves an attacks’s performance. The list is:

Cache flushing instructions

Clflush, clflushopt
(the authors do not mention this clflushopt, but I have included it for completeness)

Non-temporal instructions

monvnti & movntdq

Timers

Counter threads, performance  counters,  rdtscp, rdtsc  instructions and attempts to set thread affinity to gain core co-location which is important for the accuracy of counter threads.

Fences

lfence, mfence & cpuid

Locking instruction

lock prefix

Algorithmic constructions

Eviction set access code, pointer chasing, jumps in a loop

 

The instructions in question are rarely ever used. With the exception of the lock prefix, all of them are part of the 0x0F escape opcodes. In Zombie’s [5] opcode list (which unfortunately is outdated at the time of writing) 0x0F opcodes represent less than 2% of all opcodes, based on data from 1700 executables. The lock prefix is measured, but is rounded down to 0% in this list. This could serve as an indication that vindicates the author’s notion.

The good part is that solid static analysis is able to effectively spot problems, and highlight them to a human analyst. Further manual analysis can be performed based on those indicators to identify malignant behavior, suspicious cases or to vet out false positives. What makes this approach somewhat challenging is the fact that static analysis is very difficult to do well and impossible to do right, especially when factoring in that attackers try to actively evade static analysis.
The following is to demonstrate what such an evasive action might look like.

 

Microarchitectural & Rice

Rice’s theorem states that all non-trivial semantic properties are undecidable. In short, obfuscation is difficult to deal with. The bad news is that microarchitectural attacks are non-trivial semantic expressions and as such, as per Rice’s theorem, undecidable. In other words: you could achieve the same result in an infinite number of ways, without being able to pinpoint the “right” way. You will never be able to deduce from the semantic output which all syntax representation that cause it.

The example I like to use is this: One could build an interpreter which takes the original program as input. The output of the interpreter would then have the exact same semantics, but a different syntax. Obviously one could then build a new interpreter that processes the first interpreter’s syntax output and so on and so forth. Consequently, we cannot generate a database of syntax representation for a given semantic. The clever reader will already know that microarchitectural attacks do not lend themselves well to emulation or obfuscation for that matter. They often rely on rare syntax elements (rare instructions). Execution time is a very real concern and any obfuscation might change microarchitectural states that are important to the attack. However, this doesn’t mean it’s impossible.

Let’s go through the above list from an attacker’s point of view.

Cache flushing instructions

The clflush instructions can replaced by eviction code as demonstrated by Oren et al. [2] as well as  Gruss et al [3]. This relocates the problem from detecting dangerous instructions to detecting dangerous algorithms. It effectively disqualifies the syntactic element clflush for use as an answer to a semantic question.

Non-temporal instructions & timers

Timers are indeed the Achilles heel of most microarchitectural attacks. The rdtsc(p) instructions are a telltale sign for such an attack. Unfortunately, though, they are used by benign applications as well. Often these instructions are wrapped in API functions, e.g. the QueryPerformanceCounters API on Microsoft Windows. The problem with such API calls is that they can be imported dynamically in any number of ways. This makes a static analysis fairly cumbersome.

Counter Threads

As for counter threads, they too can be implemented in numerous ways. Counting does not have to be monotonic increasing, only deterministically changing. As the CPU’s are superscalar, some instructions can be added to the loop at a very low accuracy penalty. And of course, the loop can be camouflaged.  This not only obscures the actual nature of a function (e.g. a counter thread), it also takes the detection into potential false positive territory. Finally, some attacks (like attacks on KASRL) can be repeated. This allows a low accuracy timer to be used multiple times and then using the law of large numbers to average out the noise.

Fences

Fences are rarely a strict requirement for attacks. They do tend to lower the noise, but an attack could often do without them. For instance, Oren et al. [2] does prime+probe in Javascript without a fence. Flush+Reload works fairly decent without fencing as well. Also, makeshift fences can in some cases be produced by gaming reordering. For instance, filling the reorder buffer with dependent instructions before starting a round of the attack will serve well to fence against already pending loads and stores.

Locking instructions

I’m not aware of any substitution for the lock prefix. In this particular case, we indeed have an indicator that is difficult to replace for an attacker. It should be noted that on Microsoft Windows the Interlocked* API functions use the lock prefix and consequently the same problems arise as with the QueryPerformanceCounter API.

Algorithmic constructions

As far as algorithmic constructions are concerned, those can be varied and obfuscated ad nauseam. Therefore, they make for a poor indicator.
For instance, you could perform eviction using a vector, a tree structure or, in fact, any other data structure. Each of them will generate completely different code. Eviction can be triggered by any instruction that uses memory – therefore, any instruction would achieve this. A very old approach has been memset, which comes at a steep performance penalty for the attacker. However, it would likely suffice for spying on keyboards in Gruss et al. [4] . Call qword ptr [address] can touch two cache lines to load the address and one on the stack, as well as the one or two where the instruction itself lies. That is just an example of how ugly eviction can be made. We could argue with a performance penalty in this case. However, we should bear in mind that optimal eviction strategies not only touch uncached memory, but also memory that is already cached – see Gruss et al. [3]!

It gets worse from there: For row hammer I suggested that we do not need not use eviction. Instead, we could bring the cache coherency policy into play to cause write back into memory, see Fogh [8].  This provides yet another algorithm to detect for protection against row hammer, which of course can be implemented in many different ways.

 

Classic malware obsfuscation – Anti static analysis methods

 

Copy protections and malware has historically used a number of methods to defeat static analysis.

Self-modifying code

I wrote my first executable packer in 1995. Packers go back further than that, though. Once an executable has been packed, the only code that is now available to static analysis in the first stub of the unpacker. Unfortunately, malware authors are aware of this technique and it’s even available for purchase online as part of COTS malware as a service. Also, packers used commercially for copy protection can be used for obfuscation like this.

Malware can of course also decrypt data and save it as an executable on disk or even in memory to avoid static analysis. Techniques such as “Run-PE” are widespread in real world malware.

Another example of self-modifying code is JIT compiling, which is what Javascript does. In fact, I use the keystone assembler JIT style for building microarchitectural attacks fairly often, because it gives me a lot more control than I get from the compiler.

Opening hidden browser windows using malicious java script is entirely possible and Oren et. Al demonstrate that prime+probe runs well in JavaScript. It is worth noting that the browser components can be linked into the malware and subsequently do not need to be present on the victim’s computer.

Such ways of hiding code from analysis is already commonplace and no longer qualifies as sophistication in malware.

Anti-disassembly and code reuse

Static analysis can be performed either based source code or on disassembly. Commercial providers, however, tend not to share their source code for intellectual property reasons. This only leaves disassembly as a method for analysis. Unfortunately, however, the x86 platform has a non-fixed length of opcodes. This results in problems to locate the starting point of an instruction. Historically this has been used as a means to thwart disassembly. A clflush instruction can easily be hidden from disassembly as part of, say, a mov instruction. The extreme version of this is doing code-reuse attacks such as ROP. Obviously a clflush “gadget” does not have to be part of the shipped malware, but could very well be part of the operating system – clflush (In the simplest form) assembles to 3 bytes of which the attacker can influence the third by picking the operand, making it reasonable to find a suitable gadget somewhere in the operating system.

 

A peculiar niche case

We have already seen static analysis thwart these kinds of attacks in one special instance: the NaCl sandbox in Chrome. In there, the code is validated during compiling and run in a sandboxed environment to make sure that none of the above tricks are used. Validation will fail if a clflush instruction is generated. Unfortunately, this is not generally applicable. Never-the-less requiring intermediate language representation (say LLVM) when submitting to a shop may assist the authors intention, but many of the issues mentioned above including Rice’s theorem itself applies to intermediate language representations as well.

 

Conclusion

At this point in time, attackers capable of launching microarchitectural attacks have to be considered ‘advanced’.  We must therefore assume that they have ready access to malware obfuscation technology. This technology can effectively thwart classification using static analysis of executables – this is especially true if the “feature set” is small and malleable. This limited feature set further reduces the cost of applying obfuscation for the attacker. The feature set of MASScan is exactly that: small and mallable. Microarchitectural attacks generally have a bit of leeway for modification to blend in with benign code. Consequently, static analysis is unlikely to give defenders a real edge. Static analysis could be augmented with symbolic or even concolic analysis to improve accuracy. However these methods scale poorly and have issues of their own. Given that it produces a <6% false positive ratio, static analysis seems a dull weapon against microarchitectural attacks. This leaves the dynamic approach which I consider the most promising stop-gap-solution.
For instance, my flush+flush detection blog post [7] or my work with Herath on detecting row hammer and cache attacks at BlackHat 2015 using performance counters [6] are examples of how detecting microarchitectural attacks can be automated in controlled environments. These methods are not without flaws, either. But from an attacker’s point of view they are at least more difficult to work around as they are often behavior-based and consequently circumvent the problem presented by the Rice theorem. Despite progress in defense research, we remain without strong defenses against microarchitectural attacks.

 

Literature

[1] Irazoqui, G., Eisenbarth T., an Sunar B.  MASScan: Stopping Microarchitectural Attacks Before Execution. http://eprint.iacr.org/2016/1196.pdf

[2] Oren, Y., Kemerlis, V. P., Sethumadhavan, S., and Keromytis, A. D. The spy in the sandbox: Practical  cache attacks in javascript and their implications. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security (New York, NY, USA, 2015), CCS ’15, ACM, pp. 1406-1418.

[3] Gruss, D., Maurice, C., and Mangard, S. Rowhammer.js: A remote software-induced fault attack in javascript.  In Proceedings of the 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment -Volume 9721 (New  York,  NY,  USA,  2016),  DIMVA  2016,  Springer-Verlag  New York, Inc., pp. 300{321.

[4] Gruss, D., Spreitzer, R., and Mangard, S. Cache template attacks: Automating  attacks  on  inclusive  last level  caches.   In 24th USENIX Security Symposium (2015), USENIX Association, pp. 897-912

[5] Z0mbie, “Opcode Frequency Statistics”. http://z0mbie.daemonlab.org/opcodes.html

[6] Nishat, H., Fogh, A. “These Are Not Your Grand Daddys CPU Performance Counters”. Black Hat 2015. See also  http://dreamsofastone.blogspot.de/2015/08/speaking-at-black-hat.html

[7] Fogh, A. Detecting stealth mode cache attacks: Flush+Flush. Http://dreamsofastone.blogspot.de/2015/11/detecting-stealth-mode-cache-attacks.html

[8] Fogh, A. Row hammer, java script and MESI- http://dreamsofastone.blogspot.de/2016/02/row-hammer-java-script-and-mesi.html