Microarchitectural attacks have been known for more than a decade now. The designs behind those architectures are typically optimized for performance, cost and backward compatibility. Therefore it seems unlikely that we will see fixes in CPU architectures which address the root cause for vulnerabilities any time soon. With this in mind, the search for software-based solutions to this problem becomes a priority.
As a contribution to this effort Irazoqui et al.  puplished an interesting paper on static methods to detect microachtitetural attacks which is titled “MASScan: Stopping Microarchitectural Attacks Before Execution”.
The idea is as good as it is naïve. In this blog post I will discuss the reasons behind this position. It should also be noted that the paper in question is an early version and subject to changes.
The analysis works by flagging code that is rare in real-life applications and often used in an attack context. In this case, ‘attack context’ is defined as code which is either required for an attack to work, or because it improves an attacks’s performance. The list is:
Cache flushing instructions
(the authors do not mention this clflushopt, but I have included it for completeness)
monvnti & movntdq
Counter threads, performance counters, rdtscp, rdtsc instructions and attempts to set thread affinity to gain core co-location which is important for the accuracy of counter threads.
lfence, mfence & cpuid
Eviction set access code, pointer chasing, jumps in a loop
The instructions in question are rarely ever used. With the exception of the lock prefix, all of them are part of the 0x0F escape opcodes. In Zombie’s  opcode list (which unfortunately is outdated at the time of writing) 0x0F opcodes represent less than 2% of all opcodes, based on data from 1700 executables. The lock prefix is measured, but is rounded down to 0% in this list. This could serve as an indication that vindicates the author’s notion.
The good part is that solid static analysis is able to effectively spot problems, and highlight them to a human analyst. Further manual analysis can be performed based on those indicators to identify malignant behavior, suspicious cases or to vet out false positives. What makes this approach somewhat challenging is the fact that static analysis is very difficult to do well and impossible to do right, especially when factoring in that attackers try to actively evade static analysis.
The following is to demonstrate what such an evasive action might look like.
Microarchitectural & Rice
Rice’s theorem states that all non-trivial semantic properties are undecidable. In short, obfuscation is difficult to deal with. The bad news is that microarchitectural attacks are non-trivial semantic expressions and as such, as per Rice’s theorem, undecidable. In other words: you could achieve the same result in an infinite number of ways, without being able to pinpoint the “right” way. You will never be able to deduce from the semantic output which all syntax representation that cause it.
The example I like to use is this: One could build an interpreter which takes the original program as input. The output of the interpreter would then have the exact same semantics, but a different syntax. Obviously one could then build a new interpreter that processes the first interpreter’s syntax output and so on and so forth. Consequently, we cannot generate a database of syntax representation for a given semantic. The clever reader will already know that microarchitectural attacks do not lend themselves well to emulation or obfuscation for that matter. They often rely on rare syntax elements (rare instructions). Execution time is a very real concern and any obfuscation might change microarchitectural states that are important to the attack. However, this doesn’t mean it’s impossible.
Let’s go through the above list from an attacker’s point of view.
Cache flushing instructions
The clflush instructions can replaced by eviction code as demonstrated by Oren et al.  as well as Gruss et al . This relocates the problem from detecting dangerous instructions to detecting dangerous algorithms. It effectively disqualifies the syntactic element clflush for use as an answer to a semantic question.
Non-temporal instructions & timers
Timers are indeed the Achilles heel of most microarchitectural attacks. The rdtsc(p) instructions are a telltale sign for such an attack. Unfortunately, though, they are used by benign applications as well. Often these instructions are wrapped in API functions, e.g. the QueryPerformanceCounters API on Microsoft Windows. The problem with such API calls is that they can be imported dynamically in any number of ways. This makes a static analysis fairly cumbersome.
As for counter threads, they too can be implemented in numerous ways. Counting does not have to be monotonic increasing, only deterministically changing. As the CPU’s are superscalar, some instructions can be added to the loop at a very low accuracy penalty. And of course, the loop can be camouflaged. This not only obscures the actual nature of a function (e.g. a counter thread), it also takes the detection into potential false positive territory. Finally, some attacks (like attacks on KASRL) can be repeated. This allows a low accuracy timer to be used multiple times and then using the law of large numbers to average out the noise.
I’m not aware of any substitution for the lock prefix. In this particular case, we indeed have an indicator that is difficult to replace for an attacker. It should be noted that on Microsoft Windows the Interlocked* API functions use the lock prefix and consequently the same problems arise as with the QueryPerformanceCounter API.
As far as algorithmic constructions are concerned, those can be varied and obfuscated ad nauseam. Therefore, they make for a poor indicator.
For instance, you could perform eviction using a vector, a tree structure or, in fact, any other data structure. Each of them will generate completely different code. Eviction can be triggered by any instruction that uses memory – therefore, any instruction would achieve this. A very old approach has been memset, which comes at a steep performance penalty for the attacker. However, it would likely suffice for spying on keyboards in Gruss et al.  . Call qword ptr [address] can touch two cache lines to load the address and one on the stack, as well as the one or two where the instruction itself lies. That is just an example of how ugly eviction can be made. We could argue with a performance penalty in this case. However, we should bear in mind that optimal eviction strategies not only touch uncached memory, but also memory that is already cached – see Gruss et al. !
It gets worse from there: For row hammer I suggested that we do not need not use eviction. Instead, we could bring the cache coherency policy into play to cause write back into memory, see Fogh . This provides yet another algorithm to detect for protection against row hammer, which of course can be implemented in many different ways.
Classic malware obsfuscation – Anti static analysis methods
Copy protections and malware has historically used a number of methods to defeat static analysis.
I wrote my first executable packer in 1995. Packers go back further than that, though. Once an executable has been packed, the only code that is now available to static analysis in the first stub of the unpacker. Unfortunately, malware authors are aware of this technique and it’s even available for purchase online as part of COTS malware as a service. Also, packers used commercially for copy protection can be used for obfuscation like this.
Malware can of course also decrypt data and save it as an executable on disk or even in memory to avoid static analysis. Techniques such as “Run-PE” are widespread in real world malware.
Such ways of hiding code from analysis is already commonplace and no longer qualifies as sophistication in malware.
Anti-disassembly and code reuse
Static analysis can be performed either based source code or on disassembly. Commercial providers, however, tend not to share their source code for intellectual property reasons. This only leaves disassembly as a method for analysis. Unfortunately, however, the x86 platform has a non-fixed length of opcodes. This results in problems to locate the starting point of an instruction. Historically this has been used as a means to thwart disassembly. A clflush instruction can easily be hidden from disassembly as part of, say, a mov instruction. The extreme version of this is doing code-reuse attacks such as ROP. Obviously a clflush “gadget” does not have to be part of the shipped malware, but could very well be part of the operating system – clflush (In the simplest form) assembles to 3 bytes of which the attacker can influence the third by picking the operand, making it reasonable to find a suitable gadget somewhere in the operating system.
A peculiar niche case
We have already seen static analysis thwart these kinds of attacks in one special instance: the NaCl sandbox in Chrome. In there, the code is validated during compiling and run in a sandboxed environment to make sure that none of the above tricks are used. Validation will fail if a clflush instruction is generated. Unfortunately, this is not generally applicable. Never-the-less requiring intermediate language representation (say LLVM) when submitting to a shop may assist the authors intention, but many of the issues mentioned above including Rice’s theorem itself applies to intermediate language representations as well.
At this point in time, attackers capable of launching microarchitectural attacks have to be considered ‘advanced’. We must therefore assume that they have ready access to malware obfuscation technology. This technology can effectively thwart classification using static analysis of executables – this is especially true if the “feature set” is small and malleable. This limited feature set further reduces the cost of applying obfuscation for the attacker. The feature set of MASScan is exactly that: small and mallable. Microarchitectural attacks generally have a bit of leeway for modification to blend in with benign code. Consequently, static analysis is unlikely to give defenders a real edge. Static analysis could be augmented with symbolic or even concolic analysis to improve accuracy. However these methods scale poorly and have issues of their own. Given that it produces a <6% false positive ratio, static analysis seems a dull weapon against microarchitectural attacks. This leaves the dynamic approach which I consider the most promising stop-gap-solution.
For instance, my flush+flush detection blog post  or my work with Herath on detecting row hammer and cache attacks at BlackHat 2015 using performance counters  are examples of how detecting microarchitectural attacks can be automated in controlled environments. These methods are not without flaws, either. But from an attacker’s point of view they are at least more difficult to work around as they are often behavior-based and consequently circumvent the problem presented by the Rice theorem. Despite progress in defense research, we remain without strong defenses against microarchitectural attacks.
 Irazoqui, G., Eisenbarth T., an Sunar B. MASScan: Stopping Microarchitectural Attacks Before Execution. http://eprint.iacr.org/2016/1196.pdf
 Gruss, D., Spreitzer, R., and Mangard, S. Cache template attacks: Automating attacks on inclusive last level caches. In 24th USENIX Security Symposium (2015), USENIX Association, pp. 897-912
 Z0mbie, “Opcode Frequency Statistics”. http://z0mbie.daemonlab.org/opcodes.html
 Nishat, H., Fogh, A. “These Are Not Your Grand Daddys CPU Performance Counters”. Black Hat 2015. See also http://dreamsofastone.blogspot.de/2015/08/speaking-at-black-hat.html
 Fogh, A. Detecting stealth mode cache attacks: Flush+Flush. Http://dreamsofastone.blogspot.de/2015/11/detecting-stealth-mode-cache-attacks.html
 Fogh, A. Row hammer, java script and MESI- http://dreamsofastone.blogspot.de/2016/02/row-hammer-java-script-and-mesi.html