New cache architecture on Intel I9 and Skylake server: An initial assessment


Intel has introduced the new I9 CPU which is seen as HEDT (High-End-DeskTop) product. The micro architecture is in many respects shared with the new Skylake server micro architecture.I f history is a guide technology introduced in this segment slowly trickles down to more budget friend desktops. From a Micro architecture point of view it seems that several things about these CPU’s will force changes on micro architectural attacks – especially in the memory subsystem. In this blog post I’ll do a short overview on some of relevant changes and the effects they’ll may have on micro architecture attacks. Since I don’t own or have access to an actual I9 processor or Skylake server processor this blog post is just conjecture.


Resume of the “old” cache hierachy

The major changes over earlier process from a micro architecture point of view is that the cache system has received a significant overhaul. Current Intel CPUs have a 3 level cache hierarchy – two very small L1 caches, one for data, one for instructions. The second level cache (L2 or Mid Latency Cache) is somewhat larger. L1 data, L1 code and L2  part of each core and private to the core. Finally, Intel CPU’s had a huge 3rd level cache (usually called L3 or  largest latency cache) shared between all cores. The 3rd level cache is subdivided into slices that are logically connected to a core. To effectively share this cache, Intel connected them on a ring bus called the Quick Path Interconnect. Further the 3rd level cache was an inclusive cache, which means that anything that is anything cached in L1 or L2 must also be cached in L3.



Some of the important changes that has been announced in the Intel Software Optimization Manuals [1]  are:

–    A focus on a high number of cores in the CPU (up to 18 in the HEDT models)

–    A reduced over all cache size per core (compared to similar older models)

–    A very significant increase in the size of the L2 (factor of 4)

–    Doubled the bandwidth of L2, while only slightly increasing the latency

–    Slightly more than offset by a reduction of the shared L3.

–    Reorganized  L3 cache to be a non-inclusive cache

–    Replaces the QPI with a mesh-style bus.

Why does these changes make sense?

Increasing the size of L2 on the cost of L3 makes sense as the L2 is much faster than L3 for applications – one can only assume that making the L3 helps reduce die size and cost. Also the increase in size of the L2 caches reduces the marginal utility of the L3. Finally  as the probability of cache set contention rises with the number of cores, it becomes advantageous to make a larger part of the total cache private. Cache contention in L3 is a problem Intel has battled before. With Haswell they introduced Cache Allocation Technology (CAT) to allow software to control cache usage of the L3 to deal with contention.

The number of cores is probably also the reason why Intel dropped the QPI ring buffer design. There is a penalty for having memory served from another core’s slices. On a ring bus this penalty is proportional to how far the cores are a part on the ring. With more cores, this penalty increases. Moving to a more flexible system seems sensible from this vantage point.

Having an inclusive L3 makes cache coherency easier and faster to manage. However, an inclusive cache comes with a cost as the same data are loaded in multiple caches. The relative loss of total cache storage space is exactly the ratio of the (L2 +L1) to L3 sizes. Previous this ratio has been around 1:10 (depending on actual CPU), but multiplying the L2 size by 4 and making the L3 a tiny bit smaller the ratio is now about 1:1.5. Thus the making the L3 cache non-inclusive is very essential to performance. At this point it’s important to notice that Intel uses the wording “non-inclusive”. This term is not well defined. The opposite of inclusive is exclusive meaning the content of L3 cannot be loaded in L1 and L2. I think Intel would have used the defined term exclusive if the cache really where exclusive. Thus,   it is probably fair to assume that non-inclusive means that data may or may not be cached in L1, L2 and L3 at the same time, but exactly how this is governed is important. Unfortunately there is no information available on this.

It’s worth noting that many of these changes has been tested and developed by Intel for the Knights landing micro architecture. Knights landing is a high throughput micro architecture, sold in relative small amounts. Thus it’s likely that many features developed for this CPU will end up being trickled down.  It’ll be interesting to see if Intel plans to trickle it down into laptop/small desktops or use different cache designs for different classes of CPUs.



Cache side channel attacks

This new cache layout is likely to have profound effect on cache side channel attacks. I think Flush+Reload will work even without the non-inclusive cache. The flush primitive is based on the CLFlush instruction which is part of the instruction set architecture (ISA). Intel has been very reluctant in the past to change the ISA and therefore my estimate is that flushing will work as always. I think the reload primitive will remain active – I find it likely that an uncached load will also load stuff into the shared L3. Also it’s likely that the QPI replacement bus can be used to fetch data from private L2 caches similar to AMD’s cross CPU transmission. This will give Flush+reload a nice flush+transfer flavor, but it’ll still work. For more on Invalidate (Flush)+Transfer see [2]. Since the L3 cache must be filled in someway we can fairly certain that at least one of these things are true, if not both. The latter being my guess.

The big change are likely to be related to evict and prime primitives. Assuming that cache contention between cores was a major reason for redesigning the cache, it’s unlikely that one can can load stuff into another core’s private hierarchy. Thus, the prime and evict primitives for  cross core attacks.  However,both are likely to work decently within a core (Hyper threading or scheduling on same core).

While the ISA behavior of CLFLush is almost certain to remain unchanged, the micro architecture below it will see significant changes. With QPI gone using the flush+flush attack by Gruss et al. [3] to find out how many hubs on the ring bus away you are from a particular slice almost certainly won’t work. This does not mean that you won’t find a side channel here in CLFlush, buses are typically bandwidth limited thus being an obvious source of congestion – without an inclusive L3 the bus bandwidth might even be easier to consume. Also the Flush+Flush attack as a Flush+reload replacement, is likely to produce a different timing behavior in the new micro architecture. My guess upfront is that a timing difference and thus a side channel remains.

Also affected by the non-inclusiveness of the L3 is row buffer side channel attacks such as those presented by Pessl. et al.[4]  Without an effective eviction cross core attacks may be severely stifled. The ability to reverse engineer the DRAM complex mapping function is likely to remain unchanged as it hinges not on eviction, but the CLFlush instruction’s ISA behavior.

With the CLFlush instruction and evict likely still working on local cores, row hammer will  remain effective. But in some scenarios, indirect effects of  micro architecture changes may break specific attacks such as that of  Sarani Bhattacharya, Debdeep Mukhopadhyay [5]. The reason is that such attacks rely on information leakage in the caches, that becomes more difficult to leverage as described above.



While the changes to the cache makes sense with the significant number of cores in the affected systems, it seems unlikely that the changes will trickle down to notebooks and laptops. With only two cores the existing cache design seems sensible. Thus we are likely to see a two tier cache design moving on.


Having a non-inclusive L3 cache is significantly more secure from a side channel perspective than an inclusive in cross core scenarios. This opens up for defending these attacks, by isolating different security domains on different cores potentially dynamically. While flush+reload is likely to be unaffected, this attack is also the easiest to thwart in real life scenarios as avoiding shared memory cross security domains is an available and effective countermeasure. Lots of new research is required to gauge the security of these micro architecture changes.



[1] Intel. Intel® 64 and IA-32 Architectures Optimization Reference Manual. July 2017.

[2] Irazoqui, Gorka, Thomas Eisenbarth, and Berk Sunar. “Cross processor cache attacks.” Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. ACM, 2016.a

[3] Gruss, Daniel, et al. “Flush+ Flush: a fast and stealthy cache attack.” Detection of Intrusions and Malware, and Vulnerability Assessment. Springer International Publishing, 2016. 279-299.

[4] Pessl, Peter, et al. “DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks.” USENIX Security Symposium. 2016.

[5] Sarani Bhattacharya, Debdeep Mukhopadhyay: “Curious case of Rowhammer: Flipping Secret Exponent Bits using Timing Analysis”.

The Kings In Your Castle Part 5: APT correlation and do-it-yourself threat research

Welcome back, to the fifth and last part of our blog series The Kings In Your Castle, where we aim to shed light on how A.P.T. functions, how targeted malware looks like and the issues us analysts might find on it. If you are interested on how it started, please check out the parts 1, 2, 3 and 4; namely here, here, here and here.

In part 5 now I will describe how we leveraged our gathered data for correlations, to unveil connections among targeted attacks, reported to CIRCL’s MISP instance. Furthermore, this blog looks ahead on what happened after the presentation of our proof of concept at this year’s Troopers conference in Heidelberg. Large parts of the parsing and correlation functionality has made its way into the code base of MISP. Raphael Vinot has published the MISP Workbench, where MISP users can now perform their own correlation of events found in MISP.


For starters, let’s define the term “correlation”. We define this as the uncovering of links of any kind among events reported to MISP. A requirement for reports that support a correlation is that they refer to one single event which goes beyond the general information already contained in MISP. Through this correlation we can glean extended knowledge of a toolset used by an actor as well as information on target preference. We also get to know about shared tools or techniques among two or more actors. We would also count it as a successful correlation if we find proof that two supposedly related events do not share any links at all.

A big step from classical (mass-)malware detection to research and mitigation of targeted attacks was, to recognize a pattern over time. The fundamental difference between targeted and non-targeted threats is that non-targeted threats does not know and/or care about their target. While campaigns of non-targeted threats also tend to improve over time, targeted actors have a significantly more developed need to stay ahead of their victims. This way, when tracking threat actors, one might spot new additions to their toolset, new intrusion techniques or even see them picking up new “business lines”.

Also, looking at malware campaigns from a historical perspective helps uncovering false flags as well as attempts to cover their tracks. This is especially significant when looking at how actors learned to do this over time.

With all of this reasoning in mind we dug through data sets with (and, occasionally, without) structured approaches, in order to unveil hidden treasures.

Naming is hard

And frequently, when uncovering supposably unknown links among different events, we found ourselves poking at the very same group of aggressors; using the very same malware and attack methods as the linked event would show. But how come?

There are several reasons to this, the most obvious one being that two events were reported, commenting on differently named groups, but actually referring to the same. This is an old issue in threat analytics; naming is hard. The reality was that in most common cases we quickly realized that each time we had a link, we were looking at identical events that were just named differently.

Probably the most popular case of multi-naming is Havex; otherwise known as EnergeticBear, DragonFly, or CrouchingYeti. This group has been mentioned in at least four different reports, with dedicated naming on all four. Similar cases have been observed for Sakula, also called BlackVine, and for WhiteElephant, also called Seven Pointed Dagger.


What we found

The purpose of correlating different events was to try and find any existing links between events that were either instigated by different actors and/or performed at different points in time by the same group. For one, we wanted to proof that we could uncover links among sample sets and groups with little technical effort. For instance, we were able to spot a spear phishing attack within MISP which clearly related to other campaigns driven by an actor known as APT1. APT1 is said to be a Chinese group, performing a plentitude of attacks in (years?). It is hugely beneficial to have advance knowledge of past attacks and current changes to an attacker’s toolset. This is especially true if an attack by a specific group against a given target may be imminent or already in progress.

What turned out interesting also though, was uncovering events documenting the same group, but extending the view to include  the capabilities which were added over time. One example for this is TurboCampaign. This was first reported in 2014. By that time it went by the name of Shell_Crew. When spotted again in 2015, they sported a 64-bit Derusbi implant for Linux machines, which had not been observed earlier. It is unclear at this point, whether this component was just missed in the 2014 report or if it was added less than a year later. The fact that attack groups learn as they go and extend their toolset is not surprising. This learning process gives us exactly the kind of information we set out to learn.

Other things we found in similar ways were for example a link between a report on The Dukes and another campaign dubbed Hammertoss; we linked an RTF spearphishing campaign from 2014 to the PittyTiger group, and we discovered a connection between RedOctober and the Inception Framework.

How to determine what’s related

We should underline again that what we did was not yet another machine learning exercise. The attributes outlined in previous episodes of this series can potentially serve in malware machine learning research, but exploring this was out of scope for what we had in mind. But what DID we do then?

We followed a rather simple approach. The set of indicators mentioned in blogpost #2, combined with attributes already present within MISP, went into a Redis backend, hosted on strong computing power. Redis allows, to perfom rather quick queries on large datasets in memory. By calculating hashes to index data items, large sets can be processed from different angles.

This way we can perform correlation runs based on IP addresses, domain names and file hashes, as well as compilation timestamps, original file names, import table and ssdeep hashes and many more. As outlined mentioned last time, targeted malware is likely to not be packed or obfuscated. It is also likely to be reused, either in its compiled form or only parts of its source code.

The following graphic illustrates a snapshot of data, related to one event within MISP. One can quickly identify certain patterns, as well as absent data. Naturally, PE-specific attributes can only be retrieved from Windows PE binaries. Help on that front is provided by ssdeep hashes, which also serve for file clustering approaches.


Ssdeep clustering

Ssdeep, apart from computing a piece-wise hash of a given file or data blob, is capable of measuring the distance between two or among multiple ssdeep hashes. These distances are named match scores, weighed measures of how similar two given files are to each other. This is possible because of the fact that ssdeep does not calculate cryptographical hashes, which identify the entire file, but only pieces of a file. Those can then be compared to the piecewise hash of a different file. Naturally, this method can be extended to compare multiple files to one file, and also looking into groups of files, calculating distances among all of those.

This principle allows for clustering of files, based on ssdeep hashes. The most popular implementation of ssdeep clustering, and also the one we based our toolchain on, is ssdc by Brian Wallace. As extension to the original implementation of ssdc, we added a multi-process computation module and a Redis connector, to be able to store clustering results directly to the backend.

This way we are able to spot events which are “close” to each other based on similarities within the involved binaries. With this information one could, for instance, establish that a group targeting entities within Russia, is using malware which bears disturbing similarities to malware used by a (different..?) group targeting entities within Afghanistan and Tajikistan. Just saying….

Ssdeep clustering challenges

Processing fuzzy hashes is a very resource intensive task, even more so when clustering samples of them. This doesn’t scale well for sample sets of certain sizes. Our test set is limited, but given that most malware repositories deal with a sample size within the hundred thousands or bigger, some presorting of groups of interest might be feasible. This pre-sorting of samples which match a certain set of criteria (e.g. events of within a specific timeframe, targeted platforms only geographical constraints) can speed up the clustering approach significantly.

Lastly, it’s worth mentioning that ssdeep hashes, just like any other statically retrieved attribute, only help describing the outer surface of a sample. It does not carry information about the purpose or behavior of a binary and is easily disturbed by runtime packers and might even fail its purpose after malware authors apply a simple change in their compiler settings.

Taking things further: MISP Workbench

Please note that, by now, this kind of APT research has been performed by designated threat research teams for a long time and presented techniques are not considered new. The community benefit we see here is the integration of our toolset into MISP’s open source code body, as well as providing results and conclusions to the broader public. Research of targeted attacks is difficult without access to a large malware set and high quality threat intelligence data.

So now here we are. With MISP Workbench the tools we developed and the data we gathered were integrated into the MISP platform in order to support incident responders and researchers to easily perform their own queries when investigating one or more events.

Workbench was built with the objective to incorporate all the presented features into one single tool. Also, it is intended to enhance the existing MISP dataset with a rich feature set, especially regarding the presented PE features. Workbench can now be used to easily group events by attacker tool sets; in connection with MISP Galaxy this is also possible with focus on dedicated adversaries. Workbench supports full text indexing and lookups for keywords, as well as picking through events on singular features or indicators.

Fine folks at CIRCL have built Workbench for standalone use, with a lightweight user interface. A more detailed description can be found here.

The following screenshot shows statistics about binary compilation timestamps within our targeted malware test set. One can quickly see, how 1970 and 1992 must have been very busy years for lots of malware authors. Err… kidding, of course. But it does seem obvious how compilation timestamps, or even just parts of compilation timestamps, can occasionally serve as an interesting grouping attribute.


Furthermore, Workbench in combination with Galaxy, supplies information about threat actors and events linked to them. This enables the analyst to search for links among threat actors on a binary feature level.


As mentioned, the extracted PE features can be used for grouping as well. The following screenshot shows how the filename ‘chrome.exe’ seems to be an all-time favorite of The Dukes, a threat actor operating out of Russia.



And here it ends, this lengthy blog series on Kings In Your Castle. The work will continue, of course. The tools presented,  just as any information sharing platform such as MISP, live from the collective effort of contribution to the tool stack, the threat information base and the distribution chain. In that sense, questions and comments are welcome, just like pull requests, bug reports and feature ideas. At this point let me say thank you once more, to my co-sufferer Raphael Vinot, as well as the team at CIRCL, the team at ERNW and Troopers conference for letting us present our work as well as Morgan Marquis-Boire for supplying samples and ever more samples.

Happy APT hunting everyone 🙂

The Kings In Your Castle Part 4 – Packers, Crypters and a Pack of RATs

In part 4 of our series “The Kings In Your Castle”, we’re back with the question, what does sophistication even mean? I’ll be outlining what complexity from a malware analyst’s perspective means, why malware intends to be undecipherable and why it sometimes just wouldn’t even try. Also, this blog entry serves to present our findings on commodity RATs within the corpus of malware we analyzed, as part of our talk at Troopers conference in March.

If you are interested in previous parts of the series please check them out here, here and here.

What does sophistication even mean?

The complexity of software is a rather soft metric, that hasn’t undergone much scrutiny in definition. For a malware analyst, this sphere even takes on a whole lot of different shades, as malware by nature aims to hide its threats. At least most of it, as one would think?

For analysts, what poses challenges are techniques such as code obfuscation or the use of well-fortified crypters. What also presents a remarkable challenge  is structured application design, although this might sound somewhat counter-intuitive. Multi-component malware with a well thought-out object oriented design and highly dependent components cause more of a headache for an analyst than any crypter out there. In fact, many of the well-known high-profile attack toolsets aren’t protected by a packer at all. The assumption is, that for one, runtime packers potentially catch unwanted attention of security products; but also, for highly targeted attacks they might not even be necessary at all. Malware developers with sufficient dedication are very well able to hide a software’s purpose without the use of a runtime packer. So, why do we still see packed malware?

A software crypter is a piece of technology which obfuscates software and its intentions, but also to changes its appearance. Crypters and packers are frequently applied to malware in order to ensure the reusability of the actual malcode. That way, when malware is detected once, the same detection will not apply to the same malware running on a different system.

Let’s take a step back though. The ‘perfect targeted attack’ is performed with a toolset dedicated to one target only. We can conclude that a crypter is needed if either the authors aren’t capable of writing undetectable malware, or, more likely, if the malware is intended to be reused. This makes sense, if you reconsider, writing malware takes time and money, a custom attack toolset represents an actual (and quite substantial) investment.

Finally, with economical aspects in mind, we can conclude that attacks performed with plain tools are the more expensive ones, while attacks using packers and crypters to protect malware are less resource intensive.


The actual hypothesis we had started our research with was, that most targeted malware comes without a crypter. In more detail, we put up the statement, that targeted malware was significantly less protected than random malicious software. Again, random malware doesn’t know its target and by definition is intended to infect a large number of individuals; while targeted malware supposedly is designed for a limited number of targets.

Packer Detection (Like PEiD Was Broken)

Now, the usage statistics of runtime packers and crypers would be easy to gather from our respective dataset, if the available state-of-the-art packer detection tools weren’t stuck somewhere in 1997. The question whether a file is packed or not in practice is not trivially answered. One of the tools that is frequently used to identify packers is named PEiD. PEiD applies pre-defined signatures to the first code bytes starting from the executable entry point, which easily fails if the code at the packed binary’s entry point changes only slightly.

Aiming for more accurate results, we decided to come up with our own packer evaluation. We put together a set of indicators for abnormal binary structures, which are frequently seen in relation with runtime packers. Please note, the indicators are just that – indicators. The evaluation algorithm leaves some space for discrepancies. In practice though, manual analysis of randomly picked samples has proven the evaluation to be reasonably precise.

We gathered the following PE attributes:

  • Section count smaller than 3
  • Count of TLS sections bigger than 0
  • No imphash value present, thus import section empty or not parseable
  • Entropy value of code section smaller than 6.0 or bigger than 6.7
  • Entry point located in section which is not named ‘.text’, ‘.itext’ or ‘.CODE’
  • Ratio of Windows API calls to file size smaller than 0.1

Of course, no single one of the gathered attributes is a surefire indicator in a packer evaluation process. In reality, some of the mentioned presumed anomalies are frequently seen within unpacked binaries, and depend, for example, on the executable’s compiler. Nevertheless, any of the mentioned features is reason enough to grow suspicion, and as mentioned before, the evaluation works rather reliably on our chosen dataset.

According to our algorithm the values weigh into an evaluation score, based on the final score an analyst can then draw his conclusion. It is worth noting at this point, that the chosen thresholds are quite sensitive, and one would expect to rather detect too many “potentially-packed” samples, instead of too few.

Further details about our packer evaluation method can be found within the published code.

The results can be found in the following charts, showing the evaluation values in relation with sample counts. The maximum score a sample can reach on our scale is 220, meaning that all eval attributes exceed the chosen threshold. The following graphics show the evaluation performed on a benign sample set, on a targeted malware sample set and on a random malware sample set. Attention should be paid to the sample frequency count on the y-pane of the graph.

The benign set
The targeted malware set
The random malware set

The graphs show very well, how roughly a third of the benign samples show low rated indicators; while for the random malware sample set, it is less than a third of the overall set showing no indicators, while more than a third of the set show remarkably high rated indicators. It shall be mentioned, that a look into benign samples rated at 40-50 resulted in the finding, that most of them were packed with UPX, a binary packer used mainly for binary compression.

The remarkable bit at this point is that the set of targeted malware binaries has the overall lowest count of packer indicators. This leaves us with two possible conclusions. Following our hypothesis that targeted malware is significantly less protected by crypters than random malware, we take this result as a proof.

On the other hand, what surely biases the result, is that the chosen attributes are potentially influenced by compilers used to compile the given binaries. This means though, as the results for the targeted set are notably homogenous, that the larger part of targeted malware within our dataset has probably not experienced exotic compilers either. As a research idea for future analysis I’d like to put up the somewhat far-fetched hypothesi, that most targeted malware is written in C/C++ and compiled using Visual Studio compiler. Curious, anyone?

Commodity RATs

Taking the question of malware sophistication further, in the past the analysis community was frequently astonished in the light of yet another incredibly complex targeted malware campaign. While it is true that certain targets require a high level of attack sophistication, most campaigns do not require components for proprietary systems or extremely stealthy methods. An interesting case of high profile attacks being carried out with commodity malware was uncovered last year, when CitizenLab published their report about Packrat. Packrat is a South American threat actor, who has been active for around seven years, performing targeted espionage and disinformation campaigns in various South American countries. The actor is most known for targeting, among others, Alberto Nisman, the late Argentinean prosecutor, raising a case against the Argentinean government in 2015.

Whoever is responsible for said campaigns did have a clear image of whom to target. The actor certainly possessed sufficient personal and financial resources, yet made a conscious decision to not invest in a high-end toolchain. Looking into the malware set used by Packrat, one finds a list of so called commodity RATs, off-the-shelf malware used for remote access and espionage. The list of RATs includes CyberGate, XTremeRAT and AlienSpy; those tools themselves are well-known malware families associated with espionage.

Again, using repackaged commodity RATs is notably cheaper than writing custom malware. The key to to remaining stealthy in this case is the usage of crypters and packers. In the end though, by not burning resources on a custom toolchain, the attacker can apply his resources otherwise – potentially even on increasing his target count.

Hunting down a RAT pack

Looking at the above facts, one question emerges: how prevalent are pre-built RATs within the analysis set at all? To establish a count for commodity RATs count we again relied on detection names of Microsoft Defender. The anti-virus solution from Microsoft has shown in the past to be rather slow in picking up new detections, while providing quite accurate results once detections are deployed to the engines. Accuracy at this point includes certain reliability when it comes to naming malware.

For evaluation, we chose to search for the following list of commodity malware:

  • DarkComet (Fynloski)
  • BlackShades (njRat, Bladabindi)
  • Adwind
  • PlugX
  • PoisonIvy (Poison)
  • XTremeRAT (Xtrat)
  • Handpicked RAT binaries

The selected set is what we noted seeing when going through the malware corpus manually, please note though, the list of existing commodity RATs is by far longer.

The so to say “lazy king of APTing” is PlugX. The commodity RAT popped up in all together 15 different events listed in the MISP database.


The winner in sample numbers was Adwind, a Java based RAT, dedicated to infect different platforms. Adwind itself is malware, that has been sold under different names, including Frutas RAT and Unrecom RAT. Security firm Fidelis published nice insights on Adwind, under the much appreciated title “RAT in a jar”.

The following graphic shows the total number of RATs and related events, found within our data set of 326 events containing 8.927 malware binaries.


In total, we counted that almost a quarter of inspected events made use of one or another RAT. Looking at the sample set, 1/9th of the total set is composed of pre-built RATs. These numbers are rather high, considering that, at least in the heads of analysts, targeted malware is complex and sophisticated.

Still, though, why do we even bother with high numbers of commodity malware? For one, as mentioned before, they help driving down attack cost. Furthermore, they provide functionality that is quite advanced from the beginning, equipping even unskilled attackers with a Swiss Army Knife of malware they couldn’t implement themselves, even if they tried really hard. These do-it-yourself APT tools enable wannabe spies with little budgets, growing the number of potential offenders.

Furthermore, off-the-shelf RATs have been seen in use by certain advanced attackers. They could lead to confusion about the actual offender, as they do not allow for attribution on the base of the binaries at all. In other words, one would not know whether he is being targeted by a script kiddie or a nation state actor. Currently, it remains unclear whether commodity RATs have been used in an attribution concealment approach, but the assumption does lie close.

The Kings In Your Castle Part 3 -Ssdeep being fuzzy while exploits are being scarce

Welcome back, still on it? This is part 3 of our blog series, if you’re curious about part 1 and 2, please check them out here and here. This time I’m happy to introduce a set of borderline funny findings and tackle one of the hypotheses we put together for Raphael Vinot’s and my talk “The Kings In Your Castle”, presented at this year’s Troopers Conference in Heidelberg. I will discuss our findings regarding exploits present in known targeted attacks, the obstacles we faced during analysis and how we worked our way around. So, sit back, relax, here we go.


As you might be aware of, most data sets come with information, as well as most show one or another curiosity. Finding curiosities means learning literally unexpected things, which is why researchers jump at those with the passion of a hungry wolf. Thus, let me start the list of our findings with a curiosity.

While performing clustering on ssdeep hashes we found something we dubbed sddeep collisions, due to lack of better naming. Ssdeep is a program for computing context triggered piecewise hashes. These so called fuzzy hashes, as opposed to cryptographic hashes, do not uniquely identify a given data blob. They are calculated piecewise and are able to identify shared byte sequences among two targets. They are frequently used to ‘describe’ malicious binaries, in order to be able to match with similar binaries and eventually find groups of malware samples or even identify malware families.

The nature of piecewise hashes though implies, that hashes of two binaries cannot be identical, if the binaries show differences. Hence, it is a curious finding, that a number of unique samples within our set showed identical ssdeep hashes. Without spending too much time picking at the implementation of the fuzzy hashing algorithm itself, we assume that ssdeep does not consider the entire binary for hashing. We found a case, where 5 MB of empty padding were no reason for ssdeep to show any difference in the resulting fuzzy hash.


More than padding, ssdeep on some occasions indeed missed significant differences in the data sections of compared binaries. Given that analysts and corporations use ssdeep in work benches and production systems we found it worth a mention, that identical fuzzy hashes do by no means proof, that the compared binaries are identical.


We learned another curiosity when randomly staring at the gathered data. It is fascinating how the human eye manages to find patterns, and indeed very instructive before starting to implement queries or planning for application of machine learning. This way we saw, that for example compilation timestamps of binaries usually follow lose patterns within an event. A number of events though show outliers when it comes to timestamps; such as a binary compiled in 1996 while others are compiled post-2007, or a binary with a stamp from 2036. Of course, such outliers can have multiple explanations. Ones that come to mind the fastest are either the timestamps are falsified, the attackers forgot to falsify all timestamps, the campaign made use of readily compiled (old) tools, or maybe a runtime packer was used which falsifies the timestamps without explicit intention of the attackers.


One conclusion though lies at hand. To freely quote a splendid APT researcher, attackers learn just like we do and improve over time, which implies that they might have made more mistakes in the past. Thus, by analyzing historical data about a campaign or group one might be able to learn unexpected tidbits. Moreover, by looking at things learned by the attacker as in changes in malware and intrusion techniques, one might gather insights about obstacles the attackers faced in the past. Adoption of a runtime packer or adding of a stealth module to a given RAT might expose, that the attacker’s tools at some point were detected by security software.



OMFG!! They used e.x.p.l.o.i.t.s.!!1!

Like in real life, humans tend to conclude that a digital attack, which caused big damage, naturally came with big power. In the world of cyber though, this equation has a lot of unknowns. In fact, the success of an attack and the damage it can cause are influenced by many factors, that are independent of an attacker’s capabilities and wealth. While not true in all cases, mostly, the possession of one or more 0-days involves some level of resources or at least explicit know-how combined with engineering time.

This leads to a natural assumption: Folks who do APTs involving 0-days must be either rich, or very very dedicated. Or both. Or do they? When Stuxnet finally happened, the general public seemed to believe that APT goes hand in hand with 0-day. A considerable time span passed by, understanding started to sink in, that targeted attacks can have all sorts of faces, and barely any post-Stuxnet attack looked anything like what we now call “the first cyber weapon”.

Until today, analysts seem to have settled for the consciousness that Word Macros are just as dangerous to organizations as the latest Flash exploit. There always is someone to open the damn document.

Finally, what this leaves us with is a set of uncertainties. How important are exploits in the APT world? How frequently are they used, how common is the use of precious 0-days? This is the fog we meant to shed some light on.


Exploit prevalence at a glance

In the mass malware scenery, the number of malware strains and infection campaigns that make active use of exploits is rather low, it feels, and seemingly even declined; at least since attackers found out that Macro downloaders do the job just as well. It won’t fail the attentive reader; Word Macros are a lot easier to write and cheaper to get hold of. And back here we are, reducing the cost of an attack allows to maximize the number of potential targets. It’s all about resources.

But let us get to the numbers. In total, we analyzed 326 events within our final analysis set, of which 54 were labeled to involve one or more exploits. Such labels are usually tags of CVE numbers that are added by the initial reporter of an event. About these tags we do know, that a present tag is a strong indicator for an actual exploit being involved (given analysts didn’t make things up); the lack of any tag does not proof at all though that no exploits were used. As a counter metric, we utilized detection names of Microsoft Defender, filtering for names containing one or another CVE number. This way we detected a total of 68 events involving exploits.

Juggling numbers, with considerations of potentially missed detections in mind, roughly a fifth of the analyzed events involved the usage of exploits. With all potential data incorrectness in mind, we are very confident to state that is it not a majority of targeted attacks that involves exploits.


The APT exploit landscape

Relying on tags that are present in the MISP data set, we went on to evaluate the exploits we indeed did see. The graphic below shows a chart of CVE numbers, sorted first by tag counts, secondly by year. The numbers refer to the number of events that make use of the listed CVEs.

It is worth mentioning, that human analysts as well as security software tend to be more reliable in labelling of well-known exploits, than fresh ones or even unknown ones. This chart cannot be used to determine which attacks involved 0-day exploits; in fact, none of the data we got at hand can.


What it does show though is how the curve from frequently to non-frequently seen CVEs holds remarkably old vulnerabilities in the top spots. Absolutely killing it is CVE-2012-0158, a vulnerability within an MS Office OCX library. It can be triggered through a malicious Word document. The vulnerability has long been fixed by Microsoft, but, perhaps, not all Office installations are updated all that frequently? Who knows.

Furthermore, we can see that only a minority of 7 CVE numbers can be called more or less up to date. Given that our data collection ended January 2016, vulnerabilities from 2015 are considered fresh (enough). A total of 12 events involved exploits for non-cyber-stoneage vulnerabilities.

Exceptionally interesting is place number three on the list, CVE-2015-5119, sported by a total of five events. This vulnerability has a history, indeed.


HackingTeam exploit gone wild

CVE-2015-5119 is a vulnerability in Adobe Flash, which got leaked with the tremendous breach of the Italian offensive security company HackingTeam last year. The vulnerability was reported and fixed soon after the breach, but nevertheless made it into at least one exploit pack and the toolsets of four quite well known APT groups. According to our data, Group Wekby and a not closer identified spearphishing campaign targeting US government institutions adopted respective exploits right after the breach went public, in July 2015.

The most recent spotting of CVE-2015-5119 within our data happened beginning of 2016 when it was seen in context with BlackEnergy, the notorious malware that took the Ukrainian power grid offline end of 2015.



Discussing the results

The numbers in the dark, or, everything we do not know, is a significant blind spot to our dataset. There are two considerable unknowns. For one, we do not know whether the data is complete. Two, we do not know whether the labels we retrieved are correct.

Concerning problem number 1, intentionally incomplete reports aside, it is very well possible that an attack is detected and remediated, but the actual infection vector never recovered. In the case of an exploit being used, given for example that it is not contained in an e-mail or a file, the forensic reconstruction of the entire intrusion path can be challenging. A good example of such a case and also a very instructive read on the topic poses Phineas Fisher’s write up of the HackingTeam breach.

Problem number 2, incorrect labeling, stems from the fact that determining a CVE number for an actual exploit involves careful analysis work of a specialist. In practice, deriving CVE numbers from AV detection names is “cheap” and works reasonably well, but relies on the respective analysts doing a scrupulous job when looking into the sample. Nevertheless, mistakes are actually quite possible.

As in all given cases, I am happy to receive suggestions on how to improve on both shortcomings. Meanwhile, we present the discussed numbers with a pinch of salt on the side.


The Kings In Your Castle Part #2 – Dataset and feature extraction

Welcome back to my series of write-ups for “The Kings In Your Castle – All the lame threats that own you, but will never make you famous”. This series covers a project I presented together with Raphael Vinot from CIRCL Luxembourg at Troopers conference in March. If you missed the start, you can find it here.

O’right lets go 😀

TTPAs – Tools, Techniques, Procedures and Actors

The primary aim in the toolification process was to extract accurate binary features, that would help us describe the binaries in relation to the Event data stored in MISP. Therefor we took the feature extraction a step further than usual IOC creation would (Indicators of Compromise).

IOCs are indicators, which describe malware used in an attack or attributes of an attack. They are easy and comparably quick to create, and then distributed and leveraged to scan computers or networks for compromises. They defer from classical, heuristical malware detection, as indicators are not limited to a per-file basis but can also describe domain names of remote servers, names of files created by malware or IP addresses.

Despite their many advantages though, IOCs trade rather shallow features. Domain names, file names or strings within binaries can be easily changed in the course of an operation and at will of the actor. A goal of our research was to extract more granular file features from different domains than the usual IOCs cover, in a sense, more “expensive” features, that we considered less volatile than domain names. This way we expected to be able to find links among different events contained in MISP, that the usual indicators miss. In a targeted operation, it is considered expensive to change a toolset, rewrite malware or switch infection vectors like for example the purchase of a new exploit. Currently used indicators lack capabilities to describe “expensive metrics”, hence the idea to widen the feature space.

However, extraction of binary features is not at all a trivial task. The technical implementation of feature extraction aside, it lies within the nature of malicious binaries to hide their features as thorough as possible; any features, that is. Runtime packing, obfuscation and sandbox evasion are just a few of many techniques that malware authors use to hinder analysis, which in general difficults feature extraction.

The following lists show attributes that were gathered from MISP, as well as those we extracted from the malicious binaries. The attributes are all gained in a static manner, meaning without the execution of binaries. Sole static analysis is frequently faster than dynamic analysis, the tools more portable and large scale analysis more scalable. Next to that we worked with the hypothesis, that targeted malware frequently comes unpacked and without the use of obfuscation. On the other hand, if an actor decides to rely on runtime packing, it should be an interesting question, whether he decides whether to use a custom packer or a commercial one, and whether samples were individually packed, with a dedicated packer. One would think ‘no’, right?

I will go into more details on the packer-or-no-packer questions in a follow up blogpost. For the time being, I’ll ask you for the benefit of doubt that our test set of binaries supplied us with considerably useful data. Here goes our feature list, fitted with short explanations of the underlying trail of thought.

MISP data

  • Event ID
  • Event description
  • Event Submission date
  • CVE numbers associated with malware of given event
  • Domains associated with malware of given event

The attributes we pulled out of MISP mainly describe the respective events, which the binary hashes of our set are linked to. This information is highly valuable, as it puts the malware into context. Frequently events in MISP are linked to vendor reports, which provide a plentitude of context for an operation. Furthermore, the event submission date roughly indicates the time when the operation was reported. CVE numbers are considered an indicator, whether the operation involved exploits. This is a rather soft metric, the lack of an entry for a CVE number does not at all proof that no exploits were being used by a given malicious actor. Nonetheless, listed CVE numbers are valuable information.

Sample data

  • MD5
  • SHA1
  • Associated Event ID
  • Filetype
  • Filesize
  • Ssdeep hash
  • Detectionname of Microsoft Defender

Sample data is a set of descriptive features, associated with any filetype. In the course of our research we put our main focus on Windows executable files, as these pose the biggest group among the analyzed sample set. Our decision to add detection names from the Microsoft Defender anti-virus solution bases on Microsofts accuracy in assigning names to malware. This knowledge we draw from sole experience, although empirical tests have shown excellent results in the past.

An interesting attribute among this set is the ssdeep hash of a file. Ssdeep is an algorithm, which allows to calculate a fuzzy hash of a given data blob. Fuzzy hashes do not uniquely identify the base data, but are calculated piece-wise. This way ssdeep makes it possible to determine similarities among files, and even draw conclusions about the degree of difference between two given files. For more information about ssdeep and fuzzy hashes please visit the sourceforge website. A drawback of fuzzy hashing is, that the required computing load for comparing similarities among binaries increases considerably with the number of binaries.

Executable data

  • Compilation time stamp
  • Imphash value
  • Entry point address
  • Section count
  • Original filename
  • Section names for sections 1-6
  • Section sizes for sections 1-6
  • Section entropies for sections 1-6
  • Number of TLS sections
  • Entry point section
  • Count of calls to Windows APIs
  • Ratio of API call count to filesize

Finally, for the subset of Windows executable files we collected a wealth of descriptive indicators, which apply to meta-information and the internal structure of compiled binaries. Compilation time stamps of binaries can be easily changed, that is true, therefor they have to be taken with a pinch of salt. Singular cases though have shown, that looking at a campaign over a period of time, following related events on MISP that is, sometimes yields unexpected information “leaks”. This means, actors might follow the habit to falsify timestamps, at the same time though erring is human, and sometimes we encounter binaries with seemingly real time stamps. That said, obviously it is of interest to find attacks related to one specific incident, as historical data can reveal unknown traits of a specific actor.

A number of PE attributes serves the detection of packed binaries. The count of PE sections, section names, sizes and entropy, the count of TLS sections (Thread Local Storage) and the section of entry point for example are considered indicators, whether a runtime packer protects the executable. This is interesting information by itself, as it can be concluded which actors use packed malware, which don’t, and which packing habits the different actors have.

Next to that, the attributes also serve to determine the similarity among binaries. While on unpacked binaries, the attributes are highly dependent on the compiler used to compile the binary, on packed executables the same data shows similarities of the various packers.

Two rather uncommon binary metrics we came up with is the total count of calls to Windows APIs within the code and the API call count to file size ratio. The primary consideration hereby is, that packed or protected executables show little interaction with the Win32 API. Another interest in these metrics though is, that the API calls within a binary relate to the actual binary structure. While code changes or additions within pieces of malware very likely change fuzzy hashes, the imphash, the filesize and section attributes, the changes of the API call scheme should remain considerably small.

Data about the data

The beauty of the data collection process, is that it left us with a set of malicious binaries, that are somewhat guaranteed to have been part of a targeted attack approach at some point in the timeline of their use. Furthermore, with the help of MISP we are able to put the binaries into context, as we know in many cases which campaign they were involved with.

For picking events from MISP we applied a lose set of criteria. MISP’s events are pre-classified by the submitter, where a threat level of medium or high indicates, that this event is likely of a targeted nature. From this group we handpicked events, where the description suggested it deals with a known or at least an identified threat actor, or where the nature of the malware clearly suggested a targeted background; like e.g. themed spear phishing would.

The initial data collection started November 2016, so the total set of events only includes cases submitted to MISP until middle of December 2016. However, in follow-up work some of the feature correlation procedures have been adopted by MISP itself. For more details please refer to the website.

Please note, this procedure leaves quite some room for incorrectness. We assume by default, that indicators reported by vendors and their context are correctly associated. This is not always the case, as we found out while performing tests; in some rare occasions data in vendor reports has turned out to be incorrect. As of now we do not have insight which percentage of reports shows errors. Furthermore, the events contained in MISP only show information that actually is reported, meaning that attacks which by the time of analysis yet have to be discovered as well as attributes which are potentially missing from reports pose a blind spot to our research.

Finally, we started off with a set of 501 events, which we assumed to contain information about targeted attacks, containing a total of 15.347 malware samples. From this set we removed another subset of events, which we determined to be overlapping with another event of the same attacker or incident. Duplicate entries happen, as MISP is not striving for accuracy but for completeness. The cleanup left us with a set of 326 events and 8.927 samples.


The graphic shows the file types of the entire sample set. It can be seen, that Win32 PE executables are rather dominant. This is explained by the heavy use of repackaged commodity malware by some actors, but does not represent the general distribution of file types per event. Nevertheless, PE32 is the most important file type within the analyzed sample set, counting more than 11.000 out of the total corpus of 15.347 samples.

In the next blogpost I’ll be introducing our results of an exploit-per-APT analysis, and write about one or another curiosity we found within our final data set.


Micro architecture attacks on KASLR



Recently a number of different micro architecture attacks on Kernel Address Space Layout Randomization(KASLR) have been published. The idea behind KASLRis that code reuse attacks and read/write primitives are useless if the attacker is unable to tell where what code/data is. Consequently modern operating system randomizes the virtual addresses where code/data is stored. However it turns out the CPU leaks information about the address space layout causing failure in the defense. This blog post shortly summarizes the 4 known micro architecture attacks on KASRL and discusses potential mitigations. Part of the mitigation portion will be speculative. Bias up front: I’m a coauthor of the prefetch attack and quite like it.

What Do I mean by micro architecture KASLR break

When I wrote my first blog post [8] on breaking KASLR with micro architecture, I got criticized for using the term “micro architecture”. What I mean to convey when I use this term is that there is nothing in the Instruction Set Architecture (ISA) that specifies that the x86 has to work in ways that allows us to attack KASLR. Rather the features that are being used, are part of the implementation of the ISA. This usage of the term seems in line with common usage in the academic literature. There have been attacks on KASLR using page deduplication e.g. Barresi et al. [1] which some consider a microarchitecture attack. I however, do not and will thus not include these attacks in this discussion.

History of micro architecture KASLR breaks


Using page faults

Hund, Willems & Holz [9] was the first to publish an attack on KASLR. They use a combination of a classical prime+probe cache attack and a novel attack on the paging caches the time it takes for a fault to be thrown on unprivileged access to a page on x86-64 differs significantly depending on weather the translation is cached in the paging caches. An attacker can measure this difference. Further even an unprivileged access to a page causes the translation caches to be loaded and consequently you can use a timing difference attack to tell if memory is mapped on a certain location. They called this the “double page fault” attack.

Using Prefetch

I wrote a blog post in January [8] that become my starting for the paper by Gruss et al [6]. The Hund, Willems & Holz [9] attack suffers from the drawback that the timing is measure across the page fault handler which adds significant noise. This makes the attack itself slow and due to noise the attack needs to be repeated a significant amount of times to become reliable. What I and my coauthors noted was that the prefetch instructions do not cause page faults – they terminate silently. They do however have the same timing properties, as they do the same look ups. Consequently replacing page faults with the prefetch instruction provides a faster and more reliable KASLRbreak.


Using TSX

Wojtczuk wrote in a blog post [3] that intel TSX technology might make Hund, Willems & Holz’s [9] attack significantly more practical.  Jang et al.[ 2] took this up and showed Wojtczuk [3] was right. TSX works by being able to roll back a set of memory accesses, if they cannot be completed immediately due to conflicting access in the relevant portions of the memory sub system – much like transactional access to databases. However, access to a page that would normally trigger a page fault inside a transaction now just causes a rollback. Consequently, a significant speed up and reliability gain can be had by wrapping your “probe” in a TSX section.


Using Branch Target Buffer

The fourth and fairly different attack came using the Branch Target Buffer, BTB.  Evtyushkin, Ponomarev & Abu-Ghazaleh [5] found out that intel caches branch targets in order to better branch prediction in BTB. Correctly predicting branches is important to performance because a failure to predict correctly will under utilize the entire instruction pipeline. Intel however is cheap with the bits and uses only the lower 31 bits to store a branch target in the cache. Since KASLRis typically build on entropy in the lower 31 bits and these bits are shared between kernel and user mode, an attacker can cause conflicts. These conflicts will cause measurable timing differences.



Despite the first three methods evolving around the same theme they do have significantly different advantages and caveats.  The unique BTB method obviously also have diverging properties. I shall stress the advantages and problems of each vector using Hund, Willems & Holz paper [9] as a baseline. This classical method now appears mostly obsolete. That said I consider the paper a milestone for many reasons and it’s well worth a read.

The Advantages of the prefetch vector

Using the prefetch instruction is, as noted above, faster and more reliable. Not only does it make it more reliable, but also more accurate. Without the noise one can spot a timing difference depending on the level in the page table hierarchy an abort was caused. This allows an attacker to efficiently map out the entire address space top down starting out with 1 gigabyte regions. Further the prefetch instruction actually ignores the page table security settings and prefetches an address into a cache. This allows an attacker to recover physical addresses and by pass SMEP & SMAP when the operating system is using a direct mapping.  The prefetch instructions are available on all current CPU’s, as they were introduced back in 1998.

The TSX vectors’s advantages

Like the prefetch attack, no fault handler is called and consequently the attack has significantly reduced the noise. The TSX attack, unlike the prefetch, can trigger code access through execution using a jump or call instruction. This has significantly different timing properties and consequently allows the attacker to distinguish between code and data. As a downside it cannot use the cache loading properties of the prefetch instruction. A major drawback of the TSX vector is the installed base of TSX which is somewhat limited. It was officially introduced with Haswell micro architecture. To this day TSX is only available in high end computers: some I5, I7 and xeon’s. This does not yet include the U series which is the most popular processor for notebooks.


The BTB attack is probably the most limited attack, as it is only able to locate code and data where branches give away the address. For all its limits in this regard it is probably the fastest of the attacks. Further Felix Wilhelm [11] showed that since the attack does not rely on page tables, it translates to host-KASLR breaks too.

Software Mitigation

Intel has in the past not been prone to do anything about micro architectural attacks and consequently I have significant doubt that they will with KASLR attacks. In fact the segment limits that were part of the x86-32 architecture would be a shop stopper for the first three attack vectors, but was removed for performance with x86-64.

It is often suggested to make the timer (RDTSC(p) ) less accurate will mitigate micro architectural attacks which covers all 4 attacks vectors here. The cost is obviously some implementation overhead and a less accurate timer for benign purposes. It could be implemented through the CR4.TCD bit for bare metal operating systems or setting up the MSR’s to get a VMEXIT in hypervisor environments.  These cost seems reasonable to me – I’ve yet to hear a convincing argument why user mode applications in general, need such timing accuracy and I never heard a convincing argument that any application would be significantly slowed down by the overhead required to get less accuracy on RDTSC(p) instruction. That said lower timing accuracy is not a good mitigation in cases where an attack can be repeated. The reason is that once an attacker can repeat a timing attack, he can use the law of large numbers to cancel out any noise or bias added to the timer. For KASLR attacks, there is little reason why an attacker cannot repeat the attack and in fact the Hund, Willems & Holz [9] attack does this to great effect, not because of a noisy timer, but because a the noise added by the page fault handler.  I thus think it’s safe to conclude that a noisy timer cannot be a solution to KASLR micro architecture attacks.

The Hund, Willems & Holz [9] attack is fairly easy to mitigate. Since they actually trigger a page fault, one has a nice call back in the page fault handler. A few lines of code can use client CS and error code on the stack and the linear address in CR2 to check if a user mode program tried to access kernel memory and respond accordingly. As this event should be really rare, the response can be anything from causing a large delay making the attack impractical or shutting down the application.


The in Jang et al. [2] version of the TSX attack ,DrK, use page mapping (number of continuously mapped pages) and page attributes to identify modules. This can easily be thwarted by mapping unused pages in the driver region to a single PFN with only null data. The cost shouldn’t be too bad as only one PFN needs to be referenced mapped from a number of PTE which to some extend can use existing PDE’s. Exactly how many additional PDE’s would be needed to thwart the DrK would be a tuning parameter. This method would also slow down the prefetch attack, however currently it uses a function call to load the caches and thus cannot be completely thwarted in this fashion. It is likely that the TSX vector could be made to work in a similar fashion.

Both the TSX vector and the BTB vector is vulnerable to performance counter detection as there are currently performance counters that maps to micro events caused by the attack. Say for instance the detection I suggested for the Flush+Flush [7] attack could easily be modified to detect both these attacks. The prefetch attack however, does not seem to have a performance counter event that correlate sufficiently with the attack. The performance counters on software prefetch’s was discontinued and remapped with the by now very old Silvermont generation of processors. These kinds of detections are unfortunately either high-overhead or false positive prone.  Consequently they don’t off themselves as general solutions, but rather of a close gap nature.  Without a good performance counter mapping to the prefetch instruction, reliable detection would likely be prohibitively expensive performance wise using these methods.

The BTB and TSX vectors offer other ways of detection.  In particular it seems likely that both will generate activity in the Last Branch Record (LBR) stack. If one were to instrument the RDTSC(p) instruction, as suggested above, the distance between the last RDTSC(p) instruction and the current combined with the LBR would provide clues to nefarious activity quite possibly at a fairly low cost.

More promising is a BTB only trick that relies on the very thing that makes the BTB attack work in the first place. The BTB vector works because only the lower bits are used to store the branch target and thus you get conflicts between kernel and user mode branches. Interestingly this conflict consequently only gives you information on the lower 31 bits and Evtyushkin and Ponomarev [5] notes that using the upper bits for KASLR thus remains a solution. Using all the remaining addressing bits f or KASLR seems a good idea, however all modern operating system sorts different types of code and data into blocks and partically uses these blocks to identify maglinant behavior. For instance drivers in Windows are loaded between 0xFFFFF800’000000000 and 0xFFFFF8FF’FFFFFFFF. Thus to minimize changes to this general structure using only the upper 9 bits (the PML4 bits) would probably be the easiest solution. This would give you 256 different locations which the BTB method could not tell apart (the top most bit is used to distinguish between kernel mode and user mode). Not much, but an attacker is unlikely to get away with causing a machine a 100 times to get it right be random. In fact Windows 10 Anniversary Edition does this for PTE’s, but not for code. See Economou [10] for more info.

The three vectors based on actually accessing kernel memory can be mitigated through address space separation. By treating kernel mode as a separate “process” and swapping the CR3 register immediately after/before a ring change, one can effectively block these attacks. Depending on the amount of ring switching going on, an operating system are looking at about 5% over head as switching the CR3 registers are expensive in and of itself and because it invalidates the paging caches. Both Yang et al [2] and Gruss et al. [6] mentions this method.

A mitigation that I considered for thwarting the BTB attack was flushing the BTB itself, by systematically doing a number of jmp instructions on switching back into user mode. This method is likely to have too much performance overhead to be practical. The reason for this method is the size of the BTB, which Godbolt [4] estimates the BTB in his Haswell to be 2048 entries large – thus requiring 2048 dummy branch instructions on every transition into user mode.


The natural question, which of these methods is the best, is a misunderstood question. While the first method appears obsolete by now, the other 3 methods have different advantages and disadvantages. They base themselves on at least 2, but probably at least 3 different side channels in the CPU and thwarting all four attacks seems like a difficult undertaking. Personally I find it likely that there are more leaking in the processor to be found – maybe I’ll write about that another time. The natural conclusion is that KASLRfor bare metal operating systems seems to be broken for defending against privilege escalation mounted from native code. The caveat here is the “from native code part”. Classical data based attacks, such as a font attacks of the past cannot be doctored to break KASLRusing these micro architectural attacks. Before I cry any crocodile tears ,I should note that KASLR was always a stop gap solution and consequently mitigations become mitigations for a mitigation. The real problem being bugs in kernel mode – though very difficult to fix all kernel mode bugs, minimizing code in the kernel and improving kernel mode code quality must remain the focus of making the kernel secure.



[1]A Barresi , K Razavi, M Payer, T Gross: “CAIN: Silently Breaking ASLR in the Cloud“ –

[2] Y Jang, S Lee, T Kim , “Breaking Kernel Address Space Layout Randomization with Intel TSX”

– Proceedings of the 23rd ACM Conference on  Computer and Communications Security

[3] R Wojtczuk.” TSX Improves Timing Attacks Against KASLR.”

[4] M Godbolt. “The BTB in contemporary Intel chips.”

[5] Evtyushkin D, Ponomarev D, Abu-Ghazaleh N. Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR. InProceedings of 49th International Symposium on Microarchitecture (MICRO) 2016.

[6] D Gruss, C Maurice, A Fogh, M Lipp, S Mangard . “Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR.” Proceedings of the 23rd ACM Conference on Computer and Communications security 2016

[7] A Fogh. “Detecting stealth mode cache attacks: Flush+Flush.

[8] A Fogh: “Breaking KASRL with micro architecture Part 1”,

[9] R Hund, C Willems, T Holz. „Practical timing side channel attacks against kernel space ASLR” Security and Privacy (SP), 2013 IEEE Symposium on, 191-205

[10] N Economou.  “Getting Physical: Extreme abuse of Intel based Paging Systems” -

[11]  F Wilhelm “mario_baslr”

The Kings In Your Castle Part #1

All the lame threats that own you, but will never make you famous.

In March 2016 I presented together with Raphael Vinot at this year’s Troopers conference in Heidelberg. The talk treated research of targeted malware, the how’s and if’s of malicious binaries, the involvement of sophistication and exploits, the presence or absence of patters within advanced persistent threats (APTs). The write-up of all happenings has taken its time, and as expected kept growing longer and longer and.. finally we figured splitting the outcome into multiple posts would make all our lifes significantly easier.

Five compact blogposts are the result, easy to digest and understand, while covering the following points:

  1. Introduction, hypotheses and analysis process with respective toolset
  2. Description of the data set and the feature extraction process
  3. The findings’ curiosities and results of exploit-per-APT analysis
  4. The use of packers, crypters and commodity RATs within the sample set
  5. Actor correlations and APT naming wars, future research

Here we go, please enjoy part 1 about the kings, your castle, and cyber. At this point I would like to thank Raphael Vinot for joining in on this data shoveling excurse, the team of CIRCL being busy feeding and maintaining their data dragon, and Morgan Marquis-Boire as well as the many other wizards from the research community who kept sending us malicious software.

Part 1: The kings, your castle, and cyber

It is the same question being directed to audiences around the security conference scene: How many people in the room can tell their machine or network is currently not compromised? No hand has been seen to rise in answer.  APT has been fashion five years ago and still rocks the most-feared charts on every cyber threat survey. While tabloid press is generally after the latest most-sophisticated-threat, the analyst community has long resorted to talk about threats that are advanced and persistent.. enough. In terms of sophistication targeted attacks show all shades of grey, on average though tend to be rather shallow. On the other hand, security products all have a single weak spot in common that they will always rely on patterns; whether patterns that are there, like signatures, or patterns that are not there, like anomalies. This enables attackers to evade detections with shallow, but unknown tools which manage to fly under the radar.

In research we conducted in cooperation with CIRCL, the Computer Incident Response Center Luxemboug, we took on the APT myths by formulating hypotheses based on a set of APTs documented in the MISP platform. MISP stands for Malware Information Sharing Platform and is used by hundreds of organizations to share data on incidents, among them a large number of targeted attacks. It is possible to split the content of the information shared between reports of vendors and events seen by the users of the platform. MISP is maintained and developed by the fine people at CIRCL, data is constantly added by CIRCL members, security companies or independent contributors.

Having this information in one single place allows to correlate -supposedly- new threats reported by vendors with existing events seen in the wild now or in the past. At the time of conducting the research, MISP held information about more than 2.000 events.

The data contained helps understand the overall nature of the threats, the tools of trade, the preferred approaches of the attackers, and their evolution. It potentially even allows for actor tracking as the correlation of attributes reveals hidden treasures.

The gathered events from MISP are pre-classified by threat level. We concentrated on targeted threats and conducted a survey on the nature of malware and infrastructure used therein. How much of the analyzed malware is custom made, how much off-the-shelf or simply installs known RATs? How much of it is packed or crypted? Does the fact that malware is not crypted allow conclusions on whether it is used for targeted attacks? How often are exploits used in attack? Does the use of exploits imply more sophisticated tools as the attacker is expected to dispose of higher resources?

On a more abstract level, we also wanted to know if we were able to uncover links among actors, their tools and infrastructure, solely based on OSINT data (open source intelligence).

The reason why this is possible lies in the nature of targeted attacks in general. A targeted attack in reality is not a piece of malware. It is a process, consisting of several phases, and many times even a repetitive circle if one finds himself targeted by a determined attacker.


Figure 1 – The APT process

Furthermore, the stages involving malicious software frequently imply the use of different pieces of malware, and also a certain infrastructure to operate attacks. This means, the attackers require servers, domains and maybe even e-mail addresses or fake social media accounts and fake web sites; to orchestrate the malware, store exfiltrated data and to drive hands-on attacks. All of these items come with a certain cost, in time and money, and believe it or not, most attackers have restrictions – in time, and money.

Limited resources call for repetition, or, in other words, attackers with a hint of economical thinking won’t reinvent the wheel anew every time a new target is chosen. Advanced attack suites, exploits, tailored malware, techniques and practices are assets that are costly to change.

This said, reinventing the wheel is possible, but not a very established practice. Besides requiring extensive resources, building and maintaining more than one attack platform and a large set of unique attack tools is prone to errors. After all, the folks building sophisticated malware are human software developers too. Getting back to the actual cost of an attack, being discovered is not only annoying for an attack group, it is also expensive and potentially dangerous. Exposure might get law enforcement on track or even inspire counter attacks.

Enough about the motivations though, APTs will keep being APTs; so let’s go on with exploring their malware and infrastructure.


MISP is a system that is usually fed with threat indicators, which are then shared and internally correlated with other indicators already present in the database. The primary purpose of MISP is of course the distribution of threat indicators, aided by additional information gained by indicator correlation. As by-product though, MISP provides a decent event documentation with a timeline.

That said, obviously we didn’t just walk in and query databases; the process of gathering data, cleaning up records, writing tools and forming theories that got again discarded was lengthy. In the end what is left is a neat sample set, a ton of data and a set of homegrown tools as well as extensions for already existing tools.

Please note though, this project is by no means an effort to perform large-scale clustering of malware with machine learning methods, nor does it involve any whatsoever sophisticated algorithms. The tools we designed are merely simple, the grouping of samples a mere sorting approach. Keep it simple, then scale it up, a data analysis wizard once told me.


Figure 2 – Tools and extensions created during data processing

MISP provides a web interface, which helps in manually investigating one or more events and look for correlations, but it does not serve an automation purpose. Therefor we established different ways to access and process the MISP dataset. We used a Python connector to shovel MISP data into Viper, a malware sample management framework. This way we were able to easily sort through events and selected the ones which, based on predefined criteria, were highly likely involved with targeted attacks. These events were the base from where we started processing. Selection criteria and musings about attack nature will be outlined in a follow up blogpost. To sketch the workflow, it went roughly the following way:

  1. Event data from Viper -> sample hashes -> sample processing / tools development with SQLite backend
  2. Event data from MISP -> pull to Redis backend -> event attribute processing
  3. Importing of sample processing data into Redis
  4. Data correlation and analysis with Redis

We faced a challenge when seeing to collect all the malware binaries, involved with events that were selected for further processing. MISP holds primarily threat indicators, including sample hashes, domains, e-mail addresses, file names and occasionally Yara signatures; rarely ever binaries. Also, malware, especially when involved in targeted attacks, is not always shared with the broader public. Eventually we managed to aggregate most of the fancied binaries through help of public and private repositories.

The use of two different backend solutions has historical reasons, mainly we started off to work independently with our preferred solution, and in the end found there is no reason to abolish either. Redis is a strong, scalable backend, suited for medium to large scale data analysis; SQLite is practical in portability, small, elegant and effortless to set up.

For feature extraction from binaries we used the Python library pefile and instrumented IDAPro with help of IDAPython. Furthermore, we made use of the ssdeep fuzzy hashing library, exiftool for detailed file type analysis and used RapidMiner community edition for visualization.

Our developments were published in the course of Troopers conference and are available on github at as part of the MISP repository.

The next blogpost of the Kings in your castle series will cover the nature of the analysis data set and include a discussion of the extracted feature set.