The Kings In Your Castle Part #2 – Dataset and feature extraction

Welcome back to my series of write-ups for “The Kings In Your Castle – All the lame threats that own you, but will never make you famous”. This series covers a project I presented together with Raphael Vinot from CIRCL Luxembourg at Troopers conference in March. If you missed the start, you can find it here.

O’right lets go 😀

TTPAs – Tools, Techniques, Procedures and Actors

The primary aim in the toolification process was to extract accurate binary features, that would help us describe the binaries in relation to the Event data stored in MISP. Therefor we took the feature extraction a step further than usual IOC creation would (Indicators of Compromise).

IOCs are indicators, which describe malware used in an attack or attributes of an attack. They are easy and comparably quick to create, and then distributed and leveraged to scan computers or networks for compromises. They defer from classical, heuristical malware detection, as indicators are not limited to a per-file basis but can also describe domain names of remote servers, names of files created by malware or IP addresses.

Despite their many advantages though, IOCs trade rather shallow features. Domain names, file names or strings within binaries can be easily changed in the course of an operation and at will of the actor. A goal of our research was to extract more granular file features from different domains than the usual IOCs cover, in a sense, more “expensive” features, that we considered less volatile than domain names. This way we expected to be able to find links among different events contained in MISP, that the usual indicators miss. In a targeted operation, it is considered expensive to change a toolset, rewrite malware or switch infection vectors like for example the purchase of a new exploit. Currently used indicators lack capabilities to describe “expensive metrics”, hence the idea to widen the feature space.

However, extraction of binary features is not at all a trivial task. The technical implementation of feature extraction aside, it lies within the nature of malicious binaries to hide their features as thorough as possible; any features, that is. Runtime packing, obfuscation and sandbox evasion are just a few of many techniques that malware authors use to hinder analysis, which in general difficults feature extraction.

The following lists show attributes that were gathered from MISP, as well as those we extracted from the malicious binaries. The attributes are all gained in a static manner, meaning without the execution of binaries. Sole static analysis is frequently faster than dynamic analysis, the tools more portable and large scale analysis more scalable. Next to that we worked with the hypothesis, that targeted malware frequently comes unpacked and without the use of obfuscation. On the other hand, if an actor decides to rely on runtime packing, it should be an interesting question, whether he decides whether to use a custom packer or a commercial one, and whether samples were individually packed, with a dedicated packer. One would think ‘no’, right?

I will go into more details on the packer-or-no-packer questions in a follow up blogpost. For the time being, I’ll ask you for the benefit of doubt that our test set of binaries supplied us with considerably useful data. Here goes our feature list, fitted with short explanations of the underlying trail of thought.

MISP data

  • Event ID
  • Event description
  • Event Submission date
  • CVE numbers associated with malware of given event
  • Domains associated with malware of given event

The attributes we pulled out of MISP mainly describe the respective events, which the binary hashes of our set are linked to. This information is highly valuable, as it puts the malware into context. Frequently events in MISP are linked to vendor reports, which provide a plentitude of context for an operation. Furthermore, the event submission date roughly indicates the time when the operation was reported. CVE numbers are considered an indicator, whether the operation involved exploits. This is a rather soft metric, the lack of an entry for a CVE number does not at all proof that no exploits were being used by a given malicious actor. Nonetheless, listed CVE numbers are valuable information.

Sample data

  • MD5
  • SHA1
  • Associated Event ID
  • Filetype
  • Filesize
  • Ssdeep hash
  • Detectionname of Microsoft Defender

Sample data is a set of descriptive features, associated with any filetype. In the course of our research we put our main focus on Windows executable files, as these pose the biggest group among the analyzed sample set. Our decision to add detection names from the Microsoft Defender anti-virus solution bases on Microsofts accuracy in assigning names to malware. This knowledge we draw from sole experience, although empirical tests have shown excellent results in the past.

An interesting attribute among this set is the ssdeep hash of a file. Ssdeep is an algorithm, which allows to calculate a fuzzy hash of a given data blob. Fuzzy hashes do not uniquely identify the base data, but are calculated piece-wise. This way ssdeep makes it possible to determine similarities among files, and even draw conclusions about the degree of difference between two given files. For more information about ssdeep and fuzzy hashes please visit the sourceforge website. A drawback of fuzzy hashing is, that the required computing load for comparing similarities among binaries increases considerably with the number of binaries.

Executable data

  • Compilation time stamp
  • Imphash value
  • Entry point address
  • Section count
  • Original filename
  • Section names for sections 1-6
  • Section sizes for sections 1-6
  • Section entropies for sections 1-6
  • Number of TLS sections
  • Entry point section
  • Count of calls to Windows APIs
  • Ratio of API call count to filesize

Finally, for the subset of Windows executable files we collected a wealth of descriptive indicators, which apply to meta-information and the internal structure of compiled binaries. Compilation time stamps of binaries can be easily changed, that is true, therefor they have to be taken with a pinch of salt. Singular cases though have shown, that looking at a campaign over a period of time, following related events on MISP that is, sometimes yields unexpected information “leaks”. This means, actors might follow the habit to falsify timestamps, at the same time though erring is human, and sometimes we encounter binaries with seemingly real time stamps. That said, obviously it is of interest to find attacks related to one specific incident, as historical data can reveal unknown traits of a specific actor.

A number of PE attributes serves the detection of packed binaries. The count of PE sections, section names, sizes and entropy, the count of TLS sections (Thread Local Storage) and the section of entry point for example are considered indicators, whether a runtime packer protects the executable. This is interesting information by itself, as it can be concluded which actors use packed malware, which don’t, and which packing habits the different actors have.

Next to that, the attributes also serve to determine the similarity among binaries. While on unpacked binaries, the attributes are highly dependent on the compiler used to compile the binary, on packed executables the same data shows similarities of the various packers.

Two rather uncommon binary metrics we came up with is the total count of calls to Windows APIs within the code and the API call count to file size ratio. The primary consideration hereby is, that packed or protected executables show little interaction with the Win32 API. Another interest in these metrics though is, that the API calls within a binary relate to the actual binary structure. While code changes or additions within pieces of malware very likely change fuzzy hashes, the imphash, the filesize and section attributes, the changes of the API call scheme should remain considerably small.

Data about the data

The beauty of the data collection process, is that it left us with a set of malicious binaries, that are somewhat guaranteed to have been part of a targeted attack approach at some point in the timeline of their use. Furthermore, with the help of MISP we are able to put the binaries into context, as we know in many cases which campaign they were involved with.

For picking events from MISP we applied a lose set of criteria. MISP’s events are pre-classified by the submitter, where a threat level of medium or high indicates, that this event is likely of a targeted nature. From this group we handpicked events, where the description suggested it deals with a known or at least an identified threat actor, or where the nature of the malware clearly suggested a targeted background; like e.g. themed spear phishing would.

The initial data collection started November 2016, so the total set of events only includes cases submitted to MISP until middle of December 2016. However, in follow-up work some of the feature correlation procedures have been adopted by MISP itself. For more details please refer to the website.

Please note, this procedure leaves quite some room for incorrectness. We assume by default, that indicators reported by vendors and their context are correctly associated. This is not always the case, as we found out while performing tests; in some rare occasions data in vendor reports has turned out to be incorrect. As of now we do not have insight which percentage of reports shows errors. Furthermore, the events contained in MISP only show information that actually is reported, meaning that attacks which by the time of analysis yet have to be discovered as well as attributes which are potentially missing from reports pose a blind spot to our research.

Finally, we started off with a set of 501 events, which we assumed to contain information about targeted attacks, containing a total of 15.347 malware samples. From this set we removed another subset of events, which we determined to be overlapping with another event of the same attacker or incident. Duplicate entries happen, as MISP is not striving for accuracy but for completeness. The cleanup left us with a set of 326 events and 8.927 samples.

filetypes_basicdata

The graphic shows the file types of the entire sample set. It can be seen, that Win32 PE executables are rather dominant. This is explained by the heavy use of repackaged commodity malware by some actors, but does not represent the general distribution of file types per event. Nevertheless, PE32 is the most important file type within the analyzed sample set, counting more than 11.000 out of the total corpus of 15.347 samples.

In the next blogpost I’ll be introducing our results of an exploit-per-APT analysis, and write about one or another curiosity we found within our final data set.

 

Micro architecture attacks on KASLR

 

Introduction

Recently a number of different micro architecture attacks on Kernel Address Space Layout Randomization(KASLR) have been published. The idea behind KASLRis that code reuse attacks and read/write primitives are useless if the attacker is unable to tell where what code/data is. Consequently modern operating system randomizes the virtual addresses where code/data is stored. However it turns out the CPU leaks information about the address space layout causing failure in the defense. This blog post shortly summarizes the 4 known micro architecture attacks on KASRL and discusses potential mitigations. Part of the mitigation portion will be speculative. Bias up front: I’m a coauthor of the prefetch attack and quite like it.

What Do I mean by micro architecture KASLR break

When I wrote my first blog post [8] on breaking KASLR with micro architecture, I got criticized for using the term “micro architecture”. What I mean to convey when I use this term is that there is nothing in the Instruction Set Architecture (ISA) that specifies that the x86 has to work in ways that allows us to attack KASLR. Rather the features that are being used, are part of the implementation of the ISA. This usage of the term seems in line with common usage in the academic literature. There have been attacks on KASLR using page deduplication e.g. Barresi et al. [1] which some consider a microarchitecture attack. I however, do not and will thus not include these attacks in this discussion.

History of micro architecture KASLR breaks

 

Using page faults

Hund, Willems & Holz [9] was the first to publish an attack on KASLR. They use a combination of a classical prime+probe cache attack and a novel attack on the paging caches the time it takes for a fault to be thrown on unprivileged access to a page on x86-64 differs significantly depending on weather the translation is cached in the paging caches. An attacker can measure this difference. Further even an unprivileged access to a page causes the translation caches to be loaded and consequently you can use a timing difference attack to tell if memory is mapped on a certain location. They called this the “double page fault” attack.

Using Prefetch

I wrote a blog post in January [8] that become my starting for the paper by Gruss et al [6]. The Hund, Willems & Holz [9] attack suffers from the drawback that the timing is measure across the page fault handler which adds significant noise. This makes the attack itself slow and due to noise the attack needs to be repeated a significant amount of times to become reliable. What I and my coauthors noted was that the prefetch instructions do not cause page faults – they terminate silently. They do however have the same timing properties, as they do the same look ups. Consequently replacing page faults with the prefetch instruction provides a faster and more reliable KASLRbreak.

 

Using TSX

Wojtczuk wrote in a blog post [3] that intel TSX technology might make Hund, Willems & Holz’s [9] attack significantly more practical.  Jang et al.[ 2] took this up and showed Wojtczuk [3] was right. TSX works by being able to roll back a set of memory accesses, if they cannot be completed immediately due to conflicting access in the relevant portions of the memory sub system – much like transactional access to databases. However, access to a page that would normally trigger a page fault inside a transaction now just causes a rollback. Consequently, a significant speed up and reliability gain can be had by wrapping your “probe” in a TSX section.

 

Using Branch Target Buffer

The fourth and fairly different attack came using the Branch Target Buffer, BTB.  Evtyushkin, Ponomarev & Abu-Ghazaleh [5] found out that intel caches branch targets in order to better branch prediction in BTB. Correctly predicting branches is important to performance because a failure to predict correctly will under utilize the entire instruction pipeline. Intel however is cheap with the bits and uses only the lower 31 bits to store a branch target in the cache. Since KASLRis typically build on entropy in the lower 31 bits and these bits are shared between kernel and user mode, an attacker can cause conflicts. These conflicts will cause measurable timing differences.

 

Features

Despite the first three methods evolving around the same theme they do have significantly different advantages and caveats.  The unique BTB method obviously also have diverging properties. I shall stress the advantages and problems of each vector using Hund, Willems & Holz paper [9] as a baseline. This classical method now appears mostly obsolete. That said I consider the paper a milestone for many reasons and it’s well worth a read.

The Advantages of the prefetch vector

Using the prefetch instruction is, as noted above, faster and more reliable. Not only does it make it more reliable, but also more accurate. Without the noise one can spot a timing difference depending on the level in the page table hierarchy an abort was caused. This allows an attacker to efficiently map out the entire address space top down starting out with 1 gigabyte regions. Further the prefetch instruction actually ignores the page table security settings and prefetches an address into a cache. This allows an attacker to recover physical addresses and by pass SMEP & SMAP when the operating system is using a direct mapping.  The prefetch instructions are available on all current CPU’s, as they were introduced back in 1998.

The TSX vectors’s advantages

Like the prefetch attack, no fault handler is called and consequently the attack has significantly reduced the noise. The TSX attack, unlike the prefetch, can trigger code access through execution using a jump or call instruction. This has significantly different timing properties and consequently allows the attacker to distinguish between code and data. As a downside it cannot use the cache loading properties of the prefetch instruction. A major drawback of the TSX vector is the installed base of TSX which is somewhat limited. It was officially introduced with Haswell micro architecture. To this day TSX is only available in high end computers: some I5, I7 and xeon’s. This does not yet include the U series which is the most popular processor for notebooks.

BTB

The BTB attack is probably the most limited attack, as it is only able to locate code and data where branches give away the address. For all its limits in this regard it is probably the fastest of the attacks. Further Felix Wilhelm [11] showed that since the attack does not rely on page tables, it translates to host-KASLR breaks too.

Software Mitigation

Intel has in the past not been prone to do anything about micro architectural attacks and consequently I have significant doubt that they will with KASLR attacks. In fact the segment limits that were part of the x86-32 architecture would be a shop stopper for the first three attack vectors, but was removed for performance with x86-64.

It is often suggested to make the timer (RDTSC(p) ) less accurate will mitigate micro architectural attacks which covers all 4 attacks vectors here. The cost is obviously some implementation overhead and a less accurate timer for benign purposes. It could be implemented through the CR4.TCD bit for bare metal operating systems or setting up the MSR’s to get a VMEXIT in hypervisor environments.  These cost seems reasonable to me – I’ve yet to hear a convincing argument why user mode applications in general, need such timing accuracy and I never heard a convincing argument that any application would be significantly slowed down by the overhead required to get less accuracy on RDTSC(p) instruction. That said lower timing accuracy is not a good mitigation in cases where an attack can be repeated. The reason is that once an attacker can repeat a timing attack, he can use the law of large numbers to cancel out any noise or bias added to the timer. For KASLR attacks, there is little reason why an attacker cannot repeat the attack and in fact the Hund, Willems & Holz [9] attack does this to great effect, not because of a noisy timer, but because a the noise added by the page fault handler.  I thus think it’s safe to conclude that a noisy timer cannot be a solution to KASLR micro architecture attacks.

The Hund, Willems & Holz [9] attack is fairly easy to mitigate. Since they actually trigger a page fault, one has a nice call back in the page fault handler. A few lines of code can use client CS and error code on the stack and the linear address in CR2 to check if a user mode program tried to access kernel memory and respond accordingly. As this event should be really rare, the response can be anything from causing a large delay making the attack impractical or shutting down the application.

 

The in Jang et al. [2] version of the TSX attack ,DrK, use page mapping (number of continuously mapped pages) and page attributes to identify modules. This can easily be thwarted by mapping unused pages in the driver region to a single PFN with only null data. The cost shouldn’t be too bad as only one PFN needs to be referenced mapped from a number of PTE which to some extend can use existing PDE’s. Exactly how many additional PDE’s would be needed to thwart the DrK would be a tuning parameter. This method would also slow down the prefetch attack, however currently it uses a function call to load the caches and thus cannot be completely thwarted in this fashion. It is likely that the TSX vector could be made to work in a similar fashion.

Both the TSX vector and the BTB vector is vulnerable to performance counter detection as there are currently performance counters that maps to micro events caused by the attack. Say for instance the detection I suggested for the Flush+Flush [7] attack could easily be modified to detect both these attacks. The prefetch attack however, does not seem to have a performance counter event that correlate sufficiently with the attack. The performance counters on software prefetch’s was discontinued and remapped with the by now very old Silvermont generation of processors. These kinds of detections are unfortunately either high-overhead or false positive prone.  Consequently they don’t off themselves as general solutions, but rather of a close gap nature.  Without a good performance counter mapping to the prefetch instruction, reliable detection would likely be prohibitively expensive performance wise using these methods.

The BTB and TSX vectors offer other ways of detection.  In particular it seems likely that both will generate activity in the Last Branch Record (LBR) stack. If one were to instrument the RDTSC(p) instruction, as suggested above, the distance between the last RDTSC(p) instruction and the current combined with the LBR would provide clues to nefarious activity quite possibly at a fairly low cost.

More promising is a BTB only trick that relies on the very thing that makes the BTB attack work in the first place. The BTB vector works because only the lower bits are used to store the branch target and thus you get conflicts between kernel and user mode branches. Interestingly this conflict consequently only gives you information on the lower 31 bits and Evtyushkin and Ponomarev [5] notes that using the upper bits for KASLR thus remains a solution. Using all the remaining addressing bits f or KASLR seems a good idea, however all modern operating system sorts different types of code and data into blocks and partically uses these blocks to identify maglinant behavior. For instance drivers in Windows are loaded between 0xFFFFF800’000000000 and 0xFFFFF8FF’FFFFFFFF. Thus to minimize changes to this general structure using only the upper 9 bits (the PML4 bits) would probably be the easiest solution. This would give you 256 different locations which the BTB method could not tell apart (the top most bit is used to distinguish between kernel mode and user mode). Not much, but an attacker is unlikely to get away with causing a machine a 100 times to get it right be random. In fact Windows 10 Anniversary Edition does this for PTE’s, but not for code. See Economou [10] for more info.

The three vectors based on actually accessing kernel memory can be mitigated through address space separation. By treating kernel mode as a separate “process” and swapping the CR3 register immediately after/before a ring change, one can effectively block these attacks. Depending on the amount of ring switching going on, an operating system are looking at about 5% over head as switching the CR3 registers are expensive in and of itself and because it invalidates the paging caches. Both Yang et al [2] and Gruss et al. [6] mentions this method.

A mitigation that I considered for thwarting the BTB attack was flushing the BTB itself, by systematically doing a number of jmp instructions on switching back into user mode. This method is likely to have too much performance overhead to be practical. The reason for this method is the size of the BTB, which Godbolt [4] estimates the BTB in his Haswell to be 2048 entries large – thus requiring 2048 dummy branch instructions on every transition into user mode.

Conclusion

The natural question, which of these methods is the best, is a misunderstood question. While the first method appears obsolete by now, the other 3 methods have different advantages and disadvantages. They base themselves on at least 2, but probably at least 3 different side channels in the CPU and thwarting all four attacks seems like a difficult undertaking. Personally I find it likely that there are more leaking in the processor to be found – maybe I’ll write about that another time. The natural conclusion is that KASLRfor bare metal operating systems seems to be broken for defending against privilege escalation mounted from native code. The caveat here is the “from native code part”. Classical data based attacks, such as a font attacks of the past cannot be doctored to break KASLRusing these micro architectural attacks. Before I cry any crocodile tears ,I should note that KASLR was always a stop gap solution and consequently mitigations become mitigations for a mitigation. The real problem being bugs in kernel mode – though very difficult to fix all kernel mode bugs, minimizing code in the kernel and improving kernel mode code quality must remain the focus of making the kernel secure.

 

Literature

[1]A Barresi , K Razavi, M Payer, T Gross: “CAIN: Silently Breaking ASLR in the Cloud“ – https://nebelwelt.net/publications/files/15WOOT.pdf

[2] Y Jang, S Lee, T Kim , “Breaking Kernel Address Space Layout Randomization with Intel TSX”

– Proceedings of the 23rd ACM Conference on  Computer and Communications Security

[3] R Wojtczuk.” TSX Improves Timing Attacks Against KASLR.” https://labs.bromium.com/2014/10/

[4] M Godbolt. “The BTB in contemporary Intel chips.” http://xania.org/201602/bpu-part-three

[5] Evtyushkin D, Ponomarev D, Abu-Ghazaleh N. Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR. InProceedings of 49th International Symposium on Microarchitecture (MICRO) 2016.

[6] D Gruss, C Maurice, A Fogh, M Lipp, S Mangard . “Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR.” Proceedings of the 23rd ACM Conference on Computer and Communications security 2016

[7] A Fogh. “Detecting stealth mode cache attacks: Flush+Flush. http://dreamsofastone.blogspot.de/2015/11/detecting-stealth-mode-cache-attacks.html

[8] A Fogh: “Breaking KASRL with micro architecture Part 1”, http://dreamsofastone.blogspot.de/2016/02/breaking-kasrl-with-micro-architecture.html

[9] R Hund, C Willems, T Holz. „Practical timing side channel attacks against kernel space ASLR” Security and Privacy (SP), 2013 IEEE Symposium on, 191-205

[10] N Economou.  “Getting Physical: Extreme abuse of Intel based Paging Systems” -https://blog.coresecurity.com/2016/08/25/getting-physical-extreme-abuse-of-intel-based-paging-systems-part-3-windows-hals-heap/

[11]  F Wilhelm “mario_baslr” https://github.com/felixwilhelm/mario_baslr/

The Kings In Your Castle Part #1

All the lame threats that own you, but will never make you famous.

In March 2016 I presented together with Raphael Vinot at this year’s Troopers conference in Heidelberg. The talk treated research of targeted malware, the how’s and if’s of malicious binaries, the involvement of sophistication and exploits, the presence or absence of patters within advanced persistent threats (APTs). The write-up of all happenings has taken its time, and as expected kept growing longer and longer and.. finally we figured splitting the outcome into multiple posts would make all our lifes significantly easier.

Five compact blogposts are the result, easy to digest and understand, while covering the following points:

  1. Introduction, hypotheses and analysis process with respective toolset
  2. Description of the data set and the feature extraction process
  3. The findings’ curiosities and results of exploit-per-APT analysis
  4. The use of packers, crypters and commodity RATs within the sample set
  5. Actor correlations and APT naming wars, future research

Here we go, please enjoy part 1 about the kings, your castle, and cyber. At this point I would like to thank Raphael Vinot for joining in on this data shoveling excurse, the team of CIRCL being busy feeding and maintaining their data dragon, and Morgan Marquis-Boire as well as the many other wizards from the research community who kept sending us malicious software.

Part 1: The kings, your castle, and cyber

It is the same question being directed to audiences around the security conference scene: How many people in the room can tell their machine or network is currently not compromised? No hand has been seen to rise in answer.  APT has been fashion five years ago and still rocks the most-feared charts on every cyber threat survey. While tabloid press is generally after the latest most-sophisticated-threat, the analyst community has long resorted to talk about threats that are advanced and persistent.. enough. In terms of sophistication targeted attacks show all shades of grey, on average though tend to be rather shallow. On the other hand, security products all have a single weak spot in common that they will always rely on patterns; whether patterns that are there, like signatures, or patterns that are not there, like anomalies. This enables attackers to evade detections with shallow, but unknown tools which manage to fly under the radar.

In research we conducted in cooperation with CIRCL, the Computer Incident Response Center Luxemboug, we took on the APT myths by formulating hypotheses based on a set of APTs documented in the MISP platform. MISP stands for Malware Information Sharing Platform and is used by hundreds of organizations to share data on incidents, among them a large number of targeted attacks. It is possible to split the content of the information shared between reports of vendors and events seen by the users of the platform. MISP is maintained and developed by the fine people at CIRCL, data is constantly added by CIRCL members, security companies or independent contributors.

Having this information in one single place allows to correlate -supposedly- new threats reported by vendors with existing events seen in the wild now or in the past. At the time of conducting the research, MISP held information about more than 2.000 events.

The data contained helps understand the overall nature of the threats, the tools of trade, the preferred approaches of the attackers, and their evolution. It potentially even allows for actor tracking as the correlation of attributes reveals hidden treasures.

The gathered events from MISP are pre-classified by threat level. We concentrated on targeted threats and conducted a survey on the nature of malware and infrastructure used therein. How much of the analyzed malware is custom made, how much off-the-shelf or simply installs known RATs? How much of it is packed or crypted? Does the fact that malware is not crypted allow conclusions on whether it is used for targeted attacks? How often are exploits used in attack? Does the use of exploits imply more sophisticated tools as the attacker is expected to dispose of higher resources?

On a more abstract level, we also wanted to know if we were able to uncover links among actors, their tools and infrastructure, solely based on OSINT data (open source intelligence).

The reason why this is possible lies in the nature of targeted attacks in general. A targeted attack in reality is not a piece of malware. It is a process, consisting of several phases, and many times even a repetitive circle if one finds himself targeted by a determined attacker.

howapthappens

Figure 1 – The APT process

Furthermore, the stages involving malicious software frequently imply the use of different pieces of malware, and also a certain infrastructure to operate attacks. This means, the attackers require servers, domains and maybe even e-mail addresses or fake social media accounts and fake web sites; to orchestrate the malware, store exfiltrated data and to drive hands-on attacks. All of these items come with a certain cost, in time and money, and believe it or not, most attackers have restrictions – in time, and money.

Limited resources call for repetition, or, in other words, attackers with a hint of economical thinking won’t reinvent the wheel anew every time a new target is chosen. Advanced attack suites, exploits, tailored malware, techniques and practices are assets that are costly to change.

This said, reinventing the wheel is possible, but not a very established practice. Besides requiring extensive resources, building and maintaining more than one attack platform and a large set of unique attack tools is prone to errors. After all, the folks building sophisticated malware are human software developers too. Getting back to the actual cost of an attack, being discovered is not only annoying for an attack group, it is also expensive and potentially dangerous. Exposure might get law enforcement on track or even inspire counter attacks.

Enough about the motivations though, APTs will keep being APTs; so let’s go on with exploring their malware and infrastructure.

Toolification

MISP is a system that is usually fed with threat indicators, which are then shared and internally correlated with other indicators already present in the database. The primary purpose of MISP is of course the distribution of threat indicators, aided by additional information gained by indicator correlation. As by-product though, MISP provides a decent event documentation with a timeline.

That said, obviously we didn’t just walk in and query databases; the process of gathering data, cleaning up records, writing tools and forming theories that got again discarded was lengthy. In the end what is left is a neat sample set, a ton of data and a set of homegrown tools as well as extensions for already existing tools.

Please note though, this project is by no means an effort to perform large-scale clustering of malware with machine learning methods, nor does it involve any whatsoever sophisticated algorithms. The tools we designed are merely simple, the grouping of samples a mere sorting approach. Keep it simple, then scale it up, a data analysis wizard once told me.

toolification

Figure 2 – Tools and extensions created during data processing

MISP provides a web interface, which helps in manually investigating one or more events and look for correlations, but it does not serve an automation purpose. Therefor we established different ways to access and process the MISP dataset. We used a Python connector to shovel MISP data into Viper, a malware sample management framework. This way we were able to easily sort through events and selected the ones which, based on predefined criteria, were highly likely involved with targeted attacks. These events were the base from where we started processing. Selection criteria and musings about attack nature will be outlined in a follow up blogpost. To sketch the workflow, it went roughly the following way:

  1. Event data from Viper -> sample hashes -> sample processing / tools development with SQLite backend
  2. Event data from MISP -> pull to Redis backend -> event attribute processing
  3. Importing of sample processing data into Redis
  4. Data correlation and analysis with Redis

We faced a challenge when seeing to collect all the malware binaries, involved with events that were selected for further processing. MISP holds primarily threat indicators, including sample hashes, domains, e-mail addresses, file names and occasionally Yara signatures; rarely ever binaries. Also, malware, especially when involved in targeted attacks, is not always shared with the broader public. Eventually we managed to aggregate most of the fancied binaries through help of public and private repositories.

The use of two different backend solutions has historical reasons, mainly we started off to work independently with our preferred solution, and in the end found there is no reason to abolish either. Redis is a strong, scalable backend, suited for medium to large scale data analysis; SQLite is practical in portability, small, elegant and effortless to set up.

For feature extraction from binaries we used the Python library pefile and instrumented IDAPro with help of IDAPython. Furthermore, we made use of the ssdeep fuzzy hashing library, exiftool for detailed file type analysis and used RapidMiner community edition for visualization.

Our developments were published in the course of Troopers conference and are available on github at as part of the MISP repository.

The next blogpost of the Kings in your castle series will cover the nature of the analysis data set and include a discussion of the extracted feature set.