New cache architecture on Intel I9 and Skylake server: An initial assessment

 

Intel has introduced the new I9 CPU which is seen as HEDT (High-End-DeskTop) product. The micro architecture is in many respects shared with the new Skylake server micro architecture.I f history is a guide technology introduced in this segment slowly trickles down to more budget friend desktops. From a Micro architecture point of view it seems that several things about these CPU’s will force changes on micro architectural attacks – especially in the memory subsystem. In this blog post I’ll do a short overview on some of relevant changes and the effects they’ll may have on micro architecture attacks. Since I don’t own or have access to an actual I9 processor or Skylake server processor this blog post is just conjecture.

 

Resume of the “old” cache hierachy

The major changes over earlier process from a micro architecture point of view is that the cache system has received a significant overhaul. Current Intel CPUs have a 3 level cache hierarchy – two very small L1 caches, one for data, one for instructions. The second level cache (L2 or Mid Latency Cache) is somewhat larger. L1 data, L1 code and L2  part of each core and private to the core. Finally, Intel CPU’s had a huge 3rd level cache (usually called L3 or  largest latency cache) shared between all cores. The 3rd level cache is subdivided into slices that are logically connected to a core. To effectively share this cache, Intel connected them on a ring bus called the Quick Path Interconnect. Further the 3rd level cache was an inclusive cache, which means that anything that is anything cached in L1 or L2 must also be cached in L3.

 

Changes

Some of the important changes that has been announced in the Intel Software Optimization Manuals [1]  are:

–    A focus on a high number of cores in the CPU (up to 18 in the HEDT models)

–    A reduced over all cache size per core (compared to similar older models)

–    A very significant increase in the size of the L2 (factor of 4)

–    Doubled the bandwidth of L2, while only slightly increasing the latency

–    Slightly more than offset by a reduction of the shared L3.

–    Reorganized  L3 cache to be a non-inclusive cache

–    Replaces the QPI with a mesh-style bus.

Why does these changes make sense?

Increasing the size of L2 on the cost of L3 makes sense as the L2 is much faster than L3 for applications – one can only assume that making the L3 helps reduce die size and cost. Also the increase in size of the L2 caches reduces the marginal utility of the L3. Finally  as the probability of cache set contention rises with the number of cores, it becomes advantageous to make a larger part of the total cache private. Cache contention in L3 is a problem Intel has battled before. With Haswell they introduced Cache Allocation Technology (CAT) to allow software to control cache usage of the L3 to deal with contention.

The number of cores is probably also the reason why Intel dropped the QPI ring buffer design. There is a penalty for having memory served from another core’s slices. On a ring bus this penalty is proportional to how far the cores are a part on the ring. With more cores, this penalty increases. Moving to a more flexible system seems sensible from this vantage point.

Having an inclusive L3 makes cache coherency easier and faster to manage. However, an inclusive cache comes with a cost as the same data are loaded in multiple caches. The relative loss of total cache storage space is exactly the ratio of the (L2 +L1) to L3 sizes. Previous this ratio has been around 1:10 (depending on actual CPU), but multiplying the L2 size by 4 and making the L3 a tiny bit smaller the ratio is now about 1:1.5. Thus the making the L3 cache non-inclusive is very essential to performance. At this point it’s important to notice that Intel uses the wording “non-inclusive”. This term is not well defined. The opposite of inclusive is exclusive meaning the content of L3 cannot be loaded in L1 and L2. I think Intel would have used the defined term exclusive if the cache really where exclusive. Thus,   it is probably fair to assume that non-inclusive means that data may or may not be cached in L1, L2 and L3 at the same time, but exactly how this is governed is important. Unfortunately there is no information available on this.

It’s worth noting that many of these changes has been tested and developed by Intel for the Knights landing micro architecture. Knights landing is a high throughput micro architecture, sold in relative small amounts. Thus it’s likely that many features developed for this CPU will end up being trickled down.  It’ll be interesting to see if Intel plans to trickle it down into laptop/small desktops or use different cache designs for different classes of CPUs.

 

Effects

Cache side channel attacks

This new cache layout is likely to have profound effect on cache side channel attacks. I think Flush+Reload will work even without the non-inclusive cache. The flush primitive is based on the CLFlush instruction which is part of the instruction set architecture (ISA). Intel has been very reluctant in the past to change the ISA and therefore my estimate is that flushing will work as always. I think the reload primitive will remain active – I find it likely that an uncached load will also load stuff into the shared L3. Also it’s likely that the QPI replacement bus can be used to fetch data from private L2 caches similar to AMD’s cross CPU transmission. This will give Flush+reload a nice flush+transfer flavor, but it’ll still work. For more on Invalidate (Flush)+Transfer see [2]. Since the L3 cache must be filled in someway we can fairly certain that at least one of these things are true, if not both. The latter being my guess.

The big change are likely to be related to evict and prime primitives. Assuming that cache contention between cores was a major reason for redesigning the cache, it’s unlikely that one can can load stuff into another core’s private hierarchy. Thus, the prime and evict primitives for  cross core attacks.  However,both are likely to work decently within a core (Hyper threading or scheduling on same core).

While the ISA behavior of CLFLush is almost certain to remain unchanged, the micro architecture below it will see significant changes. With QPI gone using the flush+flush attack by Gruss et al. [3] to find out how many hubs on the ring bus away you are from a particular slice almost certainly won’t work. This does not mean that you won’t find a side channel here in CLFlush, buses are typically bandwidth limited thus being an obvious source of congestion – without an inclusive L3 the bus bandwidth might even be easier to consume. Also the Flush+Flush attack as a Flush+reload replacement, is likely to produce a different timing behavior in the new micro architecture. My guess upfront is that a timing difference and thus a side channel remains.

Also affected by the non-inclusiveness of the L3 is row buffer side channel attacks such as those presented by Pessl. et al.[4]  Without an effective eviction cross core attacks may be severely stifled. The ability to reverse engineer the DRAM complex mapping function is likely to remain unchanged as it hinges not on eviction, but the CLFlush instruction’s ISA behavior.

With the CLFlush instruction and evict likely still working on local cores, row hammer will  remain effective. But in some scenarios, indirect effects of  micro architecture changes may break specific attacks such as that of  Sarani Bhattacharya, Debdeep Mukhopadhyay [5]. The reason is that such attacks rely on information leakage in the caches, that becomes more difficult to leverage as described above.

 

Future

While the changes to the cache makes sense with the significant number of cores in the affected systems, it seems unlikely that the changes will trickle down to notebooks and laptops. With only two cores the existing cache design seems sensible. Thus we are likely to see a two tier cache design moving on.

Conclusion

Having a non-inclusive L3 cache is significantly more secure from a side channel perspective than an inclusive in cross core scenarios. This opens up for defending these attacks, by isolating different security domains on different cores potentially dynamically. While flush+reload is likely to be unaffected, this attack is also the easiest to thwart in real life scenarios as avoiding shared memory cross security domains is an available and effective countermeasure. Lots of new research is required to gauge the security of these micro architecture changes.

 

Literature

[1] Intel. Intel® 64 and IA-32 Architectures Optimization Reference Manual. July 2017. https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

[2] Irazoqui, Gorka, Thomas Eisenbarth, and Berk Sunar. “Cross processor cache attacks.” Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. ACM, 2016.a

[3] Gruss, Daniel, et al. “Flush+ Flush: a fast and stealthy cache attack.” Detection of Intrusions and Malware, and Vulnerability Assessment. Springer International Publishing, 2016. 279-299.

[4] Pessl, Peter, et al. “DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks.” USENIX Security Symposium. 2016.

[5] Sarani Bhattacharya, Debdeep Mukhopadhyay: “Curious case of Rowhammer: Flipping Secret Exponent Bits using Timing Analysis”. http://eprint.iacr.org/2016/618.pdf

Security for Sale? – On Security Research Funding in Europe

On Wednesday, Feb. 22. 2017, a collective of 20 journalists from eleven countries published their recherches on the European security industry. The article, Security for Sale, published at The Correspondent, mostly seems to revolve around the question of how the funding several players in the field received from Horizon 2020 (H2020) and FP7 framework programmes are put to use for the European people. H2020 is the European Commission’s current funding initiative for research and innovation. Here is the commission’s own explanation.

Simplified, the authors of Security for Sale conclude that the benefit of funding security research for the Europe as a whole is limited, but that the funding works pretty well as hidden subsidies for the industry itself. Their wording is more lenient than mine, but nevertheless I feel that the picture the authors draw is incomplete and I’d like to add another perspective. I can only assume that the data for this article stems from the Secure Societies line of funding, as its core topics are emphasized in the article and some of the articles it links to refer to that – the article does regrettably not contain any straightforward references to its sources. And in my opinion, the Secure Societies line of funding does indeed sometimes yields research results that are scary to everyone who did not answer the question of ‘how do we want to live?’ with ‘I liked the setting depicted in Minority Report quite a bit, but it’s missing the effectiveness of Judge Dredd’.

The authors describe the landscape in the security sector roughly as 1) the big players, where kinetic and digital technology converge, 2) research organizations, 3) universities, and 4) small and medium enterprises (SMEs). Our sector, of Ze Great Cybers, primarily hides in 4) and to some extent increasingly in 1). The article does not explicitly touch the field of information security and neither do many calls in the Secure Societies context, however, most of the projects need to touch the digital domain at some point or other, making it clear that a division of information security and all the other potential flavors of security is merely artificial, given our current state of technology. As the authors observe, this is also reflected by the upcoming funding opportunities:

“similar programs are being set up for cybersecurity and military research”

The EU and some of it’s member states are late to the game and I’m aware that not everybody hacking at computers likes the notion that InfoSec and defense converge. I also dislike the idea and I somehow liked the Internet better when it was still a lot emptier, or as Halvar Flake once put it:

However, I came to enjoy civilization and as most people, I rely on the critical infrastructures that make our societies tick. I’ve been leading incident response assignments in hospitals more than once in the last year and as a human, who may suddenly require the services of a hospital at some point or another, I am very grateful for the effort and dedication my colleagues and the clients’ staff put into resolving the respective incidents. If this means I’m working in defense now, then I still dislike the notion, but I see that the work is necessary and also that we need to think more on the European scale when we want to protect the integrity of our societies. Packets only stop at borders of oppressive societies and that shouldn’t be us.

Now let’s have a look at where the EU’s security research funds go to, according to the article. The authors state

“Companies received by far the most money. That’s not particularly surprising; these same companies were the ones influencing funding policy.”

and illustrate it with the following figure:

funding_distribution_thecorrespondent
Figure 1: Security research funding distribution. Illustration by The Correspondent.

In 2015, I was directly responsible for four H2020 grant applications and consulted on several other applications for nationally or regionally managed funding opportunities. Security in its various flavors is a tiny part of the picture. I’m happy to state that we had an above the average success rate. The illustration in the article struck me as familiar: to me, Figure 1 simply shows a typical distribution of funding within the majority of grant applications I’ve worked on. So it is hardly surprising that the global distribution of funding within technology sector of the programme looks very similar. There is nothing much sinister to that.

Larger corporations do have more opportunities to influence policy, as they can afford the time/resources to lobby. But the main reason for the distribution is salaries on the one hand and grant policy on the other. An engineer working at a large corporation will be more expensive by factor 1.2 to 2.3 than a PhD student, depending on country and the respective corporation. A factor of ~1.9, as in the figure, does not look unreasonable, given that the figure accounts for accumulated costs, not just personnel and that it is more likely that a corporation or a research institute will take the effort of building a demonstrator or pilot installations of a technology, as universities regrettably tend to lack monetization strategies for research results.

Government entities, as the next in line, tend to have very limited personnel resources for research projects and do not have a lot of wiggle room when in comes to contributions.  With a funding scope that aims at technological advances, funding for advisories is often politically limited, and rightly so. As the figures are very much aggregated, I can only assume that the complete sum also contains Coordination and Support Actions (CSAs), which are rather limited funding schemes, financially speaking, that aim at connecting related projects and generally at the systematization of knowledge to avoid arriving at one insight at twice of thrice the funding. This work is sometimes done by entities that could classify as advisories. ‘Other’ can be translated to network and dissemination partners, or dedicated project management (which makes a lot of sense, H2020 projects can be large in terms of the number of partners).

Now let’s not talk militarizing corporations enriching themselves, as the article’s authors suggest, let’s talk research funding effectiveness. I my conversations with the granting side of research funding, it is a constant pain that there is a significant amount of research projects being funded that do not amount to a product. I have seen quite a few projects where I would judge without hesitation that the project was a WoMBaT (i.e., Waste of Money, Brains, and Time) and did primarily serve to compensate for the lack of public funding towards universities.

But that is only a small part of the picture. Another part is that there is a very expensive zone between ‘things you can publish’ and ‘things you can actually use’. Simplifying, technological progress has a tendency to increase complexity. To specify their expectations regarding the results of a funding measure, the European Commission adopted NASA’s Technology Readiness Levels (TRL). In the rather broad field of engineering, we often see the requirement for a validation under lab conditions (TRL 4), to the extent of a working prototype under field conditions (TRL 7), depending on the class of funding action. The funding of a given project often ends at that point.

Using my terminology from above, in the best case you then have something you can publish and/or show off, i.e., an interesting approach that has been shown to be feasible. Between that and monetization are the roughly two to seven years you’ll often need in engineering to go from a prototype to a product. And strictly speaking, research funding ends here, because research ends here. There are almost no publicly funded actions that will allow you to go towards product development, although piloting of a technology in the field can be funded (TRL 9). Nevertheless, either a company is now able to fund the continued development towards a product, or not. This is still a limited view, as I don’t need a new product to monetize research results. It is just as desirable to to improve an existing set of products and services, based on new research results. It is, however, by far not as visible.

Now why more so much funding for the big players? Research grants tends to have a bias towards those applicants, who were able to successfully complete a project, presented impressively in the past and are generally held in good standing. Sounds familiar? Yep,sounds like selecting talks for Black Hat or any other non-academic security conference, where the process is single-blind and the committee needs to judge based only on an abstract, not a full paper with a proper evaluation section and extensive related work. If the data is relatively poor, judgment needs to rely on the applicants reputation and previous work to some extend. And in comparison to a well-backed research paper, grant applications are in their nature always speculative, although very much more detailed than the abstract of a conference submission. If one knew how something was done in the first place, there’d be no need to call it research and there’d be no reason for a grant. The important point is not to fail, if one wants to continue receiving grants (cf. bias, above).

And that still does not fully account for the observation that big players are doing very well in grant applications. A H2020 application is a lot of work, it’s often a hundred pages just for sections 1 to 3, which can easily amount to 100 person days for the coordinator until everything is properly polished. The acceptance rate for Research and Innovation Actions can be as low as 6%. Academic Tier-1 conferences are relaxed in comparison. A small company may simply be unable to compete, economically, i.e., it cannot accept the realistic risk of putting a lot of effort into naught. It’s not as hard in other types of actions, but the risk is still significant and acceptance rates have been getting worse, not better.

“Our investigation reveals that EU security policy emphasizes technology: a high-tech solution is being sought for a societal problem.”

From an outside position I feel that I’m unable to judge the intention and the mindset of the various individuals responsible for the actual wording of the H2020 calls. The persons from this context I’ve met in the past did, however, not leave an impression of exceptional naivety. I genuinely believe that it is in the best interest of European countries and the EU to fund security research, without denying that there may be recipients of funds with a questionable ethical standard. As Europeans, we need to address these issues. I do not want to live in a militarized society and I don’t believe in solving societal problems purely through technology. However, I see a significant amount of opportunities, where usable, efficient technology can enable solutions for societal problems. Given my socialization, I may have a bias towards the security spectrum of research, but that said, I see quite a few things that can and should be done to positively impact the security of our societies. In a changing political climate, Europe needs to step up its security game, also and especially in the digital domain.

And now excuse me, I need to continue that research grant proposal. I’m not doing that for kicks and neither in pure self-interest.