Too much WinWord. Too much Tex. Too many meetings. Too little CPU. It was time for a short pause from the grind and dig into some tetravalent metalloid. My current project was too big a mouthful to get into before going to Black Hat, so I dug up a pet project to play around with. And then it happened – I needed some info from the intel documentation and before you know it had what I believe is a new side channel in my head. Then a second one. Then a third. Two of these side channels looked like they would make for good arguments for a point I’ve been meaning to make. The third is more complex and it might be fairly useful. Consequently, I decided to write this small blog post on the two first side channels and make my point. Both of these side channels are technically interesting, however only the second is likely to be of any practical importance. I have not spent too much time or effort researching these channels as they are just a side project of mine. Beware: I’ve not checked the literature for these two particular channels, so I might be repeating other people’s research. I do not recall having read anything though.
Not all side channels are equal
Not all side channels are created equal. Some are inherently more powerful than others. However, there is no good way to measure how good a side channel is – it simply often comes down to the subjective question of what do an attacker need the channel for.
The simplest use for a side channel is the so called covert channel. Most microarchitecture side channels can be used to communicate between two trust domains when an attacker has access to both. An attacker can use covert channels for data exfiltration. The most important two reasons are to stay hidden or because regular channels are not open. An example of a covert channel would be an implant on a virtual machine in the cloud that communicates through, say a cache side channel, with another VM on the same physical computer. In this scenario the attacker doesn’t leave any TCP/IP logs and can communicate with his C2 server despite the defender turning off his network adapter.
While most microarchitecture side channels can be used as a covert channel, some side channels provides insights into normal system behavior and thus are powerful enough to be used for spying even when the attacker does not have access to both attacker and victim’s trust domain.
In this blog post we’ll be looking at two side channels which falls into this category of side channels, which are mostly irrelevant for spying but useful for covert channels. Since these channels are probably limited in this way I have refrained from researching them very deep. The point of this blog isn’t the two new side channels, but a different one. However both side channels are from a technical view point interesting. The second side channel might even be practical.
Time for a pause – side channel with the pause instruction
Being old one forgets the name of the instructions from time to time and I needed a pause instruction and was searching for “wait”. Anyhow I picked up the optimization guide because I know the right name would be in there. And so it was, but then I read the description again and found the pause instruction has a profound impact on hyper threads because it does not significantly block resources of the other hw thread on the computer. But that has probably been utilized by somebody already. But more interesting if two hyper threads both use the pause instruction simultaneously power can be saved. One has to wonder if that makes difference in the execution time of the pause instruction. And it does. But the story is longer. The latency of the pause instruction is heavily dependent on power management settings and the current activity of the computer. So the channel depends on those settings. Further Intel has been tinkering with the pause latency, so you’re likely to see different results on different computers. My results are from an I3-5005U, I did not test on other computers. My power settings where “Balanced”. Notice we are talking about a laptop here, which might be important too. Below a plot of latency measurements of the pause instruction with a core-co-located thread doing a loop of pause instructions and with a identically co-located thread doing a normal integer operation (xor eax,eax – did test with other instructions with similar results). The plot is for a loop of 40.000 measurements.
Usually we’d expect a speed up when a collocated hw-thread would pause, because the thread effectively yields all resources to the other hw-thread in the core and in the meassurements there is a bit of overhead that could be sped up. However we are seeing the exact opposite. I can only speculate about the reason, but it seems likely that with both co-located hw-threads in pause mode, the entire core goes into pause mode.That is the c-state of the core is increased and this comes at a small latency cost. Whatever the cause is, we have a measurable side channel. There is not much to spy on with this channel we could possibly spot when the kernel is doing a spinlock for some reason, but really juicy information does not seem to be a likely catch in the side channel. Since the channel depends heavily on power settings and on computer activity the channel might not be very useful in practice. Further the core-colocation requirement obviously detracts from value of the side channel. Also it’s a timing side channel using rdtsc(p) instruction which gives leverage for detection as well as mitigation through timing inaccuracy. In fact this attack would be difficult to get to run under VirtualBox (Ubuntu host) as this virtual machine is setup to cause a vmexit on rdtsc which makes the measurement of such small time differences difficult (Though not impossible). Even worse for the value of this side channel is that pause itself can be made to cause vmexit and thus be used for direct detection/mitigation of any pause based side channel attack. So while we have a side channel here, it’s not likely to be a very practical one.
Sowing a seed too many – side channel with rdseed
“Under heavy load, with multiple cores executing RDSEED in parallel, it is possible for the demand of random numbers by software processes/threads to exceed the rate at which the random number generator hardware can supply them. This will lead to the RDSEED instruction returning no data transitorily.”
That sounds like a very obvious side channel indeed. The RDSEED instruction has been added with the Broadwell micro architecture and uses heat entropy in the CPU to generate cryptographically strong random number. The problem with this instruction seems that for enough entropy to build up a bit of time is needed between calls to RDSEED. Thus intel designed the instruction to return an “error” using the carry flag if insufficient time has passed since the last call of RDSEED. So the basic idea is that an attacker could create a covert channel using this instruction. To send a 1 bit the sender implant loops an rdseed instruction and mean while the receiver runs a loop spaced with plenty of time between rdseed. The information is extracted in the recievesr’s end from a count of failed rdseed instructions. My simple test setup was an infinite sender loop which either called the rdseed instruction or not depending on the bit I wanted to send. My receiver looped 1000 times around did an rdseed instruction followed by a 10 ms Sleep() call. 0 bits caused zero failures in the receiver loop and typically around 800 failures in the 1bit scenario. I tested only on a I3-5005U Broadwell laptop, but with the sender and receiver thread pinned on same core as well as on different cores. Results where near identical.
This particular channel is interesting in so many ways, despite its limited use. It is a really rare thing that side channels in the CPU are not timing attacks based around the rdtsc(p) instruction. Infact I know of only one other attack: The instruction reordering attack of Sophia D’Antoine which if I understand it correctly is limited to scenarios that are not really reflective of the real world. This attack however works cross core without other limiting factors – however it’s limited to micro architectures that support the instruction which is only the newest intel CPU’s. Further it does not appear to cause any interrupt that would allow instrumentations and does not appear to be wired to performance counters that would allow detection thus making low impact software mitigation difficult. Finally, theoretically the channel could be used to spy on benign usage of the rdseed instruction, but probably wouldn’t gain much information.
The real point
The point of this blog isn’t that I found two new side channels. The point here is an entirely different one. The point is that I as a byproduct found at least two side channels in the processor of which at least one does not seem like it’s fixable in software. It might be in microcode, but I don’t think that’s relevant either. The point I wish to make is that with the amount of side channels already known and me finding two while not looking for them, it seems likely that there’ll be a great many side channels waiting to be discovered. Not to mention the likelihood that Intel will with time add new ones (recall the second side channel arrived with the Broadwell microarchitecture). Even if we knew all the side channels we’d be hard pressed to come up with software detection/mitigation and it seems unlikely that Intel will fix these side channels as they’ve not done so in the past. Consequently we should probably accept the fact that covert channels in cloud computing is a fact of life and consider them an unavoidable risk. This of cause does not mean we shouldn’t continue to find and document the channels, since our only defense against them is if malware analysts are able to spot one from a malicious implant.