Attacking Processor Integrity on x86 through Undervolting

In 2017, I watched Adrian Tang's presentation on his CLKSCREW attack (which allowed him to breach TrustZone isolation on ARM from the normal world) and talked a bit with him afterwards.
It struck me as pretty interesting, since he essentially managed to achieve malicious run-time behavior, similar to what you would get from a memory-corruption bug but originating at the hardware level.
In that sense it was similar to rowhammer, which I had previously been working on, but differed as the attack targeted the CPU instead of the RAM module.
One of the things I asked him was if he ever tried something similar on x86, which he said would be interesting but probably very hard to do.

So when I saw people overclocking Intel CPUs at run time and purely from software in the following year, I started a project in September 2018 with a master student (Zijo Kenjar) to investigate the feasibility of under-volting those CPUs from software.
It turned out that the interface for adjusting core voltage at run time provided by Intel was undocumented, but inofficially some software made use of it under the name of OC Mailbox (essentially, just a nickname for MSR 0x150).
However, it wasn't until February 2019 that we had some initial results and the question was how to frame it as a publication, since the method required root privileges (i.e., assuming a rather strong adversary).
Luckily, we found that SGX was susceptible to the same behavior we saw in userland and kernel faults.
More precisely, we found:

What is the bug?
During scenarios of severely limited voltage supply, some instructions will continue to execute with flipped bits in their results and no crashes, allowing exploitation.
We found several instances, the most simple being:
vpxor %xmm1, %xmm2, %xmm3
vmovdqu %xmm3, (%rsp)

Under which conditions does it happen?
Depend on the machine and will be slightly different for each physical core.
In our experiments we achieved the most consistent behavior at core voltage around 700mV (p-state 0x1B, voltage offset -250mV) and 770mV (p-state 0x20, voltage offset -265mV) for the first machine used in our eval.
Core temperature appears to be a significant factor, with around ~45°C for p-state 0x1B, ~50°C for p-state 0x20, and ~55°C for p-state 0x24 yielding the best results we've seen so far.

Which platforms are affected?
We tested three different CPUs with two different microarchitectures, a Lenovo Ideacentre 720 machine with a i7-7700 (stepping 9), an HP Z240 with an i7-7700K (stepping 9), and a Lenovo P330 with an i7-8700K (stepping 10).
According to the documentation, both of the Kaby-Lake machines mainboards have similar IMVP8-compliant NCP81203 voltage regulators, but we suspect the underlying issue to be present more broadly.

So what is the impact of this research? Well, it essentially means that the security guarantees that the CPU was sold with are broken on those platforms.. one example we lay out in detail in our paper is Intel SGX (i.e., one can affect enclave execution through flipped bits in the processor).
We reported the issue to Intel on August 12, who assigned CVE-2019-11157 to remain under embargo until December 10, and published an advisory as part of their monthly patch cycle.

Update (Dec 11):
Curiously, the very same issue was reported independently to Intel by two additional groups of researchers: first, a group from University of Birmingham, KU Leuven, and TU Graz. And second, a group from University of Maryland and Tsinghua University.
We did not collaborate or communicate with these two groups (since we did not know of them at the time of submission and did not know their identities until the end of the embargo on December 10).

At present, we understand that we are the only team that provided Intel with a PoC that demonstrates a deviation of control flow during enclave execution, while the first team focused on recovering cryptographic keys (one of the primary use cases for SGX *wink*) and the third team provided no PoC to Intel. While the Plundervolt paper and website state that the attack only applies to SGX (because the attacker requires write access to model-specific registers, which is available to root users on both Windows and many Linux distributions by default), we believe that there might be other scenarios where processor faults could be exploited within this strong adversarial model, such as mandatory access control (e.g., SELinux) or even bootloader code (since OC Mailbox will retain set values throughout soft resets).

Moreover, in our experiments we have seen errors related to instruction decoding (i.e., it could be possible to craft malicious instructions through faults at least in theory), however, we were not able to exploit such faults (yet). Does this mean that undervolting-based attacks are going to appear "in the wild"? Possible but highly unlikely, since root privileges are sufficiently broad to enable many kinds of attacks in practice. So far, I am also not aware of CLKSCREW/VoltJokey or rowhammer attacks being used in the wild either.

I guess this means that exploiting software (although harder now than 10 years ago) is still way too easy ;-)