>>My name is Stefan Saroiu, and together with
my colleague Alec Wolman, we’re the session co-chairs
for the security session. I’m sure all of you
are here to find out the work that both
Microsoft and Academia has been doing on these attacks that stemmed from
speculative executions for CPUs like Meltdown and Spectre or from the density of the cells
backed in DRAM like RowHammer. I was really hoping to have a discussion throughout the talk. We’re going to have
three speakers and we’re going to
introduce each of them. I think it’s going
to be difficult and the reason for that
is because I think the dynamics of
these ROM really is not amenable to the discussions
I would like to have. Nevertheless, I strongly
encourage you to raise your hand and
ask questions if you like in the middle of the talk, and we have ample time for Q&A at the end of each presentation. Okay. So, it’s my pleasure to introduce to you the first
speaker, Christopher Ertl. He’s a Security Software
Engineer in what’s called MSRC at Microsoft. MSRC stands for
Microsoft Response Center, and this is the team that deals with security vulnerabilities
especially in Azure, but is your guys’ mandate just for Azure or for
the whole Microsoft?>>No, so it’s for
all Microsoft products including browser,
Office, et cetera.>>Okay. So, he’s going to talk a little
bit about some of the things that Microsoft
has been doing to mitigate Meltdown and
Spectre. Thank you.>>All right. Thank you, Stefan. So, good morning everyone. Once again, my name
is Christopher Etrl, I’m going to be talking about the Spectre and Meltdown
vulnerabilities and how we’re able
to mitigate them. All right, so Spectre
and Meltdown, these issues gained
a huge amount of interest from the research community when they were disclosed
in January this year. The reason for that is because they represent
a fundamentally new class of hardware security vulnerability
which allows leaking information across security
boundaries from the browser, the hypervisor, et cetera. All right. So, when we were first
made available of these issues in June last year, we kicked off a SSIRP
incident response process and this is typical for whenever we’re made aware of a critical security
vulnerability, either being exploited
in the wild or just a high threat which requires mobilizing a large number
of people within Microsoft to drive
remediation of the fixes. So, Spectre and Meltdown. Once again, they
have implications across nearly every
security boundary and allow potentially disclosing information
such as passwords in the browser or a guest-to-guest in
virtualization context. So, what I’m going to
do now is I’m going to break down the
attacks themselves, more generally how
speculative execution can lead to a side channel. Then afterwards, we can
go on to how we might be able to mitigate
those. All right. So before we can get into
speculative execution itself, I’m going to need to
explain a bit about how a modern CPU works. So, typically, when
we see assembly code and we consider how CPU executes, we generally think
of each instruction executing one after
the other sequentially. But in reality, it’s a bit
more complicated than this. Instructions are first
decoded into a series of microoperations which are
placed into a reorder buffer, and from there, the CPU
is able to make use of several optimizations. So, the first of which
is being superscalar, is able to execute certain micro-operations
in parallel concurrently. Second, which is
out-of-order execution and this essentially
allows the CPU to start executing later
instructions before earlier ones to make best use of the available execution units. Yeah, this is faster
than just waiting for each instruction to complete before the next one can start. So, speculative execution is just an extension of
out-of-order execution. So, when the CPU has some dependency on
the result of an operation, rather than waiting for that to resolve and results
be made available, it can begin executing
speculatively according to a prediction that
it makes on this outcome. The reasoning behind this is
that once the result is made available and if
the prediction was correct, the results of speculative
execution can be committed, so that’s the calculated
register values and any memory
stores for example. This is much quicker
than waiting for the outcome to be made available before
starting execution. Conversely, if the prediction, that speculation ran
on was incorrect, the results will be discarded
and the execution unrolled. All right. So the fundamental problem with speculative execution which led to the Spectre
and Meltdown vulnerabilities is that not everything
is thrown away when incorrect speculative
execution is unrolled. In particular, changes to the cache state are
not always unrolled, and that can contain private data which an attacker might
be able to later observe. All right. So now, I’m going to move on to
the variants themselves. So, starting with variant one. This was where a conditional
branch would mispredict. So here, we have
a typical bounds check on an untrusted index
before using it as an array index
for this buffer. This is a typical code pattern, very common in C and C++ code, but consider if this bounds
check is mispredicted and the inner code is executed was untrusted index is actually greater than or equal to length. In this case, what will
happen is value will be read from depending on
the types involved, this could be an
arbitrary virtual address considering if buffers or by pointer and untrusted index
is 64-bit value. This could result
in reading value from an arbitrary
virtual address. Then after that, a secondary
index will be performed which loads
a different cache line depending on this private value. So, the result of this is that if an attacker can execute
this code speculatively, a different cache line will
be loaded as an artifact of that secret value that
should not be made available. Variant two was where the target of an indirect branch
would be mispredicted. So, indirect branches are used when the compiler
doesn’t know at compile time, what the target of
the branch will be. So typically, a function pointer
or a vtable for example. If speculative execution
executes one of these indirect branches
on a register, it might jump to
an incorrect target. Similar to before,
what might happen is reading a byte from an attacker controlled
register and then loading a cache line according
to this secret value, and cache line size is 64 bytes. So, in this gadget, we simply shifted left six times. So, variant three is specific to the kernel to user information
disclosure scenario. So, if these last three
instructions are executing speculatively
due to conditional branch mispredict, for example, what can happen is
that if we try to load it from kernel memory
and userland execution, speculative execution will
actually be able to retrieve that value and pass onto
subsequent instructions before the exception
will be triggered, and so it can persist the results by loading a
cache line for example, as we saw it before
and with this variant, userland code execution is
able to read kernel memory. So now, what I’m going
to do is I’m going to create a taxonomy
of these attacks, and so we can
systematically go through the key components required and then move on to
the mitigations. All right. So there are four key components required for speculative execution
side channel. The first of which is a method
of gaining speculation. So, as we saw, that might be conditional branch
mispredict for example. Second thing we’ll need
is a Window and gadget, and this is used to extend
how long speculation can run before the CPU realizes it was speculating with
an incorrect prediction value. The third thing we need is
a disclosure gadget to persist the results made during
speculative execution. So, as we saw, that might be
loading a cache line according to a private value. Finally, we need a way to observe those results to
determine, for example, which cache line was
loaded and from that, infer the secret that was loaded during
speculative execution. If any one of these four
parts are not present, the speculative execution
side channel will not be able to succeed. So, starting with
speculation techniques, we have the three from
the three variants reported. We have conditional
branch mispredicts. This doesn’t have to
be bounds checked, this can be any
conditional branch. So, for example, type check could lead to speculative type
confusion if mispredicted. But the thing is, these conditional branches can be trained based on past behavior. So, we can make it
very likely that speculative execution will
take the conditional roots, the conditional path we
desire during speculation. Variant two was the indirect
branch misprediction. Similarly, as the CPU executes, it maintains what’s
called the BTB, the Branch Target Buffer, which maintains a list of branch targets during execution, and speculative
execution will use this internal buffer to
predict where to jump. We can also collide
different entries, so we can have
two different addresses that point to the same
internal BTB entry. Finally, Meltdown, was whether CPU can perform, for example, a kernel load from userland and forward the result
of that onto subsequent microoperations before the permission fault
will be delivered. So, now that we’re able
to trigger speculation, we need a windowing gadget. Once again, this is required so that speculation can execute for long enough that we are able to persist the results by
reaching a disclosure gadget. So, the key point here is that windowing gadgets can
naturally occur in code, they can be something as simple as dependency chain of
arithmetic operations, for example, or more commonly even just forming
an uncached load. So, with speculation running, we can now begin to see how a side channel can
be formed from this. So, a side channel has
three stages generally. The first is priming the system into a known initial state. The second is
triggering or waiting for some victim
activity to occur. Finally, an attacker would
need to observe whether the state changed to
infer information about what happened during
the victim activity. So, in the context of
speculative execution, the disclosure gadget
will typically be loading a cache line according to some secret value
which might have been read after
bounds, for example. So, for flushing
reload primitive, what an attacker
will do is they will first flush an array
of cache lines, the disclosure gadget
will then load one of those according
to a secret, and finally the disclosure primitive will time how
long it takes to load each of those cache lines
and whichever one’s faster as likely to be
loaded into the cache, and from that we can infer what the secret value was during
speculative execution. So, just to sum up again the four components of
a speculation attack. We have the
speculation primitive, the windowing gadget, the disclosure gadget and
the disclosure primitive. Once again, we need all
four of these to be able to leak information
through a side channel. So, relevance to
software security. Variant three is specific to the kernel to user information
disclosure scenario, that’s exception delivery,
but all the others generally apply universally
across the board and so, we’re going to need
mitigations for those. So, now that we
understand exactly how speculative execution can lead
to a side channel attack, we can begin to go
into the mitigations that we can put in
place for these. So, we have three tactics. The first is preventing
speculation techniques. Specifically, what we
mean by this is we want to prevent
unsafe speculation, where speculative execution can lead to a disclosure gadget. The second is removing
sensitive content from memory. So, this is limiting what speculative execution
will be able to read. This can eliminate
entire scenarios or simply reduce the risk
from certain scenarios. Finally, removing
observation channels. This is making it more difficult or even impossible
for an attacker to infer what changes were made to the cache state
during speculative execution. But once again, there’s
no silver bullet. We require a combination
of different hardware, and software, and mitigations for each of
the scenarios present. So, starting with preventing
speculation techniques. Once again, the goal
here is to prevent a speculation primitive from leading to a disclosure gadget. First thing we can do
is use some kind of serialization of the
instruction pipeline. So, on X86, we have
the LFENCE instruction, which has the neat property of acting as a speculation barrier. So, if we go back to variant one, we see this bounds check
on an untrusted index. What we can do is
insert an LFENCE as a speculation barrier
here after the check. What this will guarantee is that the subsequent two
array indexes will not be executed until speculation
has resolved to this point. So, this code will never execute speculatively with
untrusted index after bounds. Second thing we can do is have some kind of
implicit sterilization. So, this is forcing safe behavior down an architecturally
incorrect path. So, going back to a variant one, what we can do is, considering if the inner code executes even when
untrusted index is after bounds, we can use conditional
move instruction to set untrusted index to this zeroed register if it is greater than or
equal to length. What this will do is it will make the behavior of
speculative execution safe because it will
simply load zero from this buffer which is
going to be in bounds. For doing this, we have the Qspectre command
line flag in Visual C++, and this will
automatically identify potentially vulnerable patterns and insert appropriate
serialization. Similarly, in Microsoft
Edge we have mitigations in the Chakra JavaScript
engine which inserts serialization to
prevent an attacker from being able to
construct these patents. Second thing we can do is
have some workload isolation. So, we talk about
the Branch Target Buffer. Typically, this
prediction, these kind of prediction states
issued either per core or per simultaneous multi-thread in the case of
simultaneous multithreading, such as Intel hyperthreading. So, what we can do is, in Hyper-V we can use CPU groups and many
routes to assign a certain core to
a particular guest, and then the others for the host. What this will do is, since the branch prediction state is not shared between
the host and the guest, a malicious guest
will have no way of colliding the branch
prediction state. So, the next thing is the- with the recent microcode updates provided
by Intel and AMD, we have some new modal
specific registers, which can control
indirect branches. So, we have IBRS first of all, which essentially
acts as a way of allowing- of creating
two different privileges. So, you can set IBRS to zero for the less
privileged state, and then on kernel entry
for example, you can set it to one, and this will create
the guarantee that the more privileged
state will not be able to be influenced by predictions made in
the less privileged state. The next thing we have is IBPB, which essentially allows us to flush the prediction state, and this can be used
when switching between different a hypervisor
contexts for example, to prevent different
contexts from poisoning each other’s
prediction state. Finally, we have STIBP
which once again, certain prediction state
will be shared among two sibling hyper threads
on a single core. When we set STIBP to one, it just offers the guarantee that sibling hyper threads
won’t be able to poison each other’s
branch prediction state. All supported versions of Windows client make use
of these by default. The next thing we can do is use the final thing to prevent speculation techniques
we can do is, you safely speculated or non speculated indirect branches. So, on Intel CPUs, FAR JMP and FAR RET instructions, which are indirect jumps
which changed the segment, will not be predicted and so we can replace
indirect branches with these, and that will
prevent variant two. Similarly, for AMD we can use this elephant
sterilization instruction which will guarantee
that the behavior is safe during speculation. Finally, we have
this proposal from Google for “retpoline” and this allows us, this acts as a way of allowing catching speculative execution
in an infinite loop, while the architectural route will form the indirect
jump as usual. For the Hypervisor and Windows
kernel we’re exploring combination of these to make
best use of performance. For removing sensitive
content from memory, the goal is once again limit
entire attack scenarios, or just limit the risks
as best possible. So, the first thing we can do is have Hypervisor
address-space segregation. So, what this means is that the Hypervisor will
only ever mapped guest physical memory on
demand as it’s needed, and as opposed to
historically where offers or guest memory
was always mapped, and what this means is that if a guest if a guest VM forms a hypercalls
into the Hypervisor, only its own guest
memory will be mapped, speculation in the Hypervisor
will not be able to read any memory of other guests. The next thing we
have is KVA shadow. So, this applies specifically
to variant three. Previously the end
user mode execution we had the kernel page
table entries mapped, but just marked as inaccessible. What we do with KVA shadow is when transitioning between them, we ensure that the
user mode execution never has the kernel page
table entries mapped. What this means is
that speculative execution and user mode will not be able
to read kernel memory, because it’s not
physically present. All supported versions of Windows client make use of this, and the final tactic we have is removing observation channels. So, once again
the goal here is to make it difficult
or impossible for an attacker to observe Changes made during
speculative execution. Best thing we can
do is we can map guest physical memory as
uncache when the Hypervisor. So, here we have
some system physical memory in the guest is still
mapped as write back cache, so there’ll be no performance impact for the guest itself. Within the Hypervisor we
map it is uncacheable. What this means is that
a speculative execution and the Hypervisor attempts
to perform a load. Since it’s marked as
uncacheable memory, it will never bring that
into the cache and this acts as a generic mitigation for
host-guest flush and reload, which requires
shared cache lines. The next thing we can do
is we can ensure that we never share any physical
memory between guests. So, similarly, we
want to prevent flush and reload between guests, so we just ensure that each guest has its own copy of everything in physical memory, and so they can never influence
each other’s cache state. Final thing we can do, is we can decrease browser
time of precision. So, there was this API performance.now accessible
from JavaScript. Which could be used to time
single load and determine whether that memory that it was loading was
in the cache or not. What we do is we decreased
the precision of this and add random jitter to prevent
clock edging techniques. So, it is now impossible
for an attacker to infer whether or not a single load is in the
cache or not. All right. So, closing remarks I just
want to sum up once again, that there’s no silver bullet. For each of the scenarios
present we’re going to require a different
combination of mitigations. Once again going
over the variants. They’re all hardware
vulnerabilities, variant one is going to
require software changes. So, that might be adding appropriate serialization
by the compiler. Variant two, is
mitigated by the OS making use of the indirect
branch controls as we saw. Finally, variant three,
this was the kernel to user information
disclosure meltdown, and that it’s completely
mitigated now with KVA shadow. All right, so, since
then we’ve been made aware of some new variants, we have Speculative Store Bypass, which made use of miss-predicting data
dependencies between load and store instructions. This can be mitigated
by identifying vulnerable code packing and inserting instruction
sterilization once again. It can also be
mitigated by disabling this memory disambiguation
optimization by the CPU. But this is not done by default because there are currently no known exploitable patterns in Windows code. Second thing. Second variant is lazy
floating-point state restore. So, this was
an optimization made by the Operating System when context switching
between processes, the floating point
registers would not be copied they would simply
be marked as inaccessible, and then the first time they made use of this will
trigger an exception, where the kernel
would restore them. To mitigate this we just disable this optimization and the floating point registers
are always copied. Then we have Bound
Check Bypass Store, this was simulated variant one, if we have a conditional branch miss-predicting leading to
an out of bounds store. If that store corrupts
an indirect branch target, that can leave an attacker
with arbitrary speculation, as an arbitrary address. The way we mitigate
that is just by adding speculation
barriers again, similar to variant one. Finally we have NetSpectre, which is the first speculative
execution side channel not using the cache. That was timing
the AVX instructions. The mitigation for that
is once again just using our speculation barrier,
and vulnerable patterns. We expect first speculative execution side
channel vulnerabilities to be a continuing
source of research, and so we have our Speculative Execution
Side Channel Bounty, max payout is 250k for
new variants. We also have on technocrats and blogs with more
technical analysis of any of the variants, as well as develop the guidance. All right, so thanks
for listening, and thank you to everyone
who has worked on this. It’s been a tremendous
undertaking. Thank you.>>Questions?>>So, I’m going to ask a question if- Oh Chris,
sorry right there. Okay, yeah, yeah. Chris.>>How expensive are
the various mitigations?>>Sorry?>>How expensive are
the various mitigations?>>So that depends on the operating system itself and your how recent your CPU is. So from our analysis, the latest Windows 10
with modern processor from within two years
is less than 10%. For older operating systems
such as Windows 7 where there are
some differences, for example the kernel. The kernel does all the font pausing there is more kernel
to user transition, that’s slightly more expensive, but yeah for the latest Windows
10 with modern processor, the performance impact is not that noticeable,
it’s single-digit. There’s more analysis
on our website, if you want more details.>>So you told us about this
complex grid of mitigations, where it seems like
it’s hard to tell if that you’re done filling out that grid and coming up with all the relevant advice to
avoid security problems. I wonder how much of this
you think is coming about because the hardware
is close to us, and we can’t even
principle do an end to end foundational
analysis of security. Could this be some motivation for adopting more
open source hardware?>>Good question. So, we have
been working with Intel. We have a non-disclosure
agreement with them. So we have some information, but yeah absolutely some
details are not known to us. That’s why we rely on
our bounty part partly for more information
to be made available to us and we’ll react
as best we can. Thank you for your question.>>Sure.>>I also have questions along the lines like
sort of similar. My question was
a very interviewer question. Right. I mean, I think
it’s not unlikely that within the next year
or so there’s going to be yet another
speculative execution, way of doing special
speculative execution to exploit kernels. Right so->>Okay.>>-and it looks like the process we have in place
right now is, hopefully, they’re not going
to release it to the public whoever
discovered this and hopefully gives time to the Microsofts and
Googles of the world to go patch the kernel sort of in the opposite of the world
to go patch the kernels. It seems pretty sad to me. I don’t know. It seems like
a sad state of affair. So, is there any
investment into having a more principled solution
to these things like you are mentioning sort of disclosing open source,
open sourcing hardware. It’s on some way it’s very
nice, on the other hand, like Intel is probably not
going to do that anytime soon. So it just sort of feels
to me that we’re kind of stuck with a bad situation
on our hands.>>So the mitigations we
have in place are designed to not only mitigate
existing vectors, but also to proactively consider reducing the attack
surface as much as possible. So as we saw with
the indirect branch controls, we can flush the prediction
state regularly as well as others but yeah, we have our own
internal research, it’s ongoing and we’ll continue to mitigate
it as best possible. Thanks.>>So I have another question. So it seems like a bunch of
the mitigations that you mentioned require, for instance, modifying binaries so that certain instruction
sequences have the appropriate
protections yet in them. But presumably, there are
situations where customers who are potentially even
attackers are allowed to load their own code that they’ve compiled
or written in assembly. So, in terms of mitigations, is there been any thought into what can be done to address
those kinds of situations?>>Yeah, very good question. So, you mentioned situations where an attacker is able
to supply their own code. So for example, one of
those scenarios is in the browser where we’re running arbitrary JavaScript
from an attacker. As I mentioned, Chakra, the JavaScript engine of Edge, has its own heuristics to
detect patterns such as variant one and it’s
an appropriate serialization. More generally under that though, you mentioned that
within Microsoft, we have a lot of code and rebuilding the whole world
isn’t always possible. It’s why we have a
combination of mitigations just aiming to mitigate the
problem as best as possible. Yeah. Also for
hypervisor scenario, we have mitigations from guest to guest as I talked about. So, really, it’s just
limiting the severity of the attacks and yeah, doing as much as we can. Hopefully that answered
your question.>>I guess I’m looking for slightly higher level answer
in the following sense. Does the current state of
affairs keep you up at night?>>So, I think->>Or you’re relatively
happy with the mitigations?>>I think at the moment the mitigations
are pretty strong. Once again, we have our bounty. So, if real-world attacks
are submitted to us they might be eligible for
a bounty and we’ll try to mitigate them as
best as possible. But once again,
it’s a continuing, ongoing matter of research. So, at the moment, I think we’re well protected, but we’re ready to react if more information
becomes available to us.>>Are there
any no public instances of Meltdown or Spectre attacks that have
been rather than proof of concepts researchers are showing, “Look
we can do this.”>>To my knowledge, we have-.>>My question is not
just about Microsoft, it’s just in general like
if you happen to know.>>Yeah. To my knowledge, I
don’t think these attacks are being actively used in
the world right now, but what we see from our detections is
only a small sample. So, I think it’s possible in the future
they might be used by attackers in real world scenarios but I can’t comment
further right now.>>Any other
questions? Okay, let’s thank the speaker then.>>Thank you.>>Okay. Our next speaker is Professor Margaret Martonosi
from Princeton University, and her research interests
are computer architecture and mobile computing with
a particular focus on power efficient systems. Her current research is focusing on hardware software interface approaches to managing
heterogeneous parallelism and power performance
trade offs in systems, ranging all the way
from smart phones, all the way up to
large scale data centers. Professor Martonosi is
a fellow of IEEE and ACM, and she’s won numerous awards, I’ll just mentioned two. In 2015, she won the ISCA Long Term Influential Paper Award, and in 2017,
the ACM SIGMOBILE Test of Time award. Take
it away Margaret.>>Thank you.
Good morning, everyone. So, this follows nicely from the wonderful
previous talk because the previous talks gave
us the state of play, and what I’m going to try to give here is some thoughts that relate a bit to your question about our attempt at a principal
waiting forward. So, parts of the story do start in January
was spectrum meltdown, but a lot of the story
starts much earlier. I’m going to take you through the flow from
our earlier work, Verifying Memory Consistency
Models to our current work, Synthesizing Security
Exploits Automatically. So, we started in about five years ago
with a simple goal. Memory Consistency
Models have to do with enforcing the ordering of memory events in hardware
software systems in a well-specified way, and we had the goal of saying for a particular part of the Memory
Consistency Model namely, from the specification
given by the ISA to a particular implementation
in hardware, is that correct? Does that pipeline
correctly implement, say Intel’s total store
order memory model, or arms weaker memory
model, and so forth? We did that based on an axiomatic approach that I’ll go through it in
a little bit of detail. After that, we
recognize that actually that localized view at
just the microarchitecture compared to the ISA was
insufficient in many cases because there are
so many other parts of the Memory Consistency
model landscape. In particular, high level
languages have a memory model. C specifies a memory model with atomics and sequential
consistency and so forth. The compiler and the OS
play a role as well because the compiler maps from
those C constructs down to assembly language and
the OS manages virtual to physical address
translations that also have a role to play in
Memory Consistency Models. Then lastly, the
microarchitecture specified as a pipeline is
only a piece of the puzzle, because there’s the full coherent memory hierarchy
to worry about, and there’s also the fact
that eventually this gets mapped down to Verilog
or something like that, and we need to make
sure that that, too, represents
a correct implementation. So, over the course of
the past five years, we’ve developed a suite
of tools that addressed this with this general philosophy being unified across all of them. The basic approach that we
use across all of them is to have an axiomatic
specification that’s given alongside
the implementation, and that can be
automatically translated into a set of
Happens-Before graphs. Now, Happens-Before
graphs have been used by higher level compiler and software people for a while, where the nodes in them are typically instructions
or coarser granularity. We’re taking those
Happens-Before graphs down so the microarchitecture and the implementation
level where they map more to hardware features. The key thing that we’re
doing is we’re saying, if there’s an axiom that says that A must happen before B, then we can draw an edge for it; if there’s another axiom that says if B must happen before C, we can draw an edge for it; if there’s an axiom that
says that A must happen before A, we can
draw an edge for it; and in fact we can enumerate
this effectively across all possible orderings for the software running on
a given hardware implementation, and so that’s why I
show multiple layers of these Happens- Before graphs. The key thing is that
if A happens before, B happens before C, and C happens before
A, and that’s a cycle. That’s saying that A is happening before itself,
physically impossible. So, every time we can show that a particular Happens-Before
Graph is cyclic, we can show that that is
a physically unobservable event; it will not happen. So, if there’s
something that we’re verifying that
should be forbidden, it should never happen, we need to ensure that
every possible interleaving, every possible Happens-Before
Graph is cyclic, and so that’s the secret sauce of all of these tools
up and down. The key thing that’s
relevant here is to recognize that
the same sorts of event ordering through memory issues that make a Memory Consistency
Model correct or incorrect also play intricately
in the space of these side channel attacks
that we just heard about, because the ordering in which you access memory is
a key part of it. So, over the past two years, we did this sort of transition from Memory Consistency Models
into the Security Space. So, first I want to
tell you a little bit about these Axiomatic Models. Here’s a sort of a simple view
of a dual-core processor. Five stage pipelines;
fetch, decode, execute. Kind of like your
architecture class for undergrad along with some sort
of Coherence Protocol, Single Writer Multiple
Reader, and so forth. We can take that and we
can ask the designer, or we can help automate the process of expressing
that as a set of axioms, and I’ve shown
a very simple case here, and just two of the axiom. So in this case, the top
half of this box is an axiom written in
our domain-specific language called muSpec that
basically says, instructions are
fetched in order. Okay. The second half is for
this very simple processor, a very simple axiom that says, if instructions are
fetched in order, they will also be
executed in an order. That’s all. So, this is very simple but we
have actually built up axiomatic specifications for processors as complex
as Intel Sandy Bridge, including the virtual to physical address
translation issues in which we have parts of the specification that
correspond to hardware, and other parts of
the specification that correspond to
axioms that are actually enforced by orderings done by the operating system. These axioms can be composed so that axioms can be written
by an OS specialist, and the hardware axioms
can be written by a hardware specialist and we
can put the two together. So as I said, we have a process of effectively,
exhaustively, but we’re using SMT solver, so we aren’t sort of stupidly exhaustively enumerating
all possible interleadings. So, what you can see there
are a whole bunch of Happens-Before Graph
starting to be enumerated. Each of the nodes in one of those columns corresponds
to one stage in a pipeline. This is for a microarchitecture
level consideration. So as you go down, one of those columns, that’s an instruction
going through fetch-decode-execute
and perhaps a memory hierarchy stages as well. The different columns in
each one of those boxes, the different columns here correspond to
different instructions, and every arrow that’s
drawn is drawn based on some axiom that we learn
from the specification. Nothing is assumed, so we don’t assume program order
or anything like that, we check everything about
the microarchitecture. So, we come up with this family of microarchitectural
Happens-Before graphs, graphs and then we use SMT solver techniques to make it efficient to check for cyclic or acyclicity for each one of these many
Happens-Before graphs and as long as we find a cycle and something
was supposed to be forbidden, we’re good. If there was something
that was supposed to be forbidden and we find
an acyclic case, we can give that to
the designer and we can say, here’s your problem. We have had cases where we give that to a designer
or we look at it ourselves and we can figure
out where the erroneous. Design aspect was
that caused us to be missing an edge that would have ordered things
appropriately. We’ve also found cases
where we were missing an axiom and had to add an axiom. So we can go either way and the tools are fast enough
to be interactive. For these kinds of
specifications, the runs are seconds, minutes, occasionally hours,
not too often hours. So, we started as I said, from that sort of ISA to microarchitecture
view, but clearly, real system span from high-level languages through
OS and compilers and down to microarchitecture and below and so we wanted
a more comprehensive view. A more recent tool
that we started about three years ago called TriCheck, has this sort of
three-layer view. So we start from high-level language litmus
tests written in C, in our case it could
be another language, and we take them through some sort of evaluator that
says what is supposed to be permitted or
forbidden about that litmus tests from the high-level language memory
models point of view. So we get that permitted or
forbidden output up top. We also take them through compiler mappings that take from C down to an instruction
level view of things, and then across through
our axiomatic models to a microarchitectural
hardware-aware view of what is observable or unobservable and we put those two together, and you can see this sort of matrix that results from this. If the software says that
something is supposed to be permitted and our model says
it’s observable, we’re okay. If the software says that
something is supposed to be forbidden and our model says I have enumerated everything and every single case is cyclic, it will never be observable, also okay and those are
the two green boxes. If the software model says
something is supposed to be permitted and we say it
will never be observed, that is overly strict
but not a bug. So that’s a case where
you might be leaving some performance on
the table but it’s okay. If the software says that
something is supposed to be forbidden and we find a case
where it’s observable, we find an acyclic happens
before graph, that is a bug. So, to test out the utility
of this kind of a framework, we tried it out on
a new emerging instruction set architecture called RISC-V. I should also say, as I said, this is fast enough. These are sort of
minutes of execution. This is fast enough that you can iteratively run through
design processes. You can decide when
you find a bug, what do you want to change? Do you want to change
the ISA itself, the compiler or the
microarchitecture? We actually found bugs basically
up and down the stack. We have found bugs in compilers, we have found bugs in microarchitectures and as
I will talk about now, we have found bugs
in an instruction set architecture, namely RISC-V. So for the RISC-V case study, we started a couple years ago when RISC-Vs instruction
set architecture, this is now so widely known, open-source instruction
set architecture. At the time it was in
a draft specification mode, but it was still being widely
used and talked about. We took 1700 different
C11 programs as our high-level language
litmus tests and we developed axioms for seven distinct RISC-V
implementations. So of each of these would be a legal processor
within the RISC-V spec, but different amounts
of out of orderness. So you can imagine one being a simple in-order
single-issue processor with no speculation all the way up to fancy out-of-order
pipelines with lots of reordering
and speculation. They all abided by
the specs though, but they varied in reordering. What we found when we went
through this process, hundreds of times we were
ending up in the red square, the buggy outcome square. It was true both for the base specification of the ISA as well as
for one that had additional support for atomics that was supposed to actually help with exactly these
kinds of problems; providing appropriate
fences and so forth. The problem was that it actually didn’t provide
appropriate fences. So in the previous talk,
for example, Christopher talked about
inserting L fences at key points to
order parts of the code. RISC-V did not have a sufficiently what’s
called cumulative type of fence of that sort to bring back ordering when it
was needed and in fact, it could not legally compile many C programs
as a result of that. Because there are C constructs in the C11 memory model that say that a programmer is
supposed to be able to ask for sequential consistency, and if you don’t have
the right kind of fence to actually
implement that ordering, you can never compile
that program correctly. So that’s one of several
issues that were found, that led to these kinds
of buggy outcome results. We worked to get RISC-Vs
attention and eventually, after our paper was published, we did get their attention, and a memory model
working group was formed about a year ago to address
these issues and it’s really a nice sort of win-win situation in the sense that the memory model working group
was able to work through the issues and create a memory model that’s
not just a more correct than before but is also formally specified and a lot
cleaner than before. Just last week, the memory model working group
and the RISC-V consortium members voted to ratify this new improved
RISC-V memory model. We’re going through
the final dotting the I’s and crossing the
T’s of making that real, that ratification
real. So that’s great. What about spectrum meltdown? So as I said, about a year ago, we were making
this mental transition from the memory ordering
issues that you worry about for Memory
Consistency Models to the memory ordering issues that you worry
about for security. So here’s my one
slide simplistic version of what you just heard
a half an hour about. Spectre and Meltdown
are essentially, take a well-known cache
side-channel attack in this case, flush and reload. Mix it with a widely used
hardware feature speculation and what was surprising was not either of
those on their own, but the facility at which new
exploits could be created and so clearly there was
an awful lot of news that broke. We had actually already
been working for about six months at
that point on a tool that would build off of TryCheck and address some of these issues and so in January, we set to work to recreate Spectre and Meltdown and see what else we could find along the way. The basic principled approach that we wanted was to step
away from the idea of security being kind of “close the door after the horse
is out of the barn” to a more principled
forward-looking approach where we could give designers tools that would help
them reason about their systems in advance
and more automatically. So that you don’t have
to stare so much at individual designs
but instead you can have more automated analysis. Our goal was the following; we wanted to be able to
give someone- give a system a good specification
of the system to study, in a specification of a class of attack patterns and
then from that say, go, analyze, synthesize,
tell me what you find. Can you find that attack pattern exploitable in that specified
system? That was the idea. Either output synthesized attacks or determine that
none are possible. Now you could say that this
is a malware generator. It kind of is, but the goal
is to have this be in the hands of designers
rather than in the hands of people
who want malware. So what we did is we did that. We developed a tool called
CheckMate to do this based on the microarchitectural happens before graphs that I
already talked about, and the too long didn’t read
version of this is that we- the tool automatically synthesize Spectre and Meltdown, as well as, two new distinct
exploits and many variants. The top link here is our archive paper from January where we talk just about the two new variants and then the bottom link is the draft hot off
the process of a paper that will get published
in October about the actual tool by which we did this and
techniques by which we do this, that I’m going to
talk about next. So in more detail, the idea is to frame
these classes of attacks as patterns of
event interleavings, but hey, that’s what memory consistency models
are already doing. That’s what our happens before
graphs are already doing. So essentially,
we’re saying here is a fragment of a
happens before graph. Do you find it anywhere
in an execution? And second of all, we want the executions
to be hardware specific. We want to know if the attack is realizable on a given
hardware implementation. So we need a way of specifying
hardware and we do that with the new spec axioms
the same as before. So as before, we have the ability to
take a microarchitecture. Take a microarchitecture
and turn it into axioms and we have this ability, unlike before, to
give it a pattern. So, instead of saying take the axioms and tell me
if you find a cycle, it’s take the axioms
and tell me if you find this pattern in
an acyclic execution. So, it’s a little bit beyond, it’s actually a lot beyond where we were before because it’s a cycle check with a pattern
finding action as well. So, for example, one of the things that I
didn’t have time to talk about in the Memory Consistency
model space is we have a notion of how to
manage cache lifetime. So, when a value comes from cache line and where
it was sourced from. We call those values in
cache lifetimes or ViCLs. So, we can come up
with constructs that allow us to reason about the possible
sources of a value. Did it come from the store buffer or did it come from the cache? If so, was it an eviction out of the cache in between
and so forth? So, ViCL creates the correspond of something new in the
cache and ViCL expires, it corresponds to something being evicted out of the cache. So, you take your
microarchitectural axioms, you take a pattern that
you’re looking for, and you take some constraints
on the number of cores, the number of threads, the number of instructions
to keep things tractable, and you send that into
your tool CheckMate. Again, it enumerates
possible execution graphs that, A, are acyclic and that, B, shows this pattern. Now I think you can
see that this is a case where
automation is a huge, sort of, brain helper. You would hate to
have to stare at complicated graphs and look
for that pattern in them, you want some help with this. So, specification is
essentially the same as before. Speculation is something that new spec is already supported. Basically, we can allow for
items to be brought into the cache in a way that isn’t necessarily ordered with
instruction execution, right? We can allow for items to be brought into the
cache in a way that isn’t particularly ordered
with branch instructions. So, those are the kinds
of things that actually were raised
in the previous talk. In addition, because we have
this full-stack analysis, we can handle user-level
software operating system or hardware events and locations, user-level software
operating system and hardware ordering details. Hardware optimizations like
speculation fit in well because we’ve handle
operating systems in the past in a tool
called COATCheck. We can handle processes
and resource-sharing, and memory hierarchies and
cache coherence protocols. I don’t want to oversell this. We are not going down to
Verilog in this tool, although we do in other tools. But we do think that once you’re operating in
this axiomatic landscape, this is a huge help in
automating the analysis. So, the last piece of
this puzzle is how do we do these pattern enumerations, and that’s with
Relational Model Finding. So, in the first half
of the talk, we did the cycle analysis
using SMT techniques. This is Relational Model Finding because we need to
find a pattern, not just a cycle. So, RMF essentially tries to find the satisfying instances or sub-parts within a larger graph. In this case, we’re doing
this by taking the new spec, translating it into Alloy, which is a domain-specific
language that’s intended to fit into a relational model
finding approach. The RMF problems get mapped onto a model-finder
called Kodkod, which in turn uses
off-the-shelf SAT solvers. So, in this way, we can feed together and
automate the process. So, here is what Spectre
looks like in our world. So, as before, these columns of nodes correspond to
instructions being executed. You can see by the label on the top the different
threads that are involved. You can see, for example, the Spectre was based on a Flush and
Reload threat pattern. So, in the upper right, that’s the pattern that the Relational Model Finding
technique is looking for. You can see that you
would not want to analyze that graph to look
for that pattern in there, but it’s in there. Follow the red arrows,
and you can find it. The last thing that it does is it generates
the skeleton or the security litmus test that would correspond to
a version of Spectre. This is a template for code. You still have to make it concrete with particular
addresses and so forth. But the step from here to
a real piece of Spectre code is pretty straightforward piece of programming work for
someone who is familiar with the instruction set
that they’re operating in. Okay. So, that’s Spectre. One of the things that we
wanted to do was to say, Spectre was based on one class of exploits called a Flush
and Reload threat pattern. We wanted to see what about
other threat patterns? Prime and Probe being one, that has been talked about
a lot in the literature. So, we said, “What if we put
in this different pattern, the Prime and Probe pattern, in. What happens then?” In fact, what happened
then was that we found two distinct variants
of the exploit. In this case, using
invalidation patterns between two cores
rather than Flush and Reload patterns on a single core
to create a new case of almost identical to Spectre, but with invalidations being the way that things
were evicted out of the cache rather than
flushes on a single core. Again, the tool generates the security litmus
tests automatically. We call them security litmus
tests rather than malware because the idea is that just as in
Memory Consistency Models, we have built up as a community
a suite of tests over time that designers use to
stress test their systems. We view the ability to generate security litmus tests as
an important construct for designers to help
them design against these threats and to explore
new classes of threats. So, one of the things
that we want to do in subsequent work is to automatically generate
the threat patterns that might be most interesting, and that’s actually
where we are right now. Okay. So, second to last slide, this is the money
shot in some ways. So, the top part of this table are the Flush
and Reload patterns, the bottom part of this table
are Prime and Probe, it’s the exploit pattern. The number of instructions
that we place does a bound in the Relational Model Finding
because that does affect the execution time
of all these tools. But what you can see is that with relatively small
instruction counts, you can generate pretty real
exploits such as Spectre, Meltdown, and the new variants. The minutes to synthesize
the first exploit, that is, this thing is relational model finding and finding exploit, the first one is five minutes
to a couple hours. It continues to run until it has found all the
possible exploits, all the possible ways that that pattern can be found in all the possible graphs
that could be enumerated in that seven minutes
to three or four hours. The number of exploits that were synthesized
correspond to all the different balletic
ways that you can create something that’s
Spectre-like or SpectrePrime-like, and you can see that
those numbers get quite high. So, one of the values
that we see in this work is the ability
to give someone the reassurance that
they found not just a way that someone could
exploit their code, but hopefully
a whole bunch of ways that someone could exploit
their hardware. So, in terms of takeaways, yes, we found two new variants, SpectrePrime and MeltdownPrime, that use cache invalidations rather than C FFlush. But more importantly to us is this key overall philosophy that the event ordering issues of security exploit patterns
aligns strongly with the Memory Consistency
model analysis that we’ve already been doing. It’s a very principled step from ad hoc one-off analysis of different exploits to
formal automated synthesis. The goal has always been to span software, operating system, and hardware for a holistic,
but hardware-aware analysis. Those are the two papers that
I am inviting you to read. If you remember nothing else, please wake up and look at
the two names in red because everyone in the room
should want to hire these two
wonderful students. Caroline Trippel sat down after our last TriCheck
paper and said, “I want to work on security.” Within six months,
she was doing this. She is an amazing
PhD student who will be on the markets this year. Yatin Manerkar also did a lot of the work that
I talked about today, including finding through
our toolset errors, and to formally proven
correct compiler mapping proofs, and errors in compilers
that go with it, and some other universal Memory Consistency model analysis beyond witness test that I didn’t have time to talk about. So, with that, I’m happy
to answer some questions. Thanks.>>That was a really
interesting talk. Thank you. So, one
question I do have is, you talk about having
this specific microarchitecture happens before graphs. Once you’ve got one
of those, you can go, such person, synthesize
a whole bunch of examples. This was a natural follow-up
question, which is, if you look at something
like the NetSpectre attack, that discussed
a side channel based on the power state VA VX two units or if you’d look
at variant force, those are about
memory dependent speculation. So, there’s this
natural question of, can you take what you’ve got here and actually go and find these new classes or is it something where
you do need to know all these uhbs upfront and then once you’ve got them,
you can go find them?>>So, as I said, we have found bugs that
span OS to hardware. We have found bugs that
span between cores. So, one thing I
want to stress is, we aren’t doing this per graph, we’re doing this for
a set of axioms. So, as long as one
can write the axioms, we can enumerate the graphs. So, that’s one thing.
The second thing is the things like
memory dependence, there already within
our model. So, yes. NetSpectre, I am pretty sure we could write axioms to handle how the packet processing feeds through the rest
of the system.>>No. To clarify my question. I mean, you synthesize Prime and Probe or you
synthesize this Spectre Prime, Meltdown Prime, things like this. Essentially, either this tool
should have spat out that you can have memory dependence-based speculation
bugs or it didn’t. Either way is interesting.>>So, for example, as far as I know, NetSpectre
is still Flush and Reload. It’s just a different
style of flush and reload that causes the invocation
to happen differently.>>We can take this offline.>>Let me finish.>>Okay.>>So, if it’s within
Flush and Reload, then with the right axioms
we can synthesize it. I agree and I said at the end
that we do want to be able to enumerate new classes
of exploits. That’s what we’re
working on now. Stefan.>>So based on your experience
with CheckMate, do you have any advice
for hardware designers, for how to design a CPU so that CheckMate won’t
find any exploits?>>When we started
this work five years ago, people I think were very
reluctant at the idea of having to have axiomatic
specifications alongside their design. I’m hoping that over the set of observations that we’ve
made over these five years, we’re increasingly
finding designers more open to having that be a key part of
the design process. So, one analogy I
make is 20 years ago, architects didn’t
think about power. They were encouraged not
to think about power. That was for later
in the design chain. Today, if I say
the word verification, people think about
something that’s very late in the hardware
design process. One of our goals is
to make tools that are amenable to being
used earlier in the design process so that hardware designers
will be more open to using them
because they will be at interactive speeds. So, we think that
the axiomatic approach, while not natural to
today’s architects, is helpful enough that people should be
coming around to it. We can talk about whether to do a correct by-construction
flow where the axioms follow
automatically from synthesis tools or whether
the axioms are written alongside a traditional design. I’m okay with either one. The main thing is I think we need these interface specifications
that let us say, “At this point, here are some rules you should
be able to count on.” I think that’s key.
If we start to have these interface specifications
with corresponding axioms, then we can automate different analysis techniques
some of which would be synthesis driven or
correct by construction, and some of which would be ancillary but still
formal documentation. One of the key things is
most Memory Consistency Models, for example, are still
written in English. But increasingly, people are coming around to
the idea that they should be written in a way that can be automatically
analyzed and verified. So increasingly, for example, RISC-V went from not very correct and written
in English spec, to now being something that
is formally specified. There are formal models for it. I think that’s a good sign. Yes.>>Let’s go back to five
years ago where we don’t know how to exploit
speculative execution. Do you think your
methodology can identify any kind of exploit variants
that we know right now?>>So, I’m not going to say that. There’s a chicken and egg aspect
of this of modelling enough to be able
to find the things. We were finding bugs before Spectre and Meltdown broke and we found different bugs
after the news broke. We hadn’t been using speculation in all of our
models before January. So we added it in afterwards. There could be something that we are choosing to abstract away now that we should include
in a model going forward. But the basics of Spectre and Meltdown have been
known for a while. Like speculation and Flush and Reload are both concepts that have been known for
more than five years. So in that sense, people should have known, but I think people
were unaware of the facility with which
they could be exploited.>>It seems that there’s
some area that we couldn’t identify the problem
by using your methodology. It looks very foremost
and very nice. Do you have any idea what you can provide through this method in general and what are the thing that you couldn’t provide? So at the end of today,
we’re going to see it on other type of that channel, probably completely
different variant from the speculative execution. But we couldn’t tell whether they’re actually such a thing
so you could just, by using your methodology.>>So as I’ve said, we are looking at ways to
automatically generate the new attack pattern classes so that we can, for example, you can imagine genetic
algorithms or something that creates new graph snippets and then says “Is this
an intact class or not?” So it’s an ongoing thing but the ability to automatically analyze once you have an attack class seems like an important step forward.>>Great talk. I really
enjoyed your talk. My question is a variant of, I don’t know if I’m
asking the same question or not actually, but so you mentioned
the two side channel attacks you mentioned are based
on memory caches. So the Prime and Probe
and the Flush and Reload. There are a lot of caches in the architecture, not just that. The question is how amenable
is your technique to actually reapply these things to other caches in
the CPU or elsewhere?>>I think everyone in
the room would like it if there was one big answer. There clearly isn’t. It steps on the way. I feel that we have all been lied to about what architectural state
is. Let’s be honest. When I teach an undergrad
architecture class, we talk about architectural state being what the software can see but that is
an extremely nebulous thing. So for example,
Christopher’s talk talked about timing jitter. That’s because there’s a form of observability that comes
from what you can time. One of the things
that we’re working on right now is ways to take these graphs and put quantifiers
onto the edges to say, “This is an exploit.” If the timing sequence is sufficiently observable
and that has to do with the timing analysis
granularity against the performance variations
but you could just as we could imagine
an observer model that puts edge weights based on time, you could get
edge weights based on power dissipation,
on radio emanations, on temperature and say that, “If someone’s in the room
and could measure temperature variations
across the chip, then this becomes a side channel that we
need to worry about.” So there are ways to add
quantifiers to some of this that seem
promising but I’m not, we’re not done with that yet.” [inaudible] and see what happens.>>In some sense
that’s my question is this work at
the level of you need the caroline in the room
to actually do this or more engineers that can
actually use this tool?>>The goal is to have
them be engineers. We gave a tutorial at a school about a year
and a half ago. Our materials are online. The tools, Checkmate is an open sourced tier but the Tricheck and
Type-check and so forth, those tools are all open sourced. The DSL is available. I’ll be honest. It’s still a pair programming experience
probably at its best. You’re sitting alongside
someone who knows what’s what. But the goal is to
have it be something that a hardware designer
can use on their own.>>All right. Let’s
thank the speaker.>>Thank you.>>Okay, our next talk
is going to be given by Onor Mutlu. Onor is a Professor of
Computer Science at ETH Zurich and he’s also on the faculty at
Carnegie Mellon University. Onur`s broad research
interests are in computer architecture systems
and bio for informatics. A major current focus of his is on memory and storage systems. He’s going to talk today
about memory systems. So, we’re going to be
changing a little bit from Meltdown Inspector to
RowHammer and Beyond. Onur has a history with us at Microsoft Research in fact he was the first member of the Computer
Architecture Group at Microsoft Research back in 2006. Onur has won numerous awards and I’ll just mention one here. He was the winner of the
inaugural IEEE Computer Society, Young Computer Architect Award. So, take it away Onur.>>Thank you very much
Alex. Is this working? Okay. It’s great to be back
here at Microsoft as always, and thanks for the invitation
Stefan and Alex. I’ll talk about RowHammer. I actually see that
it’s going to be a change compared to the previous things but I
actually see these as related, because it’s about
the mindset that hardware is not as vulnerable and you
can actually attack things. I think there’s a history
if we had time, that we could go over of
these hardware related attacks. I think Meltdown Inspector
happened because some things like
RowHammer for example, instigated some people to
actually examine issues in hardware and they actually found out
other issues in hardware. We can talk about
that separately. But before, let me see.
This is not working.>>[inaudible].>>Oh, okay. So, before I go into RowHammer, basically we’re going to talk about the main memory system. It’s a critical component of all systems that
we designed today. Whatever you’re
designing you got to have some working storage. This system must scale
in many dimensions in terms of size,
technology, efficiency, cost algorithms we used
to manage it et cetera, to maintain the
performance goals and the scaling benefits that
have been used to so far. In regards to whatever you
attached to main memory, your bottleneck by
that interface the main memory. I’ll very quickly go over
some trends that are affecting main memory
to set the stage, and how we came to a RowHammer. Basically, these are
three major trends that are affecting main memory
as I see them. We want more capacity,
more bandwidth, more quality of service,
more performance. This what I think evidence in
[inaudible] with the beast and the megabeast engines that had terabytes and terabytes
of memory actually. Energy and power is
a key system design concern and technology scaling is ending. This talk is going to be
about technology scaling. But, to understand that, I think we need to cover
the other trends also. We were able to put a lot
of course on machines as applications are
becoming increasing data intensive and we want to
consolidate more and more. That’s driving the capacity
bandwidth and quality of service requirements up and
up performance requirements. This is one example. This was actually from a paper by HP Labs at
University of Michigan. They’ve shown that core count is increasing much faster
than DRAM capacity. That’s why we are bottlenecked
by DRAM capacity. You could argue with all of
the numbers on this graph, and you could say that all
this trend is not continuing, but if you think
about why the thread may not be continuing, we may not be able to feed the course with
the data they need. So we actually may not be placing course as much as we
were doing in the past. But the trend is
actually increasing is similar to this and GPUs. Anyway, we want more
capacity for memory and that drives the capacity
of the DRAM chip. Let’s take a look at
the history of the DRAM in the last 18 years in terms of, how much capacity bandwidth
and latency has improved? This has been always a
capacity focused business. If you look at this,
capacity has improved by more than 100x in
the last 18 years. You can see that in
the last few years, the trend is not exponential, it’s actually staggering
a little bit. So we’re having difficulties
in DRAM scaling. This is an evidence of that. I’ll give you more
evidence in the stock. Bandwidth has not
improved as much, but you could
potentially improve it. What do you think of latency? How much has it improved
in the last 18 years? This much? Yeah, I agree it’s
this much in this graph. It’s basically 30 percent
think about DRAM. If you want to pay
for it you can of course give your arm and
leg and you can pay for it. But latency is almost constant, but DRAM is critical
for performance, capacity, latency, bandwidth, different applications have
different requirements. I think these are
backward-looking applications, we have many more
forward-looking applications that are going to put even more pressure on DRAM. The second major trend energy, is a key system
design concern and memory consumes
a lot of the energy. This is a paper from IBM in 2003, where they showed that in
their big iron servers, 40 to 50 percent of the entire system energy spent on the off-chip
memory hierarchy. Fast-forward to today, their reports from IBM
again in power eight, more than 40 percent of the power is spent just solely in DRAM. That’s true for GPUs also
that other paper from ISCO 2015 is from GPUs and our results are
actually show that also. So, memory energy is
becoming a big concern and one of the issues is the unconsumed power
when it’s not used, you need to periodically
refresh it and this turns out to be
a scaling problem also which we may get to
toward the end of the talk. So, on top of all of this, we’re requiring a lot more from memory going forward we’re
going to require even more with the new applications, but on top of this were having difficulties with
the DRAM technology scaling. Basically, we relied on reducing the size of the DRAM cell
to increase the capacity, but this is ending. Basically ITRS has been
projecting for a long time that DRAM will not scale
below X nanometers. I like keeping X over here, because I don’t need to change my slides but they change
their projections of course. I’ll give you the numbers
for X in the next slide. But scaling has enabled
us to get more capacity, reasonable energy
scaling, lower cost. It didn’t help us with
latency that much but it did help
with other things. So, what is the scaling problem that we’re having with DRAM? For any memory to work, you need to have
a storage device, in DRAM the storage device
is the capacitor, and you time an access device. In DRAM the access device, the access transistor, the byte line and the sense amplifier. Both of these components
need to work reliably for any memory to work. In DRAM, this capacitor must be large enough for
reliable sensing, and this access transistor and the sensing structures
must be large enough for low leakage
and high retention time. This was the value
that was assigned text by ITRS in 2013. They basically said scaling below 35 nanometers
is challenging. What do you guys
think where we are at memory feature size today? Is it 35 nanometers? This is the dimensions
of the cell. Ten? Any guesses?>>[inaudible].>>Ramsey, that’s good. Yes we’re about
maybe 17 nanometers or so. Clearly we’ve gone
below 35 nanometers. But we’ve had issues. So, basically, DRAM scaling
has become increasingly difficult and we’re
going to talk about one of the big problems
in DRAM scaling. So, what have people
done about it? Basically, this has led to the proliferation of
different types of DRAM, both the application requirements and the requirements
from the bottom. As a result there are
many emerging technologies. You can see that their
3D Stacked DRAM, you get higher bandwidth
reduce latency DRAM, Low-Power DRAM,
Non-Volatile Memory. They all have greens, but they all have reds also. So there is no single memory
that’s good at anything. As a result, one major trend has been going into
hybrid memory technologies, where you have multiple
different technologies. Potentially, multiple
different DRAMs. You design the hardware
and the software to manage data allocation
and movements such that you achieve the greens as much as possible while avoiding
the reds as much as possible. This requires clearly changes to the interface and changes to become more intelligent in terms of how we manage memory. But this doesn’t change
the fact that we need to have a memory in the system and the memory
needs to scale. This is one way of
trying to scale memory, but it turns out it’s very difficult to get rid of
DRAM from the system. People have looked at
using MM, for example, or PCM Phase-Change Memory, but it’s going to be
very difficult to get rid of all of
the DRAM from the system. Let’s go a little bit more into detail in the memory
scaling problem. There’s a lot in
the memory problem, memory space and we’re
working on a lot. But I’ll start with
the security part of it, or reliability and safety. I see these as interconnected. I’m going to make the connection. But there’s a lot more to do in the memory areas you can see. Why start with security? I like tying this to
human lives also. How many people here know
about Abraham Maslow? That’s great. He was a very
famous American psychologist. He dedicated his life
to understanding why people do things
they do, as a result, basically, this is
his major work, that book that he iterated
over during his lifetime. He’s probably more famous
for this one essentially, which is Maslow’s
Hierarchy of Needs. He basically said that, “We need to start
with reliability and security,” because if you’re
not reliable and secure, you cannot think
about relationships, friends and you definitely
don’t care about higher levels of art if you’re about to die at this moment. So, that’s why we need to start with reliability and security. This is another thing that I
use actually in my classes. Probably this should
be familiar with people who are living in
the Washington State. This the Tacoma Narrows Bridge that doesn’t exist anymore. This was built in 1940, and six months later, it collapsed this way because
of aeroelastic flutter. The new bridges are
actually doubled bridges. It was actually put in there for bandwidth reasons but it’s
good for reliability also, having two bridges over there. While I was at
Microsoft Research, I interacted with a lot
of security people and this definition of
security I like a lot. It’s really about preventing
unforeseen consequences. I see the previous talk, and the previous two talks
actually thinking about potentially unforeseen
consequences and how can we prevent them. Let me tie back into
the DRAM scaling problem, this is a slide I showed earlier. Basically, we are having
difficulties with reducing the size of the circuit. As we reduce the size
of the circuit, both of the
reliability properties are difficult to maintain. Essentially, this capacitor
becomes unreliable, it becomes more
vulnerable to noise, and this access
transistor becomes more leaky and more
vulnerable to noise. As a result, it’s really difficult to reduce
the size of the circuit. We’ve been doing
a lot of studies, both at the large scale, I’ll give you one example of the large scale that
we’ve essentially analyzed in this paper from 2015. All of the memory errors
that Facebook has recorded over the course of the year in their
entire server fleet, this is a lot of
servers actually. This is a correlational study
as you can see. It turns out as
chip density increases, the server failure
rate increases. This is because of memory errors
not due to other errors. So, there’s a clear
correlation between higher capacity
and higher errors. There’s a lot more data
in this paper if you’re interested which
I’m not going to cover. When we first started studying
the DRAM scaling problem, we also wanted to do
the small-scale studies, and we built this infrastructure which is essentially an FPGA
based memory controller, where we could do a lot of tests using this
memory controller. We could configure anything we wanted and we keep
improving this. We wanted to study first
retention issues but we discovered the RowHammer problem by building this infrastructure. Actually, this was
the infrastructure where we discovered
the RowHammer problem, you could do many tests in
parallel with different FPGAs. We opened sources infrastructure so if you’re interested you can download it at C plus plus program but now it’s much
more programmer-friendly, and you can do
the studies on the FPGAs. We don’t provide the FPGAs.
That you got to buy. So, with this kind
of instruction, you can actually do a lot of
studies on real DRAM chips. We’ve studied DRAM Retention, I’m not going to talk about, this is really interesting, and this is really
the fundamental scaling issue with DRAM. As you reduce the size
of the circuit, data becomes very difficult
to maintain inside a cell. So, a charge escapes
and charge leaks. How do you figure out how long the charge will stay in that, so that you can determine
your refresh rate? We’ll get back to
that if we have time. But while we were actually doing studies in
this infrastructure, we were inspired by
other studies that we were doing in flush memory. Flush memory is very much
prone to real disturbance. We said, “Oh, maybe there are real disturbance in DRAM also. Let’s test it using
this infrastructure.” What we found was actually
curious at that time. We basically found that you can predictably induce memory errors, bit flips, in most DRAM
memory chips at the time. This is called DRAM
RowHammer problem. It’s essentially a simple
hardware failure mechanism that can create a widespread system security vulnerability. You can do it in
a programmatic way. People wrote things like this as one of
the examples, I like this, I put it over here because
I like the title it says, “Forget software-now hackers
are exploiting physics.” This actually I think explains the problem in a nice way.
So, what is the problem? If you look at DRAM, it consists of a bunch of rows, and if you want to
read data from a row, you need to activate that row, which means that you
need to apply high voltage to that red line. If you want to read
some other row, you need to deactivate
that row or this called the pre-charge in DRAM
apply low voltage. Now if you keep doing
this repeatedly, activate pre-charge,
activate pre-charge, activate pre-charge,
activate pre-charge. Before the cells get refreshed, and if you do it enough times, it turns out in
most modern DRAM chips adjacent rows get bit flips. Some bits flip from one to zero or zero to one
depending on the encoding. Now, that’s not supposed
to happen clearly because you were not
even writing to memory, you’re reading from memory, and you’re affecting the
cells that are around you. Those cells could be belonging to some of the application, to the operating system. Essentially, there’s
a reliability problem but this could also be
a security vulnerability. So, we call these
the Hammered Row, we call these the Victim Rows. It turns out that
most real DRAM chips that you can buy in the market, more than 80% of them, at the time we’ve done
these tests were vulnerable. We could predictably
induce these errors. This is actually
a scaling problem because this didn’t
happen before 2010. The first instance that
we saw was in 2010, and all of the chips
that were manufactured between 2012 and 2013 that we tested, were
actually vulnerable. Why is this scaling problem? Essentially, cells got
too close to each other. They’re not enough isolated
electrically from each other. I’ll talk about the causes
very briefly later on but one intuitive cause is essentially
electromagnetic coupling. Because one red line is too
close to the other red line, whenever you toggle
this red line, apply high voltage
the other red line is not electrically
isolated enough, you’re toggling it
a little bit as a result you’re
opening that red line. Which means that the cells
that are vulnerable to this effect are
leaking a little bit, and if you do it enough times, they leak a little
bit enough times. If you do it enough times
before the cells get refreshed, you basically depleted the charge on some of the cells over there. If the cells weren’t too
close to each other, meaning back in 2008 you
didn’t have this problem. This is a very fundamental
problem in any kind of memory, actually any kind of
memory when it scales you get this sort of
real disturbance issues. If we have time, we’ll talk about flush memory but we won’t
have time for that. So, what’s more interesting
about this being in DRAM is DRAM is directly exposed
to the programming language, this is one example of programming language,
assembly language. So, we wrote this code which essentially execute
to the user level. What it does is it basically avoids cache hits for
these two addresses, avoid row hits for
those two addresses, and it basically ping pongs
activates to X and Y, to the same bank, and if
the chip is vulnerable, it’ll essentially
get these errors. You can download this code
and write it on your laptop. Actually, you can download
Google’s code which improved our code and you’re more
likely to discover bit flips. At the time we did these studies, this was around 2012, basically, we ran it on real systems and you can see that
as long as you have a memory controller that’s
good at activating fast, that’s able to
access memory fast, you’re able to
induce these errors. There’s nothing special
about Intel and AMD. All of the memory
controllers that are out in the market are capable of doing that in
real processors today. So, it’s a real reliability
and security issue. In fact, we thought it was more of a security issue
than reliability issue. When we wrote the paper,
the first sentence we used was, “Memory isolation is
a key property of a reliable and secure
computing system, and access to one memory
address should not have unintended side effects on data stored in other addresses.” I still believe this. I think
this is very fundamental. We should keep this invariant. We also said that
you could actually design an attack that could take over an entire system by
exploiting the bit flips. The good folks at Google
Project Zero did exactly that. They published
this beautiful blog post, it’s beautiful system
security engineering, where they said they exploited
the DRAM RowHammer bug, I don’t like the term bug I think a failure mechanism is
a nicer one over here, to gain kernel privileges. This is directly
copied and pasted from their blog posts from 2015. They basically test
a selection of laptops and found a subset of them, exhibit the problem,
and they built two working privilege
escalation exploits. One of them is less
interesting to me, it’s actually
Google Native Client. The other one essentially is able to run a user level process, and it’s able to induce
these bit flips. They were able to
induce bit flips in the page table entries of that user level process that point to their own page table. If you’re able to
actually do that, now you can change the contents
of your own page table. For example, you can gain
right to enable access to your own page table and
once you have that access, you have full access
to the entire memory. That’s essentially what they did. They were able to do this
successfully on I believe 50 percent of the machines
that they’ve tested, laptops. This became even more
interesting at that time, it’s called RowHammer
Vulnerability and people started drawing
pictures like this. I like analogies and this is a beautiful analogy that
someone had on Twitter, “It’s like breaking into
an apartment by repeatedly slamming neighbor’s door until the vibrations open the door
that you were really after. “So, if you want to escape from here you might want to start banging on these walls over here. There’s a lot of attacks that were
developed on top of this, I’m not going to go over this, these slides are available. You can go over
these, people have developed a lot of attacks over the years even
very recently. I’m going to highlight
a couple of them, this is one of the
attacks from TU Graz, these are actually
the same folks who developed Meltdown
Inspector later on. They basically show
that you could remotely gain access to
the system of the website visited by inducing
RowHammer induced bit flips through JavaScript. Very interesting.
This is another one, this basically show that
you could do this on an Android system and
an arm processor, and what they did was
they were able to, because they knew how
the operating system actually allocated pages, they were able to figure out which pages are
vulnerable to RowHammer, they through a profiling process. They were able to fool
the operating system into allocating a page table into a page that they knew
was vulnerable to RowHammer, and they would hammer that
and they would gain access deterministically to
many cell phones this way. That’s another beautiful
paper actually, and you can download
their app I think. I don’t know if this
is still functional, if you’d like to be hacked. This actually more recent. This is May 2018, the same folks at Amsterdam. They basically show
that you could do this through the GPU in an integrated, again, in a mobile system. A GPU is much more because
it can access memory much faster you can actually induce these bit flips much better. You could actually do
it over the network also through the RDMA
by exploiting RDMA. I believe there’s more to come, maybe one solution to RowHammers. This is another attack that
could drive people crazy. I don’t think it’s
a good solution. Let me very quickly go over understanding RowHammer
and then we’ll talk about solutions and then maybe some future
vulnerabilities. So, as I said there
are a bunch of causes, the sector complex problem
as circuit becomes smaller. You have many failure
mechanisms that affect this that in combination
lead to RowHammer. I’m not going to go
into this in detail, but manufacturers are very well aware of it and we’re
in touch with them. If you have this infrastructure,
you can do many, many studies, and I’m going to talk about
a couple of these. Basically, what is
the difference between the address of a row that you’re hammering and
the victim rows? We did the study and it turns out most of them are
adjacent rows as expected, but some of them are
not adjacent because there is some internal
remapping that the address remapping that
DRAM does internally. So, if you want to
hammer really perfectly, you may want to know
this address mapping, or if you want to protect. The access interval
today you can access memory every 55 nanoseconds, that’s the TRC or cycling
delayed as a single bank. If you actually prohibit this, you can get rid of the errors, clear this is one solution. You can throttle
the accesses to memory by reducing the access rate, so this is clear,
you can do that. This is not a good
solution I believe because this reduces your
performance clearly. Refresh Interval is another parameter that you can play with. Clearly, if you refresh
the DRAM more often, the probability of
attack reduces. This is you reduce
the refreshes by seven x, it gets rid of every single error that we see in the DRAM, but increasing the refreshes by seven x is probably not a good solution, even though
it solves the problem. This is very interesting because the attack is actually much more possible if your data pattern is conducive to the attacks. So, if your data pattern
is solid like this, you don’t get a lot of errors, but if your data pattern
is this way which induces much more coupling between the different cells that
are adjacent to each other, you get many, many more errors. Okay, so there are
a bunch of other results. I’m not going to go
through this. The red ones are the important ones
for security. Errors are repeatable if you
can actually flip a bit, you’re going to flip it again and again and again and again. You can actually get many errors per cache-line which means that simple error-correcting codes are not able to get rid
of all of the errors, you need more sophisticated
error-correcting codes. Cells are actually affected by two aggressor rows
on either side. This is actually what
Google exploited to make the attack much more powerful. They basically did
this double-sided Rowhammering and they hammered a single row by sandwiching it between two things
that are hammered. There’s been a lot more
in RowHammer analysis in this paper and a recent paper
that I’ve written. I’d be happy to talk about
that separately also. But, let’s talk about
solutions a little bit. These are more
traditional solutions, I think, which all
have downsides. Clearly, you can make
better DRAM chips, but that’s going to
be difficult to do. You can refresh frequently,
we’ll get back to that. You can have sophisticated ECC, and you can have access
counters to throttle. But, all of these actually come with downsides, I believe. So, we want to have simple
solutions to the problem and our paper actually looked at all of these
different solutions. Let me tell you about what is employed in existing systems
because in existing systems, you have to employ something
to be able to patch it. This is Apple’s patch
for RowHammer. Basically, they said
that they mitigated the RowHammer issue by increasing the memory refresh rates. This is, I think,
employed by industry. This is the
configurability that we have in our memory
controllers today. We can do it and as
a result we do it. I believe, there is
a reasonable solution, which is much simpler than the software-based
solutions that could potentially detect the attacks. Of course, the downside is we actually don’t want to
increase the refresh rates. In real systems, we
want to get rid of refresh as much as possible. If you increase the refresh
rates you’re increasing the performance impact
and also power impact. So, our solution was
a more probabilistic, we call that the probabilistic
adjacent row activation. The idea is after
you close the row, you activate one of
the neighbors or both of the neighbors with
very low probability. This gives you a reliability
guarantee that’s better than the reliability guarantee
that you have for hard disks for today, so this is pretty strong
depends on, of course, how you set your p
probability over here. But, the big advantage of this is you don’t refresh
the entire memory, you refresh only in a targeted way and very,
very infrequently. As a result, the overheads are very low and also stateless, you don’t need to keep track of any state to be able to do that because you know which
way you’re closing before you refresh it
probabilistically. So, there are multiple ways
of actually implementing it. The first one is
actually employed in DRAM chips going forward. I’m not sure if this is
a really good idea inside the DRAM chip going forward
fully because the way it’s employed in existing DRAM chips
without changing the interface is by exploiting the slack
and timing parameters. Whenever you close all
there’s enough slack in the timing parameters that
the DRAM manufacturers can sneak in a refresh to the adjacent rows or one
of the adjacent rows. So, we’ve actually shown
that there’s plenty of slack today that you can exploit to be able to do this reliably. But, going forward, we actually want to remove that slack also, so that we can make
DRAM lower latency. So, I don’t believe this
is a really good solution without changing the interface. The second solution is doing
it in the memory controller, having a more intelligent
memory controller that basically knows which rows are physically adjacent
to each other. This information is not known to the memory controller
today because DRAM actually does remapping of rows internally for
various reasons. But, if this information is communicated to
the memory controller, I believe there could be
much better solutions. So, we need a better
DRAM interface and more intelligent memory
controllers to solve these problems in
a nice way I think. So, this was actually
something that I recently saw. Apparently, this is
one of the Thinkpads. In the BIOS, we can have
different RowHammer solutions. You can either double your refresh rate or have this hardware
RowHammer protection, which is kind of
mysterious, but clearly, they’re doing some
probabilistic solution. So, you can actually
change the RowHammer activation probability
in some way. You can decide
your protection level if you will over here, it was fun to see this. Okay, so industry is actually writing
papers about it, too. This is not related to RowHammer, but this talks about the DRAM scaling challenges in general. It focus on what I said is
the real scaling chance, the refresh problem and the
variable retention time bomb, which we will cover if
you have time still. But, the key point that
I want to make is, rather than recommending
this paper that was written by two unlikely partners that will ever write a paper together, Samsung and Intel, they also
say a good solution for them is actually
co-architecting DRAM and controllers together and having an intelligent controller. This paper actually proposed Error-Correcting Codes
to be inside the DRAM. If you went to the
DRAM manufacturers 10 years ago and said I want error-correcting
codes in your chip, you would be kicked out of
the door as soon as possible, probably because they don’t want to reduce their capacity. But, now actually, DRAM chips going forward will have
error-correcting codes. But, as I said,
error-correcting codes are good at solving random issue. They’re actually costly solution. We want to really target the solutions to
the problems at hand. So, I think, RowHammer
can be sold in a much easier way than
error-correcting codes. The reason they’re putting
error-correcting codes is because of retention
because they they’re not able to determine
the retention times really easily and as a result error-correcting
codes can correct some of those areas that are happening because of
retention issues. Okay. So, I said Intelligent Memory Controls
is one solution, and we know actually
how to build this. We’ve actually been building this for Flush memory for a long time. I believe DRAM is going to look increasingly more
like Flush memory, as it scales down. If you look at
the Flush memory Controller, there is a paper that we
recently written based on about eight years of research that we’ve
done in the field. There’s a lot of
error correction mechanisms that goes into the
Memory Controller. Memory Controller
really understands the different types of errors, and actually targets the error
correction mechanisms to the different types of errors, specializes it’s mechanisms. I’d be happy to talk about
that in more detail certainly. So, basically, a key
takeaway, I think, to solve these issues
going forward is, we want the Intelligent
Memory Controllers. Clearly, we have a challenge and opportunity going forward. How do we design
fundamentally secure, reliable, and safe
computing architectures? Okay. How much time do we have?>>We have the room until noon.>>Until noon.>>So, it’s up to you as to
how much everyone [inaudible].>>Okay. Any questions so far? You said you wanted
this to be interactive, so I can take some questions, and then maybe I can continue.>>Told us about
some hardware mitigations for these kinds of problems, is it conceivable
that there would be some simple conservative
characterization of software that would prove that
even the old style of hardware wouldn’t have
going to have more problems. Maybe your compiler
would then make an effort to meet
these conditions about software.>>So, you’re thinking
of basically somehow analyzing the software
and saying, I think it’s certainly possible, you could potentially
analyze these cases. I’m not sure if it’s really worth the effort because usually, this is you’re
probably thinking of this being a reliability problem in a real production environment, not as a security problem.>>Both.>>For security problem,
I guess you could, then you have to analyze all of the code that runs
on your system. You need to be in
a protected environment and you disallow codes, or maybe you changed the code dynamically if it
does row hammer. I think it’s certainly
possible yes, I believe it’s
a higher overhead solution, because I think hardware, this is really the problem that can be fixed
relatively easily in hardware. That’s my belief.>>Okay.>>But, people have
actually proposed, performance counter
based mechanisms, not necessarily static
or program analysis mechanisms that
tried to figure out whether a program is
draw hammering but, people have looked at
performance counters and tried to figure out “Oh is this code doing hammering.” But, there’s performance overhead clearly associated with those.>>Thanks.>>Sure. Yes.>>Great Jack Horner. I have
always wondered what was the hypothesis that let you guys
to discover row hammer. Right? So, and maybe there are lessons there to discover
more vulnerabilities.>>No, no, that’s
a great question, I think. Well, basically, I’ll
say the hypothesis was this infrastructure that
we built for Flush memory. So, we built this infrastructure
for Flush memory, earlier than we did for DRAM. We knew that there
are a lot of errors, clear we disturb
there is actually a clear problem
with Flush memory, and control is actually
take into account those. We want to say, we knew that to read disturb are actually problem in other memory
technologies also, SRAM for example, and
we want to test all, it could potentially happen
in DRAM if it scales down. So, I think this is the value of the infrastructure, I must say. If we didn’t have this Flush
memory infrastructure, maybe we wouldn’t be building
the DRAM infrastructure also. Okay. One more.>>What do you
think about Intel’s hardware mitigation called TRR, target low refresh, target
refresh, or whatever?>>Yeah, I think we can have a longer conversation
relates that. I believe a probabilistic
solution is much simpler.>>Okay.>>Just to clarify,
in your opinion, sir. Just to clarify, in your opinion, the research community
has proposed simpler and more effective
solution than what the hardware whether Intel or the DRAM vendors
have decided to adopt.>>So, no not exact, I think targeted draw
refresh changes the interface a little bit without
exposing the DRAM internal. So, I believe if you change interface a little bit differently exposing
the DRAM internals, you can have
a much better solution. So, it does
a different trade-off. Basically, they don’t want to. They don’t want to expose DRAM internals to the
Memory Controller. I believe that’s why they went to the targeted row
refresh solution. But, I think if we relax
the interface a little bit, which we have many, many other reasons for doing
so, for example, if you want to enable
in-memory computation, if you want to enable
lower latency, it’s good to get
rid of some of the, change the interface
a little bit. Then, I think you can
go into other source. The other answer
to your question, DRAM manufacturers
actually internally adopting something similar
to what we had proposed. Except they’re doing it again within the boundaries of
the current interface. As a result, I’m not
sure if the solution is going to be very long-lasting.>>I see. I also wanted
to add that there is one research
publication out there that claims that they
mounted row hammer attack on a DIMM that implements TRR
according to the spec. They could not, you don’t know whether the DIMM dustier
or not, because you are, unless you work for
the memory manufacturer, but according to the spec
the implement TRR, and they were able
tomorrow hammer still.>>Just one brief
follow-up on that. I was wondering if
you have any common one the resilience of attack, resilience against attack of PTR versus TRR pseudo
trusted rate refresh, versus trusted rate refresh.>>Okay, what’s
the exact difference?>>Intel has specified marketed
both as specification, so that manufacturers
can comply with, but not all of
those details are open.>>Exactly. I think that’s
part of the problem. If the details are not open, it’s very hard to reason about the efficacy of the solutions.>>I just want to
know what you heard.>>Yeah. That’s all I can say.>>So, you briefly
mentioned SRAM. Have there been any observations of row hammer like
vulnerabilities in SRAM?>>As far as I know,
not in real systems, but a lot of people have
shown that when they build circuits that are
very small feature sizes, SRAM also is vulnerable
to read disturb errors. But, their protection
mechanisms I believe in existing SRAMs in the processors because
they’re easy to do. Right? You don’t need
change any interfaces for those protection mechanisms.>>So, you mentioned the boat bringing down the refresh time, but for that don’t
you think you need to know the retention times
of each rows, which may be widely
variable across rows?>>You mean as a solution
to [inaudible]?>>Yeah.>>So, they’re basically increasing the refresh frequency. Basically, you’re
refreshing more often. That’s not a problem.>>No, but how often? Because the different
rows will have variable retention times due to manufacturing
variabilities.>>That’ true. But their goal is to basically refresh
more frequently such that you cannot do as many activates with
their refresh interval.>>Okay.>>But it doesn’t matter
what the retention time of their orbits as long as you’re refreshing
more frequently, you don’t have
any correctness issues in terms of retention time loss. But you prevent
RowHammer attacks. But your question I think, how much should you increase
your refresh interval. According to
our results, if you’re only solution is refresh, if you want to get rid of
every single error that we’ve seen in our in our dense, you want to increase
refresh rate by 7x. Clearly they’re not doing 7x, they are doing 2x in my opinion. The picture that
I showed you from ThinkPad BIOS was 2x,
that was only option. Is 2x enough to get rid of all of the errors?
That’s a good question.>>Thank you.>>Sorry, got to
ask this question. Otherwise I couldn’t follow
what you were talking about. So, a while back when I’m
still in the the memory area. My understanding is that
RowHammer is caused by broken state in the cell, the band gap and then when you just due to the metal layer. If you use poly-silicon
gate to problem would have gone away, right? So, are you saying
that even today, Samsung still using metal gate, that’s why you say I have this
problem. Is that the case?>>So, I cannot
speak for Samsung, but I think the causes
that I mentioned. I think the cause
that you mentioned is still certainly valid.>>Okay.>>But the but the cause
that I mentioned, it’s really a combination
of those reasons.>>Okay.>>There are multiple reasons
that as far as we know.>>So you’ll
experiment was done on later than even
the latest memory, the DRAM, you still
see the problem?>>So the experiments
that I reported are from 2012 to 2014, when we discovered the problem. The paper was published in 2014. The latest DRAM, we’re
looking into it. There are reports
that latest DRAM also has these areas
that Stefan mentioned. But we didn’t do
those studies ourselves. I agree with you if you can solve the problem with
changing the gate, that we ideal, I’m not sure if it’s
going to be very easy. Yeah, I agree. ECC is not a good
solution to this problem. But, I think the
probabilistic solution is maybe cheaper than
the gate solutions, depending on the constraints. Okay, so let me use the last
few minutes to conclude. I think we had a good discussion. I’m not going to go over these future challenges
unfortunately, I think there are a bunch but you can take a look
at the slides. Clearly refresh is going to be a challenge and these slides actually have a lot of detail on refresh if you’re
interested in that. I believe actually there
retention time issues that may be slipping
into the fields, but they may be harder to exploit than RowHammer at
the moment at least. So, how do we keep memory secure? I think clearly we
have issues with DRAM, we have issued this Flush memory, or Flush memory is a little bit far from the system today. But emerging memory
technologies actually all have their
reliability problems. Read disturb, write disturb. Many, many different
reliability problems. I think we need
some principled approaches. We need to somehow
predict and prevent site safety issues and I go
back to the Galloping Gertie, which is the Tacoma
Narrows Bridge. People have developed
principle designs for this. This actually taught
and civil engineering, and physics classes this
particular bridge if you will. So, how do we do it for memory? This is my proposal. I think we want to
first understand. It’s very difficult to
really model these effects. If you want to go
through, we’ve done a lot of circuit simulations. It’s very, very difficult
to model something like RowHammer in circuits. You really need to somehow predicts based on
other technologies, based on past experience. So we want solid methodologies for failure modeling
and discovery. I believe this has to
come from real devices, both at the small scale
and large scale. We want to build models that
can predict the future. We want to build models
that can predict from different devices
potentially. How do we do that? I think that’s
an open research question. I mean you do want to develop metrics for secure
architectures and I say secure over here but I think RowHammer
demonstrated that, reliability, safety and
security are really very much related to each other in this particular context. On top of this, I
believe it’s architect. We need to have
principled co-architecting of the system and memory. We need to have
a good partitioning of duties across the stack. So, I believe ECC is not a good
solution because it’s not a good partitioning
of the duties for the given for the given
problem of RowHammer. So, for each problem
we need to find the good partitioning and
I believe Flush memory is a very good example where people actually find
the right partitioning. So, they saw some of
the problems of ECC, but they sold a lot
of the problems with voltage scaling as well. I believe a good architecting
requires figuring out these or potentially preventing
these unforeseen consequences. So how do you prevent for
unforeseen consequences? I believe if we
had better program built in our memory controller, we wouldn’t be refreshing
our entire memory by 2x or 4x. So, if you had
a better programmability or better patch-ability
in the field, we would be doing better today. I think this design
needs to change. Basically, today we’re not really thinking about security in our designs and
the hardware design. We don’t really
designed with security. I believe we need to
design change that also. I didn’t talk about it,
but one of the ways of having a design that can over time fix some of these reliability
issues is having a design that can
do online testing. Which is essentially what
Flush memory is doing today. If we have a mechanism to do online testing in
a low overhead manner in DRAM, I think that would go a long way. Because that can enable also
patch-ability potentially. So, that’s what
we’ve been doing to understand we built
these infrastructures, both were flashed memory
and DRAM and we’ve been doing large-scale
and small-scale studies. I believe they’re
actually vulnerabilities in Flush memory also. We’ve been exploring some
of these that are similar. Read disturb is
one example over there, but this is much much more, much much harder to exploit because Flush memory
is much harder. It’s not directly exposed to the programming model
basically today. But there’s a lot
to do over there. I’m not going to cover this. I think there are two other
solution reactions that I will briefly talk about.
One is new technology. You can say, “Oh,
why don’t we get rid of DRAM and come up with some other techniques
that doesn’t have these problems?” Good luck. I think it’s definitely good to explore these technologies, but all of these technologies as they scale to small size, they will have
reliability problems. Actually some of them have
endurance problems also. Maybe the second solution
is even more interesting. You can embrace unreliability, but if you’ve got to
do it very carefully. Basically, you can
design memories with different reliability and store data intelligently across them, your secure data may be in a
very, very reliable memory, that’s much more expensive and your unsecure data that doesn’t
require a lot of security or reliability may be
in the masses of memory that’s not so reliable,
but very low cost. As long as you did that
partitioning right, I think that’s
a really good opportunity. But how do you do
that partitioning right is a difficult question. I believe both of
these solutions over here require co-design
across the hierarchy. So, it may not be that easy to adopt both of these solutions. But I think there’s
a lot more to do in this heterogeneous
reliability memory area, that may be a good solution.
So, let me conclude. I believe memory
reliabilities reducing, there’s a lot of data that is in the field and
that I’ve shown you. Reliability issues open up security vulnerabilities as well. These are very hard
to defend against, or you come up with very suboptimal solutions like increasing the refresh
rates across the board. RowHammer is an example. I believe there will
be more examples. I believe the RowHammer implicational system security
research are tremendous and exciting and there
continues to be a lot of papers that are being written
on RowHammer these days. So, there’s good news, we have
a lot more to do clearly. I believe we need to come up with principles methodologies and designers to be able to
solve problems like this, like RowHammer and whatever
comes next after RowHammer. I think this is
one principle that we will need to adopt
going forward somehow. We need to change the processor
memory interface somehow, and have more intelligence
in the memory controller. Okay. Thank you.

CPU & DRAM Bugs: Attacks & Defenses
Tagged on:             

Leave a Reply

Your email address will not be published. Required fields are marked *