Does unsafe undermine Rust's guarantees?

Mar 17, 2025

When people first hear about unsafe in Rust, they often have questions. A very normal thing to ask is, “wait a minute, doesn’t this defeat the purpose?” And while it’s a perfectly reasonable question, the answer is both straightforward and has a lot of nuance. So let’s talk about it.

(The straightforward answer is “no”, by the way.)

Memory safe languages and their implementations

The first way to think about this sort of thing is to remember that programming languages and their implementations are two different things. This is sometimes difficult to remember, since many programming languages only have one implementation, but this general principle applies even in those cases. That is, there are the desired semantics of the language you are implementing, and then the codebase that makes programs in that language do those things.

Let’s examine a “real” programming language to talk about this: Brainfuck. Brainfuck programs are made up of eight different characters, each one performing one operation. Brainfuck programs can only do those eight things, and nothing more.

However, this simplicity leads to two interesting observations about the differences between languages in the abstract and their implementations.

Unsafety in implementation

The first is that properties of the language itself are distinct from properties of an implementation of the language. The semantics of the Brainfuck language don’t provide an opportunity to call some sort of re-usable function abstraction, for example. Yet, when we write a Brainfuck interpreter or compiler, we can use a programming language that supports functions. That doesn’t change that Brainfuck programs themselves can’t define or call functions.

And in fact, we often want to implement a language in another language that has more abilities than the language we’re implementing. For example, while it’s possible to implement Brainfuck in Brainfuck, it’s much easier to do so in a language that lets you define and call functions. Another classic example is interacting with hardware. Brainfuck has the . instruction that allows you to produce output. But in order to actually produce output to say, a terminal, you interact with your platform’s API to do so. But inside of that API, eventually you’ll hit a layer where there are no abstractions: you need to talk to hardware directly. Doing so is not statically able to be guaranteed to be memory safe, because the hardware/software interface often boils down to “put some information at this arbitrary memory location and the hardware will take it from there.” Our Brainfuck program is operating in an environment just like any other; at the end of the day it has to coordinate with the underlying operating system or hardware at some point. But just because our implementation must do so, doesn’t mean that our Brainfuck programs do so directly. The semantics of programs written in Brainfuck aren’t changed by the fact that the implementation can (and must) do things that are outside of those semantics.

Let’s take Ruby as a more realistic example than Brainfuck. Ruby does not let us modify arbitrary memory addresses from within Ruby itself. This means that pure Ruby programs should not be able to produce a segmentation fault. But in the real world, Ruby programs can segfault. This is possible if there’s a bug in the Ruby interpreter’s code. Sure, it’s our Ruby code that’s making the call to a function that results in a segfault, but the real fault lies with the code outside of Ruby’s purview.

That this can happen doesn’t mean that Ruby’s guarantees around memory manipulation is somehow useless, or suffers segmentation faults at the same rate as programs in a language that allow you to manipulate arbitrary memory as a matter of course. But it does also mean that we don’t need to look at our entire program to figure out where the problem comes from: it instead comes from our non-Ruby code. Ruby’s guarantees have helped us eliminate a lot of suspects when figuring out whodunit.

Extending languages and unsafety

The second property is that certain implementations may extend the language in arbitrary ways. For example, I could write a Brainfuck interpreter and say that I support a ninth instruction, @, that terminates the program when invoked. This would be a non-portable extension, and Brainfuck programs written in my variant wouldn’t work in other interpreters, but sometimes, this technique is useful.

Many languages find that these sorts of extensions are useful, and so offer a feature called a “foreign function interface” that allows for your program to invoke code in a different language. This provides the ability to do things that are outside of the domain of the language itself, which can be very useful. Brainfuck does not offer an FFI, but if it did, you could imagine that @ could be implemented in terms of it, making programs that use @ portable again, as long as the extended functionality was included somehow, often as a library of some kind.

Just like our implementation has the ability to do things outside of our languages’ semantics, FFI and similar extension mechanisms also give us the ability to do arbitrary things. I can write an extension for Ruby that writes to arbitrary memory. And I can cause a segfault. But we’re in the same place that we were with our implementation issues; we know that if we get a segfault, the blame lies not with our Ruby code, but instead, with either the implementation or our FFI.

So how is this memory safe?

It may seem contradictory that we can call a language “memory safe” if real-world programs have the ability to cause memory problems. But the thing is, it isn’t really programs in that language that caused the issue: it was FFI, which is in a different language entirely, or it was implementation issues, and implementations must do memory unsafe things, thanks to a need to interact with the OS or hardware. And so the definition of “memory safe language” is commonly understood to refer to languages and their implementations in the absence of either implementation bugs or FFI bugs. In practice, these bugs occur at such a low rate compared to languages that are clearly not memory safe that this practical definition serves a good purpose, even if you may feel a bit uncomfortable with these “exceptions.”

But there’s actually a deeper reason why these exceptions are acceptable, and that’s due to how we understand properties of programs and programming languages in the first place. That is, this isn’t just a practical “well these exceptions seem fine” sort of thing, they’re actually okay on a deeper level.

Programs, properties, and proofs

So how do we know anything about how programs and programming languages work at all?

Computer science is no more about computers than astronomy is about telescopes.

Edsger Wybe Dijkstra

A fun thing about computer science is that it’s closely related to other disciplines, including math! And so there’s a long history of using math techniques to understand computers and their programs.

One technique of building knowledge is the idea of proofs. If you get a degree in computer science, you’ll engage with proofs quite a lot. I even took a class on logic in the philosophy department as part of my degree.

I don’t intend to give you a full introduction to the idea of proofs in this blog post, but there’s some high-level concepts that are useful to make sure we’re in the same page about before we go forward.

Aristotelian syllogisms

Here is a very classic example of a form of reasoning called a “syllogism”, given by Aristotle in 350 BCE:

All men are mortal.
Socrates is a man.
Therefore, Socrates is mortal.

These first two lines are called “propositions,” and the third is a conclusion. We can base our conclusion on a logical relationship between the information given to us by the propositions.

But how do we know propositions are true? Are all men mortal? What’s important here is that we, for the purposes of our proof, assume that propositions are true. And we do this because, on some level, we cannot know everything. And so to begin, we have to start somewhere, and make some assumptions about what we know. It’s true that later, we may discover a fact that disproves our proposition, and now our proof no longer works. But that’s just the way that the world works. It doesn’t prevent these sorts of proofs from being useful to help us gain knowledge about the world and how it works, as best as we can tell at the current time.

So on some level, this is also why Ruby is a memory safe language even though a C extension can segfault: there’s always some kind of things that we have to assume are true. Memory safe languages are ones where the amount of code we have to assume is memory safe, rather than is guaranteed to be by the semantics, is small, and preferably indicated in the code itself. Put another way, the amount of code we need to trust is memory safe is large in a memory unsafe language, and small in a memory safe language, rather than zero.

Logic and programs

As time went on, so did our understanding of logic. And, of course, we even have competing logics! And then this gets fun, because terms can mean something slightly different. For example, in more recent logics, we’d call something like “All men are mortal” to be an axiom, rather than a proposition. Same idea: it’s something that we accept without proof.

As computers appeared, people sought to apply the rules for mathematical logic onto them. We even call circuits that carry out classic logical operations “logic gates.” Number theory and logic were foundational to making computers work. And so, once high level languages appeared on the scene, there was interest in applying these mathematical tools to understanding programs as well. This discipline is called “formal verification,” and the general idea is to describe various properties we wish a system to have, and then use formal methods from mathematics to demonstrate that this is true.

This area of study is very deep, and I don’t plan to cover the vast majority of it here. However, I do want to pursue one particular thread in this area.

Hoare Logic

“Hoare Logic” is, well:

In this paper an attempt is made to explore the logical foundations of computer programming by use of techniques which were first applied in the study of geometry and have later been extended to other branches of mathematics. This involves the elucidation of sets of axioms and rules of inference which can be used in proofs of the properties of computer programs. Examples are given of such axioms and rules, and a formal proof of a simple theorem is displayed. Finally, it is argued that important advantage, both theoretical and practical, may follow from a pursuance of these topics.

Incidentally, C. A. R. Hoare, the author of this paper, and Graydon Hoare, the creator of Rust, are unrelated.

How does Hoare logic work? Yet again, not going to cover all of it, but the general idea is this: In order to figure out if a program does what it is supposed to do, we need to be able to reason about the state of the program after execution happens. And so we need to be able to describe the state before, and its relationship to the state after. And so you get this notation:

P { Q } R

P is a precondition, Q is a program, and R is the result of the program executing with those preconditions.

But most programs have more than one statement. So how could we model this? Hoare gives us the Rule of Composition:

If P { Q1 } R1 and R1 { Q2 } R then P { Q1; Q2 } R.

This allows us to build up a program by proving each statement in turn.

Hoare logic is very neat, and I’ve only scratched the surface here. People did a lot of work to extend Hoare logic to include more and more aspects of programs. But then, something else happened.

Separation Logic

In 2002, Separation Logic: A Logic for Shared Mutable Data Structures was published.

In joint work with Peter O’Hearn and others, based on early ideas of Burstall, we have developed an extension of Hoare logic that permits reasoning about low-level imperative programs that use shared mutable data structure.

Hmm, shared mutable data structures? Where have I heard that before…

Let’s see what they have to say:

The problem faced by these approaches is that the correctness of a program that mutates data structures usually depends upon complex restrictions on the sharing in these structures.

For sure. Well, what are we to do about this?

The key to avoiding this difficulty is to introduce a novel logical operation P * Q, called separating conjunction (or sometimes independent or spatial conjunction), that asserts that P and Q hold for disjoint portions of the addressable storage.

What disjoint portions of addressable storage might we care about?

Our intent is to capture the low-level character of machine language. One can think of the store as describing the contents of registers, and the heap as describing the contents of an addressable memory.

Pretty useful!

Before we talk a bit about how separation logic works, consider this paragraph on why it’s named as such:

Since these logics are based on the idea that the structure of an assertion can describe the separation of storage into disjoint components,we have come to use the term separation logics, both for the extension of predicate calculus with the separation operators and the resulting extension of Hoare logic. A more precise name might be storage separation logics, since it is becoming apparent that the underlying idea can be generalized to describe the separation of other kinds of resources.

The plot thickens.

Anyway, in Separation Logic, we use slightly different notation than Hoare Logic:

{ P } C { Q }

This says that we start with the precondition P, and if the program C executes, it will not have undefined behavior, and if it terminates, Q will hold.

Furthermore, there is the “frame rule”, which I am going to butcher the notation for because I haven’t bothered to install something to render math correctly just for this post:

If { p } c { q }

then { p * r } c { q * r }

where no free variable in r is modified by c.

Why is it interesting to add something to both sides of an equation, in a sense? Well what this gives us the ability to do is add any predicates about parts of the program that c doesn’t modify or mutate. You might think of &mut T or even just ownership in general: we can reason about just these individual parts of a program, separately from the rest of it. In other words, we have some foundational ideas for ownership and even bits of borrowing, and while this original paper doesn’t involve concurrency, eventually Concurrent Separation Logic would become a thing as well.

I think the paper explains why the frame rule matters better than I can:

Every valid specification {p} c {q} is “tight” in the sense that every cell in its footprint must either be allocated by c or asserted to be active by p; “locality” is the opposite property that everything asserted to be active belongs to the footprint. The role of the frame rule is to infer from a local specification of a command the more global specification appropriate to the larger footprint of an enclosing command.

What this gives us is something called “local reasoning,” and local reasoning is awesome. Before I talk about that, I want to leave you with one other very interesting paragraph:

Since our logic permits programs to use unrestricted address arithmetic,there is little hope of constructing any general-purpose garbage collector. On the other hand, the situation for the older logic, in which addresses are disjoint from integers, is more hopeful. However, it is clear that this logic permits one to make assertions, such as “The heap contains two elements” that might be falsified by the execution of a garbage collector, even though, in any realistic sense, such an execution is unobservable.

Anyway. Let’s talk about local vs global analysis.

Global vs local analysis

Proving things about programs isn’t easy. But one thing that can make it even harder is that, for many properties of many programs you’d want to analyze, you need to do a global analysis. As an example, let’s use Ruby. Ruby is an incredibly dynamic programming language, which makes it fairly resistant to static analysis.

Global analysis

Here is a Ruby program. Do you know if this program executes successfully?

class Foo
  def self.bar
    "baz"
  end
end

p Foo.bar

Yeah, it prints baz. But what about this Ruby program?

class Foo
  def self.bar
    "baz"
  end
end

require "foo_ext.rb"

p Foo.bar

We can’t know. foo_ext.rb may contain no relevant code, but it also might include something like this:

Foo.class_eval { undef :bar }

In which case, when we try and call Foo.bar, it no longer exists:

<anonymous>': eval:13:in `block in <main>': undefined method `bar' for class `Foo' (NameError)
eval:13:in `class_eval'
eval:13:in `<main>'
-e:in `eval'

Ouch. So in this case, we’re going to need the whole code of our program in order to be able to figure out what’s going on here.

Incidentally, Sorbet is a very cool project to add type checking to Ruby. They do require access to the entire Ruby source code in order to do their analysis. But they also made some decisions to help make it more tractable; if you try Sorbet out on the web, the type checker is fast. What happens when you try the above code with Sorbet?

editor.rb:5: Unsupported method: undef https://srb.help/3008
     5 |Foo.class_eval { undef :bar }
                         ^^^^^^^^^^

This is a very fair tradeoff! It’s very common with various forms of analysis to choose certain restrictions in order to make what they want to do tractable. Ruby has so much evil in it, just not supporting some of the more obscure things is completely fair, in my opinion.

Local analysis

The opposite of global analysis is, well, local analysis. Let’s consider a Rust equivalent of our Ruby:

struct Foo;

impl Foo {
    fn bar() -> String {
        String::from("baz")
    }
}

fn main() {
    dbg!(Foo::bar());
}

Can we know if this program works? Sure, everything is here. Now, if we tried the same trick as the Ruby code, we know that it would work, because Rust doesn’t have the ability to remove the definition of bar like Ruby does. So let’s try something else:

struct Foo;

impl Foo {
    fn bar() -> String {
        // body hidden
    }
}

fn main() {
    dbg!(Foo::bar());
}

Can we know if main is properly typed here? We can, even though we know nothing about the body. That’s because dbg! takes any value that implements the Debug trait, and we know that Foo::bar() returns String, which implements Debug. Typechecking main is local to main’s implementation, we don’t need to look into the bodies of any function it calls to determine if it’s well-typed. We only need to know their signatures. If we didn’t require bar to have a type signature, we’d have to peek into its body to figure out what it returns, and so on for any function that bar calls in its body. Instead of this arbitrarily deep process, we can just look at the signature and be done with it.

Composability of proofs

So why is local analysis so helpful? The first reason is either speed or scalability, depending on how you want to look at it. If you are running global analysis checks, they will get more expensive the larger your codebase, since they will require checking the entire thing to work. Whereas local analysis only requires a local context, it doesn’t get any more expensive when you add more code to your project, only when you change that local context. So when you’re trying to scale checks up to larger projects, local analysis is crucial.

But I also like to personally think about it as a sort of abstraction. That is, global analysis is a leaky abstraction: changes in one part of the codebase can cascade into other parts. Remember this line about the frame rule?

The role of the frame rule is to infer from a local specification of a command the more global specification appropriate to the larger footprint of an enclosing command.

If we have local reasoning, we can be sure that changes locally don’t break out of the boundaries of those changes. As long as the types of our function don’t change, we can mess with the body as much as we want, and we know that the rest of the analysis of the program is still intact. This is really useful, in the same way that abstractions are useful when building programs more generally.

Rustbelt

Okay, we have taken a real deep dive here. What’s this all have to do with unsafe Rust?

Well, seven years ago, RustBelt: Securing the Foundations of the Rust Programming Language by Ralf Jung, Jaques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer, was published:

In this paper, we give the first formal (and machine-checked) safety proof for a language representing a realistic subset of Rust.

But it’s more exciting than that:

Our proof is extensible in the sense that, for each new Rust library that uses unsafe features, we can say what verification condition it must satisfy in order for it to be deemed a safe extension to the language.

Very good!

The paper begins by talking about why verifying Rust is such a challenge:

Consequently, to overcome this restriction, the implementations of Rust’s standard libraries make widespread use of unsafe operations, such as “raw pointer” manipulations for which aliasing is not tracked. The developers of these libraries claim that their uses of unsafe code have been properly “encapsulated”, meaning that if programmers make use of the APIs exported by these libraries but otherwise avoid the use of unsafe operations themselves, then their programs should never exhibit any unsafe/undefined behaviors. In effect, these libraries extend the expressive power of Rust’s type system by loosening its ownership discipline on aliased mutable state in a modular, controlled fashion

This is basically the question asked at the start of this post: can you honestly say that a function is safe if it contains unsafe code as part of its implementation?

They go on to describe the first challenge for the project: choosing the correct logic to model Rust in (some information cut for clarity):

Iris is a language-generic framework for higher-order concurrent separation logic which in the past year has been equipped with tactical support for conducting machine-checked proofs of programs in Coq. By virtue of being a separation logic, Iris comes with built-in support for reasoning modularly about ownership. Moreover, the main selling point of Iris is its support for deriving custom program logics for different domains using only a small set of primitive mechanisms. In the case of RustBelt, we used Iris to derive a novel lifetime logic, whose primary feature is a notion of borrow propositions that mirrors the “borrowing” mechanism for tracking aliasing in Rust.

Separation logic!

So how does this work? There are three parts to the proof. I’m going to summarize them because I’m already quoting extensively, please go read the paper if you want the exact details, it is well written.

Verify that the typing rules are sound when interpreted semantically.
Verify that if a program is well typed semantically, it does not exhibit unsafe behavior.
For any library that uses unsafe, check that it satisfies the semantics that its interface requires.

If all three of these things are true, then the program is safe to execute. This is also exciting because, thanks to 3, when someone writes new unsafe code, they can get RustBelt to let them know which properties they need to satisfy to make sure that their unsafe code doesn’t cause problems.

Weaknesses

Of course, there is more to do following up on RustBelt:

We do not model (1) more relaxed forms of atomic accesses, which Rust uses for efficiency in libraries like Arc; (2) Rust’s trait objects (comparable to interfaces in Java), which can pose safety issues due to their interactions with lifetimes; or (3) stack unwinding when a panic occurs, which causes issues similar to exception safety in C++.

However, the results were good enough to find some soundness bugs in the standard library, and if I remember correctly, widen one of the APIs in a sound manner as well. Another interesting thing about RustBelt is that it was published before Rust gained Non-lexical lifetimes, however, it modeled them, since folks knew they were coming down the pipeline.

The theoretical vs the empirical

So, we have some amount of actual proof that unsafe code in Rust doesn’t undermine Rust’s guarantees: what it does is allow us to extend Rust’s semantics, just like FFI would allow us to extend a program written in a language that supports it. As long as our unsafe code is semantically correct, then we’re all good.

… but what if it’s not semantically correct?

When things go wrong

At some point maybe I’ll write a post about the specifics here, but this post is already incredibly long. I’ll cut to the chase: you can get undefined behavior, which means anything can happen. You don’t have a real Rust program. It’s bad.

But at least it’s scoped to some degree. However, many people get this scope a bit wrong. They’ll say something like:

You only need to verify the unsafe blocks.

This is true, but also a bit misleading. Before we get into the unsafe stuff, I want you to consider an example. Is this Rust code okay?

struct Foo {
    bar: usize,
}

impl Foo {
    fn set_bar(&mut self, bar: usize) {
        self.bar = bar;
    }
}

No unsafe shenanigans here. This code is perfectly safe, if a bit useless.

Let’s talk about unsafe. The canonical example of unsafe code being affected outside of unsafe itself is the implementation of Vec<T>. Vecs look something like this (the real code is different for reasons that don’t really matter in this context):

struct Vec<T> {
    ptr: *mut T,
    len: usize,
    cap: usize,
}

The pointer is to a bunch of Ts in a row, the length is the current number of Ts that are valid, and the capacity is the total number of Ts. The length and the capacity are different so that memory allocation is amortized; the capacity is always greater than or equal to the length.

That property is very important! If the length is greater than the capacity, when we try and index into the Vec, we’d be accessing random memory.

So now, this function, which is the same as Foo::set_bar, is no longer okay:

impl<T> Vec<T> {
    fn set_len(&mut self, len: usize) {
        self.len = len;
    }
}

This is because the unsafe code inside of other methods of Vec<T> need to be able to rely on the fact that len <= capacity. And so you’ll find that Vec<T>::set_len in Rust is marked as unsafe, even though it doesn’t contain unsafe code.

And this is why the module being the privacy boundary matters: the only way to set len directly in safe Rust code is code within the same privacy boundary as the Vec<T> itself. And so, that’s the same module, or its children.

This is still ultimately better than any line of code in the whole codebase, but it’s not quite as small as you may think at first.

How can we check that things haven’t gone wrong?

Okay, so while RustBelt could give you some idea if your code is correct, I doubt you’re about to jump into Coq and write some proofs. What can you do? Well, Rust provides a tool, miri, that can interpret your Rust code and let you know if you’ve violated some unsafe rules. But it’s not complete, that is, miri can tell you if your code is wrong, but it cannot tell you if your code is right. It’s still quite useful.

To get miri, you can install it with Rust nightly:

$ rustup +nightly component add miri

And then run your test suite under miri with

$ cargo +nightly miri test

For more details, consult miri’s documentation.

Do things go wrong?

Do people write good unsafe code, or bad?

Well, the obvious answer to the above question is “yes.” Anyone who says all unsafe code in Rust is sound is obviously lying. But the absolute isn’t what’s interesting here, it’s the shade of grey. Is unsafe pervasive? Does Rust have memory related bugs at the same rate as other MSLs, or closer to memory-unsafe languages?

It’s still sort of early days here, but we do have some preliminary results that are helpful. In 2022, Google reported that:

To date, there have been zero memory safety vulnerabilities discovered in Android’s Rust code.

I am very interested to see a more up to date report about how that’s changed, but as they say:

We don’t expect that number to stay zero forever, but given the volume of new Rust code across two Android releases, and the security-sensitive components where it’s being used, it’s a significant result. It demonstrates that Rust is fulfilling its intended purpose of preventing Android’s most common source of vulnerabilities. Historical vulnerability density is greater than 1/kLOC (1 vulnerability per thousand lines of code) in many of Android’s C/C++ components (e.g. media, Bluetooth, NFC, etc). Based on this historical vulnerability density, it’s likely that using Rust has already prevented hundreds of vulnerabilities from reaching production.

We’ll see if these results replicate themselves, in the future, and also outside of Google. But it at least looks like in practice, most unsafe code has not led to problems. At least so far!