There are some things you keep learning and forgetting (or perhaps you’ve never learned them in the first place?).
For me, one of those things is Pin/Unpin
in Rust.
Every time I read an explanation about pinning, my brain is like ๐, and a few weeks later is like ๐ค๐คจ.
So, I’m writing this as a way to force my brain to retain (pin?) this knowledge. We’ll see how it goes!
Pin
Pin
is a type of pointer, which can be thought of as a middle ground between &mut T
and &T
.
The point of Pin<&mut T>
is to say:
- โ
This value can be modified (like
&mut T
) but - ๐
This value cannot be moved (unlike
&mut T
)
Why? Because some values must never be moved, or special care is needed to do so.
A prime example of this are self-referential data structures. They occur naturally when using async
, because futures tend to reference their own locals across await points.
This seemingly benign future:
async fn self_ref() {
let mut v = [1, 2, 3];
let x = &mut v[0];
tokio::time::sleep(Duration::from_secs(1)).await;
*x = 42;
}
requires a self referential structure, because under the hood futures are state machines (unlike closures).
Note that self_ref
passes control back to the caller on the first await
. This means that even though v
and x
look like regular stack variables, something more complex must be going on here.
The compiler wants to generate something like this:
enum SelfRefFutureState {
Unresumed, // Created and wasn't polled yet.
Returned,
Poisoned, // `panic!`ed.
SuspensionPoint1, // First `await` point.
}
struct SelfRefFuture {
state: SelfRefFutureState,
v: [i32; 3],
x: &'problem mut i32, // a "reference" to an element of `self.v`,
// which is a big problem if we want to move `self`.
// (and we didn't even consider borrowchecking!)
}
With await
ing being an update to the state
field and running the associated code (see a full example in the end).
But! You can totally move this future if you tried:
let f = self_ref();
let boxed_f = Box::new(f); // Evil?
let mut f1 = self_ref();
let mut f2 = self_ref();
std::mem::swap(&mut f1, &mut f2); // Blasphemy?
What gives? As a wise compiler once said:
futures do nothing unless you
.await
or poll them#[warn(unused_must_use)]
on by default– rustc
This is because calling self_ref
really does nothing: we actually get back something more like this1:
struct SelfRefFuture {
state: SelfRefFutureState,
v: MaybeUninit<[i32; 3]>,
x: *mut i32, // a pointer into `self.v`,
// still a problem if we want to move `self`, but only after it is set.
//
// .. other locals, like the future returned from `tokio::time::sleep`.
}
which can be moved safely2 in it’s initial (Unresumed
) state.
impl SelfRefFuture {
fn new() -> Self {
Self {
state: SelfRefFutureState::Unresumed,
v: MaybeUninit::uninit(),
x: std::ptr::null_mut(),
// ..
}
}
}
Only when we start polling on f
we get into the self-ref problem (once the x
pointer is set),
and if f
is wrapped in a Pin
all those moves become unsafe
, which is exactly what we want.
Because a lot of futures shouldn’t be moved around in memory once they “start”, they can only be worked with safely if they are wrapped in a Pin
, and so async-related functions tend to accept a Pin<&mut T>
(Assuming they don’t need to move the value).
A tiny example
Here, no pinning is required:
use tokio::time::timeout;
async fn with_timeout_once() {
let f = async { 1u32 };
let _ = timeout(Duration::from_secs(1), f).await;
}
But if we want to call timeout
multiple times (for example, because we want to retry) we’ll have to use &mut f
(or we’ll get use of moved value
), which is going to cause the compiler to complain about pinning:
use tokio::time::timeout;
async fn with_timeout_twice() {
let f = async { 1u32 };
// error[E0277]: .. cannot be unpinned, consider using `Box::pin`.
// required for `&mut impl Future<Output = u32>` to implement `Future`
let _ = timeout(Duration::from_secs(1), &mut f).await;
// An additional retry.
let _ = timeout(Duration::from_secs(1), &mut f).await;
}
Why?
Because a few levels down, timeout
is calling Future::poll
which is defined as
fn poll(self: Pin<&mut Self>, ...) -> ... { ... }
When we await
ed on f
itself, we gave up ownership on it.
This allowed the compiler to handle the pinning for us, but it can’t do that if we only provide a &mut f
, since we could easily break Pin
’s invariants:
use tokio::time::timeout;
async fn with_timeout_twice_with_move() {
let f = async { 1u32 };
// error[E0277]: .. cannot be unpinned, consider using `Box::pin`.
let _ = timeout(Duration::from_secs(1), &mut f).await;
// .. because otherwise, we could move `f` to a new memory location, after it was polled!
let f = *Box::new(f);
let _ = timeout(Duration::from_secs(1), &mut f).await;
}
So we don’t care about pinning, and our future is not really special in any way (or is it? more on that later!), and we don’t move our future anywhere, but we are using an API which also allows for futures that are special, and so we need to play along by pin!
ing our future:
use tokio::pin;
use tokio::time::timeout;
async fn with_timeout_twice() {
let f = async { 1u32 };
pin!(f); // f is now a `Pin<&mut impl Future<Output = u32>>`.
let _ = timeout(Duration::from_secs(1), &mut f).await;
let _ = timeout(Duration::from_secs(1), &mut f).await;
}
This is sort of like the beloved
expected `&u32`, found `u32`
help: consider borrowing here: `&1u32`
with a bit more steps, traits, opaque types and macros.
We do need those extra steps: creating a Pin<&mut T>
requires a little more effort because we also need to make sure that no &mut T
is left around or can be obtained later (like we saw above) which would defeat the purpose of the Pin
.
This leads us to a more accurate phrasing of the no-move rule: the pointed-to value must not move until the value is dropped (regardless of when the Pin
is dropped!).
That’s the job of the pin!
macro: it makes sure that the original f
is no longer visible to our code, thus enforcing Pin
’s invariants (we can’t move it if we can’t see it).
Tokio’s pin!
implementation expands pin!(f)
to this:
// Move the value to ensure that it is owned
let mut f = f;
// Shadow the original binding so that it can't be directly accessed
// ever again.
#[allow(unused_mut)]
let mut f = unsafe {
Pin::new_unchecked(&mut f)
};
The standard library’s version of pin!
is a bit cooler, but the same reasoning is used: shadow the original value with a newly created Pin
so it can no longer be accessed and moved.
A ๐ฆ
So Pin
is a (zero-sized wrapper around another) pointer, and it’s a bit like &mut T
with more rules.
The next problem is going the be “returning borrowed data”.
We can’t return the pinned future from before:
use std::future::Future;
async fn with_timeout_and_return() -> impl Future<Output = ()> {
let f = async { 1u32 };
pin!(f); // f is now a `Pin<&mut impl Future<Output = u32>>`.
let s = async move {
let _ = timeout(Duration::from_secs(1), &mut f).await;
};
// error[E0515]: cannot return value referencing local variable `f`
s
}
It should be more clear why now: the pinned f
is now a pointer,
and it points to data (the async closure) that won’t be there once we return from the function.
We therefore can use Box::pin
3:
-pin!(f);
+let mut f = Box::pin(f);
Making f
a Pin<Box<impl Future<Output = u32>>
.
But didn’t we just say that Pin<&mut T>
is a (wrapper around a) pointer “in between” &mut T
and &T
?
Well, a mut Box<T>
is also like a &mut T
, but with ownership.
So a Pin<Box<T>>
is a pointer “in between” a mutable Box<T>
and an immutable Box<T>
, with the same exceptions (the value can be modified but cannot be moved).
Unpin
Unpin
is a trait. It’s not “the opposite” of Pin
, because Pin
is a type of pointer and traits (however good their marketing) cannot be the opposite of pointers.
Unpin
is also an auto-trait (the compiler implements it for you automatically when possible), marking a type whose values can be moved after being pinned (so for example, it will not be self-referential).
The main point is that if T: Unpin
, we can always Pin::new
and Pin::{into_inner,get_mut}
values of T
, meaning we can easily go from and to a “regular” mutable value and ignore the associated complexities with working directly with pinned values.
Unpin
is also why Box::pin
is so useful: It (or rather, Box::into_pin
) can safely call the unsafe Pin::new_unchecked
because “it is not possible to move or replace the insides of a Pin<Box<T>>
when T: !Unpin
”, and the resulting Box
is always Unpin
because moving it doesn’t move the actual value.
Another tiny example
We can create an Unpin
future by hand:
fn not_self_ref() -> impl Future<Output = u32> + Unpin {
struct Trivial {}
impl Future for Trivial {
type Output = u32;
fn poll(self: Pin<&mut Self>, _cx: &mut std::task::Context<'_>) -> std::task::Poll<Self::Output> {
std::task::Poll::Ready(1)
}
}
Trivial {}
}
Now, we can call timeout
on it multiple times without pinning:
async fn not_self_ref_with_timeout() {
let mut f = not_self_ref();
let _ = timeout(Duration::from_secs(1), &mut f).await;
let _ = timeout(Duration::from_secs(1), &mut f).await;
}
Any future which is created using the async fn
or async {}
syntax is considered !Unpin
- meaning that once we put it inside a Pin
, we won’t be able to take it out again.
Summary
Pin
is a wrapper around another pointer which is a bit like&mut T
, with the additional rule that it is unsafe to move the value it points to until the value is dropped.- To safely work with self-ref structures, we must prevent them from moving once we set a self-referential field.
- Placing a value inside a
Pin
does exactly that. - Constructing a
Pin
is harder becausePin
promise that moving is impossible for the lifetime of the value, so we can’t create it without giving up the ability to create a&mut T
later on and breaking thePin
’s invariants. - When
await
ing on an ownedFuture
, the compiler can handle the pinning because it can know that theFuture
won’t move once ownership is transferred. - Otherwise, we need to handle the pinning (for example with
pin!
orBox::pin
) which is a bit tricky because of all this. Unpin
is a marker trait that says that a type can be safely moved even after it was wrapped in aPin
, making everything simpler.- Most structures are
Unpin
, butasync fn
andasync {}
always generate!Unpin
structures.
Appendix A - A hand rolled self-referential Future
Note: I’m not an unsafe
lawyer. This code is indented for educational purposes about async
, so there might be some glaring UB issues I missed.
We will not use MaybeUninit
s here so we can focus only on the unsafe
operations w.r.t Pin
,
but IRL the compiler won’t use Option
(or initialize v
twice, etc) as it is unneeded & slower.
enum SelfRefFutureState {
Unresumed, // Created and wasn't polled yet.
SuspensionPoint1, // First `await` point.
Returned,
Poisoned, // `panic!`ed.
}
struct SelfRefFuture {
state: SelfRefFutureState,
v: [i32; 3],
x: *mut i32, // a pointer into `self.v`,
// a problem if `self` moves, but only after it is set.
sleep: Option<tokio::time::Sleep>,
// Ensure the we are !Unpin.
_m: std::marker::PhantomPinned,
}
impl SelfRefFuture {
fn new() -> Self {
Self {
state: SelfRefFutureState::Unresumed,
v: [0; 3],
x: std::ptr::null_mut(),
sleep: None,
_m: std::marker::PhantomPinned,
}
}
}
impl Future for SelfRefFuture {
type Output = ();
fn poll(
self: Pin<&mut Self>,
cx: &mut std::task::Context<'_>,
) -> std::task::Poll<Self::Output> {
// Safety: We aren't going to move `self`, promise.
let this = unsafe { self.get_unchecked_mut() };
match this.state {
SelfRefFutureState::Unresumed => {
this.v = [1, 2, 3];
this.x = this.v.as_mut_ptr().wrapping_add(1);
this.sleep = Some(tokio::time::sleep(Duration::from_secs(1)));
this.state = SelfRefFutureState::SuspensionPoint1;
// Safety: We are the owners of `sleep`, and we aren't moving it.
let pinned_sleep = unsafe { Pin::new_unchecked(this.sleep.as_mut().unwrap()) };
Future::poll(pinned_sleep, cx)
}
SelfRefFutureState::SuspensionPoint1 => {
// Safety: Same as above.
let pinned_sleep = unsafe { Pin::new_unchecked(this.sleep.as_mut().unwrap()) };
if let std::task::Poll::Pending = Future::poll(pinned_sleep, cx) {
return std::task::Poll::Pending;
};
// Safety: We initialized `v` and `x` before moving to this state,
// No one else can move us because `Self` is wrapped in a `Pin`,
// so `x` is still valid.
unsafe { this.x.write(42) };
this.state = SelfRefFutureState::Returned;
std::task::Poll::Ready(())
}
SelfRefFutureState::Returned => std::task::Poll::Ready(()),
SelfRefFutureState::Poisoned => {
panic!()
}
}
}
}
#[tokio::main]
async fn main() {
let f = SelfRefFuture::new();
f.await;
}
The actual layout of generators is more complex in practice. ↩︎
In fact, all types can be moved (
mem::swap
accepts any typeT
). Values, however, can be wrapped in aPin
, which enforces the rules above (getting a&mut T
fromPin<&mut T>
isunsafe
in the general case). ↩︎In this example, we can also delay the pinning, moving the original future into the
async
block andpin
it there. ↩︎