Compiler/lint guidance with unsafe code?

ike · May 8, 2024, 7:17pm

Is there some way to get automated guidance when writing unsafe blocks? Some unsafe things are certainly more unsafe than others, and once one opens Pandora's box by deciding to use unsafe code for something, it seems like there's no help other than "do your own research" about how to ensure that the unsafe block turns out safe.

For example, I was going to use transmute to cast a slice with one element type to a slice with another where I know the element types have identical representations - or actually, I only know that because I looked at the source, and they do. But then I find out that the Rust compiler is free to implement slices with different element types differently. So I need to use slice::from_raw_parts. But then it occurs to me that I'm just trusting that the element types stay having identical representations, with no safety net about that.

What I'd like to do is somehow write something that walks the Rust compiler through the individual steps in the conversion in such a way were the Rust compiler can produce an error when any step fails to uphold its underlying assumption. And know that I've done that and not left an "unsafety gap".

One case of this would be the ability to assert that for 2 sets of instantiations, certain code fully obeys parametric polymorphism, at least with respect to decisions made by the current Rust compiler, under all optimization levels (or even just certain specified optimization levels). Another case would be asserting that two types have bit-wise compatible underlying representations. Another would be that the following unsafe manipulations cannot lead to dangling references. Compile-time pre and post condition checks? Something that asserts that the unsafe block hasn't manufactured a lifetime out of hole cloth?

I've had experience with proof assistants - so I have that model in mind: the Rust compiler has certain automation to prove safety, and the automation relies on the user adhering to rules that give the Rust compiler a not-too computationally complex way to do those proofs. That's great! But, as soon as some code needs to do something that does not adhere to those rules, is there anything the user can do that still results in a verified safety guarantee under current compiled conditions?

geebee22 · May 8, 2024, 7:25pm

Use Miri, more Miri and even more Miri. It is a great tool.

I was thinking the other day : maybe someone could assemble a check list of 101 mistakes you can make with unsafe. [ I guess by now I have made at least 5-10, but my memory is not good ]

ike · May 8, 2024, 8:08pm

Doesn't using Miri for this require that I suspect a certain usage as unsafe in a certain way and write a test for it somehow? One problem is that I don't know the full range of things to suspect. Another is that I can't write (or perhaps just don't know how to write) a test for all of those things that is guaranteed to trigger a complaint under all circumstances. I think that the examples I gave with slices would have passed Miri, based on my understanding of Miri.

But, if you're saying that Miri is better than nothing, certainly that's true.

geebee22 · May 8, 2024, 8:40pm

Yes, you have to get lucky for Miri to pick something up, but if you have lots of tests, especially "badly behaved" ones, you have a chance of being lucky.

I suppose it depends bit what you are doing with unsafe.

khimru · May 8, 2024, 8:44pm

Unfortunately that's not possible in principle.

I mean: sure, certain unsafe patterns can be used safely… but if that happens then they get “normal”, “safer” wrapper (like aforementioned slice::from_raw_parts) and then you have some confidence about these implementations.

If you work with raw transmute or other similar extra-low-level details your best bet is to create another set of such “safer” wrappers and discuss them (here or on some other forum where Rust developers happen to be present).

In general the idea it to provide more high-level implementations which you may share with others.

Writing correct unsafe code is hard thus it's natural to share the burden and try to reuse code as much as possible.

But you have to always be prepared to change your implementation when unsoundness in your code would be found.

It took many years for the Rust community to develop some more-or-less safe handling of self-referential structures, e.g.

Current favorite, ouroboros is perceived to be safe… but what would happen tomorrow?

The best advice that can be presented is to move “questionable” usafe code into separate crate where it may be discussed and tested without also looking on your “business logic”.

quinedot · May 8, 2024, 9:13pm

There are a few lints, but I agree, not nearly enough. There's an extreme unsafe cliff not far beyond trivialities like get_unchecked. I'd be happy to learn of more tools for checking Rust unsafe at the source code level too.

No tool can be complete, as even things like #[repr(transparent)] should be considered implementation details unless accommodated by documented guarantees.^[1] But it would be nice to at least have something, like say linting on the lack of a strictly defined layout in a transmute call.

Incidentally, crates for safer transmutes include bytemuck and zerocopy. I doubt we'll ever get "the compiler promises to decide layout the same way for these types of things." More generally, as far as I know, the teams are still entirely uninterested in language/soundness dialects based on disabling optimizations.

More than that, you have to actually hit UB.

So definitely use Miri, but yes, Miri doesn't help with things like "upstream is allowed to change their implementation details" and "rustc is allowed to pick a different layout next time you compile". It helps with "you did something UB during execution."

Or to rephrase, it helps with UB you hit today, but doesn't detect latent unsoundness which may bite later.

This could perhaps be approximated as, "relying on things you only know because you read the source is unsound". ↩︎

ike · May 8, 2024, 11:15pm

Miri sounds a lot like C tools I'm used to, such as Valgrind. So that's not what I'm looking for, although it is very nice to have for what it is.

I found this crate: static_assertions

That's helpful, as it allows compile-time checks on things like size and alignment, and whether certain traits are implemented by certain types or not.

For parametric polymorphism, which would be really helpful to assert, I'd need more than that - but nothing the Rust compiler doesn't know. When trying to convert Foo<T> to Foo, I'd like to assert that Foo has parametric polymorphism over at least T and U - meaning that there are no specific impls of Foo<T> or Foo, nor impls of traits that only one of T or U implement, and similarly for other generic types used by Foo. Part of that is a source code check - which is probably implementable by a Rust macro. But another part is asking the Rust compiler if it is monomorphising Foo<T> vs. Foo any differently itself - in other words, if it is acting like there is an implicit impl for Foo<T> or Foo even if there isn't source for either.

Another very trivial and very useful case would be to assert (and have Rust check that assertion) that nothing in the unsafe block is altering the semantics of pointers in any way.

quinedot · May 9, 2024, 12:34am

There's no way to express that in code (e.g. trait-bound-esque), and I doubt there's interest in providing one, given that it makes adding implementations a breaking change. Additionally, the compiler doesn't actually know that in the general case (downstream crates, which are compiled separately, can provide more implementations).

You can assert specific trait bounds you need hold. It's a semver-esque check, as making a previously satisfied bound not hold is a breaking change.

These impl concerns seem quite far away from transmutation soundness to me. But even at the layout level, there are no guarantees unless they've been opted into, and that's intentional.

ike · May 9, 2024, 1:19am

But that's exactly why it's needed! If an unsafe code block depends on parametric polymorphism, then it might break in an unobvious way when a trait implementation is added. It would be better if an assertion about the unsafe code block broke in an obvious way instead.

Note that the static_assertions crate has assert_not_impl_any! macro that would fail if a trait it names is implemented. What I'm looking for is just a more elaborate version of that.

The whole point is to have changes that break the underlying assumptions of unsafe code break in obvious and predictable ways, instead of hidden and unpredictable ways. That's the reason for all assert...! macros.

scottmcm · May 9, 2024, 3:35am

Note that const { assert!(size_of::<T>() == size_of::()) }; is riding the trains to be stable in 1.79 (June 13th). Those more directly allow you to write those checks in a way that will fail ASAP instead of waiting for runtime.

paramagnetic · May 9, 2024, 5:06am

The whole point of unsafe is that you can't have its invariants checked by the compiler because there's simply not enough information in the type system. If it could be checked automatically, then it wouldn't have been unsafe in the first place.

I think there is definitely some room for improvement for things like transmuting pointers, which is basically never the right thing to do. That could be a static rule (e.g. deny-by-default lint), but most of the hard stuff in unsafe cannot be checked, by definition.

khimru · May 9, 2024, 8:32am

Can you show us an example? Starting with some imaginary “business case” and going down to the level of “unsafe parametric polymorphism”?

Yes, but that's not the Rust phylosophy, it's more like modern C++ philosophy. Rust philosophy is that implementation of unrelated traits shouldn't affect your unsafe code and that unsafe code shouldn't do tricks which may be affected by existence of unrelated traits.

It may sound like a “just a more elaborate version of that”, but it asks for something that compiler wouldn't even know at the time of compilation, it needs some global knowledge about whole program while assert_not_impl_any! is pretty local.

ike · May 9, 2024, 5:40pm

My issue started with inspecting someone else's code to see what they were doing, because I'm still learning Rust, and need to develop an understanding of how to use unsafe.

The code snippet is something like:

fn fun(x: &[T]) {
...
let y = unsafe { slice::from_raw_parts(x.as_ptr() as /*some ptr conversion*/, x.len());
...

My first thought was - why not just transmute the whole slice? I searched to find out why not - and found something that says that the Rust compiler is free to implement different slice types differently even in the same compilation. In other words, the representational parametric polymorphism I assumed was there isn't. One slice type might have the length first and the pointer second, while another might have the pointer first and the length second. So the Rust compiler is that unpredictable. Certainly a trap I would have fallen into, coming from C/C++. The online source gave a very similar recipe to the above code snippet as the right way to do it.

But then, on closer reading of slice::from_raw_parts documentation, it says:

Caveat

The lifetime for the returned slice is inferred from its usage. To prevent accidental misuse, it’s suggested to tie the lifetime to whichever source lifetime is safe in the context, such as by providing a helper function taking the lifetime of a host value for the slice, or by explicit annotation."

And I noticed that the above unsafe code snippet does nothing about the slice lifetime. The resulting slice y has an unbounded lifetime. The unbounded lifetime is absolutely not needed because y should and could have the same lifetime as the original slice x. Is that a problem? If the Rust compiler is that unpredictable, maybe it is. The source slice x is probably safe for the rest of that function call, but what if that function is inlined somewhere, and the Rust compiler decides to re-order operations after inlining. If y had the proper lifetime, the Rust compiler wouldn't place anything depending on y after it had dropped the source vector on which x is based , or allowed that source vector to mutate. But given that y isn't tied to x by lifetime, would the Rust compiler assume that there's no alias there to worry about and do whatever re-orderings it wants that don't conflict with other aliasing it infers from other lifetimes?

Even if the Rust compiler has some safe rule like: "When in the presence of an unbounded lifetime, turn off operation reordering", and if the original author of the unsafe block made sure that the code following the unsafe block could resist the unbounded lifetime on y, what about future modifications to that code? Do people maintaining Rust code always carefully study unsafe blocks that precede modifications they make? There's no comment in the code.

So, what help from Rust's compiler would I have liked if I was writing something like this from scratch myself? And that was what led to my initial post in this thread.

jendrikw · May 9, 2024, 5:42pm

That's called safe code /hj

ike · May 9, 2024, 5:52pm

Its safe code because the Rust compiler forces me to adhere to a set of rules that allow it to automatically infer safety itself. In order for it to be able to do so automatically, the rules have to be very conservative. My point was about venturing outside the envelope but still within the space where Rust can verify a proof of safety. There's very large space of things that can't be proven automatically but which can still be verified automatically.

jendrikw · May 9, 2024, 6:28pm

Don't get me wrong, I very much like your idea. But I think the current situation can be summarized as "the rust compiler uses everything it can to prove you don't violate the rules". It certainly is possible that the compilers' capabilities improve over time. Current work include tree borrows which will allow code to compile that is currently rejected. It may also be possible that the "rules" the compilers need to adhere to are relaxed. See Issues · rust-lang/opsem-team · GitHub for examples. But keep in mind that there will always be technically valid programs the compiler will reject.

ike · May 9, 2024, 7:22pm

I wish that was the case, but it seems more like: "the Rust compiler can prove you don't violate the rules, until your first unsafe block. Then the burden is on you." And this burden is heavier than with C/C++ - because in C/C++, the compiler obeys a 50 year old ABI and won't budge, and assumes there may be aliases lurking everywhere. The Rust compiler obeys only its momentary whim about ABI, might do just about anything, and assumes the only aliases that exist are those bound together by common lifetimes.

True: you used an unsafe block, you ought to follow through with your responsibilities. But, "ought implies can" - a point maybe not so ironically attributed to someone named I Kant.

scottmcm · May 9, 2024, 7:28pm

Add enough types with safety invariants that you can write each step as an abstracted safe thing that you can prove independently correct. Even better, put those small individual steps into crates separate from your use of them, publish them, and get eyes on them to prove those parts.

This is the essence of Rust. Find a useful thing that the compiler can't prove safe on its own, and wrap it up into a safe interface. Then compose those safe interfaces without worrying because they're safe.

This is why things like https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.flatten_mut are interesting. They're a small bit of straight-forward unsafe code, written with comments justifying the implementation, that then others can use without needing to think about those things.

EDIT: it's now https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.as_flattened_mut, per team feedback to get it stabilized.

ike · May 9, 2024, 7:29pm

BTW:

I think the right way to do this is:

fn fun<'a>(x: &'a [T]) {
...
let y = unsafe { slice::from_raw_parts::<'a, _>(x.as_ptr() as /*some ptr conversion*/, x.len());
...

But actually, y's element type (U) also needs a lifetime annotation - so it ends up more like:

fn fun<'a>(x: &'a [T]) {
...
let y = unsafe { slice::from_raw_parts::<'a, U<'a>>(x.as_ptr() as /*some ptr conversion*/, x.len());
...

Could there be a lint to suggest this? Or something?

scottmcm · May 9, 2024, 7:34pm

The right way is usually to extract it out into a function and have lifetime elision get the lifetime correct without you needing to think about it.

For example, you don't put the from_raw_parts in the middle of a long function, but instead do a

pub fn as_slice_of_unsigned(x: &[i32]) -> &[u32] {
    let ptr = x.as_ptr().cast::<u32>();
    let len = x.len();
    // SAFETY: `i32` and `u32` have exactly the same size & alignment,
    // and they both have the validity invariant that all bitpatterns are allowed and no padding,
    // so the input slice existing is enough to know that the alignment and reasonability requirements are met
    unsafe { std::slice::from_raw_parts(ptr, len) }
}

Then the code calling as_slice_of_unsigned doesn't have to think about lifetimes, because lifetime elision tied things together properly.

Topic		Replies	Views
Unsafe code confusion help	2	168	April 26, 2024
Understanding unsafe code and is it idiomatic? help	3	525	May 29, 2020
You should stop telling people that safe rust is always safe help	10	2629	January 12, 2023
Learn Rust the Dangerous Way - the unsafe-first tutorial tutorials	60	10152	April 5, 2020
Unsafe Rust: Intro and Open Questions	11	2244	January 12, 2023

Compiler/lint guidance with unsafe code?

Related Topics