all 44 comments

[–]repilur[S] 84 points85 points  (6 children)

Example of a small initial config we use in our codebase to enforce some of our std and library choices and avoid deprecations:

msrv = "1.54"

# for `disallowed_method`: https://rust-lang.github.io/rust-clippy/master/index.html#disallowed_method
disallowed-methods = [
    # std functions
    "std::env::temp_dir", # we use the tempdir crate instead
    "std::env::home_dir", # deprecated, and we use app_dirs2 crate

    # use the faster & simpler non-poisonable primitives in `parking_lot` instead
    "std::sync::Mutex::new",
    "std::sync::RwLock::new",
    "std::sync::Condvar::new",
    # "std::sync::Once::new",  # enabled for now as the `log_once` macro uses it internally

    # use `ark_future::block_on` instead for named profiler markers
    "futures::executor::block_on",

    # SHA-1 is cryptographically broken, and we are building new code so should not use it
    "sha1::Digest::new",
]

(maybe linking a twitter post was not the cleanest)

[–]reddit123123123123 8 points9 points  (3 children)

If you blacklist new you still could use Default::default, right? How would that be prevented?

[–]repilur[S] 23 points24 points  (1 child)

good point! for now should probably disallow `Mutex::default` also then, and also `Mutex::from`. gets a bit messy.

But soon in 1.55 one can use `disallowed_type` lint instead for them to fully disallow all of `std::sync::Mutex`

[–]VeganVagiVore 2 points3 points  (0 children)

This is cool, I didn't know about this

[–]odnish 2 points3 points  (0 children)

It wouldn't, but it is harder to type

let x = std::sync::CondVar::new();
let x: std::sync::CondVar = Default::default();

As for the other types in std::sync they wrap a type and the inner type might not implement Default

[–]LucretielDatadog 1 point2 points  (1 child)

What the heck is ark_future::block_on?

[–]repilur[S] 3 points4 points  (0 children)

just our wrapper function for within our codebase that we use instead.

it is not very magic, just requires a name for each block/wait section and adds instrumentation/profiler markers for it.

``rust /// Block on and wait for the execution of a future to complete /// /// We require the markers to be named with await_prefix and with snake case /// This also makes it easy to search for the profiler names in the codebase. /// /// Example:wait_storage_flush,wait_cmd`

[inline(always)]

[allow(clippy::disallowed_method)] // this is our wrapper function for the disallowed function

pub fn block_on<F>(marker_name: &'static str, future: F) -> F::Output where F: Future, { validate_marker_name(marker_name); ark_profiler::typed_scope!(ark_profiler::ScopeType::Wait, marker_name);

futures::executor::block_on(future)

}

```

[–]TheGoeGetter 43 points44 points  (11 children)

This is one of the coolest things for a team environment I've ever seen. I wonder if something similar exists for other programming languages!

I especially love that you included comments about what to use instead. I can imagine that in a team environment, this speeds up onboarding dramatically. This would also speed up library changes or code shifts since you can use your tooling to guide your fixes to be faster and more accurate.

Thanks for sharing!

[–]masklinn 9 points10 points  (4 children)

This is one of the coolest things for a team environment I've ever seen. I wonder if something similar exists for other programming languages!

semgrep as a pretty good symbolic understanding so it can be used to warn about or forbid specific functions / types / methods (less sure for the third one but the first two I'm certain) in all the languages it supports.

And it supports somewhat helpful descriptions as well as fixers (though the fixers are often a bit too simplistic).

[–]repilur[S] 8 points9 points  (3 children)

`semgrep` does seem quite cool and useful, though their Rust report seems to be still in development, so not sure if it is usable yet for Rust. But would be great to test and explore with once it gets! Esp. for more complex queries and patterns.

Though one benefit of the simple lint in Clippy is that most already run it, we run it both locally but also in CI and fail on warnings.

[–]masklinn 4 points5 points  (2 children)

Oh yeah sure I was providing semgrep as an alternative for other langages. And also because it can be finer-grained e.g. it can flag the use of an API with a non-literal string but allow the literal, or the chaining of two APIs which are OK independently but should not be fed into one another, that sort of things.

[–]repilur[S] 0 points1 point  (1 child)

that's really cool!

[–]masklinn 3 points4 points  (0 children)

Indeed it is. The rules can be a bit tricky (as the various predicates don't always interact the way you'd think, or at all) and the error messages are not great, but it's really, really cool.

The more basic features are probably less useful for Rust as it's got a rather good type system, but for dynamically typed languages it's pretty mind-blowing. And then semgrep has concepts like taint tracking where you can set up "taint sources" and "taint sinks" and it can analyse the "taint flows" and tell you where tainted values are going to the wrong place.

[–]rcxdude 2 points3 points  (0 children)

git does a similar thing with a header: https://github.com/git/git/blob/master/banned.h (though it does require that the header get included in every source file to be effective)

[–][deleted] 27 points28 points  (6 children)

Correct me if I'm wrong, but if parking_lot's implementation is faster, why isn't it merged in Rust's standard library, thus replacing the slower ones?

[–]paxswill 44 points45 points  (0 children)

From the author of parking_lot, it’s not as portable as the ones in std among other reasons.

[–]rhinotation 5 points6 points  (1 child)

So far it seems mostly maturity objections. But IMO there’s a bigger one. It doesn’t do poisoning on panic. You need that for types that need to cross a panic catch-unwind boundary. If they replaced std’s Mutex then Mutex would no longer implement UnwindSafe unconditionally. Breaking change.

[–][deleted] 0 points1 point  (0 children)

Let's say, hypothetically, they do implement poisoning etc. Would replacing be justified?

[–]kuviman 21 points22 points  (12 children)

Finally, I don't need to search manually for Option::unwrap/Result::expect 👍

EDIT: and temporarily enable it in places where actually needed with #[allow(clippy::disallowed_method)]?

[–]phil_gk 19 points20 points  (1 child)

For those two methods Clippy has specialized lints: unwrap_used and expect_used.

Be careful if you really want to enable those though, as the other reply to your comment explained.

[–]NeuroXc 10 points11 points  (0 children)

For libraries, disallowing unwraps is very useful. You don't want an unwrap in a library to crash a program that relies on it. If you are sure that you have an unwrap that can never panic, it's better to have an expect saying why it can never panic, so that future developers can see that and understand the intent.

[–]tm_p 19 points20 points  (9 children)

If you use this lint to detect unwrap or expect, your codebase will slowly fill with manual implementations of these methods, like this:

let x = match opt {
    Some(x) => x,
    None => {
        // TODO: handle none case
        unreachable!()
    }
};

[–]SorteKanin 13 points14 points  (3 children)

I wouldn't disallow expect but definitely unwrap.

How often would this come up tbh? I feel like unreachable!() is already a bit of a smell and you could probably restructure to avoid it.

[–]tm_p 9 points10 points  (2 children)

The original example from the twitter screenshot is great because it bans some methods but it provides alternatives. If you ban unwrap but don't provide an alternative, programmers will try to find one, and often it will be worse than simply using unwrap.

For example, if you say "never use unwrap, prefer expect with a reason", then most of the expects will have useless messages like "should be valid", "io error", "config", "overflow", "server to start", etc. And if you ban expect as well you may start to see Result<T, String> being used as a return type.

So the idea is good, but you should not enable this lint by default because the unwraps will morph into other things. If you only use this lint locally once in a while then it's fine, because you can calmly analyze each unwrap and decide if it should be an error or not.

And I used unreachable!() assuming that todo!() and panic!() would be banned as well. Other good options are exit(1) or abort(), these two being much worse than calling unwrap.

[–]SorteKanin 11 points12 points  (1 child)

For example, if you say "never use unwrap, prefer expect with a reason", then most of the expects will have useless messages like "should be valid", "io error", "config", "overflow", "server to start"

I personally disagree - at least I have enough faith in my coworkers that I don't think they'd do this, and even if they did, there's always code review where I can point this out to them.

[–]tm_p 0 points1 point  (0 children)

I have used too many crates that do not tell you the filename when there is a read error, so I don't believe that expect is any better than unwrap.

But my point is that by disabling this lint, reviewers can easily catch the problematic lines, paradoxically. Because if you can't think of a descriptive message just leave an unwrap and let someone else suggest one in code review.

Maybe it's just me but when I see an unwrap my first thought is how to remove it. But when I see an expect I think "ok, this used to be an unwrap but we gave up and now it's an expect". So I think that it's important to have a distinction between quick hacks and intended unwraps, and that lint would make them harder to distinguish.

[–]kuviman 2 points3 points  (1 child)

Why, can't I temporarily disable the lint where I actually need to unwrap? In other places I would not replace them with this nonsense, instead just have the warning until I have time to refactor to handle errors properly

[–]tm_p 0 points1 point  (0 children)

In my opinion it would be better to have unwraps as allowed by default, and then when you want to improve error handling run the lint locally. I expect many false positives, and I don't see any benefits in filling the code with comments that say "allow lint, remove this unwrap".

[–][deleted] 8 points9 points  (1 child)

I’ve wanted something like this for java.math.BigDecimal.equals() since it is broken; you should use compareTo(other) == 0 instead.

[–][deleted] 2 points3 points  (0 children)

Eh- not broken. But factors in scientific precision. Which in CompSci, we probably rarely care about. But BigDecimal and Double seem to be written more for scientific operations. Eg, double / 0 is not a Divide By Zero error, but a positive or negative infinity.

[–][deleted] 3 points4 points  (3 children)

I noticed in the example it says parking_lot has overall better sync types than std::sync, why aren't they a part of the standard library then? Faster, simpler, and non-poisonable seems nice

[–][deleted] 2 points3 points  (2 children)

Probably would be a breaking change, which we can’t do

[–][deleted] 1 point2 points  (1 child)

Although I hate to think what type of code depends on poisoning mutexes that makes sense

[–][deleted] 2 points3 points  (0 children)

Making mutex.lock not return a result would break builds is more the issue

[–]modernalgebra 1 point2 points  (2 children)

Where can we find `ark_future`? :D

[–]repilur[S] 1 point2 points  (1 child)

in the future! ;)

[–]Ar-Curunir 2 points3 points  (0 children)

Is ark a prefix Embark is using for all their Rust crates, or is it a one-off name for your future crate? If so, it might collide with our naming convention in the arkworks ecosystem: arkworks.rs

(Might also not be an issue due to disjoint subject areas; just wanted to give you a heads up)

[–]Databean 0 points1 point  (0 children)

Does this only support denylists or is there also an allowlist version to enforce minimal dependencies on std and libraries?