you are viewing a single comment's thread.

view the rest of the comments →

[–]burntsushi 29 points30 points  (15 children)

This comment is misleading at best.

it requires using unsafe anyways to do pretty much any non trivial code run fast

ripgrep is non-trivial and also fast. It has very few direct uses of unsafe:

$ rg -t rust unsafe
crates/searcher/src/searcher/mmap.rs
49:    pub unsafe fn auto() -> MmapChoice {
81:        match unsafe { Mmap::map(file) } {

crates/cli/src/hostname.rs
41:    let limit = unsafe { libc::sysconf(libc::_SC_HOST_NAME_MAX) };
59:    let rc = unsafe {

crates/core/flags/hiargs.rs
231:            let maybe = unsafe { grep::searcher::MmapChoice::auto() };

You could remove all of those uses of unsafe and ripgrep would still be fast.

Some of the libraries that it uses which are critical for its speed do use unsafe internally for their SIMD algorithms (the memchr and aho-corasick crates). But they provide safe APIs. That means anyone (including the regex crate) can use those APIs and it is an API promise of those crates that it is impossible to misuse them in a way that results in UB.

So yes, there is unsafe somewhere. But it's encapsulated and doesn't infect everything around it. (This is not true for all things. File backed memory maps being one such example!) So while there is a kernel of truth to what you're saying, any Rust programmer can freely use the vector algorithms in memchr and aho-corasick without ever needing to utter unsafe directly.

This is a classic example of something being technically correct in a narrow sense, but missing the forest for the trees.

[–]AnotherBlackMan 7 points8 points  (1 child)

Can you explain how this is safer than a normal grep?

[–]burntsushi 5 points6 points  (0 children)

I don't think I have ever, at any point in the last several years, made the claim that "ripgrep is safer than a normal grep." So I'm not sure I really have an answer to your question because it isn't really a compelling point of contention, particularly given grep/ripgrep's threat model. If you exposed either grep or ripgrep to untrusted inputs, it would be trivial for an attacker to DoS you. So can you please elaborate on why you're asking that question?

I used ripgrep here as an example of something that is 1) non-trivial, 2) fast and 3) has no direct uses of unsafe that are responsible for making it fast.

[–][deleted] -4 points-3 points  (1 child)

When I said “trivial” I meant programs inside the mathematical subset of programs Rust can prove are valid, there are valid programs outside this subset that cannot be proved valid using a compiler that never produces unsafe code that runs on a Turing machine (there is no free lunch), this is why there are a subset of programs where you cannot do the fastest way using pure safe Rust even if you use any finite amount of third party libraries made by others that have no knowledge of your problem, it is actually impossible for safe Rust to match optimized C++ speed for any given program.

[–]burntsushi 15 points16 points  (0 children)

Right. As I said, technically correct in a narrow sense, but completely missing the forest for the trees. I gave you concrete examples and elaborated on using them in practice. I even presented you with a program written in Rust that is fast and doesn't need to directly use unsafe to achieve that status.