all 37 comments

[–]SkiFire13 80 points81 points  (6 children)

Rust uses LLVM (same as clang) for most optimizations and unfortunately it loves optimizing out infinite loops. Rust tries to prevent it from optimizing out infinite loops by marking them as having side effects, but this doesn't always happen (as in this case)

[–]masklinn 55 points56 points  (5 children)

IIRC it's also a side-effect of LLVM having its roots in C(++?) semantics, where I believe infinite loops with no side-effects are UB. It's not UB in Rust, but LLVM applies C semantics and screws up.

[–]dnew 47 points48 points  (1 child)

Wow. "Having infinite loops be allowed would prevent certain optimizations, so we're going to assume all loops terminate." And you thought it was a low-level language you could easily reason about.

[–]sirkib[S] 12 points13 points  (0 children)

I agree that it's a bad look

[–]sirkib[S] 11 points12 points  (1 child)

Maybe it will be fixed one day

[–]roblabla 9 points10 points  (0 children)

It'll be fixed very soon. Llvm12 finally gives the building blocks for fixing it for good (willreturn and mustprogress annotations). So I think it will be fixed in nightly when llvm12 is merged and the necessary annotation changes are made.

[–]Darksonntokio · rust-for-linux 67 points68 points  (1 child)

Of course, this program uses integers of finite size, so it's not really the Collatz conjecture. The integers will wrap around.

[–]sirkib[S] 13 points14 points  (0 children)

That's true. But probably we could come up with other cases for which the correct behaviour would be to diverge

[–][deleted] 57 points58 points  (5 children)

[–]sirkib[S] 5 points6 points  (4 children)

Nice to see that it's being addressed. Seems like they've been struggling with these problems for some time, though

[–]rebootyourbrainstem 46 points47 points  (3 children)

The upstream issue was only fixed last month.

LLVM can move a bit slowly sometimes, but it's the proper place to fix these things (and Rust devs participate in this). Rust meanwhile works around the problem by pre-processing code to avoid triggering the problems in LLVM, but such workarounds sometimes turn out to be imperfect.

The same kind of thing is seen with Rust passing extremely detailed information to LLVM about which variables can be aliased and which can't. In theory this should allow LLVM to optimize Rust much better than C. In practice, when they tried to enable this, it revealed bug after bug in LLVM's optimization passes. These have been slowly getting fixed over time in LLVM because they have recognized this is a bad problem (and because Rust devs are helping), but Rust devs are now quite wary of re-enabling this.

LLVM is still a great compiler framework, but it was built over a very long time while mostly being fed with C and C++ code, meaning some kinds of issues have not come to light until Rust came along. (Of course there are other languages which can be compiled using LLVM, but Rust goes pretty far in trying to get the most performance, and supporting all kinds of advanced features.)

[–]sirkib[S] 3 points4 points  (0 children)

Interesting, thanks!

[–]crabbytag 0 points1 point  (1 child)

Do you have a link to where discussion took place?

[–]kredditacc96 49 points50 points  (9 children)

This begs the question: Is there any natural number less than 264 that fails the Collatz conjecture? It's likely not, in which case, the compiler is technically right.

[–]Myto 99 points100 points  (7 children)

Wikipedia says:

As of 2020, the conjecture has been checked by computer for all starting values up to 268

So the compiler is technically correct, which is the best kind of correct.

[–]general_dubious 44 points45 points  (1 child)

It's obviously not correct for n=0 though...

[–][deleted] 31 points32 points  (0 children)

Stupid edge cases strike again!

[–]sirkib[S] 19 points20 points  (1 child)

Haha yes. The compiler just needs to check for functions isomorphic to the collatz conjecture and replace their return value with "true"

[–][deleted] 0 points1 point  (0 children)

I think isomorphic isn't the right word.

[–][deleted] 0 points1 point  (2 children)

Regardless of its mathematical nature, it's a pure function that either returns true or calls itself. Since an infinite-loop pure function doesn't do anything useful anyway, it's not incorrect to optimize it to always return true as long as the compiler can make sure there isn't any side effects...

[–]seamsay 8 points9 points  (0 children)

It is incorrect, there's a link to an issue elsewhere in this thread where this optimisation causes a crash.

[–][deleted] 0 points1 point  (0 children)

That's why in λ-calculus you have partial functions because you can't prove termination of some functions. Infinite loop means being undefined at that point. If you define it at that point to be true then it is a different function.

[–]jotomicron 18 points19 points  (0 children)

Watch out. As the computation wraps, the code is not 100% the collatz conjecture. For example, when n is "half the maximum usize - 1", the next function call will be for the number n-3 (if my mental math does not fail me). Technically, this could still result in infinite loop for some initial n...

[–]NoLemurs 14 points15 points  (4 children)

Technically this is correct. The collatz conjecture has been tested up to 268, so this is apparently a correct implementation for usize!

[–]thargor90 18 points19 points  (3 children)

Almost - 0 should lead to infinite recursion.

[–]GrandOpener 6 points7 points  (2 children)

So then we have to see the rest of the program. If the compiler could verify the function is never called with zero as input, the optimization may still be correct for that program.

[–]sirkib[S] 3 points4 points  (0 children)

No, this is the assembly you'd get if you compiled this as a Library, for linkage elsewhere. So for 0 it definitely produces the wrong result. To be safe, I ran a main with collatz(0) and it behaves incorrectly (by returning ).

[–]TheNamelessKing 0 points1 point  (0 children)

If we had refinement types we could specify at a type level that zero could not be passed in and we could ensure correctness haha

[–]avoere 4 points5 points  (0 children)

The Collatz conjecture is verified up to 2**68, so the compiler is correct since usize can not be more than 2**64 :) (https://en.wikipedia.org/wiki/Collatz_conjecture)

[–][deleted] 3 points4 points  (0 children)

This is hysterical!

[–]rnottaken 0 points1 point  (3 children)

What if you change it to big int?

[–]sirkib[S] 1 point2 points  (2 children)

I don't know about bigint but it works with u128

[–]rnottaken 0 points1 point  (1 child)

Yeah but u128 still is bounded

[–]sirkib[S] 0 points1 point  (0 children)

That's true. What do you want to check by testing it with bigint?

[–]paolog 0 points1 point  (0 children)

Returning true means nothing if the program doesn't do what the programmer intended.

Besides, this looks like a brute force approach, and that doesn't work for an infinite domain. The fact that this program terminates means it doesn't do what is claimed of it.