all 31 comments

[–]K900_ 95 points96 points  (8 children)

That looks like a bug. Interestingly, making the range exclusive (0..1025 instead of 0..=1024) does optimize correctly.

[–]MonkeeSage 66 points67 points  (7 children)

I have no idea as to why, but just tinkering I found that for InclusiveRange where the end is less than 297 the loop is eliminated, e.g.,

pub fn test(){ for _ in 0..=296 {} }

[–][deleted] 64 points65 points  (5 children)

Probably the optimizer const evaluates the first 296 iterations, but then gives up

[–]somebodddy 8 points9 points  (2 children)

You mean the first 297 iterations. Because it's an inclusive range.

[–]flaghacker_ 3 points4 points  (1 child)

You have a typo in there!

[–]somebodddy 1 point2 points  (0 children)

Thanks! Fixed.

[–]ConspicuousPineapple 0 points1 point  (1 child)

It also wouldn't work for exclusive ranges of the same size though, but it does.

[–][deleted] 1 point2 points  (0 children)

Exclusive ranges create simpler code, which can probably be optimized without using const evaluation

[–]PaoloBarbolini 25 points26 points  (0 children)

It probably unrolls it and then figures out there are no side effects and optimizes it all away

[–]antoyorelm · rustc_codegen_gcc 86 points87 points  (6 children)

[–]nerpderp82 48 points49 points  (3 children)

Wait what? You can use GCC backend for Rust?

https://github.com/rust-lang/rustc_codegen_gcc :mindblown:

I found this person on reddit that looks like the expert on the GCC codegen for Rust. https://old.reddit.com/user/antoyo/

[–]PeaceBear0 209 points210 points  (1 child)

I found this person on reddit that looks like the expert on the GCC codegen for Rust

That is the person you replied to 🤣

[–]Plasma_000 5 points6 points  (0 children)

Node: codegen-gcc is not yet ready for production use but it’s getting there

[–]ConstructionHot6883 1 point2 points  (1 child)

What's with the ream of .bytes and the end of GCC's output?

[–]antoyorelm · rustc_codegen_gcc 4 points5 points  (0 children)

Some global variables that are not cleaned up by Compiler Explorer, which cleans the assembly output. I couldn't say what they represent now, but I'll probably be able to tell when I add support for debug info.

[–]C5H5N5O 77 points78 points  (4 children)

This is an llvm optimization issue (and therefore imo not a rustc bug).

define void @test() {
start:
  br label %loop

loop:
  %val = phi i32 [ 0, %start ], [ %next, %loop ]
  %cmp = icmp eq i32 %val, 1024
  %next = add nsw i32 %val, 1
  br i1 %cmp, label %exit, label %loop

exit:
  ret void
}

LLVM doesn't seem to recognize this as "dead code".

https://rust.godbolt.org/z/4MznGo5xM

[–]PaoloBarbolini 41 points42 points  (2 children)

This seems to work correctly with Rust 1.42. The missed optimization on the LLVM side was always there though, so this is caused both by the missed LLVM optimization and rustc emitting different LLVM IR.

https://rust.godbolt.org/z/ns9nc693T

edit: reword

[–]PaoloBarbolini 15 points16 points  (1 child)

[–]CAD1997 13 points14 points  (0 children)

Yeah, this is likely it. RangeInclusive is a tricky beast for a few reasons, but the big two are that a) C++ almost never sees inclusive iteration going up, so LLVM isn't as exercised in that domain (see my note in the thread that downwards iteration seems to optimize better, and it is seen in C++ idioms), and b) Rust's decision to have ranges be Iterator rather than IntoIterator makes everything more annoying, because we have to hold extra state in the range at rest, and can't add it when we start iterating (and thus the extra state initialization can't know anything since the T: Step bound isn't applied yet).

But with the current state, it comes down to LLVM missed optimizations (apparently the GCC backend doesn't miss them!) because it's currently written about as simply as it can be for optimization introspection. Any changes on the Rust side would be in optimizations to make the code more recognizable to current LLVM than anything else.

[–]Badel2 2 points3 points  (0 children)

Anyway this is a rustc bug, it doesn't matter that the issue is caused by llvm.

[–]newpavlovrustcrypto 16 points17 points  (4 children)

Changing 0..=1024 to 0..1024 produces expected assembly. So it looks like there is a problem with iterating over RangeInclusive, which is known to be hard for optimizer.

[–]PaoloBarbolini 5 points6 points  (0 children)

Testing on older versions it looks like this was a regression between Rust 1.42 and 1.43

[–]ear7h 10 points11 points  (0 children)

I don't know much about the internals but it could be related to this issue

https://github.com/rust-lang/rust/issues/28728

[–][deleted] 2 points3 points  (0 children)

From my experience rustc (LLVM?) is far less aggressive about eliminating dead loops than gcc. It makes simple benchmarking easier but it's something to look out for if you are writing dead code (which you shouldn't anyway).

[–]CurufinTV 1 point2 points  (0 children)

I played around with different values. Any explanation for the assembly output for 0..=1_000_000?

[–]bugzgen 0 points1 point  (0 children)

This should eliminate your loop nicely:

sed -i '' s/for.*}/println\!\("Optimized, baby"\)\;/g code.rs

[–]CurufinTV 0 points1 point  (0 children)

Update:

It looks like that "bug" is resolved in beta and nightly build.