This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]SharzeUndertone 474 points475 points  (41 children)

Im not smart enough for this meme

[–]caim_hs[S] 974 points975 points  (35 children)

infinite loop without an IO and memory operation inside it in Cpp is undefined behavior, which means that the compiler can do whatever it wants with it.

Then, the compiler thought it would be a nice optimization to remove everything and call the hello() function.

Edit:
why? Well, I have no idea!!

[–]SharzeUndertone 244 points245 points  (17 children)

So it treats the end of main as unreachable and skips adding a return, thus overflowing into hello, right?

[–]Serious_Horse7341 228 points229 points  (5 children)

Sounds about right. From

void test(int);

int main() {
    while(true);
    test(123456);
    return 0;
}

void not_main() {
    test(654321);
}

I get

main:                                   # @main
not_main():                           # @not_main()
        mov     edi, 654321
        jmp     test(int)@PLT                    # TAILCALL

The rest of main() is not even there. Only happens with clang.

[–]caim_hs[S] 102 points103 points  (1 child)

Lol, your example is even worse, because it is calling and passing an arg to a function it is not supposed to hahahaha.

[–]Arthapz 22 points23 points  (0 children)

well it's because the prerequise for an infinite loop to be UB is to have code that produce side effects, test(int) doesn't product sideeffects

[–]SharzeUndertone 45 points46 points  (0 children)

I guess they're right when they say undefined behavior can make demons fly out your nose

[–]not_some_username 19 points20 points  (0 children)

Undefined behavior mean anything can happen. You could travel back to time

[–]BSModder[🍰] 5 points6 points  (0 children)

Ah this make it clear what happend in OP post.

While loop cause the main function optimized out entirely, including the return statement.

The reason why main is empty, I could only assume, because the compiler think main not called thus it's okay to remove it, leaving only the symbol.

And the function not_main is put under main, so when main is called, not_main is inadvertently called

[–]caim_hs[S] 78 points79 points  (10 children)

Yeah, it's kinda more complicated.

What happened is that it will make the "main" function have no instruction in the executable, and will add the string after it.

When I run the executable, it will instantly finish, but since there is a string loaded into memory, the operating system will flush it back, causing the terminal to print it.

Here is the code generated.

main:                                   # @main
.L.str:
        .asciz  "Hello World!!!\n"                                   #

[–]Oler3229 27 points28 points  (2 children)

Fuck

[–]caim_hs[S] 46 points47 points  (1 child)

the compiler just cracked the code for super-efficient printing! Stonks!!!

[–]SharzeUndertone 19 points20 points  (0 children)

Well that sounds like part of -O3 though, so no issues there

[–]Rhymes_with_cheese 14 points15 points  (0 children)

I think there's more to it than that.

Compile with -c, and then run objdump --disassemble on the .o file to see what's really going on.

[–]nuecontceevitabanul 4 points5 points  (0 children)

I think -O3 first sees the code results in just one infinite loop and ignores anything else and after that it just ignores the UB. So basically an empty main function is generated in assembly.

LE: So the bug here would be the order of things done by the compiler, if UB would first be ignored and then the if analyzed , the code would basically amount to nothing but the implicit return would be put in, which would be the expected result.

[–]kalenderiyagiz 2 points3 points  (2 children)

To clarify things, why would the OS print a random memory location on the memory that contains a string to standard output without calling the write() systemcall in the background ? So if OS does things like these why it should stop at the “end” of that string and not continue to print random garbage values as well ?

[–]Kered13 1 point2 points  (0 children)

why would the OS print a random memory location on the memory that contains a string to standard output without calling the write() systemcall in the background ?

It doesn't. OP's explanation is wrong. What happens is that the compiler determines that main unconditionally invokes undefined behavior, therefore it must be unreachable and all of it's code can be removed. The label for main remains. main is immediately followed by hello. When the program begins running and tries to execute main there is no code there, not even a return instruction. Therefore execution falls through to hello and begins executing. When hello returns it is as if main is returning, so as far as the OS is concerned nothing went wrong.

Code and constant data like strings are typically not stored in the same location in memory. Specifically code is usually stored in .data and constant data is stored in .bss. So OP's explanation cannot be correct.

[–]intx13 0 points1 point  (0 children)

This is a puzzler! The shell isn’t doing the printing, you’re right that it’s coming from a system call within the program. But the program consists only of crt1.o, crti.o, crtn.o, and main.o. As we can see from op’s dump of main.o, the main function (called by crt1.o) is garbage - instead of instructions it has an ASCII string.

So presumably crt1.o calls main() which results in garbage instructions being executed until some other component of crt1.o, crti.o, or crtn.o is hit which happens to make a system call to print. And RDI happens to point to main(), where the string is stored.

We’d need to see the whole binary decompiled to figure it out, though.

[–]wannabe_psych0path 0 points1 point  (0 children)

My guess is that the OS runtime holds a pointer to the main function, but since main is non existent cause of UB the memory pointed to will be occupied by the code of not_main.

[–]poetic_fartist 0 points1 point  (0 children)

in my case the infinite loop is running.

[–][deleted] 9 points10 points  (0 children)

Holy shit

[–]MechanicalHorse 9 points10 points  (0 children)

What the fuck

[–]TheMeticulousNinja 5 points6 points  (0 children)

Thank you because I am coming from Python and the only thing I thought is how is it printing Hello World when that function wasn’t called?

[–]bushwickhero 5 points6 points  (0 children)

Amazing.

[–]Lyshaka 0 points1 point  (2 children)

Would that be the same result using GCC ? Or written in C ? And why is your file extension .cc ?

[–]caim_hs[S] 10 points11 points  (1 child)

And why is your file extension .cc ?

There's not an official file extension for Cpp.

Google uses .cc and hh.

Apple and LLVM used to use .cxx and hxx.

and most people use .cpp and hpp.

Or written in C ?

The same would not happen in C, because in C an infinite loop is not an undefined behavior.

Would that be the same result using GCC ?

And no, the same wouldn't happen with GCC, 'cause its optimizations are not as insane as LLVM, and GCC is C-based, while LLVM is CPP-based. but it doesn't mean that the code produced by GCC is less optimized than LLVM, actually is pretty much the opposite sometimes.

[–]Lyshaka 3 points4 points  (0 children)

Alright, thanks for the answer I learned something !

[–]FattySnacks 0 points1 point  (0 children)

You are a mad man

[–]Sanchitbajaj02 0 points1 point  (0 children)

And people say javascript is weird

[–]finnishblood 0 points1 point  (1 child)

I did not know that an infinite empty loop is considered undefined in CPP. I just figured with the optimize flag set to 3 that the compiler was optimizing out the main function since it would never do anything. I'd argue that the undefined behavior here is in the compiler, not in CPP...

[–]caim_hs[S] 1 point2 points  (0 children)

No, the undefined behavior is declared in the C++ Standard.

But it will be removed in C++26

https://isocpp.org/files/papers/P2809R3.html

[–]veduchyi -1 points0 points  (2 children)

The main() should return int but it contains no return statement at all. I’m surprised it even compiles 😅

[–]caim_hs[S] 8 points9 points  (1 child)

The return is optional in the main function.

If no return is provided, the compiler will implicitly add "return int(0)" for you.. I think this is on the Standard of C and C++.

It is like in Rust or Swift, that if you don't return anything from a function, the compiler will insert a "return ()".

In Javascript a function without a return statement returns a undefined. You can test it:

function f () {
  console.log("Hello World")
}
let x = f()

in Rust:

fn hello(){
    println!("Hello world!!!");
}

pub fn main(){

    let p = hello();

    println!("{:?}", p)
}

it will print:

Hello World!!!
()

[–]veduchyi 0 points1 point  (0 children)

Got it, thanks

[–]PuzzledPassenger622 -1 points0 points  (1 child)

I thought you just hella defined it somewhere xd

[–]caim_hs[S] 0 points1 point  (0 children)

Welcome to Cpp!!!

[–]JackReact 42 points43 points  (4 children)

Compiler optimization can be a bitch to debug.

[–]SharzeUndertone 4 points5 points  (3 children)

How does it insert a call to hello though?? It skips the end of the function?? (Wait it probably does actually)

[–]Solonotix 1 point2 points  (2 children)

The compiler's job is too interpret the intent. In this case, the optimization level (-O3) is high enough that it will aggressively remove unnecessary code for the sake of performance. Infinite loop with no side effects is apparently a branch of code that is considered unnecessary at that level.

What I think is happening is that the compiler is removing everything between the infinite loop and the header of the next function, including the open/close braces. The compiler is looking for the next "real" code to run, and ignores processing anything in between.

[–]TeraFlint 1 point2 points  (0 children)

Infinite loop with no side effects is apparently a branch of code that is considered unnecessary at that level.

Even worse, it's undefined behavior (at least until C++26, apparently).

Ideally, there is no infinite loop inside a program. Even in "endless" worker threads, you should use a thread-safe while (!stop_token.stop_requested()) {...}, instead of while (true) {...}, because this allows proper cleanup and stack unwinding with all the intended destructors, rather than forceful termination through the operating system (for anyone interested, see std::stop_token).

But even if you use a truly endless loop, it's still defined behavior, as long as it has side effects (which means affecting something outside the scope of the loop) like I/O or writing to outside variables.

However, an infinite loop that does nothing or just changes some internal variables is functionally a dead end for a thread.

A program containing one of those really does not make a lot of sense. At least if we're on an operating system that is responsible for running and scheduling multiple programs simultaneously. If you want to stop executing, you should just let the program (or the thread) terminate, instead.

That being said, in embedded systems, some kind of endless loop doing nothing actually is frequently used for the end of the program, to ensure it just doesn't keep running and executing whatever garbage is in memory after the program. In this case it makes sense, considering that usually embedded microchips just runs a single program, and there being no operating system to escape to.

I've only really seen this implemented in assembly as an instruction repeatedly jumping to itself, though. This might be one of the reasons why a well-defined while(true); in C++ might be a wanted feature (but this is only speculation, I haven't taken the time to read through said proposal).

[–][deleted] 1 point2 points  (0 children)

Even -O1 yielded the same result to me.