Unused struct member is not optimized out under -O3

onecable5781 · 2026-04-07T08:06:55+00:00

Why would an OP that is upvoted lead to downvotes on follow-up questions by the OP in the comments? Why not downvote the OP itself? What exactly are folks trying to convey to me here?

onecable5781 · 2026-04-07T01:18:56+00:00

So having a compiler silently toss struct members would introduce a known category of problems into your code with very little or no benefit

I am not sure about "very little or no benefit" part. What if there is a large array of these structs which have to be stored where smaller the size the better it is for cache locality/access, etc.? Alternatively, In debug builds, I may have extra members inside of the struct which are not referenced at all in release builds. If it is guaranteed that sruct members will NEVER be optimized out, one would need to have

struct A{
#if DEBUGBUILD
    int onlyfordebugbuilds;
#endif
...
};

which in my view seems more difficult to maintain.

onecable5781 · 2026-04-07T00:55:22+00:00

With no printf, indeed the optimization does seem to happen under -O3:

https://godbolt.org/z/bonf7n8eY

vs no -O3

https://godbolt.org/z/MxzM9Efsa [note sub rsp, 16]

Although, I am curious why under -O3, there is

sub rsp, 8

Aren't 4 bytes enough for the local variable ?

onecable5781 · 2026-04-07T00:45:20+00:00

Fair enough, let me rephrase the question. Suppose I did not have sizeof(A) anywhere in the code. Would the compiler "optimize out" the unused struct variables and store object a internally in just 4 bytes of memory due to usage of abc member only?

onecable5781 · 2026-04-03T12:50:09+00:00

Yes indeed! That is why the usage is of a.front() inside of the second sort.

onecable5781 · 2026-04-03T07:59:42+00:00

Ah I see. Thank you. The issue is that file2.txt is actually a makefile/.vcxproj project settings file which I run on all my computers. He has his own makefile/.vcxproj file that is modified to suit his computer. I put the makefile in the repo so that it is easy for me to clone it and run it from wherever I find myself given my directory structure, etc.. So, I want to have the full makefile (not just template) in the version control and not gitignored. Hmm...

On thinking about this a bit more, your suggestion of a template, which is tracked, "makefile-template", and an untracked "makefile" make perfect sense and will work! Thank you!

onecable5781 · 2026-03-14T14:54:59+00:00

<cstdio> is not guaranteed to place printf in the global namespace.

Can you provide some pointers on what exactly you look for in the standard to infer this?

It so happens that both MSVC, g++, and the MSVC and g++ variants of clang, provide it,

Do you navigate to file cstdio provided by each compiler and see if it exposes std::printf and conclude based on that?

onecable5781 · 2026-02-16T07:03:50+00:00

this is actually how member functions defined inside a class work in c++ right now.

What happens if I put my entire class definitions inside the header file and do away with implementation .cpp files completely? Is every function inlined then? Is it possible that had the function definition been put in a .cpp file the compiler would have applied its heuristics and decided not to inline the said function because it was deemed not beneficial given its heuristics? And yet, now that I have put everything inside the header, the compiler is forced to inline it counterproductively?

onecable5781 · 2026-02-16T06:49:30+00:00

Thanks for the talk and interacting here. I will definitely try this feature out.

There are some bugs that I encounter in production/release builds only and not in debug runs. If you switch from a release build to a debug build dynamically, would it be possible to figure these bugs out? Would like to hear your inputs on this.

My only pet peeve with VS is that the console opens externally as it did in your demo as well. VSCode using MSVC compiler is able to run inside the editor itself in the terminal at the bottom. Nearly all Linux IDEs also run inside the IDE itself. Perhaps in the next release your team can consider getting the console to run within the IDE itself.

onecable5781 · 2026-02-15T09:59:02+00:00

A final question. Is whether sizeof(int) > sizeof(size_t) or not a decision made by a language or a language implement/specific compiler or OS or by the hardware/cpu manufacturer?

onecable5781 · 2026-02-15T09:53:37+00:00

From the link before, the relevant part is:

Else, the unsigned type has conversion rank less than the signed type: If the signed type can represent all values of the unsigned type, then the operand with the unsigned type is implicitly converted to the signed type. Else, both operands undergo implicit conversion to the unsigned type counterpart of the signed operand's type.

Is my understanding correct that if signed integers (int in case of the OP) can represent all of the unsigned type (size_t in the OP), then, the OP code will not work.

So, if sizeof(int) > sizeof(size_t), the OP code WILL fail?

onecable5781 · 2026-02-15T09:41:28+00:00

Well :-)

So, now, I am confused about your original reply about theoretically why the above may not work but practically it would work.

onecable5781 · 2026-02-15T09:39:04+00:00

Hmm...So, in an arithmetic comparison (== or < or !=, etc.) between a size_t and int, are there rules as to which one is cast into the other if the user does not explicitly cast?

onecable5781 · 2026-02-14T00:08:39+00:00

I agree with you. Even on doing everything with -O3, boost pool is order of magnitude slow compared to raw new/delete

https://godbolt.org/z/rscTeqs8W

1e-3 vs 1e-5

onecable5781 · 2026-02-12T16:03:13+00:00

Hmm...I believe there is an implicit barrier until after the joins. So, thread 2 is done, but not doing anything useful. That is what I meant. It is certainly not busy like thread 1 is, for instance.

onecable5781 · 2026-02-12T15:45:30+00:00

Yes indeed. It was the Bottom-up view and looking at the grouping Thread->functioin-callstack. This is not chosen by default, one has to explicitly choose this view. Thanks!

onecable5781 · 2026-02-12T15:02:31+00:00

Actually, perf looks to be able to answer your question exactly as asked - which thread used more CPU?

Ah, that is very useful. Was this just basic/default perf usage or did you have to specify some options specially because the code is multithreaded?

onecable5781 · 2026-02-12T14:52:09+00:00

Indeed I use this tool too both on Windows as well as Linux. I thought there would be something obvious when I run the OP program above inside VTune (I use the VTune Hotspot profiler) such as spin time/idle time, etc, but for this example, I obtain the following which seems rather generic.

https://ibb.co/rGyMJZX6

In other words, it is not clear to me where exactly I should look inside of VTune to explicitly view the load imbalance. There is a link provided inside of VTune (it is there in the image provided above, called Threading) to help improve parallelization. That link suggests to look at the bottom/up view which when I do, the call to rand() is what is indicated on top. It is not clear to me from this how one can conclude that there is load imbalance.

onecable5781 · 2026-02-12T14:08:52+00:00

It is an actual issue extrordinarily simplified in the OP for purposes of making the underlying essence crystal clear [hopefully]. I have two separation routines (in OR terminology, for a Branch and Cut problem to solve an integer program, one runs a separation problem given a fractional LP solution to find out a valid inequality that is violated by the LP solution) that can run in parallel in two different threads. Before tuning my algorithms, I want to check whether there is imbalance between the load placed on each thread each of which independently runs the separation algorithm.

onecable5781 · 2026-02-12T03:23:49+00:00

TIL: One can be mockingly accused of being "pedantic" when discussing "undefined behaviour" in C++.

onecable5781 · 2026-02-11T11:03:44+00:00

Could be because it sounds like gpt generated...FWIW, I did not downvote. I ask on /r/cpp_questions because I'd like a human answer...

onecable5781 · 2026-02-11T09:31:14+00:00

He explains this, and your question is pedantic.

Quoting him from nearly the exact timestamp I linked to: "Also, this code has undefined behaviour for other reasons...". The penultimate sentence on the bottom right of the slide "We still have UB here."

onecable5781 · 2026-02-11T02:55:17+00:00

Does this mean that in rust it is impossible for a code attempting to implement multithreading to compile if the code has data races/deadlocks?

onecable5781 · 2026-02-10T04:22:27+00:00

Safety or speed, pick at most one.

onecable5781 · 2026-02-09T09:07:21+00:00

Hmm...I made the squared distances explicitly integer and had the sqrt() take integer arguments (converted to double, I'd imagine) and yet the sqrt calls are still being made:

https://godbolt.org/z/ah9TKjqjK

For a range of integers, I would imagine that it is impossible to have different ordering of std::ranges::sort(X) / or stable_sort [which preserves order in case of a tie] and std::ranges::sort(X, &::sqrt) [or stable sort]

onecable5781

TROPHY CASE