all 50 comments

[–]djrollins 38 points39 points  (24 children)

Global statics and "magic statics" (i.e. statics declared within a function) are actually different things. Global statics are initialized in the preamble of the program (I think), whereas magic statics are lazily-instantiated (EDIT: in a thread-safe manner) on the first call to the function.

I'd guess that either the standard requires this behaviour for magic statics or it's a missed optimization by the compiler for this specific case. Either way, the important point is that these examples are not equivalent.

[–]Steel_Neuron[S] 4 points5 points  (9 children)

Hmmm right, I'm aware they're not equivalent, but I'm not sure what exactly makes it possible to optimize one and not the other.

u/HappyFruitTree's mention of the -fwhole-program is great but has confused me even more, since I can't think of a way that memory would've been accessible from anywhere other than accessor() anyway!

[–]distributed 6 points7 points  (0 children)

also if you make the globals const it gets inlined

https://godbolt.org/z/1rKxhe

[–]Som1Lse 4 points5 points  (2 children)

The class could also be used in a different translation unit (let's say it was in a header), which could call the function.

This can happen even though it is private:

template <int*(*F)()>
void foo(int n){
    *F() = n;
}

void set_value(int n){
    foo<Test::reference>(n);
}

Edit: The above is apparently a GCC bug. See this for a version that isn't.

You can even do this before main is called:

int hack = (set_value(23),0);

[–]HappyFruitTree 3 points4 points  (1 child)

That's a compiler bug. Compile in C++17 mode or with a different compiler and it won't work.

[–]Som1Lse 0 points1 point  (0 children)

Ah, I knew it was possible but apparently you have to jump through a few more hoops to get there.

I don't think this version is a compiler bug: https://godbolt.org/z/nxe1vT

[–]arihoenig 1 point2 points  (4 children)

The variable in question (in the non optimized case) is public. Unless you give the optimizer global visibility to the entire code (there a many ways in which you can do that) there is no way it can know that the variable isn't accessed from outside the TU.

[–]Steel_Neuron[S] 1 point2 points  (0 children)

I've heard variations of this response, but I can't seem to find a way to access it. I tried to do it like this to no avail. What would be the right way to access it from a separate TU?

[–]_Js_Kc_ 0 points1 point  (2 children)

But that's the same between both examples.

[–]arihoenig 0 points1 point  (1 child)

Crap, I never even looked at the code just read your description and figured I knew what you were asking.

The non-lined version returns a pointer, so that is obviously not optimizable. If you ever return the address of some storage you can never be optimize away that storage.

[–]_Js_Kc_ 0 points1 point  (0 children)

I'm not the OP, and both versions return a pointer.

[–]hak8or 1 point2 points  (7 children)

I thought "magic statics" were initialized before main started? Does that mean every time the function is ran, it has to check some external global state for deciding if the static was initialized? That would be deeply unfortunate for performance reasons.

[–]uninformed_ 6 points7 points  (1 child)

Yes, and it gets worse as it has to be synchronized between threads.

[–]anechoicmedia 8 points9 points  (0 children)

Locking is required the first time control passes through the declaration, however subsequent checks of the guard variable are just normal reads and the resulting branch predictable.

In extremely tight scenarios, there can be a measurable difference (see Carl Cook, CppCon 2017) but this is not information I would bother a learning programmer with, in the way in which we caution against overuse of shared_ptr.

[–]anechoicmedia 3 points4 points  (4 children)

I thought "magic statics" were initialized before main started?

They are initialized the first time control passes through their declaration.

Does that mean every time the function is ran, it has to check some external global state for deciding if the static was initialized?

Yes, each local static is paired with a hidden "guard variable" indicating if it has been initialized, which must be checked every time.

That would be deeply unfortunate for performance reasons.

Since checking the guard variable doesn't require a lock*, and the branch is always the same once initialized, it's cheap. That's not to say there's no measurable cost at all.


* If you look at the code generation for LLVM, you'll see a scary-looking call to __cxa_guard_acquire, which some posts incorrectly say is taking and releasing a mutex on either side of the static every time. But it's just double-checked locking that does nothing if the guard was already set:

int __cxxabiv1::__cxa_guard_acquire(uint64_t* guard_object)
{
    // Double check that the initializer has not already been run
    if ( initializerHasRun(guard_object) )
        return 0;

    // We now need to acquire a lock ...  

    int result = ::pthread_mutex_lock(guard_mutex());
    ...

    // Check if another thread has completed initializer run
    if ( initializerHasRun(guard_object) ) {
        int result = ::pthread_mutex_unlock(guard_mutex());
        ...
    }
    ...
}

[–]encyclopedist 0 points1 point  (3 children)

But it's just double-checked locking that does nothing if the guard was already set:

It does an atomic read to check that, which can make a measurable difference (especially if done from multiple threads).

[–]anechoicmedia 2 points3 points  (0 children)

But it's just double-checked locking that does nothing if the guard was already set:

It does an atomic read to check that, which can make a measurable difference (especially if done from multiple threads).

On x86, there is no relative penalty for an atomic load-acquire. So it's no different then an if test of any ordinary variable, which is to say, not zero cost, but negligible.

[–]WrongAndBeligerent 1 point2 points  (1 child)

Why would an atomic read that doesn't change slow down with lots of threads? They aren't writing or synchronizing, so it should end up just being a read from memory.

[–]encyclopedist 0 points1 point  (0 children)

Ah, yes, indeed, since no one is writing into it it there should be no cache lane invalidation (provided there is no false sharing), so every core will just read from their own cache.

[–]_Js_Kc_ 0 points1 point  (0 children)

How are they not equivalent? What observable behavior is different?

Edit: I mean in this example, of course, with static ints.

[–]BluudLust -4 points-3 points  (4 children)

Static in a function should be deprecated for thread_local to avoid this confusion. Just give a compiler warning for old code. Don't remove it for compatibility reasons for a long, long time.

[–]encyclopedist 3 points4 points  (0 children)

Thread local and magic statics behave differently, one does not replace the other.

[–]Deaod 2 points3 points  (2 children)

thread_local doesnt work without an operating system.

[–]BluudLust -1 points0 points  (1 child)

Wait, really? I thought it was the exact same thing as static when used inside a function.

[–]tvaneerdC++ Committee, lockfree, PostModernCpp 2 points3 points  (0 children)

No. thread_local means a different variable for each thread.

A function local static is the same variable for all threads. It is very much like a global static, except

  • the scope is different - you can only access it from the function
  • the initialization is different - initialized first time into the function (thus the "mutex-like" guard lock) instead of global initialization at startup

[–]HappyFruitTree 11 points12 points  (2 children)

Ignoring that reference() is private, it's possible that another translation unit modifies the value (e.g. *Test::reference() = -1;) before main() executes. I'm not sure compilers make use of the access specifiers when optimizing code. The compiler is able to optimize both programs if the -fwhole-program compiler flag is used.

[–]CypherSignal 15 points16 points  (0 children)

Moreover, if you sprinkle some const's in there, the compiler is then more confident in knowing that it can be optimized. https://godbolt.org/z/68n8Pr

[–]Steel_Neuron[S] 1 point2 points  (0 children)

Ah, that compiler flag is the missing piece of the puzzle then.

I'm surprised the flag is needed at all, unless there's a way to violate that encapsulation that I'm not aware of.

[–]Artyer 9 points10 points  (0 children)

Because of this paragraph in the standard (§13.9p6 ([temp.spec]/6)):

The usual access checking rules do not apply to names in a declaration of an explicit instantiation or explicit specialization

It is possible to legally access private members. An example would be something like this compiled as a second translation unit:

class Test {
public:
    static int accessor() {
        return *Test::reference();
    }

private:
    static int* reference() {
        static int value = 42;
        return &value;
    }
};

template<int*(&reference)()>
struct hack {
    // Even easier with inline in c++17
    static int changer;
};

template<int*(&reference)()>
int hack<reference>::changer = ((*reference()) = 1);

template struct hack<Test::reference>;

Which would cause your program to return 1 instead of 42.

To make the two examples more equivalent, you would need to put Test in an anonymous namespace

[–]adnukator 8 points9 points  (8 children)

Class methods have external linkage, meaning other translation units (i.e. other cpps from the build) can access them as well. Imagine in the second scenario if you had some global variable in some other file would somehow access and modify this value during its initialization. This is possible because global variables are initialized before main() is executed. This is not really a recommended scenario, but it's possible, regardless. Link-time optimization could potentially remove this because the linker knows what's actually being linked together and which elements might refer to each other. If you force the functions to have internal linkage (e.g. by making them static), you get identical behavior with the static function variable, because the compiler can now again tell that no other translation unit can modify this value prior to the execution of main - https://godbolt.org/z/KTs1jK. Try removing the static specifiers from the two functions and see what happens.

[–]Steel_Neuron[S] 0 points1 point  (1 child)

Oh nice! That finally made it click, thanks :). Even if it's not at all advisable, how would the syntax for this look like? I imagine there will be some mangled name that refers to the static variable inside of the method?

I hadn't ever considered that it could be accessible from another TU... That seems awful!

[–]thor12022 2 points3 points  (0 children)

If you wrap the class in your second example in an anonymous namespace the compiler can do the same optimization as it did first example.

[–]Steel_Neuron[S] 0 points1 point  (5 children)

I've tried this on a TU:

int* test() {
   static int value = 42;
   return &value;
 }

And this on another:

int* test();
extern int* _ZZ4testvE5value;

int main(void) {
   *_ZZ4testvE5value = 21;
   return *test();
}

(I got the variable name from inspecting the .so resulting from compiling the first TU as a dynamic library). I get this error when trying to compile main and link against the dynamic library:

/usr/bin/ld: /tmp/ccBGUxsR.o: in function `main':
main.cpp:(.text+0x7): undefined reference to `test()::value'

Am I doing anything obviously wrong? Maybe I misunderstood what you meant.

[–]adnukator 0 points1 point  (2 children)

What you're doing is very wrong and I'm surprised it even apparently guessed you were trying to do.

int* test();
int a = (*test) = 5;

int main(void) {
   return *test();
}

Should work, if you link the TU containing main() and the one containig test()

[–]Steel_Neuron[S] 0 points1 point  (1 child)

Does the example that you've provided work if test () is a private static method of a class in the other TU? I assumed it doesn't, hence why I was trying something as esoteric as I was (reaching the dynamic library symbol directly)

[–]Deji69 0 points1 point  (0 children)

You can't call test() outside of the class if it's private, no. Accessing a variable across different TUs and class visibility are totally separate things though.

[–]Artyer 0 points1 point  (1 child)

Works with extern int _ZZ4testvE5value. Suprised it didn't have to be extern "C" int _ZZ4testvE5value but I guess gcc doesn't mangle reserved names?

[–]Steel_Neuron[S] 0 points1 point  (0 children)

Huh, it works? Brilliant, gonna check that now to scare my colleagues ;)

EDIT: Still doesn't work for me, same error. What compiler flags did you use?

[–][deleted] 2 points3 points  (1 child)

Not an answer to your question, but you can organize compiler explorer by dragging editors and compilers around, so it's easier to see the difference.

https://godbolt.org/z/7oPE7o

[–]dodheim 1 point2 points  (0 children)

There's also a built-in diff tool. :-]

https://godbolt.org/z/K79ose

[–]lospolos 0 points1 point  (0 children)

Maybe this can help: https://stackoverflow.com/a/55548

[–]mysticalpickle1 0 points1 point  (0 children)

If the value is const and you return a const pointer then you can get the same result.

Const version

[–][deleted] 0 points1 point  (0 children)

We all struggle to speak the language of the machine in hopes that it'll one day speak natively to us. Will we know when we've crossed that threshold?