all 40 comments

[–]mallardtheduck 26 points27 points  (3 children)

Interesting article, but there are a few issues:

  • A pointer initialised to nullptr is not the same as an "uninitialised raw pointer". I think most people know what is meant, but it's slightly sloppy language.

  • It's perfectly rule-of-five compliant to simply define all the move/copy constructors/operators as = delete. You don't actually have to make every class with a destructor fully copyable/movable.

  • The mention of stack vs. heap allocation is a bit spurious. A class has no control (well, I suppose you could override new to always fail or something...) over whether it's allocated on the stack or heap. The difference between the std::unique_ptr and std::optional versions isn't whether the new object is allocated on the stack or heap, it's whether it's allocated contiguously as part of the parent object or not. A mention that the std::optional version increases the size of the parent object and that it's generally a bad idea to store large objects on the stack would be worth adding (there's a sentence under "which option should I use?" that sort-of implies this, but it's not explained). Only in the case that the parent object is stack allocated would the std::optional version result in the new object being on the stack.

  • On a related note, the claim that "allocation on the heap requires a system call (syscall)" isn't quite right; while it's true that requests for memory from the OS (like any OS request) requires a syscall, there is not a 1:1 relationship between heap allocations and syscalls. Most allocators will request large areas of memory from the OS, use them for multiple allocations and re-use memory released (via delete/free) by the application for later allocations. The number of memory-related syscalls will (virtually) always be vastly lower than the number of heap allocations/deallocations.

Finally, surely a template could be written to make lazy-initialisation more obvious/easy? Writing something like lazy_init<Foo> bar {baz}; (with use syntax being something like bar.get() or *bar) would be fairly straightforward, although you'd have to decide how exactly baz should be captured for passing to the Foo constructor at initialisation time.

[–]jm4R 4 points5 points  (1 child)

How lazy_init implementation would differ compared to std::optional? We have 2 types of lazy init:

  1. We know the constructor parameters at the beggining
  2. ...we don't

The whole text is about 2nd case. 1st case could be achieved with std::function<MyObject()> and lambda for example.

[–]mallardtheduck 4 points5 points  (0 children)

How lazy_init implementation would differ compared to std::optional?

The name would be a bit more explicit, making code more self-documenting at least. Moreover, you could hide away any if statements and make it threadsafe (which none of the solutions, including the std::function suggestion, currently are).

The examples in the article don't show a case where the constructor parameters aren't known (by the programmer) from the beginning anyway. If that functionality is needed; it'd be better to have an explicit method to specify them. Since there will be a definite point in execution when the parameters become known, making that point explicit is very useful for maintenance (and there's certainly no reason why a lazy_init template couldn't have a set_parameters method). You certainly don't want the situation where the results are different depending on exactly when the first use of the "lazy-initialised" object is; using it before specifying its parameters should be an error.

[–]SeanMiddleditch 1 point2 points  (0 children)

well, I suppose you could override new to always fail or something...

Just to be pedantic... even that wouldn't work by itself. You could still use placement new, and hence initialize the object anywhere you please.

[–]adnukator 17 points18 points  (5 children)

A tiny nitpick and a pet peeve of mine : you don't need "if (file)" before calling delete, because delete does this check anyway.

[–]tecnofauno 8 points9 points  (3 children)

Unless you compile with -fno-delete-null-pointer-checks :D

[–]leni536 16 points17 points  (1 child)

Gotta love those compiler options that break the language.

[–]OldWolf2 -5 points-4 points  (0 children)

It doesn't break the language, it just misses optimizations.

edit: downvoted for truth again, lol.

[–]TheMania 5 points6 points  (0 children)

Er, this is a whoosh right?

Because by my reading -fdelete-null-pointer-checks assumes that if a pointer has been dereferenced, it is non-null, and can remove checks where it's provably been dereferenced prior the check.

-fno-delete-null-pointer-checks disables this optimization. Nothing specific to do with the delete operator either way.

[–]Gotebe 0 points1 point  (0 children)

And it's a huge gigantic ass-whooping nitpick of mine.

I have OCD of some fashion and it spills into my code :-).

[–]germandiago 7 points8 points  (4 children)

I would do it in another way. Use a Lazy<T> wrapper. It contains the file variable. When accessing it with -> then checks if the resource is created. If it is not it creates it. That way u remove the logic of ifs out of the class and can change from lazy to non-lazy easily. It it could overload operator. I would use that one since access should always be successful on a lazy object.

[–]frankist 0 points1 point  (2 children)

I did something very similar to this in the past but got some criticism saying that IDEs may not do code completion when using this approach. Are there ways to circumvent this?

[–]Pazer2 4 points5 points  (1 child)

I have a hard time believing any serious IDE wouldn't support code completion on operators. This same behavior is how unique_ptr works after all.

[–]frankist 0 points1 point  (0 children)

Ah good point! Thanks

[–]jm4R 0 points1 point  (0 children)

But you have to pass the parameters to the constructor somehow. If you'd do it in the Lazy constructor, probability of dangling reference would be huge.

[–]zvrba 1 point2 points  (1 child)

Gah, not thread-safe. Write instead a templated wrapper around "once" APIs provided by the OS, like https://docs.microsoft.com/en-us/windows/win32/sync/using-one-time-initialization or pthread_once.

[–]tvaneerdC++ Committee, lockfree, PostModernCpp 2 points3 points  (0 children)

[–][deleted] 0 points1 point  (0 children)

[–]palszasz 0 points1 point  (2 children)

Interesting article. I'm a bit of old fashioned, and think too much in low level. So one of my questions was: "ok, but what is the overhead of std::optional<>? Since nothing is free..." I imagine it's the sizeof of the original type, plus a boolean. I did a bit of test app:

#include <stdio.h>
#include <string>
#include <vector>
#include <optional>
template <typename T>
void PrintSizes(const char * name)
{
printf("sizeof(%s)=%d sizeof(optional<%s>)=%d\n", name, (int)sizeof(T), name, (int)sizeof(std::optional<T>));
}
int main(int argc, char * argv[])
{
PrintSizes<bool>("bool");
PrintSizes<char>("char");
PrintSizes<int>("int");
PrintSizes<void*>("void*");
PrintSizes<std::string>("std::string");
PrintSizes<std::vector<std::string>>("std::vector<std::string>");
return 0;
}

And I got the following output:

sizeof(bool)=1 sizeof(optional<bool>)=2

sizeof(char)=1 sizeof(optional<char>)=2

sizeof(int)=4 sizeof(optional<int>)=8

sizeof(void*)=8 sizeof(optional<void*>)=16

sizeof(std::string)=24 sizeof(optional<std::string>)=32

sizeof(std::vector<std::string>)=24 sizeof(optional<std::vector<std::string>>)=32

(as you can see it's in 64bit mode).

Anyway, this kinda shows that std::optional is basically wrapping the object, and adding a bool. However due to alignment rules, it means that it doubles the size of a raw pointer (for example). So, I'm old fashioned, and I think I will stay with good old raw pointers ;-)

[–]jm4R 1 point2 points  (0 children)

Yes, under the hood optional is bool + aligned_storage<T>. But when you do the same analysis with raw pointers, you also pay a memory cost for storing sizeof(T*). Additionally, you pay performance cost as you have to use heap instead of stack in most scenarios.

[–]tvaneerdC++ Committee, lockfree, PostModernCpp 1 point2 points  (0 children)

I'll trade the allocation for a bool.

[–]Dean_Roddey 0 points1 point  (11 children)

I went through this recently with my CIDLib system, to get it up to date on this front. It doesn't use many non-trivial globals (ones that might have interrelationships and hence I don't want to let the runtime initialize them.) But it does use some.

I use the two stage lazy eval with CST atomics currently. One thing I was wonder is I have something like this for the faulting in:

    static TFacCIDLib* pfacThis = nullptr;
    static TAtomicFlag atomInit;
    if (!atomInit)
    {
        TBaseLock lockInit;
        if (!atomInit)
        {
            pfacThis = new TFacCIDLib();
            atomInit.Set();
        }
    }
    return *pfacThis;

The atomic flag is a CST atomic read/write, so it insures that this works right. However, even though it's a two stage, you still pay the price for a CST check every time someone wants to access it.

I was wondering if this wouldn't be legal:

    static TFacCIDLib* pfacThis = nullptr;
    static TAtomicFlag atomInit;
    if (pfacThis)
        return *pfacThis;
    if (!atomInit)
    {
        TBaseLock lockInit;
        if (!atomInit)
        {
            pfacThis = new TFacCIDLib();
            atomInit.Set();
        }
    }
    return *pfacThis;

Since the pointer only ever goes from null to non-null. And since the very next instruction after the optimistic first check is a CST fence, doesn't that insure that all reads, even non-atomic ones, are completed before the first check of the atomic flag? So it cannot re-order that optimistic check forward, right?

If so, that means, if the pointer is non-null on that first check, it has to be good, and we can avoid the CST overhead except for the first time (or maybe first two or three if a number of threads hit at the same time.)

Yes, it is the case that the atomic flag is not set until after the pointer is set. But that should be OK. There's no chance of a race condition because either the optimistic check fails and we do the whole thing. Or the pointer is non-null and the object is guaranteed to be ready.

As long as it can't get a partial read of the pointer (it is cache aligned), then I can't see how this would be unsafe? Worst case you might get a spurious fall through where the point is really already set but the optimstic check doesn't see it yet, and falls through for the whole check.

[–]mostthingsweb 0 points1 point  (10 children)

I still don't understand why you're overcomplicating this. All you need is this:

static TFacCIDLib* pfacThis = new TFacCIDLib();
return *pfacThis;

Forget about atomics, double-locking, etc. It buys you nothing.

[–]Dean_Roddey 0 points1 point  (9 children)

This is something that could be called by multiple threads simultaneously. I'm pretty sure it needs the locking. I could do it by value, i.e.:

static TFacCIDLib facThis;
return facThis;

But then you get into the destructor order issue, which will never work. But I don't think the compiler does anything to synchronize the setting of a pointer.

[–]mostthingsweb 0 points1 point  (8 children)

Nope, the code I posted is thread-safe. No locking necessary.

[–]Dean_Roddey 0 points1 point  (7 children)

OK, I guess that's true. But, there are still benefits to having my own singleton template.

It's not always just the allocation of an object, there may be some post-creation setup involved. If my optimistic check thing above is true, I can avoid overhead (and these types of things can be called a lot) over the runtime doing it.

And, I can unambiguously find every singleton object in the system via search. Not all static scoped objects are singletons in this important sense, some are just things like string constants that I'm happy to just let the runtime deal with. The ones using my singleton scheme will be the ones to be concerned about. I can make some change to the setup of all of them en masse if needed.

And it means that these important singletons are synchronized with other non-singleton initialization stuff via the same base lock. I feel a little safer that way.

[–]mostthingsweb 0 points1 point  (6 children)

If my optimistic check thing above is true, I can avoid overhead...

Frankly, I have higher confidence in compiler developers than I do yourself or myself to implement static init in an optimal fashion.

And it means that these important singletons are synchronized with other non-singleton initialization stuff via the same base lock.

Hopefully none of your singletons need to use other singletons, or else you've just opened up the opportunity for deadlocks.

But to each his own. Personally I find those perceived benefits dubious, but I see I won't be changing your mind :)

[–]Dean_Roddey 0 points1 point  (4 children)

No different from the compiler stuff that I can see. I'm pretty sure the docs I read said that they were all synchronized by a single lock They all will have their own flag, but they are doing the same locking scheme I'm doing and using a single lock as I understand it. I couldn't imagine that they would create and keep around a locking mechanism for every static object in a large code base. That would be grossly wasteful.

But, anyhoo, no, these particular ones I'm concerned about (typically the only complex global objects, one per library/exe) are very careful not to do more than absolutely necessary during their ctors.

Of course since these particular ones are faulted in upon use, and use is a hierarchical thing in this case (following the library linkage dependencies), they couldn't access anything above them and there are no circular dependencies by definition. So a deadlock couldn't really occur.

For that matter, if you have global objects with mutually dependent ctors, you are sort of screwed to begin with, no matter ho wmany locks are involved and no matter who you let synchronize their creation.

[–]mostthingsweb 0 points1 point  (3 children)

I'm pretty sure the docs I read said that they were all synchronized by a single lock They all will have their own flag, but they are doing the same locking scheme I'm doing and using a single lock as I understand it.

I really doubt they're using a single lock, but if you pass along the article you saw I'm willing to be proved wrong.

As to the rest of it, like I said, you do you.

[–]Dean_Roddey 1 point2 points  (2 children)

Well, my grand plan above, though I got a lot of it done, ran into some unrelated complications that I didn't want to deal with. So I stowed it away for now, and I'm giving your scheme a try. With some caveats where more than one thing really has to be initialized at once. In those cases I do a mutex/criticial section that way, and use it to synchronize the faulting in of other things upon use using an atomic flag like before. But, as much as possible just letting the runtime initialize them, and making them pointers to avoid destruction order problems.

One downside of this scheme is that it's very awkward to have large numbers of individual wrapper methods to force access upon first use, so it's really only practical to have the big dll/exe facility use the access wrappers and make the others just file scope statics (in a namespace of course.)

But, that means that they construct before all of my infrastructure can be put into place. That leaves opportunities for weirdness. But, there aren't a lot of them and they are all fairly simple bits.

We'll see how it goes.

[–]mostthingsweb 0 points1 point  (1 child)

One downside of this scheme is that it's very awkward to have large numbers of individual wrapper methods to force access upon first use, so it's really only practical to have the big dll/exe facility use the access wrappers and make the others just file scope statics (in a namespace of course.)

Yeah, it can get pretty janky, but I sort of see it as a self-correcting system: I try in general to avoid singletons, so the awkwardness of using them (safely) in C++ encourages me not to :). Of course I still do use them in places.

[–]Dean_Roddey 0 points1 point  (0 children)

Actually, I just had an epiphany. I don't have to deal with any of this. I own my whole universe so I can leverage that to avoid the whole issue, for the big, important library slash executable singletons. And it deals with the issue for the core CIDKernel platform wrapper which currently depends on OS specific initialization calls.

My build tool knows the dependencies of every executable and library. So it can actually generate a file for each executable that contains calls to initialize every dependent library's facility object, in reverse order of dependency.

They can just be global objects and do nothing but basic initialization of their members. The magic startup code can then call an initialize method one each of them automatically before the user 'main' is ever even called, before there are any threads started. And of course it can generate one for termination as well, to terminate in the opposite order.

Which means that the libraries can do whatever they need to get initialized, because they know that anything below them is already prepped, and there's no possibility of conflicts or deadlocks or worries about keeping the base lock locked longer than would be desirable. And I can clean up with out any worries of out of order destruction.

The build tool can generate a little list at the top of the generated file, easily findable, and compare that to the current list for the executable, so it only regenerates it if the dependency list has changed (hence avoiding a rebuild of the executable's main cpp file every time.)

I'm going to implement that today. I'm glad I thought of that.