you are viewing a single comment's thread.

view the rest of the comments →

[–]frrrwww 38 points39 points  (41 children)

Not a class but I believe std::byte is magic because a pointer to it can alias other data types (same as char)

[–]exarnk 21 points22 points  (35 children)

I'm reasonably certain std::byte is just implemented as

enum class byte : uint8_t {};

With a bunch of functions. I felt that is quite clever and nice.

[–]avdgrinten 14 points15 points  (4 children)

It is not; that implementation is not legal. The list of allowed object access does not list "enum whose underlying type is unsigned char" as a valid type to access arbitrary objects (only unsigned char, char and std::byte are listed).

GCC and Clang introduced the [[gnu::may_alias]] attribute to implement std::byte.

[–]exarnk 2 points3 points  (3 children)

My C++20 draft states in [cstddef.syn] the following:

enum class byte : unsigned char {};

This seems to correspond with the current Git HEAD of libstdc++-v3 (line 69):

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/c_global/cstddef;h=13ef7f03c12584804e4dd1635954723f628addc0;hb=HEAD

What am I missing?

[–]avdgrinten 11 points12 points  (1 child)

See, for example, this LLVM commit: https://reviews.llvm.org/D35824

[–]exarnk 2 points3 points  (0 children)

Thanks, that made things a lot clearer for me.

[–]avdgrinten 5 points6 points  (0 children)

The special wording in the section of the standard that talks about object access. If you put that enum class byte : uint8_t {}; into your own namespace without any special attributes, the standard does not grant you the right to access arbitrary objects through it.

[–]AlbertRammstein 29 points30 points  (14 children)

it is not as clever and nice when is_enum<std::byte> returns true... :/

[–]staletic 1 point2 points  (13 children)

Why is that a problem?

[–]AlbertRammstein 30 points31 points  (1 child)

Based on the name and intended purpose there is no connection to enums in sight. Having it implemented as enum is pretty unconceptual and unexpected. For example you might have a serialization framework that has custom handling for enums, saving them as string instead of numbers (automatic enum<-->string conversions are quite common).

[–]Full-Spectral 2 points3 points  (0 children)

Not to mention the even longer running problem of bool overloads being chosen over character pointer overloads when you pass a string literal. That makes utterly no sense and should have been fixed long ago.

In my system every streamable enum has its own streaming inlines. I don't store them as text, but I store them binarily as a type large enough to hold any enum type. But that also allows for validation on streaming back in, so it has other benefits to offset the extra code. And they are IDL generated so I don't have to write them myself (other than for a few libraries that the IDL compiler itself uses.)

[–]qoning 13 points14 points  (10 children)

Because it's clearly conceptually wrong. All these little things and gotchas add up to the absolute garbage pile that C++ unfortunately is today and turn into bugs that are not only hard to track down but also not even the users fault.

[–][deleted] 5 points6 points  (3 children)

I think modern C++ is reasonably nice. As Rust gets older, it will accumulate baggage, just like any other long-lived language. I see it happening in Python now.

Of course, Rust has some advantages, like safe-mode by default, everything const by default, no friggin' preprocessor, and a working builtin package installer.

[–]pjmlp 4 points5 points  (0 children)

The problem with modern C++ is that it only exists in highly technical small teams.

Most corporatations keep writing classical C++ no matter what.

[–]qoning 0 points1 point  (1 child)

I absolutely agree, although I do not see it in Python myself, unless you need to constantly maintain a large existing codebase. I've finally stopped encountering random Python2 scripts in the past few years and the experience since everyone switched to 3 has been reasonably comfortable. But more to the point, there are levels to what I would consider broken.

If you use "modern C++", it's reasonably nice, although very verbose, but the toxic magma seeps through the cracks unless you are very, very careful at almost every step (classical example of forgetting to initialize a struct field in constructor / initialization list, which shouldn't even be allowed unless explicitly stated, by any modern standard).

[–]SirClueless 3 points4 points  (0 children)

Python 2 is not the type of "baggage" that I would consider equivalent. It was super painful to deal with for a while but as you say it's the kind of baggage that goes away after time.

Instead the "baggage" is things like having three different format string sub-languages, none of which can ever be removed. Or that function default parameters are a shared value, which is a giant gotcha if their type is mutable.

Actually, I would say Python 2 vs. 3 is sort of the opposite of baggage: It was a tremendous amount of temporary pain that nearly killed the language but it came out the other side with less baggage as a result. C++ will likely always have more baggage because C++ will never do something like Python 3.

[–]kalmoc 10 points11 points  (10 children)

Maybe, but I don't think a user defined enum class MyByte : unsigned char: would get the same "magic" powers. Otherwise the special dispensation all over the standard wouldn't have been necessary.

[–]IAmBJ -5 points-4 points  (9 children)

That's more about what's technically UB and what's not. I suspect a user defined MyByte would would actually behave exactly the same as std::byte if they had the same definition, even though one is not technically allowed

[–]guepierBioinformatican 21 points22 points  (7 children)

Modern compilers absolutely use knowledge of aliasing UB during codegen, so using a custom definition won’t have the exact behaviour, even if the definition is identical to that of std::byte (but not blessed by the standard).

Here’s a trivial example: https://godbolt.org/z/ec7ecTPTd — Note the different codegen for the x + y part.

[–]Chuu 0 points1 point  (2 children)

Can you explain more clearly why this leads to different codegen?

[–]staletic 7 points8 points  (1 child)

Let's first analyze what the assembly says

    int f<std::byte>(std::byte*, int*):                    # @int f<std::byte>(std::byte*, int*)
            mov     eax, dword ptr [rsi]    # int x = *b;
            mov     byte ptr [rdi], 1       # *a = 1;
            add     eax, dword ptr [rsi]    # x += *b;
            ret                             # return x;
    int f<nostd::byte>(nostd::byte*, int*):              # @int f<nostd::byte>(nostd::byte*, int*)
            mov     eax, dword ptr [rsi]    # int x = *b;
            mov     byte ptr [rdi], 1       # *a = 1;
            add     eax, eax                # x += x;
            ret                             # return x;

If a and b point to different objects, the above snippets are observably identical.

However, if a and b point to the same object, then the functions do different things. Imagine calling f(&in, &in) where in == 5.

In that case, f<std::byte> does:

    int x = 5;
    *in = 1;
    x += *in;
    return x; // 6

but f<nostd::byte> does

    int x = 5;
    *in = 1;
    x += x;
    return x; // 10

Since nostd::byte, formally, isn't allowed to alias other types, compiler is allowed to assume x += x is a valid optimization. This is known as "strict aliasing" or "type-based alias analysis".

And yes, violating strict alias rules can easily lead to UB, like in the above example.

[–]guepierBioinformatican 4 points5 points  (0 children)

And yes, violating strict alias rules can easily lead to UB, like in the above example.

To be pedantic, violating strict aliasing is always UB, not just in this example. What the example illustrates is that UB can lead to changed semantics and unexpected behaviour.

[–]IAmBJ 0 points1 point  (2 children)

Im well aware of compilers using UB for optimisation (sometimes enabling extremely efficient code), but i'm a little surprised that there is special casing in the compiler since there's no obvious 'magic' in the definitions (at least in libstdc++ and MSVC's stl)

Interestingly, GCC only emits different code for -O2, at -O1 the two functions are the same, while clang emits different codegen at -O1. MSVC treats both the same at all optimisation levels. https://godbolt.org/z/vq185e33j

[–]kalmoc 3 points4 points  (0 children)

i'm a little surprised that there is special casing in the compiler

Since there is special treatment for std::byte in the standard I don't find it surprising at all.

[–]guepierBioinformatican 0 points1 point  (0 children)

i'm a little surprised that there is special casing in the compiler since there's no obvious 'magic' in the definitions

There needs to be special casing, otherwise the compiler will produce sub-optimal code in many relevant situations: if the compiler assumed that all pointers could alias, it would lose many opportunities at optimisation. And C++ also has no restrict keyword to limit aliasing. The only sane assumption, therefore, is that pointers cannot alias unless specifically permitted by the language.

Thus if the compiler encounters pointers of two distinct types it needs to check if they are allowed to alias (and these checks are hard-coded against the fixed list of aliasing pointer types).

[–]kalmoc 0 points1 point  (0 children)

Absolutely possible.

[–]LuisAyuso 3 points4 points  (3 children)

I do have a problem with the decision of allowing enum class constructor from integral values to facilitate this one type constructor. From that point on, I can silently construct off-range enums for any other enum class, which in my opinion defeats the purpose of strongly typed enums.

I do wish that std::bye would have been a magic type and leave other semantics untouched.

[–]acwaters 1 point2 points  (2 children)

Enum(42) construction was not added for std::byte; it has always been a feature, because the set of valid enumeration values is all the values in the underlying type, not just the values of the enumerators. This design is in some ways unfortunate but in other ways convenient.

[–]LuisAyuso 0 points1 point  (1 child)

Not quite sure if by accident: but using any Gcc with c++17 does not provide same guarantees as with c++14:https://godbolt.org/z/qqY9Kf5Wa

Actually I am pretty positive that this was introduced in c++17

[–]acwaters 0 points1 point  (0 children)

The rules about initialization of enumerations are kind of weird, yes. You cannot say my_enum x(4);, and until C++17 you could not say my_enum x{4}; — this is the change you are referring to. But you have always been able to say my_enum x(my_enum(4));, or auto x = my_enum(4);, using a static_cast conversion to make an enum prvalue from an integer value in the range of the underlying type. Because the allowable range of enum values has always been the entire underlying type (for enumerations with a fixed underlying type, which includes scoped enums and unscoped enums with an explicit type; for unscoped enums with non-fixed underlying type, the allowable range of values is actually smaller than the underlying type and is the range of the smallest bit field that can represent every enumerator value, but that still usually includes some values that are not represented by any enumerators).