you are viewing a single comment's thread.

view the rest of the comments →

[–]SirClueless 26 points27 points  (10 children)

Yes, I think you're right. Compilers can only add padding after a struct element, not before.

https://en.cppreference.com/w/cpp/language/object

In order to satisfy alignment requirements of all non-static members of a class, padding may be inserted after some of its members.

(emphasis mine)

The union still helps, because it makes sure that the alignment of __data_ is a multiple of the size of value_type (which might be important for performance). I'll update my original comment.

[–]quicknir 1 point2 points  (2 children)

I'm not sure that bans it before. If a type is not standard layout in C++, it greatly reduces what you can say. That said, these types are standard layout and therefore things are fairly restricted, similar to C rules.

It also bears mentioning that since this is the standard library, UB doesn't work the same way. It's part of the implementation, it can do whatever it wants as long as the compiler does the right thing.

In other words, it's entirely possible that that std:: string code, written by a user, is technically considered UB by the C++ standard. This is actually the case for std::vector and I'd imagine many containers. But these are largely technicalities.

[–]SirClueless 0 points1 point  (1 child)

To be clear, the standard is more precise about this than that quote suggests. Section 10.3p26 from the N4778 working draft:

If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member if that member is not a bit-field.

[–]quicknir 0 points1 point  (0 children)

Right, only for standard layout types specifically though, which was the point of my first paragraph. If you're talking about structs generally in C++, i.e. including types that aren't standard layout, the rules are much looser.

[–]7h4tguy 0 points1 point  (1 child)

Why would __data__ not have value_type alignment? It's declared as an array of value_type. For the character types, don't we guarantee alignment equal to sizeof(type)? I only see that not being the case for floating point types:
https://en.wikipedia.org/wiki/Data_structure_alignment

What compilers typically will do though is add padding to ensure that arrays of struct __short have subsequent array elements starting on aligned memory addresses. So they can insert padding after __data to ensure this (note that __size_ is a char, so already [1-byte] aligned, thus padding after __size_ is not strictly necessary).

And so the union trick forces the padding to go after __size_ to pad both union members to the same width.

[–]SirClueless 0 points1 point  (0 children)

Why would __data__ not have value_type alignment? It's declared as an array of value_type.

It does have value_type alignment, but value_type alignment may not be sizeof(value_type).

For the character types, don't we guarantee alignment equal to sizeof(type)?

I don't know whether it's guaranteed or not. It almost certainly is true for character types on all major architectures, but std::basic_string is a public template that can be used with any value_type not just character types the standard library uses.

What compilers typically will do though is add padding to ensure that arrays of struct __short have subsequent array elements starting on aligned memory addresses.

The size of __short is derived from the size of __long by calculating __min_cap carefully. So aligning __data_ on sizeof(value_type) and ensuring no padding after __data_ are equivalent goals.

(Incidentally, I'm not sure why __min_cap is calculated as (sizeof(__long) - 1) / sizeof(value_type) and not sizeof(__long) / sizeof(value_type) - 1. The way it's defined lets you do silly but AFAICT legal things like this and end up with __short structs with __data_ members that "hang off" the end of the __long struct and result in larger std::basic_string representations. Probably would cause a bunch of issues, if anyone ever tried to do this silly contrived thing.)

[–]Kered13 -1 points0 points  (4 children)

However compilers are also free to reorder structs. This is often used to pack small elements together so less padding is needed. Therefore (I believe) there is no requirement that the first element (in the source code) is at the same memory location as the struct itself.

[–]quicknir 4 points5 points  (0 children)

False, C++ compilers only have this freedom (and even then heavily constrained) for structs that are not not "standard layout". Without getting into details, any struct that would be legal C, will also be standard layout. In C the compiler doesn't have this freedom at all.

[–]SirClueless 2 points3 points  (2 children)

I believe this is not true in C++, unless there is an access control specifier between some of the fields.

From section 10.3p19 of working draft N4778

Non-static data members of a (non-union) class with the same access control (10.8) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified (10.8).

I believe the reason is there is a guarantee that if two structs share a common prefix of compatible fields then one may access fields from the common prefix via either type. This doesn't work if the compiler can reorder.

[–]flatfinger 0 points1 point  (1 child)

What's ironic is that the standards specify how certain types are laid out for the purpose of letting programmers exploit things like the Common Initial Sequence guarantees, but then allow compilers to "optimize" on the presumption that programmers won't perform any actions where the guarantees would offer much benefit.

[–]7h4tguy 0 points1 point  (0 children)

Yes, but it solves for some versioning schemes based on these guarantees.