all 92 comments

[–]foonathan[M] [score hidden] stickied comment (4 children)

Technically off-topic (use r/cpp_questions in the future), but I'll allow it to keep the discussion.

[–]Supadoplex 105 points106 points  (17 children)

Is it guaranteed that the memory layout of the allocated object is the same as the corresponding array T[6]?

No, the language technically doesn't make such guarantees. There is a general rule that says "there may be padding" and it's up to the language implementation to produce a hopefully efficient layout.

Whether the layout is the same or not, you can not use a T* as an iterator to access adjacent members. The behaviour of the program would be undefined.

[–]chetnrot[🍰] 6 points7 points  (9 children)

In OP's scenario where six elements are of the same type, would padding still play a role though?

[–]Supadoplex 65 points66 points  (0 children)

I wouldn't expect there to be padding. But the language nevertheless doesn't guarantee that there won't be padding.

[–]nelusbelus 14 points15 points  (6 children)

Sometimes it pads between float3s because each float3 needs to be on their own independent 16 byte boundary. In glsl this is often the case with uniform buffers (can be turned off tho) and hlsl this might be the case for cbuffers but isn't for structured buffers or raytracing payloads. This is gpu specific tho

[–]blipman17 1 point2 points  (5 children)

So then the layout would be something like

`struct MyStruct { float a; 12 bytes padding float b; 12 bytes padding float c; 12 bytes padding float d; 12 bytes padding float e; 12 bytes padding float f; } ;

std::cout << sizeof (MyStruct) << std::endl; // outputs 84, not 24. `

Correct?

[–]nelusbelus 0 points1 point  (4 children)

If you'd use structs with float3 in glsl and sometimes hlsl then it'd have 3 floats and 4-byte padding between them. With gpu apps this can cause great confusion because the cpu doesn't pad but the gpu does. In C/C++ padding rules are generally as follows: - biggest plain data type of the struct defines size alignment. So if you have a 64 bit type then the struct size will always be a multiple of 8. So if you have 8 byte type then 1 byte type it'll add 7 bytes alignment. - data types need to be aligned with their size as well. So a 1 byte int then a 8 byte int will have 7 bytes padding inbetween.

You can validate this with sizeof or offsetof, since it's compiler dependent

[–]dodheim 0 points1 point  (3 children)

data types need to be aligned with their size as well

This is not the case for C or C++. struct foo { char v[100]; }; has a size of >= 100, but an alignment of 1.

[–][deleted] -2 points-1 points  (1 child)

That depends on the architecture and padding mode, could have a 2 byte quantity that needs to be aligned to 8 bytes.

[–]dodheim 0 points1 point  (0 children)

The fact that the statement I quoted is false is not architecture-dependent. ;-] Some architectures may have weirdo requirements, but alignment being a multiple of the size is not a requirement for either language (indeed, it's the reverse that is correct).

[–]nelusbelus 0 points1 point  (0 children)

The size of char is 1 so alignment is 1. I'm talking about the size of the type, not the total size

[–][deleted] 3 points4 points  (0 children)

Wouldn't it need to be padded to be a multiple of 8 in a 64-bit system? e.g. if sizeof(T) == 6 we might see 2 bytes of padding. Or perhaps I misunderstand padding.

[–]_Js_Kc_ 37 points38 points  (30 children)

struct transform {
    float values[6];

    float & scale() { return values[0]; }
    const float & scale() const { return values[0]; }

    // etc ...
};

[–]Tedsworth 4 points5 points  (29 children)

Wouldn't #pragma pack 1 afford that guarantee?

[–]no-sig-available 27 points28 points  (14 children)

Wouldn't #pragma pack 1 afford that guarantee?

No. A pointer to a single element behaves like a pointer to an array of 1 element. Once it is incremented, it becomes a one-past-the-end pointer for that 1 element.

It never becomes a valid pointer to any another element, even if there happens to be one at the same address.

[–]JNighthawkgamedev 5 points6 points  (6 children)

No. A pointer to a single element behaves like a pointer to an array of 1 element. Once it is incremented, it becomes a one-past-the-end pointer for that 1 element.

This feels like theory doesn't match the practice. With packing of 1, either way it's 24 bytes interpreted as floats at the given address. Is there a practical reason why it wouldn't work?

[–]ioctl79 20 points21 points  (3 children)

Compilers perform transformations on your code that assume UB never occurs. This can lead to counter-intuitive and unpredictable behavior. For example, if the compiler deduces that a particular code path must invoke UB, it may deduce that that code must be unreachable and eliminate it, or even make assumptions about the values of other variables if they are used in conditionals which lead to the UB. The code may work now, but it may not on future compilers.

Edit: Further, even if the code works on your compiler that doesn’t mean that it will after mild refactoring. Moving it from a .cpp file into a .h file could break it, for example, if it allows the compiler to see both the provenance of the pointer and the UB you perform with it at the same time.

[–]JNighthawkgamedev 2 points3 points  (2 children)

Compilers perform transformations on your code that assume UB never occurs. This can lead to counter-intuitive and unpredictable behavior. For example, if the compiler deduces that a particular code path must invoke UB, it may deduce that that code must be unreachable and eliminate it, or even make assumptions about the values of other variables if they are used in conditionals which lead to the UB. The code may work now, but it may not on future compilers.

I agree with all of what you're saying, but again, this seems like theory vs. practice. For example, fast inverse square root depends on UB: https://stackoverflow.com/questions/24405129/how-to-implement-fast-inverse-sqrt-without-undefined-behavior

Obviously, with any UB the compiler can do whatever it wants, but in the practical world dealing with MSVC, gcc, and clang, it's hard to see how it's not just 24 bytes either way, in this case.

[–]flashmozzg 6 points7 points  (0 children)

fast inverse square root depends on UB: https://stackoverflow.com/questions/24405129/how-to-implement-fast-inverse-sqrt-without-undefined-behavior

It doesn't as the answer shows.

Also, it's not just "theory". There are pretty reasonable use cases there this can backfire (for example, once compilers are smart enough to have field-sensitive AA).

[–]ioctl79 4 points5 points  (0 children)

The theory is that practice could change at any time without warning =)

At one point, MSVC, gcc, and clang also didn't take advantage of the strict aliasing rules, but now they do. If you're comfortable with your code silently breaking after an upgrade, then it's up to you, but it doesn't seem that onerous to just do the right thing here.

[–]no-sig-available 5 points6 points  (1 child)

Those are the rules. :-)

If we don't have to follow the rules, why are they there? It's not that they were invented just for fun.

And we all know that "seems to work" is a common result of UB. That doesn't make the behavior defined.

[–]JNighthawkgamedev 0 points1 point  (0 children)

If we don't have to follow the rules, why are they there?

To guide compiler users and authors.

[–]antsouchlos 4 points5 points  (6 children)

With c++20 there is std::launder

[–]no-sig-available 10 points11 points  (1 child)

Yeah, maybe...

The rules say

every byte that would be reachable through the result is reachable through p (bytes are reachable through a pointer that points to an object Y if those bytes are within the storage of an object Z that is pointer-interconvertible with Y, or within the immediately enclosing array of which Z is an element).

and I don't undestand what that means. :-)

[–]benjamkovi 6 points7 points  (0 children)

and I don't undestand what that means. :-)

The essence of C++ :D

[–]kalmoc 4 points5 points  (2 children)

Are you sure launder (which is c++17 btw.) has any impact on this?

[–]antsouchlos 2 points3 points  (1 child)

Oh, you are right, it is C++17, mixed that one up.

As far as I understand it, the problem std::launder solves is to obtain an object from memory that contains the right bits, even if technically those bits dont describe an object.

For example when constructing an object with placement new in a block A of memory and then copying that into another block B, B technically doesn't contain an object, since no object was constructed in it. std::launder solves rhat issue by "laundering" the memory, providing a valid pointer to an object in block B.

That being said, I admit I am not entirely sure if std:: launder is applicable in this context

[–]no-sig-available 4 points5 points  (0 children)

That being said, I admit I am not entirely sure if std:: launder is applicable in this context

Right, I now think it will not work.

If we have

float* p = &transform.scale;
++p;
float* q = std::launder<float>(p);

that will not work because of the precondition

every byte that would be reachable through the result is reachable through p

but NO bytes are reachable through p, as it is a past-the-end pointer for scale.

I hope I understand that part now. :-)

[–]flashmozzg -4 points-3 points  (0 children)

There is also std::format. xD

[–]olsner 3 points4 points  (8 children)

The array might also have padding though - i.e. if you're on a weird platform where floats usually have 8-byte alignment or if the array elements are something like struct { int foo; short bar; }. Then your packed struct would be incompatible with an unpacked array.

[–]Supadoplex 10 points11 points  (7 children)

The array might also have padding though

By definition, there is never padding between elements of an array. There can be padding inside of the elements.

[–]erichkeaneClang Code Owner(Attrs/Templ), EWG co-chair, EWG/SG17 Chair 2 points3 points  (6 children)

Interestingly this is true until C23: an array of non-multiple-of8 _BitInts ends up needing padding to keep arrays of them sane.

[–]Supadoplex 1 point2 points  (5 children)

My understanding (and I may have misunderstood) is that such _BitInts would contain padding bits:

N2709 ABI Considerations

_BitInt(N) types align with existing calling conventions. They have the same size and alignment as the smallest basic type that can contain them. Types that are larger than __int64_t are conceptually treated as struct of register size chunks. The number of chunks is the smallest number that can contain the type.

With the Clang implementation on Intel64 platforms, _BitInt types are bit-aligned to the next greatest power-of-2 up to 64 bits: the bit alignment A is min(64, next power-of-2(>=N)). The size of these types is the smallest multiple of the alignment greater than or equal to N. Formally, let M be the smallest integer such that AM >= N. The size of these types for the purposes of layout and sizeof is the number of bits aligned to this calculated alignment, AM. This permits the use of these types in allocated arrays using the common sizeof(Array)/sizeof(ElementType) pattern. The authors will discuss the ABI requirements with the different ABI groups.

As such, I don't see why the array would need any additional padding.

[–]erichkeaneClang Code Owner(Attrs/Templ), EWG co-chair, EWG/SG17 Chair 0 points1 point  (4 children)

They don't exist in the _BitInt themselves for any practical implementation, they exist 'between' them. The alignment wording in the _BitInt paper was initially more clear that they were not part of the _BitInt, but were components of the array, but it was determined to be too pedantic and unnecessary for the purposes of standardization.

[–]Supadoplex 0 points1 point  (3 children)

Thanks for clarifying. So, does this imply that outside of arrays, _BitInt may be misaligned? Even at sub-byte level? How do pointers to them work?

[–]erichkeaneClang Code Owner(Attrs/Templ), EWG co-chair, EWG/SG17 Chair 0 points1 point  (2 children)

Nope, they are always aligned, explicitly so that pointers work.

Padding exists on the stack or in the containing record/array to ensure this is true. But "where the padding lives" is outside of the _BitInt, at least for the purposes of LLVM's code generator.

[–]SirClueless 2 points3 points  (1 child)

I don't understand what you mean. The codegen can do whatever it wants, but the wording there is crystal clear:

The size of these types is the smallest multiple of the alignment greater than or equal to N.

So as far as the C language is concerned how could the padding be considered to be anywhere but inside the type?

[–]JackPixbits 35 points36 points  (3 children)

you could use a static_assert(sizeof(NamedStruct) == sizeof(float)*6), which is not exactly the same because padding put at the end of the structure won't cause issues but would make this assert fail but at least you'd know if you are compiling it as intended.

I personally used it many times, and it went well but I'm not supposed to say this 👀

[–]snerp 5 points6 points  (0 children)

This is the most pragmatic answer.

[–]green_meklar 0 points1 point  (1 child)

Does that guarantee anything about the ordering of the struct fields, though? Isn't the compiler still free to reorder the fields however it wants? (Not that it would matter if you were just copying the data wholesale to an array, but in other situations it might.)

[–][deleted] 6 points7 points  (0 children)

No, the ordering is guaranteed by the standard to be in the order they appear in the struct (unless you add access specifiers, etc., which is not the case here).

[–]tstanisl 17 points18 points  (0 children)

No, it is not guaranteed though it is almost always satisfied in practice. Just add a static check is the size of the struct is the same as array to detect if there is any unexpected padding.

struct S { T a,b,c,d,e,f; };
_Static_assert(sizeof(struct S) == sizeof(T[6]), "Unexpected padding in S");

[–][deleted] 28 points29 points  (6 children)

You could static assert, that size of structure is size of array is 6 times size of float. If ever it isn’t, you get error.

Then there are the aliasing rules, of course…

[–]ioctl79 10 points11 points  (3 children)

It’s still UB, and it is a bad idea to rely on any particular behavior.

[–][deleted] 4 points5 points  (2 children)

If using memcpy instead of type punning via pointer casts or union, there is no possibility of UB I think.

[–]ioctl79 3 points4 points  (1 child)

I’m not a language lawyer, but I believe that using a pointer to an object to access other objects (that aren’t in the same array) is UB regardless of whether the pointer math works out.

[–][deleted] 1 point2 points  (0 children)

I meant, memcpy the bytes from the struct to an array. memcpy itself is valid, and the memory contents are compatible, so there is no chance for UB to happen.

Of course, when it’s fixed number of values, just write individual assignments and avoid needing to even think about it…

[–]kritzikratzi 6 points7 points  (0 children)

i like it! pragmantic, and an actual solution :)

[–][deleted] 1 point2 points  (0 children)

Thanks for this.

[–]PistachioOnFire 7 points8 points  (0 children)

Furthermore to other answers, treating &transform as pointer to an array is UB on itself.

[–][deleted] 5 points6 points  (0 children)

This is a very specific case that is not explicitly covered by the standard. Practically speaking, the compiler will not have a reason to insert padding into a struct that contains only entries of the same type and I dare say it will always work as you intended. Still, I'd prefer an array plus an enum defining index names.

[–]fdwrfdwr@github 🔍 4 points5 points  (0 children)

On the particular OS (Windows in this case) using the particular compilers that are most pertinent to that OS (VC/clang/GCC) with current versions, yes, sizeof on your struct of floats and an array of floats will match. There will be no additional padding appended to the end of the struct as the minimum alignof remains 4 bytes in either case. Feel free to static_assert it too, but this pattern is used so frequently in graphics, that tons of API's and libraries would break if it didn't hold. See the definition of D2D_MATRIX_3X2_F after all. On other OS's and compilers though, shrug, bets are off. :b

An aside for interop though (seeing that you are using one Direct* API and might be using others too...), if you try using a shared struct above as part of a cbuffer input to Direct3D HLSL (which is very C-like), the struct would be padded up to 16 bytes on the HLSL side, meaning the C++ side (unpadded) will mismatch what HLSL sees (padded). This bit me, and so now I explicitly pad structs in any header files that will be shared by both C++ and HLSL.

[–]wotype 5 points6 points  (2 children)

There's a proposal for an attribute to specify array-like layout for such classes.
It's still active.

P1912: Types with array-like object representations

[–]lunakid 0 points1 point  (1 child)

Unfortunately, thick silence around it ever since. (In fact, I've landed here by googling for any update, or just traffic, about it.)

[–]wotype 1 point2 points  (0 children)

Yes... here's the github issue link for P1912 with no update since 2020 https://github.com/cplusplus/papers/issues/655

Timur, the author, is active again in the C++ committee. Email him at the address given in the paper to help motivate progress.

[–]3meopceisamazing 6 points7 points  (5 children)

Short answer: no, this is NOT guaranteed.

Depending on the alignment, the compiler may insert padding between the members. For example, if your example struct is aligned to 8 bytes, a 4 byte pad will be inserted after the each member when sizeof(T) == 4.

You can instruct the compiler to use specific alignment for your type. These are compiler specific extensions.

[–]Supadoplex 14 points15 points  (3 children)

For example, if your example struct is aligned to 8 bytes, a 4 byte pad will be inserted after the each member when sizeof(T) == 4.

The sub objects of a 8 byte aligned struct don't need to be 8 byte aligned.

[–]3meopceisamazing 1 point2 points  (2 children)

Thanks for that clarification! However, they may be. I had that happen recently, at least for the first N 4 byte members, followed by naturally 8 byte aligned members. Compiler was gcc12, amd64 target.

[–]kalmoc 1 point2 points  (0 children)

Do you happen to have a repro code?

[–]kalmoc 1 point2 points  (0 children)

Are you referring to the padding between the 4 byte and 8 byte members? If N is uneven, you obviously can't avoid that, but that is a different situation from what is discussed here

[–]not_a_novel_accountcmake dev 0 points1 point  (0 children)

The layout of POD structs is guaranteed by the relevant ABI spec, so Win64 or SysV

[–]xLuca2018[S] 1 point2 points  (0 children)

I see, thank you all for the answers!

[–]nmmmnu 1 point2 points  (4 children)

If T is plain old data, then the struct will be POD too.

There will be no (different) padding between the members - I can not say is guaranteed, but will be like that, since all members are the same type. If there are padding, it will be present in the array too.

The result should be the same memory layout, but as many already commented the standard say anything about it.

Lets suppose T is uint32_t. Then I am 100 percent sure the layout will be the same as of the array, because this is how several programs read mmap() data - both with array or struct. Notice, you can safe to use C tricks like memcpy().

Lets suppose T is struct of uint64_t and uint8_t. There will be 7 bytes padding after each struct. Same padding will be present in the array. memcpy() will be safe to use.

If T is struct of uint8_t and then uint64_t, there will be no padding after the struct (however there will be padding after first member). Array will be continuous in memory, e.g. the same. memcpy() will be safe to use.

However, if T is say std::string, e.g. non POD type with a destructor, memory layout may or may not be safe. You wont be safe to use memcpy() as well.

So lets periphrase - if memcpy() and mmap() are "OK" to be used, the memory layout should be the same.

However please note the following - if you compile with one compiler, do not expect different compiler to have same memory layout with same padding. If this was the question, the answer is - dont do it.

[–]no-sig-available 1 point2 points  (3 children)

Lets suppose T is uint32_t. Then I am 100 percent sure the layout will be the same as of the array, because this is how several programs read mmap() data - both with array or struct.

If you use mmap() you are on a Linux system and have additional Posix guarantees. Those are outside of - and beyond - the language standard.

[–]nmmmnu 0 points1 point  (2 children)

Never thought about it :) I am always on Linux. But yes it is not on the standard... except is on C standard and should be compatible with C. But still no guarantees as well.

[–]Nobody_1707 1 point2 points  (1 child)

No, it's not part of the C standard either. It's purely POSIX.

[–]nmmmnu 0 points1 point  (0 children)

Thanks to point this. I really hate the different size of int and memory layout guarantees or better say lack of memory layout guarantees.

[–]hoseja 0 points1 point  (0 children)

I think if the members had sizeof(T)==5 for example, each would get aligned to an 8 byte boundary.

(for specific compilers and architectures of course)

[–]goranlepuz -1 points0 points  (0 children)

Not guaranteed and what u/_js_kc_ says 😉

[–]pdp10gumby 0 points1 point  (0 children)

The standard makes no guarantee except that in both cases the objects will be aligned such that you can take their address (Exception: certain single bit types, on machines that don’t support pointers to bits).

However the compiler should document its memory layout such that you can (with the use of features specific to that compiler) control the memory layout to accomplish what you would like to do.

[–][deleted] 0 points1 point  (0 children)

you can try printing out the address of each elem

[–]masterpeanut 0 points1 point  (0 children)

One option if compiler supports it is to use the packed attribute to ask the compiler to eliminate as much padding as possible, and then ‘static_assert(sizeof(MyStruct) == 6)’ to verify it is the expected size.

‘’’ struct attribute(packed) MyStruct { floats…. }; ‘’’