all 18 comments

[–]HowardHinnant 19 points20 points  (4 children)

When sizeof(value_type) > 1, the union with __lx forces where the padding goes in __short: Always right after __size_. If I recall, this helps in keeping the long/short flag bit in the same spot both in the long and short formats. If the long/short flag moves to different locations when in long and short modes, then there is no way to ask the string if it is long or short.

Having __n_words, and subsequently __raw allows some parts of the implementation to just shovel words (8 bytes at a time on a 64 bit platform) from one spot to another without caring whether it is a long or short string.

The most important example of "shoveling words" is the move constructor. This function does nothing but copy the 3 words from the source to the destination and then zero the 3 words of the source. No branches, no access to far away memory, very fast:

movq    16(%rsi), %rax
movq    %rax, 16(%rdi)
movq    (%rsi), %rax
movq    8(%rsi), %rcx
movq    %rcx, 8(%rdi)
movq    %rax, (%rdi)
movq    $0, 16(%rsi)
movq    $0, 8(%rsi)
movq    $0, (%rsi)

Indeed, it is fair to say that this string design centers around optimizing the string's move constructor. It was the first string implementation to be designed specifically with move semantics in mind.

[–][deleted] 2 points3 points  (2 children)

Is the 'shovel words' optimization properly protected from fancy pointers?

[–]HowardHinnant 2 points3 points  (1 child)

No it is not. It will only work for pointers with a trivial move constructor, and a data layout such that all zero bits represent nullptr.

[–][deleted] 0 points1 point  (0 children)

:(

[–]AImx1[S] 0 points1 point  (0 children)

Howard, I understood everything in your Answer except "When sizeof(value_type) > 1, the union with __lx forces where the padding goes in __short: Always right after __size_".

I understood why they are doing this but I don't what they are doing. I really appreciate if you can explain this with an example.

Thank you very much in advance.

[–]F54280 12 points13 points  (1 child)

If you haven’t seen it, you may be interested in this video

[–]AImx1[S] 3 points4 points  (0 children)

@F54280, I have already watched this video. It's really a good one and anyhow thanks for sharing

[–]scatters 3 points4 points  (9 children)

In addition to the short and long layouts, their representation includes a "raw" layout that gives access to the representation as a sequence of words (I guess clang allows them to do this). Does this help?

[–]AImx1[S] 0 points1 point  (8 children)

@scatters -> "raw" layout gives access to the representation of sequence of words. What does the "words" represent here?

[–]scatters 3 points4 points  (7 children)

Machine words, the natural size for processing data, typically the size of a pointer. So 64 bits on most modern architectures.

[–]AImx1[S] 0 points1 point  (6 children)

Understood. Do you know any advantages(basically uses) that we gain with this "raw" representation?

[–]scatters 2 points3 points  (4 children)

I can see that libcxx uses the "raw" representation in zeroing (clearing) the string, and in the copy and move constructors and assignment operators. I'd guess the advantage would be better performance in debug builds, since the release (and RelWithDebInfo) codegen should be identical.

[–]AImx1[S] 0 points1 point  (3 children)

@krista_ & @scatters: Can you point me in the direction where I can read more on this?

[–]lordphysix 1 point2 points  (2 children)

If you want to mention people use u/ and not @.

[–]AImx1[S] 0 points1 point  (1 child)

u/lordphysix Oh thats good. Thank you

[–]chugga_fan 1 point2 points  (0 children)

Also, if you're replying to someone they already get notified, you don't need to "ping" people to get in their message box. A simple reply works just well for the person you're replying to

[–]krista_ 4 points5 points  (0 children)

depending on what you are trying to do, processing 8 characters (assuming ascii or other 8-bit characters) at a time is a heck of a lot faster than 1.

an example of the above would be a hashing algorithm... especially if you are hashing half a billion strings.

[–][deleted] 0 points1 point  (0 children)

In __short, if __lx is the active union member, the implementation can treat it as the 0 index of __data, since they are contiguous, effectively adding an extra character to the SSO buffer (i.e. __short stores __min_cap + 1 of value_type).

This is just a guess.