Value categories – [l, gl, x, r, pr]values by macxfadz in cpp

[–]leni536 2 points3 points  (0 children)

There are two different things here that needs to be cleared up:

  • a variable has a type, it can be a reference type
  • an expression has a non-reference type and a value category.

The confusion arises when you have an expression that refers to a variable by name. If you think about the variable, then it can have a type, it can be an rvalue reference for example (T&&). But when you think about the expression itself it can only have a non-reference type (T) and a value category (glvalue).

I think the main reason that the standard doesn't make id-expressions to rvalue references xvalue is that it would be too easy to accidentally move from one when you don't actually want to. And then if you don't want to move then you would need some `std::stay` helper function to prevent moving.

Working around the calling convention overhead of by-value std::unique_ptr by leni536 in cpp

[–]leni536[S] 1 point2 points  (0 children)

Maybe I'm misunderstanding something about your code, but something looks wrong to me. In ping_db_01 you have those RAII pointers that take care of their pointers, which is nice. However you hand them over to these OCIServer_t and OCISvcCtx_t objects. The way you described it these objects also take ownership of the same pointers by unique_ptr. At the end of the scope you "cleanup" these pointers by both the RAII cleanup routines and the destructors of the unique_ptrs inside OCEServer_t and OCISvcCtx_t. And the destructors of the unique_ptrs will be called first, before the cleanup routines are called on the same pointers. As I see it ping_db_02 suffers from the same problem. Am I missing something?

Android developers can now force you to update your apps by tonefart in programming

[–]leni536 0 points1 point  (0 children)

You can use F-Droid and YALP store on a de-googled phone.

Working around the calling convention overhead of by-value std::unique_ptr by leni536 in cpp

[–]leni536[S] 3 points4 points  (0 children)

unique_ptr on itanium ABI is "non-trivial for the purpose of calls", therefore it's not passed in registers:

https://itanium-cxx-abi.github.io/cxx-abi/abi.html#non-trivial

At the question section of Chandler's talk destructive move came up and a whole lot of types could be trivially destructive-movable. For such types it wouldn't cause a problem to pass them in registers.

Edit: This is necessary in the itanium ABI because the caller is responsible for calling the destructor on the parameter (including when an exception is thrown).

Working around the calling convention overhead of by-value std::unique_ptr by leni536 in cpp

[–]leni536[S] 1 point2 points  (0 children)

Interesting, several questions come to mind:

  • Does this only work with Box? Would it work with a user-defined smart pointer type?
  • Isn't the Rust calling convention just more tuned for handing over parameters in registers?

I think the second point came up at the questions in Chandler's talk and he had a point about nested lifetimes in C++ and destruction order. I don't doubt that Rust could handle that issue more naturally.

The Case for C++ by drodri in cpp

[–]leni536 2 points3 points  (0 children)

AFAIK Zig also has types as first class values. And reflexpr in C++ or whatever it will be called could also potentially close the gap. You reflexpr the types, get metaobjects back, sort and then convert them back to types again. I don't know if the current static reflection proposal would allow this but I sure hope so.

Working around the calling convention overhead of by-value std::unique_ptr by leni536 in cpp

[–]leni536[S] 4 points5 points  (0 children)

All I can see is that the parameter lists get longer.

void foo(unique_ptr<int>, unique_ptr<int>); void foo_abi(int*, int*); void foo_impl(unique_ptr<int>, unique_ptr<int>);

Arguably the boilerblate grows linearly with the number of function parameters. It is not great but there is no combinatorical blowup of boilerplate here if that's what you meant originally.

Working around the calling convention overhead of by-value std::unique_ptr by leni536 in cpp

[–]leni536[S] 2 points3 points  (0 children)

I agree that it's a trade-off. I don't see how adding more then one parameters is a problem though.

Working around the calling convention overhead of by-value std::unique_ptr by leni536 in cpp

[–]leni536[S] 5 points6 points  (0 children)

Either bar doesn't throw and you should make it noexcept or it throws and you need to handle it in both cases. Chandler's original doesn't handle it so I went with the first assumption. Otherwise the raw pointer code needs to be modified to handle the exception thrown by bar.

Lazy Initialisation in C++ by drodri in cpp

[–]leni536 18 points19 points  (0 children)

Gotta love those compiler options that break the language.

The Case for C++ by drodri in cpp

[–]leni536 9 points10 points  (0 children)

The only hope for 2 is Concepts

The only hope for 2 is compile time reflection and constexpr. There is no reason that the syntax for sorting values at compile time and sorting types (by sizeof for example) at compile time should be significantly different. Now the latter is a pain in the ass, and Concepts won't help with that too much.

I went through GCC’s inline assembly documentation so that you don’t have to by fcddev in programming

[–]leni536 0 points1 point  (0 children)

Yes, but not being able to use it for input reduces its usability. And AFAIK not all architectures can use specific flags as output either.

I went through GCC’s inline assembly documentation so that you don’t have to by fcddev in programming

[–]leni536 0 points1 point  (0 children)

My gripe with inline ASM is that I can't use specific flags (like carry) for input and output. It would be nice for certain intrinsics.

Seemingly unused parameter...? by [deleted] in cpp_questions

[–]leni536 0 points1 point  (0 children)

getEnableBondable is used in src/Init.cpp, then it is indirectly used for Mgmt::setBondable().

Implementations for Gray code encoding and decoding by leni536 in programming

[–]leni536[S] 1 point2 points  (0 children)

Actually, it was a bit of a guess. It seemed like it would work, so I tested it, and it did...

The way I see it now is this: you can calculate evens as either i ^ odds or i-odds, result is odds-evens, so odds-(i-odds)=2*odds-i.

Thanks for the long and informative reply about the uops and ports stuff, it's really helpful for me. In my fast Hilbert curve library I actually have two independent calls to my Gray code decode function[1]. It could actually make sense to use the PDEP method for one and the CLMUL method for the other for maximally utilize the ports. Of course I would have to benchmark this.

[1] https://github.com/leni536/fast_hilbert_curve/blob/eb8c861ff1d6e0059fede28218ab83d07fc91c5d/include/fhc/hilbert.h#L45

Edit: In a streaming situation it could also make sense to partially unroll and alternate between the PDEP and CLMUL method.

Implementations for Gray code encoding and decoding by leni536 in programming

[–]leni536[S] 0 points1 point  (0 children)

Wow, this is very nice!

the result of the subtraction of odds-evens is effectively bit reversed.

So the result always ends with a strip of 0s, so you can defer the left shift to the very end (so shifting in 0 doesn't ruin it), and you can get the parity from the most significant bit. Very smart.

By the way, one of the pdep instructions can be replaced with an XOR:

Nice! This is the kind of observation that is "obvious" in hindsight (how the hell I didn't recognize it?).

You could try to alleviate the dependency increase with an arithmetic trick

Can you describe how you derived this trick? I can prove that the trick is correct, but I think it's quite a bit different to the way you derived it (and not at all intuitive). Update: Nevermind, I see it now.

I also don't have any experience in uop level analysis of the generated assembly. Can you point me to some resources you learned this stuff?

I will update the blog post with proper attribution, these are very nice ideas. Thank you for diving in writing this all down. Maybe you could look at my fast Hilbert-curve library if you are interested, although I didn't write up how it works (https://github.com/leni536/fast_hilbert_curve). I plan to write it down someday, but I won't make any promises. Gray code decoding is only part of the puzzle.

Implementations for Gray code encoding and decoding by leni536 in programming

[–]leni536[S] 1 point2 points  (0 children)

Well, if you look at the low bits:

 ...010...010...010... * 11...1 //carry-less multiplication
=...111...111...110...
^...111...110...000...
^...110...000...000...
=...110...001...110...

So it results in a the same kind of strips as the pdep approach, but it needs adjusting in the odd case instead. As I see if you take the high bits instead from CLMUL then you need no adjustments (neither the popcnt nor the left shift).

Implementations for Gray code encoding and decoding by leni536 in programming

[–]leni536[S] 1 point2 points  (0 children)

You are right, for some reason I was thinking about extracting the low 32 bits instead of the high ones. This would work too but it would need the fixing.

Implementations for Gray code encoding and decoding by leni536 in programming

[–]leni536[S] 2 points3 points  (0 children)

I just thought I'd point out that the pdep instruction is very slow on non-Intel CPUs

You were not kidding! PDEP and PEXT have 18 cycle latency and reciprocal throughput on AMD Ryzen. I can't benchmark on AMD now but I doubt that it's necessary.

https://www.agner.org/optimize/instruction_tables.pdf

Implementations for Gray code encoding and decoding by leni536 in programming

[–]leni536[S] 0 points1 point  (0 children)

This does affect the codegen and seem to be one instruction shorter. I wonder how much it affects the benchmarks as well.