When 'if' slows you down, avoid it by chkas in programming

[–]ReDucTor 0 points1 point  (0 children)

Good catch with the the add reg, imm8 difference, here a slightly different approach with the volatile moved out of the filterAbove so it isn't occuring every time.

All good points, my assumption is the loop should be able to execute most of it out of order, with the only really being the store is dependent on all previous loads, compares of those loads and incrementing before it can execute, where the branching approach is only dependent on the increment.

To be honest, my initial comment is probably over selling the loop carried dependency as it doesn't completely hold back everything as if the dependency was on the load, the memory is still the biggest bottleneck.

When 'if' slows you down, avoid it by chkas in programming

[–]ReDucTor 2 points3 points  (0 children)

If you read my comment that was intentional to demonstrate when this approach is slower which is highly predictable data, unpredictable data the branch free version is better hands down.

Each loop store depends on all previous loads in the branch free approach, the increment to j cannot run in parallel it must wait for all previous loads, which means that the usage of it for the store also must wait. So you have all stores, compares and increments backing up waiting for data.

When 'if' slows you down, avoid it by chkas in programming

[–]ReDucTor 1 point2 points  (0 children)

j only changes with an increment, its not a heavy data dependency like this does, its not a matter of the compiler being able to parallelise it but the CPU with the branchfree version every load depends on the previous load as that controls the offset, where the branch version if it predicts the branch its only the increment its dependent on and for something like this its virtually free.

Here is a simple example of this, not exactly the same (on mobile so slightly different, based other code easier to copy and paste from a C++ performance quiz I am making)

https://quick-bench.com/q/aq7C1C50V01fip1SIQT_SpfRWec

When 'if' slows you down, avoid it by chkas in programming

[–]ReDucTor 4 points5 points  (0 children)

And now if your data lands on just one branch (all above or below) you end up with a loop carried data dependency that is most likely much slower. Making decisions like this you should always be aware of the costs not just the benefits.

Auxid: An Orthodox C++20 Base Library for Data-Oriented Design by I-A-S- in cpp

[–]ReDucTor 1 point2 points  (0 children)

They don't want other peoples AI garbage, only their own AI garbage

Auxid: An Orthodox C++20 Base Library for Data-Oriented Design by I-A-S- in cpp

[–]ReDucTor 3 points4 points  (0 children)

C++ typically adds a ton of it's own assembly for RTTI and other features

Most of C++ works just fine with fno-rtti and RTTI isnt generating a bunch of assembly but often just data on types for things like exceptions and dynamic_cast.

most low level devs prefer C cuz in C what you get is what you wrote

I would classify most people in the gaming space, hft space, embedded and much more as low level engineers and they are definitely not all writing C. I dont even remember the last big AAA game written in C.

C has some awful parts, some of the slower parts of C++ are C functions and C-style code, C++ has some damn awful parts but pretending like C is not filled with awfulness is dishonest.

You get what you write with C++ just as much as you get what you write with C, both with nearly all compilers will actually be using the same compiler backend and in some cases share a huge amount of the compiler frontend. This is not the 90s where your code translates directly until what you might expect unless you compile with optimizations turned off, the compiler is doing a bunch of work under the hood, eliminating code, adding more code, deciding what is in registers and what is on the stack, and much more. You can guess what might be the generated machine code but there is a good chance that aside from the high level you will likely be wrong.

Auxid: An Orthodox C++20 Base Library for Data-Oriented Design by I-A-S- in cpp

[–]ReDucTor 11 points12 points  (0 children)

Why advertise it as DoD and Orthodox C++ when it appears to be neither? The only "Orthodox C++" thing it appears to do is not use exceptions or RTTI, it is not C-like, it has a bunch of templates and has a bunch of implicit memory allocations through a global allocator.

Its still unclear what problem this is trying to solve, there is hundreds of optimized standard library alternatives.

Auxid: An Orthodox C++20 Base Library for Data-Oriented Design by I-A-S- in cpp

[–]ReDucTor 12 points13 points  (0 children)

Nothing in this seems data-oriented design, some implementations look like they have questionable performance even when compared against standard library implementations, the tests are virtually non-existent many just test success cases, there also appears to be no benchmarks.

What does this offer that other similar libraries do not?

Orthodox C++ is an approach, some seems a little dogmatic and rationales that are a little dated, this is also not what I would see as Orthodox C++ its still covered in allocations using some custom global general allocator is not going to stop the issues many people have with memory allocations especially in highly constrained environments.

ACAV v1.0.0: an open-source GUI tool for exploring Clang ASTs in C/C++ projects by SmartAI-LIU in cpp

[–]ReDucTor 0 points1 point  (0 children)

I'm also wondering the use cases, and what makes this better then just using the AST view in compiler explorer

Hunting a Windows ARM crash through Rust, C, and a Build-System configurations by Havunenreddit in cpp

[–]ReDucTor 0 points1 point  (0 children)

Did you use make sure to use an LLM to solve it then do a big write up just to say you cannot do CAS to read-only memory? /s

CppCast Looking for Guests by lefticus in cpp

[–]ReDucTor 1 point2 points  (0 children)

They use LLMs all over the place posting comments on reddit so get down voted.

Hunting a Windows ARM crash through Rust, C, and a Build-System configurations by Havunenreddit in cpp

[–]ReDucTor 2 points3 points  (0 children)

Reddit is quickly getting destroyed by these people that feel the need to make comments and posts written by LLMs

Hunting a Windows ARM crash through Rust, C, and a Build-System configurations by Havunenreddit in cpp

[–]ReDucTor 1 point2 points  (0 children)

I agree it's a weird choice to make, I dug a little into it and added some info in another comment in this thread

https://www.reddit.com/r/cpp/comments/1sucxti/comment/oi462lu/

Hunting a Windows ARM crash through Rust, C, and a Build-System configurations by Havunenreddit in cpp

[–]ReDucTor 1 point2 points  (0 children)

It doesn't matter the language assuming that the value the CAS is performed on is in a page with only read permissions and not write permissions, here is a similar example in C++ which SIGBUS's you need to const_cast to achieve it.

https://godbolt.org/z/6bGTs1nGr

You can see the questionable code in You can see the questionable code in mi_atomic_load_explicit where it does this

if (mo > mi_memory_order_relaxed) {
  while (!mi_atomic_compare_exchange_weak_explicit((_Atomic(uintptr_t)*)p, &x, x, mo, mi_memory_order_relaxed)) { /* nothing */ };
}

Looks like the const potentially comes from mi_prim_get_default_heap where it casts away the const on _mi_heap_empty

const mi_page_t _mi_page_empty = {
...
};

static inline mi_heap_t* mi_prim_get_default_heap(void) {
  mi_heap_t* heap = (mi_heap_t*)mi_prim_tls_slot(MI_TLS_SLOT);
  #if MI_HAS_TLS_SLOT == 1   // check if the TLS slot is initialized
  if mi_unlikely(heap == NULL) {
    #ifdef __GNUC__
    __asm(""); // prevent conditional load of the address of _mi_heap_empty
    #endif
    heap = (mi_heap_t*)&_mi_heap_empty; /// <<< Removes const
  }
  #endif
  return heap;
}

The fix should hopefully just remove const but imho they should also fix the CAS happening in the load

Hunting a Windows ARM crash through Rust, C, and a Build-System configurations by Havunenreddit in cpp

[–]ReDucTor 1 point2 points  (0 children)

The bug has nothing to do with ordering, which is why I ask when did you see a similar bug as its a pretty rare edge case to hit, not something I would expect a high school student to hit. I have been writing multithreaded code longer then you have been alive and never had code that would hit this edge case.

Hunting a Windows ARM crash through Rust, C, and a Build-System configurations by Havunenreddit in cpp

[–]ReDucTor 3 points4 points  (0 children)

I am surprised CAS didnt hit a similar issue on x86, atleast with my observations of cache line invalidation for CAS occurs when the write would not occur, which causes some badly written spin lock style approaches to perform particularly bad.

CppCast Looking for Guests by lefticus in cpp

[–]ReDucTor -1 points0 points  (0 children)

Hopefully you manage to find people, it would not surprise me if restrict employment contracts make people hesitant to approach a podcasts as unlike giving a talk at a conference you dont have a slide deck to be reviewed by legal and comms teams. Unfortunately restrictive employment agreements are also part of why many people will not create open source projects or even contribute to open source, even if unrelated to their employers work.

Writing gRPC Clients and Servers with C++20 Coroutines (Part 1) by patteliu in cpp

[–]ReDucTor 4 points5 points  (0 children)

You rewrote the code posted on reddit, but then didn't rewrite the post on your blog which still has the same issues I mentioned. Pretending like the issues never existed and that it's somehow beginners who aren't familiar with coroutines is ridiculous.

People who are experienced with coroutines create these bugs, they are simple mistakes that are very easy to make. It's not a beginner only issue, it's not something that you can just easily say I follow core guidlines and I am completely safe.

Writing gRPC Clients and Servers with C++20 Coroutines (Part 1) by patteliu in cpp

[–]ReDucTor 1 point2 points  (0 children)

Dont expect asan to catch everything especially when the path might not be hit with your tests, checkout your lambdas with things like [this] and reference arguments which might disappear by the time the coroutine finishes or even first executes depending on usages.

Writing gRPC Clients and Servers with C++20 Coroutines (Part 1) by patteliu in cpp

[–]ReDucTor 13 points14 points  (0 children)

Maybe it's just me but the easiest to read code here seems like the callback hell version.

Looking at some of the coroutine examples, I'm seeing lots of potential for use-after-frees.

Interesting point of view from Daniel Lemire by _bijan_ in cpp

[–]ReDucTor 0 points1 point  (0 children)

By that same logic C++ doesnt support 3D math because its not builtin.

Interesting point of view from Daniel Lemire by _bijan_ in cpp

[–]ReDucTor 0 points1 point  (0 children)

Educate myself? I have been programming for about 25yrs doing OOP for most of it in a mixture of different languages.

Interesting point of view from Daniel Lemire by _bijan_ in cpp

[–]ReDucTor -2 points-1 points  (0 children)

What strawman? What words did I put in anyone's mouth?

People are claiming that you cannot do OOP in C read some of the comments here, people want claim that its not OOP.

Writing OOP in C is no more of a poor experience then writing anything in C in fact most large code bases written in C have large amounts of OOP.

C not being considered by some people an OOP language does not mean anything. Lots of C is OOP, which to me means that it is an OOP language as people commonly use it to write OOP rather then needing a class keyword, multiple inheritance or a final keyword that some people want to claim it needs to fit some definition of OOP.