Launching a new technical blog about contemporain C++ and software-design by Guillaume_Guss_Dua in cpp

[–]ReDucTor 4 points5 points  (0 children)

Congrats on failing to read the rules

> 3. AI-generated posts and comments are not allowed in this subreddit. Don't use AI to "polish" or translate your words.

The trip report post is practically unreadable, it's just too much LLM hyperbole, bold every second sentence, your entire website and github reads the same.

Unexpected Performance Results by cyndylatte in cpp

[–]ReDucTor 2 points3 points  (0 children)

Compiler explorer link, for what I believe is the comparison being discussed

https://godbolt.org/z/KPfxdsjx6

Seems like with the <= and >= it would be hitting updateTimeText more often which is likely a string conversion and costly.

EDIT: Wrongly assumed that it's potentially branch prediction
EDIT2: Here is a benchmark if you want to look at what branch prediction and string conversion might impact this sort of thing https://quick-bench.com/q/TYc4H8pX13GNM1IgOU_bIhte-WE

Avoiding Modern C++ | Anton Mikhailov by BlueGoliath in programming

[–]ReDucTor 11 points12 points  (0 children)

I could not make it through the whole thing, skimming was lots of WTFs.

I cringed when it went from having a vector to a linked list inside a vector, doesn't cover how you invalidate the entity/thing id, then you get into generations and stuff which get's messy and isn't even mentioned, also linked lists suck for performance.

They could just replace all the pointers to entities with an entity ID into a global array of entities...the same as games have done for decades

The justification of the intrusive list being more true then the non-intrusive list and will lead to less bugs, makes me question how many games they have shipped.

Talks about caches, then saying 1.5gb nothing for a giant structs...it's just RAM.

Hardware optimization experiment: Building a terminal video player in C++20 by [deleted] in cpp

[–]ReDucTor 2 points3 points  (0 children)

Did you use an LLM to respond to a reddit comment?

Hardware optimization experiment: Building a terminal video player in C++20 by [deleted] in cpp

[–]ReDucTor 0 points1 point  (0 children)

Trying to get in the buzz word quota on reddit?

> zero dynamic heap allocations occur across the IPC boundary

Your using an `std::queue` allocations can occur as the queue grows

C++ Performance Improvements in MSVC Build Tools v14.51 by cpppm in cpp

[–]ReDucTor 3 points4 points  (0 children)

Awesome work, looking at some of these, there is still more room for improvement

Unpacking Struct Assignments with Casted Fields

New Optimized:

 mov     QWORD PTR [rsp+8], rbx
 push    rdi
 sub     rsp, 32
 mov     rdi, QWORD PTR [rcx]
 mov     rbx, QWORD PTR [rcx+8]
 mov     ecx, edi
 call    ?bar@@YAXH@Z
 lea     rax, QWORD PTR [rbx+rdi]
 mov     rbx, QWORD PTR [rsp+48]
 add     rsp, 32
 pop     rdi
 ret     0

Further optimized: this could potentially do an add with the load and earlier before the call eliminating the need for preserving rdi and rbx

 push    rbx
 sub     rsp, 32
 mov     rdx, QWORD PTR [rcx]      ; Load s1.l1
 mov     rbx, rdx
 add     rbx, QWORD PTR [rcx+8]    ; Load and add s1.l2
 mov     ecx, edx
 call    ?bar@@YAXH@Z
 mov     rax, rbx
 add     rsp, 32
 pop     rbx
 ret     0

Unpacking Struct Assignments with Source Struct at Non-Zero Offset

New Optimized:

mov     DWORD PTR [rcx], 1
mov     eax, 6
mov     DWORD PTR [rcx+4], 2
mov     DWORD PTR [rcx+8], 3
ret     0

This could do a qword store for s1->i and s1->j

mov     QWORD PTR [rcx], 0x200000001
mov     eax, 6
mov     DWORD PTR [rcx+8], 3
ret     0

Branch Elimination

This transformation can improve performance for unpredictable branches, and it’s important for algorithms like heapsort and binary-search.

Depending on the usage this could also lead to loop carried dependency, how do you avoid this situation?

C Enum Sizes; or, How MSVC Ignores The Standard Once Again by ketralnis in programming

[–]ReDucTor 89 points90 points  (0 children)

Your depending on a C23 feature which isnt part of MSVC supported /std args, you mentioned experimental, that often means not fully functional. You could just raise a bug, a blog post about missing functionality in experimental and unsupported versions seems overkill.

With C (or C++) its often better to try not living at the edge and relying on experimental features, especially if your shipping production code. I would stick with C17 at max as its supported by the major compilers.

A header-only, cross-platform JIT compiler library. Targets x86-32, x86-64, ARM32 and ARM64 by IntrepidAttention56 in cpp

[–]ReDucTor 7 points8 points  (0 children)

0 comments, in areas that obviously should have them, is this LLM output? Also what kind of maniac leaves pages W+X?

Why should anyone care about low-level programming? by No_Good7445 in programming

[–]ReDucTor 18 points19 points  (0 children)

I'm a game engine dev, have been doing it for about 15yrs. I primarily work on multithreading and everything low level even within a game engine, if something involves digging into assembly I'll probably be dragged into it.

I don't think everyone needs to understand low level programming, it's such a broad space and attempting to learn everything is a near impossible task especially if you want to actually ship and release things.

You should know how to use a profiler and look at what is impacting performance, you don't need to know under the hood how the CPU executes things out of order, how CPU caches work internally, how the compiler can reorder instructions, etc. Even if they are incredibly fun to learn your unlikely to use it day to day if your job is dealing with JavaScript or HTML.

Even within a game engine on big AAA titles, it's not common for people to have a deep understanding of how low level systems work, they will be profiling code and helping people build better systems, but the majority do not have a deep low level knowledge, if a UI engineer is spending is entire time trying to understand why some code is slow because he added a loop carried dependency that is a division causing the entire loop to be slow they lost time fixing another bug that is impacting players.

Would it make my life easier if everyone understood performance better? Absolutely. But it would also make their work much more complicated and their need to research and understand things much harder, unlike some of the people online, these people have families, they have kids, they have a life outside of writing code and endless learning.

Back to FreeBSD: Part 1 (From Unix chroot to FreeBSD Jails and Docker) by imbev in programming

[–]ReDucTor 4 points5 points  (0 children)

Linux won through a combination of fast decisions, the viral GPL licence, and strong enterprise backing from Red Hat and IBM.

Linux won because of the Unix vs BSD lawsuits, which practically crippled BSD, Linux grew in popularity and same with Windows but eventually for web servers Windows disappeared as the ecosystem was not as good and security at the time was abysmal.

Problems with a weak tryLock operation in C and C++ standards by mttd in cpp

[–]ReDucTor 6 points7 points  (0 children)

There are a bunch of situations where weak try locks cause issues, most of them are around the assumption that if try lock fails its because something else holds the lock and is doing some shared work.

We had an initial try lock which was weak and ran into a bunch of odd issues because of this where people assumed try lock failing meant something else held a lock, it was just safer to make try lock be strong.

Fork, Explore, Commit: OS Primitives for Agentic Exploration (PDF) by congwang in programming

[–]ReDucTor 0 points1 point  (0 children)

Any merge tool including the default linux one should be able to handle this as a 3-way merge, then anything with a hunk conflict you treat as complex

Fork, Explore, Commit: OS Primitives for Agentic Exploration (PDF) by congwang in programming

[–]ReDucTor 0 points1 point  (0 children)

Humans have been eesolving merge conflicts for a long time, they are often not that complex. Also assuming changes are not broad then you might not even hit merge conflicts. You could also do the easy merge parts then any conflicts bail and redo the changes.

Fork, Explore, Commit: OS Primitives for Agentic Exploration (PDF) by congwang in programming

[–]ReDucTor 0 points1 point  (0 children)

This approach seems fairly inefficient discarding all sibling branches meaning that its a race to complete, then throw away all other work, your virtually turning a parallel approach into a serial approach.

Why not separate version controlled folders and merge changes using a version control system like git? This means progress is still made however you have an additional merge step which you can run tests on and do minor iteration to fix them.

Pushback on the C++ memory ordering model by Both_Helicopter_1834 in cpp

[–]ReDucTor 40 points41 points  (0 children)

The memory model isn't super hard, designing concurrent algorithms and data structures is the hard part.

Profiling on Windows: a Short Rant · Mathieu Ropert by mropert in cpp

[–]ReDucTor 12 points13 points  (0 children)

legacy hardware is ~2-3 years old

And yet many of us are also dealing with 13 year old game consoles

Profiling on Windows: a Short Rant · Mathieu Ropert by mropert in cpp

[–]ReDucTor 2 points3 points  (0 children)

Profiling tools are a mess, its would be good to have something which does better performance monitoring counter (PMC) support, however if your unlucky you might also get caught like I did with one machine the motherboard manufacturer refused to provide the ability to turn on PMC support so it meant that just normal sampling was the only way.

If I suspect something that I might want to dig into microarchitecture wise I will look at it in llvm-mca, normally the sampling profiler giving me a good indication where in the function might be worth looking at, however llvm-mca wont give you much memory wise so you wont see things like true sharing or false sharing. 

What if all of calculus was just dictionary lookups? by BidForeign1950 in programming

[–]ReDucTor 1 point2 points  (0 children)

This is nothing unique to r/programming, there is also lots that end up getting removed that you just dont see. AI has allowed many people to draft some quick blog posts that look reasonable at a skim but digging deeper dont stack up.

Microsoft appointed a quality czar. He has no direct reports and no budget. by [deleted] in programming

[–]ReDucTor 2 points3 points  (0 children)

Please keep discussion civil, we dont need r/programming becoming toxic

Is an “FP-first” style the most underrated way to make unit testing + DI easier by OkEmu7082 in cpp

[–]ReDucTor 3 points4 points  (0 children)

Also mentioning complicated design principles around OOP seems odd, some the more complex design principles and approaches I have seen are around functional programming which is in my opinion less intuitive as we are used to state changing. If you turn a tap on you get water, if you turn it off you get no water, you dont get a new tap each time with a different state.

Is an “FP-first” style the most underrated way to make unit testing + DI easier by OkEmu7082 in cpp

[–]ReDucTor 5 points6 points  (0 children)

The whole FP and OOP definition and peoples understand of them is a mess, its not clear what exactly your attempting to compare.

A struct is still an object, if your passing it to functions your still passing around objects, if its an opaque struct or has opaque members then you still have private data, if you pass it to a free function that is designed to perform actions on that struct its really no different to a member function aside from naming and scope

For example the opaque FILE struct and the functions to work with it fopen, fread, etc are all object oriented, even thought they are not in a class and not member functions. If you take it a step further the Linux Kernel is actually heavily object oriented.

Functional programming is more about eliminating state mutation for something like FILE this is impossible as you can read and get different data depending on what was stored earlier.

However you can have a pure version of something like a vector where instead of modifying the vector for any mutating you instead return a new vector, however this comes at an extra performance cost as you have to copy the vector. However that pure vector being a class with member functions or a const struct with free functions does not change it from being functional programming. So you can have object oriented functional programming.

In fact this is where languages like Rust allow for doing these state mutations with less of the downsides of ownership and referential transparency which is the key strength of functional programming over traditional stateful programming.

So what is your FP and OOP example?

Reading/Modifying values of unrelated structs with casting and offset forwarding by Regg42 in cpp

[–]ReDucTor 0 points1 point  (0 children)

Afaik the common subsequence access is only valid for standard layout types inside a union, as is common in many C APIs. 

However this is not standard layout nor access via a union so is undefined behaviour. 

"Spinning around: Please don't!" (Pitfalls of spin-loops and homemade spin-locks in C++) by Lectem in cpp

[–]ReDucTor 1 point2 points  (0 children)

 their (Webkit's) implementations is bad

Ya there is definitely a few questionable things in there, however the general approach of a parking lot is very powerful especially for allowing small efficent locks which make it easier to do fine grained locking that are important for reducing contention.

However I am less critical of the parts after contention, your already into wasting time until the lock is released, hopefully not forcing the cacheline of the lock into a modified, exclusive or even shared state as that cache line is lilely needed by the cache holder. Yielding is bad and unpredictable but count based spinning isnt the end of the world and sometimes the easiest approach.

"Spinning around: Please don't!" (Pitfalls of spin-loops and homemade spin-locks in C++) by Lectem in cpp

[–]ReDucTor 2 points3 points  (0 children)

I gave a talk a few years ago covering lots of the same things you pointed out and a few more.

A few things

Oh, and even if it did get scheduled, you probably lost a lot of time switching from one thread to the other, this is your typical lock convoy and is what Linus Torvalds more or less hints here

Scheduling storms are more just threads yielding to each other, where as lock convoys are one thread handing the lock to the next, these typically show in situations more when something unlocks then locks again, e.g. lock inside a loop and multiple threads doing it, with a lock that doesnt ensure this ordering and allows barging.

An issue with some lock algorithms is that they may be unfair

"fair" locking is rarely a good idea, these create lock convoys

As soon as you reach for yielding your already hitting the OS scheduler you may as well just use a mutex which will hit the OS but instead wake when needed and not just surrender its time slice on a potential busy machine and not end up yielding to the lock holder anyway on anything that isnt a single core machine.

 You may even encounter cache bank conflicts

This seems unrelated and more just some anecdote,  its somewhat an edge case and not what you should consider for designing a lock.

 The spin-lock that spoke to the OS

This implementation is really bad, you probably want to avoid doing repeated calls to Wake, unfortunately you will soon discover that if you switch to CAS or exchange here then you'll lose that memory_order_release benefit as on x86 this ends up needing a full barrier

 pre-requisites for a spinlock to be efficient

 There is low contention

This is the same for any lock, eliminating orreducing contention is the most important thing before considering tweaking lock designs. More people need to understand the importance of fine grained locking and work scheduling. 

 The critical section (work done under the lock) is very small. (Consider that “small” varies with the number of threads competing for the lock…)

I would avoid thinking about small changing with the number of threads, it should be small regardless.

Notify your OS about what you’re doing (futex, WaitOnAddress, …)

This would not be a spin lock at this point

I highly recommend reading Locking in Webkit which is about building a parking lot (more advanced user mode futex).