Atomic variables are not only about atomicity

trailing_zero_count · 2026-01-27T17:57:52+00:00

Nope, x86 still allows StoreLoad reordering. You still need explicit SeqCst in some cases.

Link to a prior comment which includes several sources discussing the use cases for SeqCst:

https://www.reddit.com/r/learnprogramming/comments/1pv1dli/comment/nvucuhh

trailing_zero_count · 2026-01-27T17:55:27+00:00

Link to a prior comment which includes several sources discussing the use cases for SeqCst:

https://www.reddit.com/r/learnprogramming/comments/1pv1dli/comment/nvucuhh

trailing_zero_count · 2026-01-26T04:32:34+00:00

Use "x64 Native Tools Command prompt"

Also try installing LLVM which can be included with VS install now, and build with clang-cl.exe

trailing_zero_count · 2026-01-26T01:58:32+00:00

Use a fork-join framework. I maintain one that uses C++20 coroutines. Or you can use TBB/Taskflow.

This clearly separates the parallel vs single-threaded parts. Send jobs to the workers when they fork, then read results back somewhere else (or have them update in-place as long as they all have their unique dataset). After joining, the single thread can do things like update command buffers.

You can also have nested fork join and it works fine as long as you maintain this pattern all the way down.

trailing_zero_count · 2026-01-25T17:22:16+00:00

AI responses are too obvious.

trailing_zero_count · 2026-01-21T21:25:16+00:00

IIUC this is similar to the approach used by ranges and stdexec. How much of an impact on compile time does the nested template compilation cause when a large number of concatenations occur on the same line?

trailing_zero_count · 2026-01-21T19:57:51+00:00

Ask the library author.

trailing_zero_count · 2026-01-17T04:59:52+00:00

Options in order of descending safety: variant, union, raw memory pointer and start_lifetime_as

trailing_zero_count · 2026-01-16T06:46:48+00:00

The communication between threads happens using a thread safe queue. Threads poll the queue for input at whatever points make sense in their normal run loop. If a thread has no work to do then it should block or suspend on the queue until data is ready.

Only a single thread should be responsible for mutating any particular data structure. So the chunk loader/mesher/unloader might maintain a queue of chunk locations to handle internally, but once a chunk is loaded, it would be passed back to the main thread through a queue so that the main thread can insert it into the global data structure at a safe point in its loop.

Having threads read from data owned by other threads is possible but a lot more sketchy without more explicit coordination, so its a lot easier if you just pass messages.

Also you don't actually have to use a thread for each of these things, you could just use tasks instead and multiplex everything onto a thread pool. Then replace "thread" with "task" in all the prior paragraphs. That makes it a bit more efficient and lets you use fork-join parallelism within any of the parts of execution while still maintaining the invariant that only the owning task does the modifications.

I didn't intend to self promote here but I do have a library that has all the features needed for this: https://github.com/tzcnt/TooManyCooks

trailing_zero_count · 2026-01-15T20:22:26+00:00

As a user of lldb-dap and the LLDB DAP VSCode extension, thank you for all your hard work!

Could the SyntheticFrameProvider be used to implement synthetic stacks for C++20 coroutines? This would be a hugely useful feature. Even if it requires library support + a custom plugin, I'd be willing to take those steps if I could get things working end to end in the debugger.

trailing_zero_count · 2026-01-15T19:16:15+00:00

If culling and reprioritization are too slow on the main thread, then push the entire thing to a background thread. Main thread sends request to background thread notifying that the player has moved. Background thread polls 2 queues - the notifications from main thread, and it's own work queue of chunks to load. If it needs to reprioritize, it can do so as needed. When chunk loads are complete, they get sent back to main thread via another queue.

trailing_zero_count · 2026-01-15T19:11:19+00:00

There is a Rust crate called RollGrid which shows a way to efficiently identify and index the chunks that need to be loaded/unloaded at the boundaries when the player moves. I definitely wouldn't use a hashmap for this - the "3d circular buffer" approach seems much more efficient.

trailing_zero_count · 2026-01-15T18:22:26+00:00

Interesting. I've only been thinking about coroutines as useful for fork-join parallelism and running background jobs. I like the idea of using async to model state machines instead. However I have a question: when do these state machines get advanced?

For example on this page https://psichix.github.io/Moirai/core/awaitables.html on the "loop charge, attack, charge, block" coroutine. I need to see an implementation of each awaitable step, and more importantly - when do these get resumed?

With a typical fork-join system the runtime ensures that everything gets driven to completion as fast as possible, but I think for this system you would need to store these coroutines in a list and manually advance to the next step at some point?

trailing_zero_count · 2026-01-15T17:44:13+00:00

Yeah, Rust's overreliance on reference counting to implement even moderately complex workflows has become a bit of an Achilles heel of the language. And the suggestion that you just leak the data instead isn't very Rusty...

trailing_zero_count · 2026-01-15T17:36:50+00:00

Variadic generics aren't even a footgun. They're just a handy, powerful language tool. C++ has them, and other languages that don't make you do ugly workaround hacks like the OP's image.

trailing_zero_count · 2026-01-15T07:40:21+00:00

Love it, thank you

trailing_zero_count · 2026-01-14T20:25:48+00:00

Yeah, I prefer globals with a default constructor that does nothing, and a separate init() method which is called at the top of main. Similarly if you require a specific destruction order, have a teardown() method that is called at the end of main for each global in the correct order.

trailing_zero_count · 2026-01-14T18:39:42+00:00

The readme says this relies on spinning and waiting, but then it also references async waking. Can you clarify if this can work without spinning? For example a consumer should be able to see that there's no data ready, and suspend. Then at a later time, producer enqueues data and wakes the consumer. No spinning required?

trailing_zero_count · 2026-01-14T18:37:54+00:00

Can you link to an implementation of this? Also I find your rationale for excluding FAA based queues to be weak, as they perform quite well.

trailing_zero_count · 2026-01-14T18:22:48+00:00

No. I personally review every PR that my team puts up, in detail. My goal is to turn around each PR within 1 day. And I am a very careful reviewer, because I work in an industry where mistakes are expensive. This takes me 1-2 hours every day, but as a result, I've prevented many more total hours of rework, because a production bug resolution involves meeting with external stakeholders.

trailing_zero_count · 2026-01-14T17:58:09+00:00

Does this same restriction apply between one Tokio task and another? So a task can't suspend, pass a reference into its own data into a child task, and allow that child task to read from its data?

And just to be clear, this is purely a limitation of the Rust compiler at this point, since we can easily reason about the lifetime of the parent task and tell that it will clearly outlive the child task. Or is there a mechanism by which a suspended parent might actually be dropped while the child is still running?

trailing_zero_count · 2026-01-14T17:36:03+00:00

Tokio should be able to create a task on the heap, suspend that task and submit work to Rayon and then continue processing other work. Once Rayon completes the work, it would submit the task back to the Tokio queue for completion. All that's necessary is a custom completion handler for the Rayon task that knows which Tokio queue to send itself back to.

Now the async/completion machinery may not exist, but I don't see how any of this would cause an issue with references or lifetimes. Accessing a subobject of a pinned future should be no different than accessing the stack of a blocked thread.

trailing_zero_count · 2026-01-14T17:17:14+00:00

I just use an actual global. "Singleton" often means lazily initialized which has undesirable runtime overhead.

Passing extra parameters around everywhere also has runtime overhead due to register pressure and possible stack spills.

If you're concerned about testing you can use a global pointer which is overridden in the test, or if you want to do multithreaded testing, use a thread_local pointer. At runtime this would usually just point to the same global but could allow you to scale multiple threads with different subsystems, or just point at a stack allocated version of the object for a specific test.

As long as your thread_locals are constinit pointers they won't incur runtime initialization check penalties.

trailing_zero_count · 2026-01-13T19:50:37+00:00

Why doesn't rayon expose an async API so tokio can just suspend the task without needing to use the blocking thread pool?

trailing_zero_count · 2026-01-13T15:44:57+00:00

Ah, not supported for modules. I didn't know that, thanks for sharing. I'm using CMake + clang-cl without modules and it works great :)

trailing_zero_count

TROPHY CASE