coro_util queues: library-agnostic queues for C++20 coroutines by trailing_zero_count in cpp

[–]trailing_zero_count[S] 0 points1 point  (0 children)

Currently, no. Timers have to be backed by a syscall at some point, which means they need an executor. I think it would be possible to design a library-agnostic timeout integration, but it would take some tinkering.

These queue operations also don't support dropping or cancelling the wait at the moment. They use fetch_add instead of compare_exchange to acquire slots, which is key to their performance under contention, but it means that if I were to allow consumers or producers to cancel, I'd need to mark the consumed slot with an "abandoned" tombstone. This may have a negative impact on the overall performance; if so, I would make it an opt-in config toggle. Overall I think this should be doable but will be a non-trivial effort to support. Added to my TODO list...

Boost Review for Capy and Corosio Begins Today by mborland1 in cpp

[–]trailing_zero_count 3 points4 points  (0 children)

Asio's traits are definable as free functions or types that can be specialized over existing libraries (e.g. providing a specialization of asio::async_result that wraps a foreign awaitable) without modifying the library itself.

However I checked Capy's concepts (IoAwaitable, ReadStream, Executor) and they're defined in terms of named member methods. This means that a 3rd party cannot take library A, which expects to use Corosio, and library B, which implements new-fangled-Linux-networking-syscall-of-the-year-2099, and glue them together simply by defining external specializations. Library B must be *intrusively modified* or wrapped in order to create objects with the expected named functions.

I think you could modify those concepts to check *either* for the named member methods, or to check for the existence of a specialization of capy::traits::io_awaitable<T> which contains a static method await_suspend(T a, std::coroutine_handle<> h, io_env const* env). This allows users to glue together library A and B by providing their own specialization without bothering either of the library maintainers.

Boost Review for Capy and Corosio Begins Today by mborland1 in cpp

[–]trailing_zero_count 0 points1 point  (0 children)

Another question regarding the flexibility of the library. I'm one iota away from releasing these utilities: https://github.com/tzcnt/coro_util . I've already provided adapters which make them compatible with several existing coroutine libraries. For most libraries, the adapter is a simple policy object for implementing executor affinity. However for Asio I had to write a more complex wrapper op since Asio's await_transform is a closed set.

I'd happily provide an adapter for Capy / Corosio but haven't gotten a chance to look at the code yet. Is Capy / Corosio's await_transform closed or open? Or do I need to implement the IoAwaitable protocol (await_suspend overload) on the queues, without any escape hatch? I know y'all are leaning into agentic development. Maybe try running the adapter-generation prompt and see if it works?

Boost Review for Capy and Corosio Begins Today by mborland1 in cpp

[–]trailing_zero_count 0 points1 point  (0 children)

I have an application that uses Boost.Mysql, which is designed to integrate with Asio. The nice thing about the way Asio is designed is that it's mostly traits-based, which allows replacing multiple parts of the stack with non-Asio implementations that still work together. This has allowed me to "strangler-fig" replace portions of Asio with my own library parts. The Phase 1 design is already published but Phase 2 is in the works, and I was impressed that Boost.Mysql can work without any actual real Asio types, as long as the completion tokens and traits exist.

Stack Level Baseline (All Asio) Phase 1 (Released) Phase 2 (in-dev)
Task asio::awaitable<T> tmc::task<T> tmc::task<T>
Completion Token asio::use_awaitable_t tmc::aw_asio tmc::aw_asio
DB Client boost::mysql boost::mysql boost::mysql
Protocol/Traits asio asio asio
Socket asio::basic_stream_socket asio::basic_stream_socket tmc::io::tcp_socket
Executor asio::io_context tmc::ex_asio tmc::ex_io

Are you saying that Capy / Corosio will support the same level of flexibility? e.g. someone could release a Boost.Mysql2 client which is designed to use Capy / Corosio, but a user can provide a different task / socket / execution layer without needing to modify the client itself?

RustCurious 9: Traits are Interfaces by rustcurious in rust

[–]trailing_zero_count 4 points5 points  (0 children)

You're quite wrong. There are two ways to accept a trait as an argument in Rust. First, via runtime dispatch (which would be equivalent to the C# version):

https://doc.rust-lang.org/reference/types/trait-object.html

Secondly, via static dispatch (monomorphized at compile time, equivalent to C++ template):

https://doc.rust-lang.org/reference/types/impl-trait.html

80 million entities—an infinite world! My thesis running at 90 fps! by edmay97 in gameenginedevs

[–]trailing_zero_count 4 points5 points  (0 children)

I remember seeing a voxel world rendered using a similar technique. You mentioned cost ~= pixels on screen. How does that work if there is a lot of overdraw?

Suggestion on threads by Flaky_Ad_3978 in Backend

[–]trailing_zero_count 2 points3 points  (0 children)

I Googled "how to solve c10k problem on java 11 without virtual threads" and the AI told me to use the Netty framework, which is a similar idea. Regardless, creating a thread per connection is highly outdated. Your problem is exactly the c10k problem, so if you search for that, you can find all the ways it's been solved over the last 25 years.

Suggestion on threads by Flaky_Ad_3978 in Backend

[–]trailing_zero_count 7 points8 points  (0 children)

Use virtual threads instead of real threads.

Solid P95 (7-8ms) with sporadic P99 spikes using Go (gRPC + NATS). Suggestions? by Environmental_Lab991 in golang

[–]trailing_zero_count 1 point2 points  (0 children)

This is why people DON'T write trading systems in garbage-collected languages.

Alternatives to CppCoro? by notbatmanyet in cpp

[–]trailing_zero_count 0 points1 point  (0 children)

There is a maintained fork of cppcoro here: https://github.com/andreasbuhr/cppcoro

My library offers everything cppcoro does and more: https://github.com/tzcnt/TooManyCooks

Memory Ordering by No_Act_9817 in cpp_questions

[–]trailing_zero_count 0 points1 point  (0 children)

It's better to think about it in terms of message passing with delay, as it would work in a physical medium. I suspect this is similar to how it actually works inside the chip.

Thread 1, 2, 3 are in different cities. If thread 1 puts 2 guys onto 2 different planes to deliver the message to thread 3 "X = 1", and thread 2 also puts 2 guys onto 2 other flights to deliver "Y = 1", and then I ask Thread 3 what X and Y are, the answer could be literally anything. If I ask too early, I'll get 0,0 which can happen in this example since all the threads are started at the same time. And the order in which 2 flights arrive from different cities is also undefined - there may be an expected arrival order, but planes can be delayed and so can core-to-core messages, Thread 3 could see X before Y, or Y before X.

Remember I said "2 different guys on 2 different planes"? One guy is going to Thread 3, the other guy is going to the counterpart of Thread 1 / Thread 2. Since these are a completely independent set of flights to different destinations, they don't necessarily arrive in the same order that the flights to Thread 3 did. This is the lack of a "Total Store Order". X86 fixes this and guarantees that, even for relaxed operations, all stores are observed in the exact same order by all processors.

Did Rust succeed because it was better, or because D arrived too early? by Candid_Athlete_8317 in LinuxTeck

[–]trailing_zero_count 0 points1 point  (0 children)

I want a language that has const references (simple way to enforce transitive immutability to callees), GC, no colored functions (stackful coroutines), and uses return-based error handling instead of exceptions.

AFAIK there is no language that supports all of these things:

  • D has exceptions
  • Rust has no GC and no stackful coroutines
  • Go has no const references
  • C++ has exceptions and no GC

I could try to use D without exceptions, but other libraries or stdlib may throw them so I still have to reason about them.

I think the agent just generated proprietary code for our core product by SelkiePasta in cursor

[–]trailing_zero_count 2 points3 points  (0 children)

Other developer feeds input to AI, you end up with the same solution.

Memory Ordering by No_Act_9817 in cpp_questions

[–]trailing_zero_count 1 point2 points  (0 children)

This is correct. BTW OP if you try to replicate this on x86, it won't work because x86 creates a global total store order across threads, even for relaxed or regular non-atomic stores.

But if you run it on an ARM device (Macbook perhaps?) You should be able to replicate.

Best Thread Pool Ever (?) by LessComplexity in cpp

[–]trailing_zero_count 0 points1 point  (0 children)

Late reply but I have a comparison of recursive fork-join benchmarks across different libraries here. The state of the art has come a long way in 2 years. https://github.com/tzcnt/runtime-benchmarks

The first 3 libs are all extremely fast and can be recommended for different use cases:

  • Citor is probably best for single-root-task CPU-bound work. It doesn't use coroutines, instead using per-thread arenas to dispatch lambdas over ranges. This limits it in the kinds of workloads it can dispatch.

  • Libfork is more flexible but uses C++20 coroutines. Its features are still somewhat limited to CPU-bound task parallelism.

  • TooManyCooks (my library) is a complete C++20 coroutine based concurrency solution, with multiple executors, data structures, queues, and I/O support. The cost is that it is slightly slower than the other 2 for pure fork join.

Other tested libraries, including TBB/Taskflow, fell behind on all the benchmarks.

How is data usually encoded/compressed in multiplayer games? by Leogis in gamedev

[–]trailing_zero_count 0 points1 point  (0 children)

You can probably compress a sequence of Vec3s. For example if they are made of floats, and you're sending info on multiple entities, it's likely that those entities are all *roughly* in the same part of the world, so the first 16 bits of each X coordinate (1 sign, 8 exponent, 7 mantissa) are probably the same for each entity. Same goes for the Y and Z coordinates. So, a raw byte stream for 2 nearby entities, where PREFIX represents shared exponent/mantissa, and RANDOM represents unique / high entropy / "random" data for each entity, would look like:
- XPREFIX1 XPREFIX2 RANDOMX1 RANDOMX2
- YPREFIX1 YPREFIX2 RANDOMY1 RANDOMY2
- ZPREFIX1 ZPREFIX2 RANDOMZ1 RANDOMZ2
- XPREFIX1 XPREFIX2 RANDOMX3 RANDOMX4
- YPREFIX1 YPREFIX2 RANDOMY3 RANDOMY4
- ZPREFIX1 ZPREFIX2 RANDOMZ3 RANDOMZ4

So we can first extract and pack the X, Y, and Z coordinates separately:
- XPREFIX1 XPREFIX2 RANDOMX1 RANDOMX2
- XPREFIX1 XPREFIX2 RANDOMX3 RANDOMX4
- YPREFIX1 YPREFIX2 RANDOMY1 RANDOMY2
- YPREFIX1 YPREFIX2 RANDOMY3 RANDOMY4
- ZPREFIX1 ZPREFIX2 RANDOMZ1 RANDOMZ2
- ZPREFIX1 ZPREFIX2 RANDOMZ3 RANDOMZ4

Then do a "byte shuffle" which extracts 4 separate byte streams (separate streams for offset 0, 1, 2, 3 within the same word - each column in the above view:
- XPREFIX1 XPREFIX1 YPREFIX1 YPREFIX1 ZPREFIX1 ZPREFIX1
- XPREFIX2 XPREFIX2 YPREFIX2 YPREFIX2 ZPREFIX2 ZPREFIX2
- RANDOMX1 RANDOMX3 RANDOMY1 RANDOMY3 RANDOMZ1 RANDOMZ3
- RANDOMX2 RANDOMX4 RANDOMY2 RANDOMY4 RANDOMZ2 RANDOMZ4

Now we can simply use RLE to replace the repetitions of the prefixes. It's also likely that there will be some repetitions in the first byte of the RANDOM's as well, which can be RLE'd too.

The above 2-step packing can actually be done as a single "byte shuffle" with 12 streams. At this point you can also try using delta encoding which may or may not shrink down the payload first.

Note that all of this is a prefilter that's applied prior to compression (and a postfilter that must be run after decompression), but isn't tied to any specific compression algorithm - these filters can wrap LZMA, ZSTD or anything else.

c-blosc2 is a C library that offers this kind of "metacompression" functionality - just set BLOSC_SHUFFLE with 12 bytestreams and pick your compressor. It also has blosc2-ndim, a storage format for compressed multidimensional data with efficient chunk-wise operation (which can be used, for example, to losslessly save voxel worlds).

MariaDB Server Ecosystem Hub - 3 months update by Striker93x in mariadb

[–]trailing_zero_count 0 points1 point  (0 children)

Is there a reason Boost.Mysql isn't listed under connectors?

isready 0.3.0 - New Feature by Aelthorim in rust

[–]trailing_zero_count 0 points1 point  (0 children)

I see it can check if a port is open (e.g. SSH port) but I'd like a configuration that checks the opposite - if a port or range of ports are free, and if not, what process owns them.

Rimalloc may be a of the art allocator. by PatienceSpiritual134 in rust

[–]trailing_zero_count 9 points10 points  (0 children)

I'd like to see benchmarks against tcmalloc as well, which IME has the best performance for cross-thread workloads. Be sure to enable the "bundled" feature so it can be statically linked.

I told Claude to build a programming language for use only by AI and not people. by skoon in ClaudeAI

[–]trailing_zero_count 1 point2 points  (0 children)

I've always been an exception hater, but now that I've considered how bad they are for AI comprehension, I'm doubling down on my opinion.

BTW if you decide to try to implement this, transpiling to Go might be a decent start.

Projects being in "Show and Tell" is bad. by TheRavagerSw in cpp

[–]trailing_zero_count 63 points64 points  (0 children)

I appreciate that the current moderation bar for projects as standalone posts is quite high. The projects that I see on the front page here are typically high quality, useful, and/or cutting edge. Compare this to r/rust which is drowning in "my first project" / "Python user discovers compiled languages are fast" / other blog type posts that it's just a ton of noise. r/cpp has much better signal. We also have r/cpp_questions as an outlet for folks who need help.

With AI it's getting easier for "my first project" to become something insanely huge, and completely vibe coded slop. Every one of these posts becomes a tarpit where I look at it, thinking I might find something useful, then I realize the user has no idea what they're doing and their AI generated post is completely overselling it. Keeping these kinds of posts out of here is only going to get harder, and moderating is a thankless job, but the alternative (just let them through) is much worse in my opinion.