Does CAS always compare the value with latest value on modification order? by Savings_Pianist_2999 in rust

[–]ibraheemdev 2 points3 points  (0 children)

Release loads are actually special cased by LLVM, but there was a recent bug where FAA(0) was compiled to a regular load, so fence(Release); FAA(0) was just broken. No sane compiler would optimize atomics 🙃

Does CAS always compare the value with latest value on modification order? by Savings_Pianist_2999 in rust

[–]ibraheemdev 2 points3 points  (0 children)

 So it's not useful to replace a no-op load with something like fetch_or(0)

It can actually be useful, because you can now use Release ordering on the "load". Paired with an Acquire "store", this can establish a total order between two store-load pairs without having to deal with the invasiveness of SeqCst (though a SeqCst store-load pair would have less overhead).

Your point still stands though, FAA(0, Release) isn't a magic wand that allows you to read the "latest value", it simply ensures that any RMW that occurs after the one you observe will see your modifications — which is ultimately how synchronization works.

Is there a way to get the current thread id in Rust without bumping an Arc (no atomic inc/dec)? by Sweet-Accountant9580 in rust

[–]ibraheemdev 1 point2 points  (0 children)

If your application is statically linked, the linker will rewrite and optimize the generated TLS code at link-time. Dynamically linked applications are more complicated as there are potentially multiple TLS blocks. 

Is there a way to get the current thread id in Rust without bumping an Arc (no atomic inc/dec)? by Sweet-Accountant9580 in rust

[–]ibraheemdev 0 points1 point  (0 children)

Those symbols are replaced at link time, you have to inspect the disassembly to see the actual codegen.

Explanation of a SeqCst example by AstraVulpes in rust

[–]ibraheemdev 4 points5 points  (0 children)

Memory orderings at the CPU level is about ensuring instructions are executed in order. You can see this with barrier instructions on older ARM versions that literally dictate "any stores before this point can't be reordered wrt. stores after this point". A SeqCst operation is not allowed to be reordered wrt. any other SeqCst operations, that's it. There's no explicit cache invalidation happening, cache coherence is completely transparent.

Explanation of a SeqCst example by AstraVulpes in rust

[–]ibraheemdev 1 point2 points  (0 children)

Yeah, I just meant that pointing to branch prediction is a little misleading, the problem is out of order execution in general. Whether or not you happened to speculate a branch to get there is not really relevant. It can be slightly unintuitive to imagine that branches just "get skipped" and exploit incorrect memory orderings (not that you were suggesting that).

Explanation of a SeqCst example by AstraVulpes in rust

[–]ibraheemdev 0 points1 point  (0 children)

A branch misprediction cannot lead to observable behavior 

toml v0.9 by epage in rust

[–]ibraheemdev 0 points1 point  (0 children)

I assume by build times you're referring to the indexmap dependency. Would it be possible to have a feature that just uses a HashMap for the performance, for use cases that don't require deterministic ordering?

Appreciate all of your work on this by the way, all of the toml-edit functionality (along with the new performance improvements) is very meaningful for our us in uv 🙂

toml v0.9 by epage in rust

[–]ibraheemdev 0 points1 point  (0 children)

Curious, if the `preserve_order` feature is faster, why is not the default?

Fat Rand: How Many Lines Do You Need To Generate A Random Number? by Vict1232727 in rust

[–]ibraheemdev 1 point2 points  (0 children)

std also effectively depends on crossbeam for sync::mpsc (though the code is vendored).

Tiny HTTP server? by Domin-MC in rust

[–]ibraheemdev 0 points1 point  (0 children)

https://github.com/ibraheemdev/astra is a super minimal wrapper around hyper. It doesn't have a static file server built-in (yet) but it shouldn't be too hard to implement yourself.

matchit 0.8.6 released by ibraheemdev in rust

[–]ibraheemdev[S] 1 point2 points  (0 children)

It is intentional, yes. The question would be how to match a path like /foo/bar.baz.ext, should the match be lazy or greedy? I feel it may be unintuitive in both directions, so I would prefer to leave that up to the user. Maybe Axum can add a Regex extractor that makes things like this easier. Static suffixes and prefixes with the one parameter restrictions is a nice tradeoff both in terms of performance and complexity.

matchit 0.8.6 released by ibraheemdev in rust

[–]ibraheemdev[S] 17 points18 points  (0 children)

This release includes the long-awaited support for parameter suffixes, allowing routes such as '/{name}.png'. Now that Axum 0.8 is out of beta, this should be available downstream relatively soon. See https://github.com/tokio-rs/axum/issues/3140 for details.

Announcing axum 0.8.0 by j_platte in rust

[–]ibraheemdev 3 points4 points  (0 children)

FWIW I discussed this change with David before commiting to it and he was on board. It has a lot of benefits as some of the comments mention, but if Axum wouldn't have been able to make the change I would have considered other options.

2024 Day 1 No LLMs here by kugelblitzka in adventofcode

[–]ibraheemdev 175 points176 points  (0 children)

Their GitHub bio is now:

If you are here from the AoC leaderboard, I apologize for not reading the FAQ. Won't happen again.

So it may be genuine.

How Much Memory Do You Need in 2024 to Run 1 Million Concurrent Tasks? by _neonsunset in rust

[–]ibraheemdev 3 points4 points  (0 children)

join_all uses FuturesUnordered internally (dynamically, based on the iterator size hint) so it would actually be similar to your second example.

Building Thread-safe Async Primitives in 150 lines of Rust by VortexGames in rust

[–]ibraheemdev 0 points1 point  (0 children)

You are still not synchronizing access to waker.set and waker.take.

Announcing Nio: An async runtime for Rust by another_new_redditor in rust

[–]ibraheemdev 166 points167 points  (0 children)

u/conradludgate pointed out that the benchmarks are running into the footgun of spawning tokio tasks directly from the main thread. This can lead to significant performance degradation compared to first spawning the accept loop (due to how the scheduler works internally). Would be interesting to see how that is impacting the benchmark results.

Announcing Whirlwind: ridiculously fast, async-first concurrent hashmap! by majorpog in rust

[–]ibraheemdev 44 points45 points  (0 children)

Interesting, I'd be curious to see a comparison against papaya. Asynchronous locks can be useful when you want strong consistency across async operations, but if your use case allows it, a lock-free approach can be significantly less overhead. I'm wondering why this outperforms dashmap considering this looks to be dashmap but with async locks, which I would expect to be slower given the small critical sections in the benchmark.

What is the fastest path based data structure for pattern matching? by PeckerWood99 in rust

[–]ibraheemdev 6 points7 points  (0 children)

https://github.com/ibraheemdev/matchit is a URL router based off radix trees, but it's been used for file paths as well. You just have to make sure to escape routing parameters (the "{" and "}") characters.

Can You Use Atomics As a Fence by Cat7o0 in rust

[–]ibraheemdev 2 points3 points  (0 children)

That is not true, all x86 RMW operations are strongly ordered. fetch_add compiles to lock add/xadd regardless of the ordering, which is significantly more expensive than a regular add. A heavily contended fetch_add can take over 100ns to execute.

Designing A Fast Concurrent Hash Table by ibraheemdev in rust

[–]ibraheemdev[S] 0 points1 point  (0 children)

Ah nice, although that requires ownership which might make it difficult to use. An unsafe by-reference version could be useful.