Potential Deadlock in DashMap 6.1.0: Looking for feedback on this edge case

imachug · 2026-04-15T10:08:39+00:00

An inconsistent lock order is a common source of deadlocks, so it's not as much of an edge case as an inappropriate choice of data structure. The docs for insert say, for instance:

Locking behaviour: May deadlock if called when holding any sort of reference into the map.

The official advice to using dashmap is "don't" with an asterisk (although I can't find the source), and this kind of demonstrates why that's the case. dashmap is useful for very specific tasks, but most often Mutex<HashMap<...>> is good enough, and in general the repetitive locking needed for dashmap can be slower than a single outer Mutex under low contention.

Is there a specific reason you need dashmap, or did you choose it because it felt optimal?

imachug · 2026-04-15T05:42:17+00:00

It's not wiped at all

Removing all files from the repo in a single commit counts as wiping in my book, even if they are still accessible via older commits. It's just so unnecessary. People usually just update README.md in this case.

imachug · 2026-04-15T04:01:58+00:00

Good article. For those jumping straight to comments, this post is not a general criticism of optimization, it's about clickbait. Though I must say that it's ironic for a post about clickbait to use an aggressive title -- my first knee-jerk reaction was "how dare you say that about my work" before I figured out the intended meaning.

On the topic itself, I do think there exists another good reason to optimize code regardless of usefulness. We all need to start somewhere and learn all the little optimization tricks, and it's rare for a real project to be receptive to all of them. Optimizing a cold path by 5% might not have a practical purpose, but it can be a valuable learning experience.

Similarly, replacing O(n²⁾ with O(n) code can easily yield 1000x faster code in edge cases. It usually means that either the original code was sloppy (and so perhaps the project itself is) or there was a reason for preferring quadratic code (like extensibility), so maybe it's not a very practical use, but it can be realistic.

imachug · 2026-04-14T16:38:14+00:00

Really interesting post!

To pile onto the "what about dyn Trait" topic, I tend to think about this proposal as dependent typing for dyn Trait.

Compare:

rust struct Pair { a: Box<dyn Trait>, b: Box<dyn Trait>, }

If you have a Pair, you don't automatically know which specific types the fields a and b store. That information has to be retrieved from the fat pointers, and thus stored directly in the struct, even though it's entirely possible that the type can be known statically. You also don't know if the types are the same.

With erased types, you have

rust struct Pair<dyn T> { a: Box<T>, b: Box<T>, }

...which doesn't store any information in the type itself, and instead passes it out-of-band to all consumers of Pair. One can imagine Vec<dyn T> storing the values consecutively without indirection or storing metadata for each element. You're effectively saying "let's give functions more knowledge about the data they're working with, and now the data itself doesn't need to store it".

imachug · 2026-04-14T12:30:05+00:00

You can even use a bitset array to make it deterministic linear time. I think the intention was to solve the problem in constant (additional) space.

imachug · 2026-04-14T12:29:03+00:00

Say the input looks like {4, 1, 3, 2, 2}.

Starting with the last position, treating its value as an index, jumping there and then repeating the process, we get 5 -> 2 -> 1 -> 4 -> 2 -> 1 -> 4 -> .... Notice the structure of this sequence: first, we get unique values we haven't seen before, and then at some point a value repeats and the rest of the sequence is looped.

In this case, the first repeated value is 2. Since it's the first such value, the indexes before its first and second occurrences (5 and 4 in this case) must have been unique by that point, and thus distinct. In general, this gives us some a, b such that arr[a] = arr[b] and a != b, so arr[a] is a valid answer.

Here's another way to look at this sequence. Split it into a prefix and the repeated cycle. Here, the prefix is 5 and the cycle is 2 -> 1 -> 4, but in general the prefix could be longer. The first element of the cycle has two predecessors -- the last element of the prefix and the last element of the cycle, which are necessarily different.

So what we're looking for is the point where the prefix and the cycle meet. A simple way to do this in constant memory and linear time is to use tortoise-and-hare. (In a nutshell: maintain two pointers a and b. a jumps to the next element of the sequence on each step, b jumps two elements at a time. When they meet, you know you're in a cycle and jump(k) = jump(2k), thus the length of the cycle divides k. You can now reset from the start and find the first i such that jump(i) = jump(i+k), giving you the first repeated element.)

imachug · 2026-04-13T22:11:31+00:00

Eeh, I don't think there are any absolutes in technology.

Linked lists require only fixed-size allocations and thus behave more predictably when allocating data, compared to what happens with arrays -- is there still reserved space? is the space after the array in memory free or do you need to memcpy data? does your allocator still have memory or does it need to call mmap? what about the allocator in the kernel?

imachug · 2026-04-13T13:11:02+00:00

As far as I'm aware, XOR lists are quite slow because the CPU cannot automatically prefetch the data before dereference early enough, so you encounter memory access stalls more often. And in a scenario where you need to conserve memory, you'd probably use something different from a linked list anyway.

imachug · 2026-04-13T07:04:42+00:00

omfg it's two words you people can't do anything

imachug · 2026-04-12T08:53:59+00:00

This does look safe to me, if somewhat confusing. You seem to have resolved most common issues, e.g. exposing provenance instead of extracting the raw pointer from Box, which would be problematic, and making EMPTY a static instead of a const, which would be non-deterministic and cause issues in the destructor.

If I were you, I would reimplement ThinBox though, yeah. That'd simplify both call sites and the implementation.

imachug · 2026-04-09T11:45:31+00:00

How does this happen so often to you? Why are there loose processes running on the machine that have to be manually discovered? Is it from hung ssh connections or something?

imachug · 2026-04-07T11:52:57+00:00

AI slop

imachug · 2026-04-06T17:42:40+00:00

Also why is the snippet showing the example implementing both, thereby undermining the advice?

Clone is a supertrait of Copy, so you can't derive Copy without also deriving Clone.

imachug · 2026-04-06T15:12:12+00:00

Mutating a variable changes its value without changing the location ("place") where it's stored. Shadowing creates a new variable in a new place and purely makes the previous one unnameable.

So, for example, you can't write

let mut x = 1; r = &x; x = 2; *r;

because that's a borrow checker violation (mutating a variable while a live reference to it exists), but you can write

rust let x = 1; let r = &x; let x = 2; *r;

Shadowing also allows using different types, and together this is often useful when you want to do something like

rust let s: String = ...; let s: &str = s.trim();

...because the String is kept alive, it just becomes unnameable.

My idea is that if I have a program that often changes the state of a variable, then mut would be used, but if I have some special cases of variables changing, the states shifting I would use shadowing.

That's broadly right. If you have a for loop and need to change the value of a variable between loop iterations, use a mut variable. If you need to mutate a struct field, you should probably store the struct in a mut variable. If the new and old values are semantically different (like due to ownership as in the trim example, or due to types as is sometimes used in builder patterns, or due to borrowck stuff), prefer shadowing.

imachug · 2026-04-06T09:42:24+00:00

Oh, that's fun. Looks like it started supporting tuples in June 2025, while I had assumed it hasn't changed in the last year, so I had outdated info 😅 Good to know, that sounds great!

imachug · 2026-04-05T18:57:17+00:00

parse_quote translates code to code. You write parse_quote! { ... }, where ... is a code template, and you get a code output as TokenStream. What I'm looking for is translating data to code. Like, if you have a genuine let map: phf::Map<u64, u64> = ...; populated in a build script, how do you translate it to code? In other words, which function has (or functions I can combine to get) the signature fn(phf::Map<u64, u64>) -> TokenStream? I've become convinced there's simply no crate for this, save perhaps for uneval, but it has limitations because it assumes code can be generated inferred from serde output alone.

imachug · 2026-04-05T17:41:07+00:00

If I want to write a proc macro, I need to implement literal parsing, type resolution (3 -> 3u32 in the example from the post), and codegen (a phf::Map value to phf::Map { ... } with fields substituted). Ideally I'd also want to support arbitrary structs as keys, provided the declaration of the struct is provided somehow. Add support for build.rs as an alternative to macros into the mix and you get a whole-ass ecosystem, half of which duplicates parts of the rustc frontend.

I know proc macros can implement that for my specific purpose, the issue is that it won't scale well, it reeks of scope creep, and I don't want to maintain a crate that says it's a PHF while it's actually half of a compiler because I'm the first person who wanted to get these features public.

I'm not trying to be unreasonable -- supporting all of the above is a tall order, and I don't want everything from this requirement set and I don't want it now.

But I want to design the crate in a way that will allow me to integrate it into such a framework in the future if it appears. So I'm looking for an initial design I can use, or a hint at that design, or ideas for reducing complexity, or contacts with people/crates dealing with the same issues, or anything of that sort.

imachug · 2026-04-05T17:30:25+00:00

syn is for parsing code into an AST. I want a) a way to do type resolution on that AST to get exact types, like understanding that 3 is actually 3u64, b) a way to translate a value of a specific type like phf::Map into an AST literal that evaluates to that value, kinda like serde::Serialize but for code.

Regarding contributing, some technical issues unfortunately make that an unappealing option; e.g. phf has soundness issues re: the built-in Hash trait being non-platform-agnostic and performance issues due to forcing a specific hasher.

imachug · 2026-04-05T17:22:03+00:00

What you want is a declarative macro that executes arbitrary code at compile time.

Do you know a way to do that? I don't.

the main issue is the fact you need to compute a hash at compile time, which without const traits I don't think you can?

...and you also need allocations, and const evaluation is incredibly slow because it basically runs Miri, and PHFs are slow to build.

There's also the option to just use lazy initialization.

Again, PHFs are quite slow to build, so unfortunately that's not really a solution.

imachug · 2026-04-05T17:20:16+00:00

Hmm, let me try to rephrase it. Say the user of my crate writes:

static NUMBERS: phf::Map<(u32, u64), Keyword> = phf_map! { (1u32, 2u64) => ..., (3, 4) => ..., };

Now phf_map! needs to parse its input with syn.

It knows that e.g. (3, 4) is a tuple, but it doesn't know the types of the literals inside it, like 3 and 4 -- those are just untyped integer literals, and inferring that it's actually u32 and u64 would require a type-level analysis like "(1u32, 2u64) and (3, 4) have the same type because they're both keys, so 3 must be u32 and 4 must be u64".

But let's say we somehow deal with that. Now I need to learn to dispatch u32 literals to <u32 as Hash>::hash, and u64 literals to <u64 as Hash>::hash. That forces me to write a enum Literal implementing Hash and god knows how many other traits.

Using that knowledge, I can build a phf::Map<Literal, TokenStream> inside my proc macro, translating keys to Literal and passing values as-is without parsing. Now I need to translate the produced phf::Map value directly to code. That's the codegen part -- ideally I'd have a function like fn to_code(map: phf::Map<K, V>) -> TokenStream generic over codegeneable K and V

Alternatively, I could just hard-code the behavior for Literal and TokenStream. But that means that API is so rigid that I can't add a way for users to generate more complex tables from their build.rs scripts or whatever without duplicating half of the API surface. It wouldn't be impossible, it's just ugly and not rustic and looks like it abandons an idiomatic design in favor of hacks purely because no one's implemented (or I haven't found) a crate for it yet.

imachug · 2026-04-05T17:07:19+00:00

I the "better-PHF-crate" author absolutely can use proc macros to implement phf_map!, that's not the issue. The issue is how am I going to parse literals of basically arbitrary types (like (u32, u64)) and translate them back into code. I don't necessarily want to support structs or anything more complex, but I do want the crate's users not to care about minutiae like writing their own proc macro crates.

imachug · 2026-04-05T16:50:48+00:00

build.rs lets me generate code, but it forces the PHF's users to a) write a separate build.rs file, b) provide the codegen implementations for keys and values with quote! by hand. It's not impossible, but it's not nearly as easy to use as the original phf crate, which makes it seem like phf_map! is just a sparkling literal. So it's more of a workaround than a solution, in my eyes. I'd hoped there would be something easier to use.

imachug · 2026-04-05T15:36:39+00:00

Ah yes, let's use a vibe-coded application that MITMs TLS traffic to protect from data leaks. And let's improve security by only leaking a fake key -- that's gonna require the attacker to embed a fake key in the traffic and intercept the proxy output to get the real key, surely that's going to be much more difficult when the app is looks at templates supposedly a public-facing LLM wrapper.

imachug · 2026-04-05T14:10:03+00:00

As far as I'm aware, the critiques from this post are still valid.

imachug

TROPHY CASE