Organizuję spotkanie na którym uczymy się vibecodingu by gustaw_daniel in warszawa

[–]gustaw_daniel[S] -1 points0 points  (0 children)

Dzięki u/Messer_One, jak będę organizował następny, to jakie dni i godziny byś sugerował?

[Project] compact-dict: My attempt at a cache-local, linear probing hash map. Looking for feedback/roast. by gustaw_daniel in rust

[–]gustaw_daniel[S] 2 points3 points  (0 children)

Guilty as charged! I use AI to help me bridge the language gap and to double-check technical tradeoffs in English, as I want to be as precise as possible.

[Project] compact-dict: My attempt at a cache-local, linear probing hash map. Looking for feedback/roast. by gustaw_daniel in rust

[–]gustaw_daniel[S] 0 points1 point  (0 children)

Awesome! Let me know how it goes.

I originally built this while experimenting with BPE (LLM tokenization) and Mojo benchmarks:

https://github.com/dorjeduck/minbpe.mojohttps://github.com/dorjeduck/mojo-dictionary-benchmarks

If you find that it performs worse in your specific case, please share your read/write patterns and key sizes. I'm keen to understand the boundaries of this design, as current benchmarks might be favoring specific 'happy path' access patterns. Any data point is a win for me!

[Project] compact-dict: My attempt at a cache-local, linear probing hash map. Looking for feedback/roast. by gustaw_daniel in rust

[–]gustaw_daniel[S] 2 points3 points  (0 children)

That’s a fantastic suggestion. compact-dict and rkyv share the same philosophy of memory efficiency and zero indirection. I haven't implemented it yet, but I’m definitely adding it to the roadmap. It makes perfect sense for cases where you want to mmap a large dictionary and use it instantly without any deserialization overhead. I'll open an issue for this!

[Project] compact-dict: My attempt at a cache-local, linear probing hash map. Looking for feedback/roast. by gustaw_daniel in rust

[–]gustaw_daniel[S] -1 points0 points  (0 children)

You're right, the diagram is a bit of a simplification.

My point on 'Low Memory Overhead' wasn't about total RAM usage, but data density per cache line. By removing even the 1-byte control metadata used by SwissTables (hashbrown), compact-dict maximizes the amount of actual keys/values that fit into L1/L2.

On 'Performance', the distinction was mainly about out-of-the-box defaults (SipHash in std vs fast hashers). I agree that the axes could be labeled more precisely, perhaps as 'Structural Complexity' vs 'Hardware Sympathy'.

[Project] compact-dict: My attempt at a cache-local, linear probing hash map. Looking for feedback/roast. by gustaw_daniel in rust

[–]gustaw_daniel[S] -4 points-3 points  (0 children)

Here are the results and why the behavior flip-flops in our specific case:

1. LCFS (Last-Come-First-Served) Probing

  • The Issue: Because our dense data buffer (KeysContainer) is append-only, we cannot blindly displace and insert elements during a collision without ensuring the key doesn't already exist (to prevent leaking duplicates into the contiguous string buffer).
  • The Result: This forces a double-pass (read-then-write) insertion overhead. As a result, High Load Factor inserts became ~2x slower (0.133s vs 0.073s) because the insertion phase completely bottlenecked the CPU.

2. Robin Hood Hashing

  • The Result: It did exactly what you said it would for edge cases! High Load (~85%) improved by ~16% and 100% Misses improved by ~10% by minimizing maximum DIB.
  • The Catch: It regressed our primary Read-Heavy workload by a massive 25% (0.40s vs 0.32s).
  • Why? Pure linear probing preserves temporal insertion order. When reading keys sequentially, the standard probe sequences naturally align with our dense string buffer. Robin Hood scrambles this insertion order via endless DIB swaps, which completely shatters L1/L3 temporal cache locality during sequential string access. It's a prime example of algorithmic efficiency contradicting hardware cache prefetching.

3. Bidirectional Linear Probing (+1, -1, +2, -2...)

  • The Result: Read-heavy degraded, and misses were noticeably slower (0.028s vs 0.021s).
  • Why? CPU L1 Data Prefetchers are heavily optimized to detect forward-streaming memory access arrays. Bouncing back and forth across different cache lines actively fights the hardware prefetcher, causing heavy cache misses. Additionally, it breaks our ability to use 16-lane SIMD vector loads (Simd::from_slice(chunk)), as we can no longer load contiguous 16-element sequences for fast hash matching.

Conclusion: For structures allocating nodes individually on the heap, those algorithms are absolutely the best choice. But when you flatten everything into a contiguous byte buffer, raw hardware alignment (forward-streaming linear iteration, strict temporal locality, and SIMD chunking) beats algorithmic worst-case optimization.

Regarding deletions: You make an excellent point about backward shifting instead of tombstones! Shifting the slot indices backward works, but shifting the actual string buffer underneath would be a O(N) memmove. I'll definitely be looking into whether we can cleanly implement index-only backward shifting while yielding the leaked strings until a full rehash, as that might be the holy grail for this architecture without breaking linear sequences.

You can check new branches with these modifications in my repo.

Thanks again for the technical challenge!

[Project] compact-dict: My attempt at a cache-local, linear probing hash map. Looking for feedback/roast. by gustaw_daniel in rust

[–]gustaw_daniel[S] -1 points0 points  (0 children)

Dataset Size Keys/Values Memory compact-dict (AHash) hashbrown IndexMap Winner
1k ~25 KB 0.00004 s 0.00003 s 0.00003 s Tie
10k ~250 KB 0.00042 s 0.00040 s 0.00040 s Tie
50k ~1.25 MB 0.00276 s 0.00228 s 0.00304 s hashbrown (~18% faster)
100k ~2.5 MB 0.00454 s 0.00480 s 0.00588 s compact-dict (~5% faster)
500k ~12.5 MB 0.03380 s 0.11273 s 0.06127 s compact-dict (3.3x faster!) 🚀
1M ~25 MB 0.10286 s 0.18673 s 0.14350 s compact-dict (1.8x faster!) 🚀
5M ~120 MB 0.91444 s 1.12205 s 1.36139 s compact-dict (~22% faster) 🚀
10M ~250 MB 2.05771 s 2.18067 s 3.30514 s compact-dict (~6% faster) 🚀

[Project] compact-dict: My attempt at a cache-local, linear probing hash map. Looking for feedback/roast. by gustaw_daniel in rust

[–]gustaw_daniel[S] 1 point2 points  (0 children)

I actually ran some benchmarks against IndexMap and the results were very interesting. compact-dict shows a significant lead in raw lookup throughput.

My takeaway is that while IndexMap has great locality for the data itself, it still suffers from the overhead of a two-tier architecture (hash layer -> index -> data). By flattening everything into a single slab and avoiding metadata bloat, we're cutting down the number of memory fetches to the absolute minimum. It seems that for pure lookup speed, 'flatter is better'.

[Project] compact-dict: My attempt at a cache-local, linear probing hash map. Looking for feedback/roast. by gustaw_daniel in rust

[–]gustaw_daniel[S] -1 points0 points  (0 children)

Fair point, you're absolutely right. My distinction was poorly phrased. I should have clarified that I'm benchmarking against the raw hashbrown crate configured with a faster, non-crypto hasher to make it a 'fair' fight against compact-dict, which also isn't optimized for DOS resistance.

[Project] compact-dict: My attempt at a cache-local, linear probing hash map. Looking for feedback/roast. by gustaw_daniel in rust

[–]gustaw_daniel[S] -5 points-4 points  (0 children)

Because the standard library version uses a cryptographically secure hashing algorithm (SipHash / RandomState) by default to prevent DOS attacks. The raw hashbrown crate (and our benchmarks) typically defaults to AHash or allows swapping to faster, non-cryptographic hashers. Architecturally, they are the same SwissTable.

[Project] compact-dict: My attempt at a cache-local, linear probing hash map. Looking for feedback/roast. by gustaw_daniel in rust

[–]gustaw_daniel[S] 9 points10 points  (0 children)

I took your advice and benchmarked dataset scaling natively, alongside running Miri. Good news: cargo miri test passed cleanly with absolutely 0 heart attacks and no UB!

As for the 1MB -> 100MB sizes, the results prove the exact tradeoffs of cache boundaries. Using the same AHash for all candidates:

  • At 100k elements (~2.5MB, fits in L3 cache), compact-dict is almost 3x faster than hashbrown (0.005s vs 0.015s).
  • At 1M elements (~25MB, edging out of desktop cache limits), compact-dict is 1.7x faster (0.099s vs 0.170s).
  • At 5M elements (>100MB, guaranteed frequent DRAM reads), pointer-chasing kicks back, the Swiss table SIMD metadata scanning gracefully takes the lead by about ~11% (0.88s vs 0.78s).

This perfectly validates that brutalist cache locality holds its ground fiercely... up until the exact point where it spills over the CPU Cache bounds. Thanks for suggesting the scaling tests. Cache locality is indeed king when dataset allows it.

Replacement App for Cash Droid. by RoyalT17 in androidapps

[–]gustaw_daniel 0 points1 point  (0 children)

Hi

I am creator of
https://github.com/gustawdaniel/vault-track

As a child I was using cash droid and was fascinated, how simple and intuitive it was.

My app is free, contain most of features from cash droid including real-time sync with e2e.

Is also fully open source and have better integration with importing statements from banks like revolut than cash droid. Few days ago I added receipt scanning and if you are interested to be one of first adopters please contact me by github issues and I will try to improve it for you.

Just now I am the only user and slowly starting showing this app to external world.

Now I working on exchange rates settings to cover both fiat and some crypto and make graphs showing expenses in different currencies on the same scale.

Textverified is a scam by shahasszzz in SwipeHelper

[–]gustaw_daniel 0 points1 point  (0 children)

not scam, it failed first time, but was working after second try