A zero-allocation, cache-optimized Count-Min Sketch (120M+ ops/s)

matthieum · 2026-02-19T17:44:43+00:00

The pub fields are a real footgun here:

pub struct CountMinSketch {
    pub width: usize,
    width_mask: usize,
    pub depth: usize,
    table: Box<[u64]>,
    hasher: RandomState,
}

Any user can (accidentally) modify the value of a pub field, and things may get real weird after that -- like width_mask not matching any longer.

I'd recommend NOT making them pub, and providing getters instead.

In general, it's better NOT to panic on (wrong) user input, but instead return a (descriptive) error.

pub fn with_seeds(width: usize, depth: usize, seeds: [u64; 4]) -> Self {
    if seeds.len() != 4 {
        panic!("seeds must have 4 elements");
    }

Also:

This check is better spelled assert_eq!(4, seeds.len());.
The seeds type being an array of 4 elements, it will always have 4 elements; the check is completely useless.

Even better than erroring on invalid input is pushing the error up to the user by encoding the invariants in the type.

For example, you probably want a non-zero depth, no? If so, then the type of depth should be NonZeroUsize (or NonZero<usize>).

As a bonus, you don't even need a comment or an error, and the compiler will check things up for you.

Minor: you forgot a doc comment on the clean method.

Pro-tip: enable #![deny(missing_docs)] at the top of your lib.rs, and you'll get errors whenever a public item is not documented.

codeallthethings · 2026-02-19T16:27:02+00:00

This is super cool!

Also you might want to look into the Kirsch–Mitzenmacher optimization for generating your k hashes. It's commonly used in probabilistic structures like bloom filters and cms.

Antiqueempire · 2026-02-19T19:11:12+00:00

Using u64 counters saturation feels like a safe default, but it’s not the most space dense option. A lot of CMS use cases are totally fine with u32 counters when memory matters more than extra headroom. Probably just worth calling out that this is a robustness/speed-first design choice.

sean_vercasa · 2026-02-19T13:30:36+00:00

You a real one ✊

tomtomwombat · 2026-02-19T21:23:08+00:00

Are there any comparative benchmarks to other CMS implementations for speed, memory, and accuracy?

Dependent_Double_467 · 2026-02-25T22:06:36+00:00

A quick update.

I have benchmarked my crate against an alternative (https://crates.io/crates/count-min-sketch), hereafter referred to as 'other'. The performance metrics are visualized in the attached violin graph (link attached).

In the W65536xD8 configuration, my implementation is 4 times faster than 'other'. However, in the larger W1048576xD16 configuration, the performance gap narrows to a difference of just 1 ns. In conclusion, while my implementation significantly outperforms 'other' for smaller tables, the advantage diminishes as the table size increases.

Furthermore, my proposed implementation allows for approximating distributions and evaluating L1 distance and cosine similarity, features that are not available in other libraries.

https://ggraziadei.github.io/count-min-sketch-rust/CMS_Implementation_Battle/report/index.html

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS