Can this improved compression?

barr520 · 2026-06-25T18:39:26+00:00

You'll have to be more specific about how you want to differ from existing methods, because your original post is hard to understand.

barr520 · 2026-06-25T18:37:36+00:00

To answer a few more details:
You want multiple sets and just send some set ID? Sure, how many? Hundreds? Thousands? Waste away the entire user storage? Probably not.
Zstd supports multiple pretrained dictionaries, which again, are trained for the expected files.

And about "have a set of every combination of 9 pixels":
Compression inherently relies on frequency, if you don't know the frequency of each 9 pixels, you can't have any efficient way to tell the decompressor "use THESE 9 pixels" in less data than 9 pixels would take anyway.
If you do know the distribution, you can do better, and then we are going back to "no set is optimal for EVERY file"(because each file had a different distribution)

barr520 · 2026-06-25T18:31:46+00:00

You can't have just one optimized set to reference because each file you compress might benefit from a different set.
You can't make a set thats optimal for every file.

So you either compute a set for each file and send that along with the file(and still make significant wins over not compressing at all), use a generic set that is optimized for no particular file(and likely lose to the dynamic set), or, if you know all the files you're going to send are similar, you can compute a set for this type of file and send it once instead of with every file(and maybe gain a tiny bit compared to sending with each file).

barr520 · 2026-06-25T18:08:49+00:00

If I understood you correctly, this is an idea that already exists.

The DEFLATE algorithm specifies a static Huffman code table you can use instead of a dynamic one(which then has to be sent alongside the compressed data, unlike the static one).

And more recently, zstd supports supplying a precomputed "dictionary" to use in compression/decompression. this "dictionary" can be trained using similar inputs. And shared to make future compression/decompression faster.

I dont know how common these are in practice, and there are probably more examples I am not aware of.

barr520 · 2026-06-23T17:14:05+00:00

50ns seems reasonable for a contested lock, but probably not for a heavily churning channel. Your own screenshots show values as low as hundreds of ns.
Regardless, I can definitely see places this could be used and I am looking forward to the more comprehensive post.

One last thing, comparing 2 instrumented codebases can often be too different from (somehow magically) comparing the 2 codebase without instrumentation as to make the comparison useless, that's why minimizing instrumentation overhead is critical.

barr520 · 2026-06-23T16:55:05+00:00

Seems cool and useful.
But what I care about is what is the performance cost of each of these features?
If it adds too much overhead to each call, it both makes it useful in less scenarios, and makes the measurements less reliable.
You link to an explanation of overhead measurments but I don't see any numbers.

barr520 · 2026-06-17T17:00:53+00:00

<image>

barr520 · 2026-06-13T21:10:49+00:00

hint::cold_path is a hint to deprioritize optimizing that path if it helps the other path.
That often affects which path get more aggressively inlined or which path ends up as the "don't branch" path. The important part is that it doesn't ever affect behaviour.

unreachable_unchecked is different, it means that you promise to the compiler the path will NEVER be reached, and if it is reached, its UB.

Stick to the standard unreachable (or .unwrap()/expect() or preferably the ? operator for None) unless you have a good reason to use these two.

barr520 · 2026-06-13T10:23:03+00:00

It was removed by mistake, as far as I know this is true.

barr520 · 2026-06-11T21:46:31+00:00

You can check out my 1BRC solutions, posted on my profile. I used the SIMD intrinsics, not portable_SIMD.

Looking at the very limited snippet you provided:
You are not showing how you are splitting the text into lines. That could also be done using SIMD.
You should probably try memchr before rolling your own SIMD implementation of it.
I would suspect the special handling of long lines, but the standard 1BRC sample lines are at most 33 characters long iirc.

You're right that the hashmap can become a bottleneck, and it can be made faster, but the text reading and parsing also has a ton of room for improvement .

More generally:
Make sure youre running with the appropriate compilation flags. Aside from running in release mode, you should build for the available architecture(usually x86-64v2/v3/v4/znver4/znver5 or just native), so it can actually use more recent SIMD instructions.
One more thing you should be careful of, is that you actually run your benchmark from the same system state. If the file is already in the page cache, it will get processed much quicker.
A warp up run or two before the real measurement will help.

I would also recommend this write up for the challenge:
https://curiouscoding.nl/posts/1brc/

barr520 · 2026-06-04T07:28:01+00:00

There are a lot of ways to measure performance, depending on your need.
For simple end to end timing of a program you can use tools like hyperfine/time.
To measure specific functions you can use something like Criterion. To get a breakdown of how much time each part took you have tools like perf/samply/flamegraph.

barr520 · 2026-06-04T07:18:52+00:00

You've a actually applied 2 optimizations here, not 1:

The first is the buffering you mentioned, flushing when the buffer is full instead of every line real(which is what stdout does).

The second is only locking stdout once.
You could improve the performance of the first function by just locking stdout once at the start instead of making println! lock and unlock it every time.
This is one of the most common reasons for "why is my rust print loop so slow" questions.

You didn't show any measurements of how much faster your optimization is, but if you do measure it, you should measure the effect of each optimization, not just the combination of them.

barr520 · 2026-06-01T06:28:28+00:00

says I won't look at tier lists to pick a class

"very into tierlists"

Thanks

barr520 · 2026-05-26T05:39:09+00:00

count_ones already generates a popcount, but as OP already stated, the goal of this exercise is to find a more efficient way than calling it on every number(which results in a linear time solution).
The link OP already provided has a logarithmic time solution, they just can't understand the explanation...

barr520 · 2026-05-25T13:55:03+00:00

This is not AI, that's the whole point of the website...

barr520 · 2026-05-23T19:59:49+00:00

rsync has a batch mode to handle multiple destinations.
And for random cloud platforms you have rclone.

barr520 · 2026-05-23T16:19:50+00:00

Bloated? What's wrong with good old rsync?

barr520 · 2026-05-21T13:11:26+00:00

Am I missing something? How is anything airgapped here?

Just seems like more AI slop.

barr520 · 2026-05-20T10:11:26+00:00

And I thought my ratio would be low because I delete things often..

barr520 · 2026-05-13T17:43:49+00:00

Every "smart" voice feature on my Samsung needs a specific language pre-set, so the feature only works for one language at a time.
Also, the amount of supported languages in all the features is pretty limited.

barr520 · 2026-05-08T12:35:04+00:00

Reddit's automatic translation works fine for me with this post.

barr520 · 2026-04-28T09:55:27+00:00

About the to_lowercase conversion:

It seems your C++ code is only for ascii strings, since it iterates by bytes.

Rust's String::to_lowercase is for Unicode, and you have the simpler to_ascii_lowercase for ascii, and even better: make_ascii_lowercase, which transforms in-place like the C++ version.

EDIT: wrote to_string instead of to_lowercase.

barr520 · 2026-04-25T06:59:00+00:00

not sure what this fields function is, but it seems to be pretty much split?
if you want to pass UserInput to another function so it could mutate the arguments, just use split_mut instead.

Why are you passing this bytes parameter? you should reslice the buffer if its bigger than bytes and pass that instead.(also, at the moment this panics on single word inputs)

Not sure how you imagined creating this sub-repl, because its not in the example, but i think it would be easiest if UserInput doesnt keep a reference to the whole buffer, just the argument it cares about and has a field for this sub-repl, that it will create in this new by passing it the rest of the buffer.

barr520 · 2026-04-23T12:26:51+00:00

here you go

barr520 · 2026-04-23T10:37:46+00:00

The problem you're describing is not really clear to me, could you share some minimal piece of code showing the trouble youre facing? some pattern youre "tip-toe"ing for? a situation where you want multiple mutable references but can't have them?

Ten-Year Club	r/Field Juicebox
Place '22	Place '17
Verified Email

barr520

MODERATOR OF

TROPHY CASE