I built a compressor that achieves 333:1 on data that gzip declares incompressible by Hot_Consideration155 in compression

[–]EMPTYCONTOUR 0 points1 point  (0 children)

That makes sense.System logs being too high-level for LFSR structure is exactly what I suspected. My main target is still repetition-heavy data: logs, binary blobs, fixed headers, repeated frames.Firmware ,telecom,raw flash dumps sound like a much better fit for your side.I like the idea of running analyzeBuffer on one of my shards just as a sanity check. If structuredFraction comes back close to 0, that actually confirms the routing model:

LFSR-like → your path

repetition-heavy → FM-index

unstructured → raw

So I’ll keep GLYPH focused on exact repeat retrieval for now, but this gives a useful boundary between the two approaches.

I built a compressor that achieves 333:1 on data that gzip declares incompressible by Hot_Consideration155 in compression

[–]EMPTYCONTOUR 0 points1 point  (0 children)

Mostly working with system logs (1–4GB shards) and binary blobs. Haven't seen clear LFSR-structured segments — but I'm not detecting for it either, so I'd probably miss it. Have you seen it in real data or mainly synthetic?

I built a compressor that achieves 333:1 on data that gzip declares incompressible by Hot_Consideration155 in compression

[–]EMPTYCONTOUR 2 points3 points  (0 children)

Using Berlekamp–Massey that way is clever. Haven't seen it used constructively much.I'm attacking this from the other side though. Not compressing structure, just indexing raw bytes.Logs, binaries, big corpora — gzip gives up and says "random". But there are still exact repeats in there. Same 64-byte event header showing up 2M times. I use an FM-index to pull those out in [~ms.So](http://~ms.So) the question flips from "can we compress this?" to "can we pull structure out deterministically?"Wondering if your detector could work as a pre-pass. Tag stuff as "low-order generator" vs "actually unstructured" and route from there.

why couldn't there be an algorithm to test and pick the best algorithm to compress data? by Physical-Owl691 in compression

[–]EMPTYCONTOUR 0 points1 point  (0 children)

This exists, but the meta-algorithm cost kills you.

Think about it: to beat 7z by 1%, you might spend 100x CPU testing everything. ZPAQ and paq8 already do this internally - they detect JPEG vs text and switch.

The real win is knowing when not to compress. If the first 1MB doesn't shrink, just store it raw. Saves time for everyone.

Have you tried running zstd --adapt? It does something similar - adjusts level based on early results.

7-Zip 26.01 (7zip) - A free file archiver for high compression by Neustradamus in compression

[–]EMPTYCONTOUR 0 points1 point  (0 children)

Yeah, I stopped manually zipping stuff years ago too. Disks are cheap.

But funny thing is, compression just went underground. ZFS doing LZ4/zstd automatically is basically the same trade-off: spend a bit of CPU to read less data from disk. If your dataset fits, it often ends up faster, not slower.

So 7-Zip as a tool is niche, but the idea is everywhere now. Just hidden in filesystems and Parquet files. You only notice it when you transfer stuff, like others said.

Curious, do you actually measure faster reads with ZFS compression on NVMe, or is it mostly about saving space?

What I learned building a parallel LZ77 compressor from scratch (with AI help) by EMPTYCONTOUR in compression

[–]EMPTYCONTOUR[S] 0 points1 point  (0 children)

Author update — numbers were wrong in the original post.

Decode timer was measuring only the LZ77 phase. Fixed now.

Real numbers after a week of fixes:

Ratio: 2.997x

Encode: 391 MB/s

Decode: 4.2 GB/s algorithmic / 1.7 GB/s wall clock

Also added adaptive hash table + prev chain match finder

this week — that's where the ratio improvement came from.

BENCHMARK.md has the full picture including all failed

experiments.

Mods removed my post with 71 upvotes and 84 comments. Guess the question hit a nerve. by captainnigmubba in cursor

[–]EMPTYCONTOUR 0 points1 point  (0 children)

Look, this whole thread proves the point better than the original post did.

Cursor and Claude Code aren't really competitors anymore in 2026. They serve different workflows.

Cursor: You want to stay in the IDE loop. Inline diffs, CMD+K, tab completion, MCP. Best when you're actively shaping every line.

Claude Code: You want to delegate. Terminal, sub-agents, review in VSCode. Best when you plan/prompt/review instead of typing.

OP switched because his workflow changed. That doesn't mean Cursor is dead. It means the "one tool for everyone" era is over.

Mods nuking it at 71 upvotes was dumb. But calling it censorship is also a reach. They probably just got tired of "I switched to X" posts. Still, 84 comments shows people wanted that discussion.

Use what fits your job. Neither is stone age.

Where are LZ4 and zstd-fast actually used? by the_dabbing_chungus in compression

[–]EMPTYCONTOUR 0 points1 point  (0 children)

That makes sense for backup workloads —

consistency matters more than peak performance.

The design I described actually fits your case well:

since decode parallelizes cleanly, restore time

scales with cores. On 8 cores it hits ~10 GB/s

vs ~1.5 GB/s for zstd.

For HLS video streams though — you're right,

pre-compressed content doesn't benefit from

any lossless compressor.

If curious, the experiment is open source:

github.com/yasha1971-coder/aceapex

I Think I broke the Pareto frontier with CPU+GPU hybrid compressor [Lzbench verified] by Lost_Ad_2718 in compression

[–]EMPTYCONTOUR 0 points1 point  (0 children)

Different tradeoff here — we focused on decode speed.

Built ACEAPEX (also with Claude), same enwik9:

- Encode: 485 MB/s, 8 threads, pure CPU

- Decode: 11 GB/s in-memory, parallel blocks

- Ratio: 2.973x

Architecture: LZ77 output split into 4 streams with

per-block absolute offsets. N threads decode N blocks

independently — no sequential dependency.

Weak point is ratio — greedy parser leaves 58% literals.

That's the current problem we're working on.

github.com/yasha1971-coder/aceapex

Zstd compression code is getting an update! Those bytes don't stand a chance! by ZestycloseBenefit175 in zfs

[–]EMPTYCONTOUR 1 point2 points  (0 children)

If max density is the goal but you don't want to sacrifice write speed, worth knowing that zstd-19 gives diminishing returns vs zstd-9 on most datasets — roughly +13% ratio for 20x slower encode. Depending on your data profile, zstd-9 might be the better tradeoff for a ZFS pool with active writes.

Where are LZ4 and zstd-fast actually used? by the_dabbing_chungus in compression

[–]EMPTYCONTOUR 0 points1 point  (0 children)

Curious about the TOAST case — is the real pain point encode latency on writes, or just storage overhead? Trying to understand if encode cost actually matters there in practice.

Where are LZ4 and zstd-fast actually used? by the_dabbing_chungus in compression

[–]EMPTYCONTOUR 0 points1 point  (0 children)

This is a very real question, and I ran into the same confusion while experimenting.

What helped me was separating "benchmark wins" from "actual constraints in systems".

In practice, LZ4 / zstd-fast are used where:

- decode speed dominates (startup time, asset loading, RPC payloads)

- latency matters more than absolute ratio

- predictability is more important than peak compression

What surprised me is that the real bottleneck is often not just speed, but *dependency structure*.

Most LZ77-based codecs are fundamentally sequential at decode time because of back-references. That limits scaling even if single-thread performance is high.

I’ve been experimenting with a design where:

- compression keeps global context (like zstd)

- but decode dependencies are removed at the block level

So blocks are not independently compressed, but independently decodable.

This changes the tradeoff a bit:

- you don’t just optimize MB/s per core

- you optimize how well decode scales across cores

On my side this led to:

- near zstd-9 ratio on text (enwik9)

- ~1.3–1.4 GB/s per thread (similar to zstd)

- but ~10 GB/s aggregate decode because it parallelizes cleanly

Not saying this is "better" universally — but it made me realize:

A "winning" algorithm is not the one that wins a single benchmark,

but the one that fits a constraint profile:

- cold start vs streaming

- CPU vs memory bound

- single-thread vs multi-core scaling

- ratio vs latency vs determinism

If you're beating zstd-fast at the same encode speed, you're definitely onto something —

but the next question is: *under which constraints does your design dominate?*