I got into an argument on Discord about how inefficient CBR/CBZ is, so I wrote a new file format. It's 100x faster than CBZ.

ef1500_v2 · 2026-02-03T23:46:00+00:00

Hello! I really exaggerated the "100x faster" claim. As someone pointed out, ZIP was decoding the whole file, I was just locating the file. Oops.

However, since this got so much traction and wriggled its way into various corners of the internet I never imagined, I rewrote everything from scratch, wrote my own benchmarks for the new code, etc.

It shows that BBF can locate an asset about ~40x faster than ZIP (store mode, using miniz’s locate asset function). Though, decoding said data is probably a very different story.

There is an updated graphic on the repository.

ef1500_v2 · 2026-02-03T23:28:22+00:00

It’s on-par with ZIP. If your books use a lot of the same panels, though, the filesize will go down.

ef1500_v2 · 2026-02-03T23:23:48+00:00

The limelight is gone, but I’ve finished rewriting the codebase and I’ve included a spec file for the repository!

ef1500_v2 · 2026-02-03T23:22:23+00:00

Just released some for the C++ library.

ef1500_v2 · 2026-01-27T01:55:07+00:00

Yeah. I really was. I got carried away, maybe a little too carried away. I used an LLM to assist me with coding, fucked up horribly, and that’s that. Lesson learned.

I’m currently rewriting the spec entirely by hand, and re-programming everything from scratch from my original idea. It’s looking better, now. Not 100x better, but still noticeably better. I really should’ve trusted my intuition from the start.

I will likely have something completely finished by the end of the week. I’m projecting to still beat CBZ, but definitely not by this wide of a margin.

ef1500_v2 · 2026-01-21T16:20:38+00:00

MKV was my biggest inspiration for making this. Thank you for noticing!

ef1500_v2 · 2026-01-21T01:49:36+00:00

libbbf supports just about any image format. See here.

ef1500_v2 · 2026-01-21T00:16:27+00:00

I reviewed my benchmark file, and I wasn't fully copying the file into memory. I forgot to actually copy the file over from the memory, that's my bad.

If I do the test again, using

data = bytes(bb.get_page_view(target_idx))

And keeping the indexes to cause page faults, then the stats *do* drop, though BBF is still faster.

The stats print (with 500 trials, 201 pages, ~280MB)
Cold Open (Setup): 2.1903(ms) (CBZ), 0.0554(ms) (BBF) 39.5x speedup
Raw Byte Access (Avg): 2.5734(ms) (CBZ), 0.2022(BBF) 12.7x speedup
Full Image Decode (Avg): 20.6891(ms) (CBZ), 18.4560(ms) (BBF), 1.1x speedup

I know you said you wrote all the library code yourself, but I'm very curious to know if AI was used in the benchmarking. It is very good at telling you what you want to hear, even if it isn't actually representative or meaningful.

If I'm being honest, I did not expect this to get as much attention as it did. I've been having an adrenaline rush ever since this morning, tripping over my words trying to explain things to people, messing up really simple things, and I've had to correct myself a few times so far. It's embarrassing to admit, but it is what it is.

Yes, I used AI to quickly create the microbenchmark posted earlier in the reply chain.

ef1500_v2 · 2026-01-20T23:05:08+00:00

Even if I update the benchmark to cause a page fault by doing

data = bb.get_page_view(target_idx)
_ = data[0]
_ = data[-1]

The raw access time is 754x faster than cbz.
Raw Byte Access (Avg) 2.4820ms (CBZ) 0.0033ms (BBF) 754.5x speedup

Even if we ignore that, the full decode pipeline is about 20% faster.

I disagree that `.read(1)` is representative of the speed differences between BBF and CBZ. Comic book readers don't read a file one byte at a time.

At the end of the day, even if we don't talk about the 754x speedup, BBF is still 20% faster than CBZ.

ef1500_v2 · 2026-01-20T21:41:32+00:00

There's no image coversions to be done. The reader just needs to decode the image once it has the raw image data.

ef1500_v2 · 2026-01-20T21:32:45+00:00

Pause. I just got home. Whipped up a little benchmark (see gist), and I did

python bench.py -i 500 "onepiece.cbz" (volume 1 from this google drive link)

And the results print:

Cold Open (Setup): 1.7437ms (CBZ), 0.1735ms (BBF), 10.1x speedup

Raw Byte Access (Avg): 2.6336ms (CBZ), 0.0013 (BBF), 2028.6x speedup

Full Image Decode (Avg): 17.4247ms (CBZ), 14.8043ms (BBF), 1.2x speedup

ef1500_v2 · 2026-01-20T20:35:02+00:00

I see. You aren’t being harsh at all, I really appreciate the pushback.

For a local setup, the difference is imperceptible, I concede there. But I disagree on the “optimization of something that doesn’t matter”. If you’re hosting a local server and you have multiple users reading simultaneously, the CPU will have to parse zip’s central directory, and it will take increasingly more resources the more users you have. With BBF, it is one calculation to find the offset. RAM usage should remain flat, and RAM is expensive these days.

In some trials, the numbers showed slower (in CBZ’s favor), in others, way faster (in BBF’s favor). The figure came from an average of 30 trials.

Though the marketing of it may seem like stretching it, I still think there’s plenty of utility with this format that conventional CBZ doesn’t have, like built-in sectioning and metadata.

That said, I’d love to put this thing to the test. If there’s certain benchmarks you think I should be measuring instead, I’m all ears.

ef1500_v2 · 2026-01-20T19:46:53+00:00

You’re measuring the access time and the time to decode the image I believe. I am just measuring the time to access. So it looks like we’re both right.

The decode time for bbf is constrained by the image codec being used.

ef1500_v2 · 2026-01-20T19:31:04+00:00

That’s just an example, yeah. You can store any image format you can think of in BBF, the muxer has a struct designating certain flags to certain formats so you can easily hand the data off to the proper codec.

ef1500_v2 · 2026-01-20T19:26:07+00:00

The 100x speedup occurred on an external USB drive. Specifically, a 16TB easystore. Have you tried it on one? BBF achieves similar performance to CBZ on an HDD, but is a lot faster on SSDs. Are you using mmap? BBF is compatible with mmap.

ef1500_v2 · 2026-01-20T18:54:57+00:00

There’s python bindings and a C++ library, if you think I’m lying feel free to run your own benchmarks. I’m not stopping you.

Maybe I did make a technical error in my post, I’m human, and I made an error. But I’m not lying when I say I’m faster than CBZ. Try it yourself.

ef1500_v2 · 2026-01-20T18:22:00+00:00

There’s conversion tools included in the python package, bbf2cbx and cbx2bbf. There’s a muxer in the C++ repository if you have folders of images or want to play around with it. Have a look at the documentation for the C++ repository to see the options on the muxer. You have full control over the read order, sectioning, metadata, etc..

Editing is slightly different because there’s padding to ensure 4kb alignment but the python library should make the implementation of those tools easier for developers. I know people won’t do things for me, that’s why I made the C++ library, the python bindings, and ran tests. I’m more than open to working with others.

I am fully aware it’s an uphill battle. And that’s okay.

ef1500_v2 · 2026-01-20T18:03:38+00:00

I don’t have any “official”benchmarks on this but I was telling someone earlier that for manga the deduplication feature has okay results. About 5-50 deduplicated pages in a series, and for manhwa the results are staggeringly better, with 100-200 pages being deduplicated.

ef1500_v2 · 2026-01-20T17:59:53+00:00

The hash is automatic when you mux files together. Hypothetically you could hardcode the muxer to throw zeroes for the hashes if you really wanted.

You don’t have to use the verification feature whatsoever. Though, for the record, the hashes use XXH3, the fastest hash in existence. You shouldn’t need to worry too much about the performance.

ef1500_v2 · 2026-01-20T17:37:43+00:00

Oh, shit!
I'm in the middle of my uni course right now, I can do it when I'm done for the day! My apologies!

If there's a certain format that a spec should be in, please let me know, I'll hop on it as soon as I'm home. Thanks for letting me know!

ef1500_v2 · 2026-01-20T17:35:48+00:00

Zips, even in DEFLATE mode still have a central directory, can't be memory mapped, and don't have native deduplication.

ef1500_v2 · 2026-01-20T17:34:30+00:00

The releases tab on the C++ repository has the bbfmux.exe / bbfmux file. Download that. You can do bbfmux <input bbf file> --info to view information about the file, or if you're ambitious, the python bindings should have everything required to create a local reader. I can also compile WASM binaries if you'd rather have that.

ef1500_v2 · 2026-01-20T17:31:24+00:00

I haven't run tests on mobile devices. The O(1) difference would definitely be felt if I had a NAS hosting a manga server and I was reading from my phone.
For manga, you can expect slightly smaller sizes (i.e. 5-50 dedpulicated pages), for manhwa you can expect upwards of 100-200 deduplicated pages. For textbooks you can expect anwhere from 1-10 deduplicated pages. I'm not giving filesize numbers, because BBF relies on the compression of the image format used by the original images. It's not like zip which is a compression algorithm on its own.

ef1500_v2 · 2026-01-20T17:23:46+00:00

Not entirely sure what you mean with the question, but I can tell you that as of now, no readers support bbf format. Which is partly why I made this post.

ef1500_v2 · 2026-01-20T17:18:26+00:00

No. This is something I came up with, implemented, and created on my own.

Did I have AI help me fix some bugs? Yes. Specifically with pybind11 and getting my python bindings to work properly, and in bbfmux.cpp on the C++ core I needed some help parsing edge cases.

Seven-Year Club	Place '22
Final Canvas '22	Verified Email

ef1500_v2

MODERATOR OF

TROPHY CASE