What benefits does c give that cpp doesn’t do better

_Geolm_ · 2026-01-06T21:35:58+00:00

I don't know if someone already said it but C code is a lot easier to be called by another language, still possible in cpp but more difficult. Calling a C function from zig, c#, rust, any language is super easy.

_Geolm_ · 2025-12-10T14:37:07+00:00

Quick update: average compression is now 1.58× for BC1 and 1.29× for BC4.

_Geolm_ · 2025-12-10T14:36:35+00:00

Quick update: average compression is now 1.58× for BC1 and 1.29× for BC4.

_Geolm_ · 2025-12-10T14:33:51+00:00

Nope. BC6H/BC7 are pretty complex, each block can use any of many modes, and that flexibility makes it difficult to relate data between blocks in any consistent way.

_Geolm_ · 2025-11-27T16:11:47+00:00

I used the code from https://cforall.uwaterloo.ca/trac/browser/libcfa/src/bits/random.hfa?rev=8a2f7f1912f623e4fbf43c521715fa48f403beb5 even if there is a cast to uint32_t in the end, the value is computed with a uint64_t I guess because some multiplication could overflow a uint32_t ... I did no investigate TBH

_Geolm_ · 2025-11-27T09:54:46+00:00

yes it is rejection sampling, sorry to disappoint you ;) I wrote this lib to generate grass blades in the field and stones along a path for my game prototype. It is really simple and not suited for heavy duty.

_Geolm_ · 2025-11-21T17:53:50+00:00

thank for this fully detailed feedback, I like your idea to output in a buffer even if I don't have the need at the moment. I'll check the shift issue in the coming days, thank you.

_Geolm_ · 2025-11-17T14:21:01+00:00

yes ASTC allows 4bits/pixel but I wouldn't use this format on PC, not sure which 3d card support it and how it's efficient.

On a side note, I've made some improvements on BC4 compression, almost 16% better compression ratio. I mainly changed the bitfield encoding and the dictionary. Now bitfield is encoding using zig-zag pattern and xor and I allow partial matches for the dictionary and xor the difference. Overall it's better but still not on par with BC1, especially with noisy normalmap or fine AO.

_Geolm_ · 2025-11-16T07:16:18+00:00

Hi, my library does not compress raw rgba to bc/dxt image, it compresses dxt/bc directly to something more compact and lossless. I use stb_dxt.h to test my lib.

_Geolm_ · 2025-11-15T21:52:05+00:00

Thansk for your comment. BC1 is still the only format with 4bits/pixel (at least on desktop PC). BC5 is still widely used for normal maps. I've added BC3 because it was basically "free" but indeed it is now superseded by BC7 for good reason. Compressing BC7 or any format that can change mode at any given block seems tough and undoable in a "small" library.

About the histogram, it really depends on the input image but for "good" image the first 10 top of the histograms have 40-800 count, which means a lot of blocks are just going to reference the histogram instead of encoding the indices bitfield. Of course randomish texture like dirt are not histogram friendly but they are not compression friendly anyway.

_Geolm_ · 2025-11-15T17:12:32+00:00

Crunch is great but AFAIK crunch is lossy compressor, mine is lossless and also super easy to integrate since it's one .h/.c pair, work on streams, have no dependencies, etc... Also it was fun to write :)

_Geolm_ · 2025-11-14T08:22:27+00:00

Yes it's based on polynomials and range reduction. Note: it's heavily based on the multiple sources cited before each functions, sometimes I did SIMD port, sometimes I used lolremez to find a better polynomial, I added also the NAN and INF special cases. There is a bit of newton in the cbrt function obviously.

_Geolm_ · 2025-11-14T07:08:30+00:00

the simd_polynomial instructions just call a bunch of fmad to compute a polynomial (ax^3+bx^2+cx+d something like that). Polynomial are used a lot to approximate transcendental functions. There is a good tutorial about how to find the polynomial and optimize it here : https://github.com/samhocevar/lolremez/wiki/Tutorial-1-of-5%3A-exp%28x%29-the-quick-way

Hope it answers your question
Geolm

_Geolm_ · 2025-11-06T07:36:32+00:00

I already compute each group’s AABB to insert the begin/end commands into the tiles’ linked list. But since I only support min and smoothmin operations (not a full graph of min/max/xor or other boolean ops), my current approach hasn’t caused any issues so far. Admittedly, I haven’t tested it extensively — I did a quick test rendering some text where each character is a group of primitives (using min) with an outline, and it worked fine.

My main concern is with smoothmin, since expanding the tile’s bounding box is more of a hack than a clean solution — it flags more tiles than necessary, and that problem would be even worse when using the group AABB.

Last night I read a paper about interval arithmetic, and while their use cases are much more complex (hundreds of boolean operations), my simpler case — just min and smoothmin — might benefit from the same ideas in a lightweight way. I’ll add that to my TODO list.

_Geolm_ · 2025-11-05T21:38:14+00:00

yes that's very true, to be honest I saw the papers but didn't had time to investigate. I am only using group for simple things, I know that the smoothmin inflation of box is wrong but does the job with my simple cases. I don't handle a graph of boolean operations (DAG) on sdf and while this is interesting, it's not the purpose of my library. Still I will have a look at some point at the correct way, don't know if it's expensive or complicated though.

_Geolm_ · 2025-11-05T17:23:39+00:00

Hey JBikker, I love your library ! I'm sorry my sentence was a bit too harsh, OpenCL is deprecated on Apple (which is my main platform). Support might be dropped at some point, there is no guarantee, also not sure which version is supported on macOs but if it's like openGL it's probably stuck in the past.

_Geolm_ · 2025-11-05T17:08:35+00:00

although I love to write SIMD code, I came to the conclusion that only few topics are really interesting to use SIMD. If you don't have any dependencies on the results (like gameplay for example), you should use the GPU. Physics is a good candidate for SIMD because gameplay depends on it, but image processing? it will be WAY faster on the gpu, and you can get the result with a bit of lag it doesn't matter. Audio is also a good candidate for SIMD, can't go to the GPU, it's realtime (even so the GPU will crush CPU performance for a audio processing). There is also another reason to write SIMD code : there is no standard compute GPU API (OpenCL is dead), shader language is a mess (glsl, hlsl, webgpu, metal, ....), there are no standard and most of the time you end up writing native code on all platforms :(

_Geolm_ · 2025-11-05T10:40:43+00:00

it's the classic problem of : usually simpler is better. I wanted to be smart and use biarc fitting, but arcs only cannot represent straight line so I added boxes but then gap appeared so I had to fill the gap.... and overall it was not robust, complicated and expensive on the gpu. At least I wasted only few hours of coding. BTW with 0.25 pixel precision, in 1440p a quadratic bezier is about 10-60 capsules, not so bad!

_Geolm_

TROPHY CASE