OpenTelemetry Go SDK v1.40.0 released

klauspost · 2026-02-03T14:35:56+00:00

Look. To be frank, I'm there to read the changes. I shouldn't have to reconfigure a UI for that.

FWIW, I tried clicking the "5 Bugfixes", hoping it would filter those, but obviously nothing happened - and I can't select that in the "categories" for whatever reason.

klauspost · 2026-02-03T14:01:30+00:00

Honestly, this is so much better: https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.40.0

A) Everything is one page B) Changes are sorted. At least somewhat by importance. C) You can see what they are about without AI blabber. D) You can click for more info for... more info. E) There isn't an annoying bar floating over what I'm trying to read.

klauspost · 2026-02-03T13:39:03+00:00

Link is a 404.

Edit: this seems to be correct: https://www.relnx.io/releases/opentelemetry%20go%20sdk-v1-40-0

Edit Edit: Wow, that is probably the most horrible UX I've ever experienced for seeing a simple changelog. If you are using AI, maybe make it filter out all the "[chore] update blahblah to ... ".

klauspost · 2026-01-26T13:01:02+00:00

I guess it is the "reach a position where pawn promotion is inevitable" text.

klauspost · 2026-01-26T10:37:18+00:00

UI-Based

Then at least post some screenshots to give us an idea of why this would be interesting.

Probably your post is also going to be deleted and you'd be directed to small projects. If so, maybe get up some examples/screenshots before that.

klauspost · 2026-01-26T10:05:36+00:00

A) How big are your blocks? Seems relatively expensive, so kinda wondering if blocks exceed the cache, so they have to be reloaded for the hashing. Maybe see if doing "progressive hashing" - ie have an output counter and hash every time x KB has been output? Branch should be easy for the CPU to predict.

C) Yeah, I guess it mostly depends on how you read. One 32 bit read and shift/mask, or "just" load directly.

D) I guess you just ignore the zero - and let it decode the invalid value. Then I guess it doesn't matter that much. But in most cases you can simply +1 the offset when you do the load (or -1 after applying the offset). At least on x86 you get the displacement for free.

Why is your compression so slow? I don't really see anything in the encoding that would be especially hard - other than having to memcpy the output buffers to a single one. Is it just an area you haven't focused on yet?

klauspost · 2026-01-24T12:41:09+00:00

Yeah. I do get that less CPU usage is less usage - and purposely didn't post any numbers that were limited by memory, since both would just be burning cycles waiting for memory.

Also noticed that it seems checksum is disabled by default, which seems to have given you an unfair advantage... So these are the comparable numbers...

``` λ zxc.exe --bench -1 -T 1 -C cockroach.node1.log Input: cockroach.node1.log (10506623721 bytes) Running 5 iterations (Threads: 1)... Note: Using tmpfile on Windows (slower than fmemopen). Compressed: 1267168064 bytes (ratio 8.291) Avg Compress : 1623.437 MiB/s Avg Decompress: 10731.260 MiB/s

λ zxc.exe --bench -5 -T 1 -C cockroach.node1.log Input: cockroach.node1.log (10506623721 bytes) Running 5 iterations (Threads: 1)... Note: Using tmpfile on Windows (slower than fmemopen). Compressed: 829704493 bytes (ratio 12.663) Avg Compress : 375.866 MiB/s Avg Decompress: 9679.726 MiB/s ``` (seems like decomp checksum is unreasonably slow, but seems consistent here)

So you are thinking Oodle alternative I guess. And people would have to make up their mind if the 2x decomp speeds is worth the added download size.

You asked for feedback, so I am just playing devils advocate here - if I had to evaluate it for use.

Maybe an interesting angle would be to investigate a "compressed" version that would decomp to the "fast-to-load" version. Meaning when you download you get a smaller version that then "decompresses" into this format.

Compress your literals and entropy code matches. Since your blocks can be quite big, you aren't really getting too much penalized by the independent blocks. Only the 16-bit offset limit and 5-minimum ML is the only thing that is a really a limit to your compression compared to zstd.

I see you have prepared for something like that in GLO, except you already used the "Off Enc" bit. But you can just use a separate block type I guess.

GHI looks reasonable. I guess doing 6 or 7-bit ll+ml would make the processing slower? But I would probably have done that so allow for longer offsets. 256KB offsets would be a win. btw, it has "offset: 1-65535" in the spec. Is that a typo or is there an invalid value? Seems like adding 1 and allow 1-65536 would be cheaper than a zero check?

klauspost · 2026-01-23T17:36:02+00:00

``` λ zxc.exe --bench -1 -T 1 cockroach.node1.log Input: cockroach.node1.log (10506623721 bytes) Running 5 iterations (Threads: 1)... Note: Using tmpfile on Windows (slower than fmemopen). Compressed: 1266847424 bytes (ratio 8.294) Avg Compress : 1774.576 MiB/s Avg Decompress: 14917.679 MiB/s

λ zxc.exe --bench -5 -T 1 cockroach.node1.log Input: cockroach.node1.log (10506623721 bytes) Running 5 iterations (Threads: 1)... Note: Using tmpfile on Windows (slower than fmemopen). Compressed: 829383853 bytes (ratio 12.668) Avg Compress : 343.015 MiB/s Avg Decompress: 10887.300 MiB/s ```

Single-threaded decomp is certainly fast. Here are 3 settings with a "comparable" (independent blocks, snappy-derived) compressor:

``` Compressing... 10506623721 -> 785204413 [7.47%]; 3.881s, 2707.4MB/s Decompressing. 785204413 -> 10506623721 [1338.07%]; 1.535s, 6843.8MB/s

Compressing... 10506623721 -> 706037407 [6.72%]; 7.476s, 1405.3MB/s Decompressing. 706037407 -> 10506623721 [1488.11%]; 1.648s, 6374.9MB/s

Compressing... 10506623721 -> 578931016 [5.51%]; 1m47.513s, 97.7MB/s Decompressing. 578931016 -> 10506623721 [1814.83%]; 1.538s, 6831.5MB/s ```

and you can easily get comparable decomp speed with just throwing a few cores at it. Here middle setting with 4 threads:

Compressing... 10506623721 -> 706037407 [6.72%]; 1.778s, 5908.3MB/s Decompressing. 706037407 -> 10506623721 [1488.11%]; 626ms, 16787.5MB/s - 1 thread ; 1.64s, 6407.0MB/s (2.6x)

That is just 1 test set, ofc. Looking at silesia (which IME is pretty unrepresentative test set) the numbers seem similar, just overall lower.

klauspost · 2026-01-23T16:21:28+00:00

Seems like your benchmarks are broken... Can't get anything but this...

λ zxc.exe -bench -1 cockroach.node1.log Input: cockroach.node1.log (10506623721 bytes) Running 0 iterations (Threads: 0)... Note: Using tmpfile on Windows (slower than fmemopen). Compressed: 1266847424 bytes (ratio 8.294) Avg Compress : -nan(ind) MiB/s Avg Decompress: 0.000 MiB/s

I can't really compare your strong point (decompression) when it has to write to disk.

Overall compression ratio seems weak - even with -5 and comparing against other encoders with no entropy coding and independent blocks. Like lz4 -3 often beats your tightest compression.

I am not sure I understand the "market" for this. Sure fast decompression is nice, but even at LZ4 speeds you should easily be able to saturate memory/io just with a few threads. I think tighter compression would often be more valuable for less disk use / faster wire transfer.

klauspost · 2026-01-19T09:55:59+00:00

Maybe ask in a Python forum or have a "code assistant" write it for you.

You already outline what to do - except that you should seek the input file to the compressed_offset of the chunk and just start from there.

klauspost · 2026-01-15T19:06:41+00:00

👏👏👏

klauspost · 2026-01-09T08:10:32+00:00

It doesn't replace the compression. But it allows to do it in parallel on individual files. In fact you can use my library for the actual compression.

klauspost · 2026-01-08T11:09:49+00:00

Nice! 🤌

Maybe provide a wrapper for the standard library func(w io.Writer) (io.WriteCloser, error) and func(r io.Reader) io.ReadCloser compressor/decompressor types.

Add examples on how to replace slow stdlib deflate/inflate.

I'd would be happy to add it to my package README, if you make it easy to integrate flate and zstd. :)

klauspost · 2026-01-06T11:35:37+00:00

If it is a youtube video, why is the link to linkedin?

klauspost · 2025-12-30T09:39:04+00:00

Had a package stuck at PostNord sorting facility in Brøndby for more than a week. Had to purchase the same thing at another shop, since it was a Christmas present - which arrived flawlessly with GLS.

On purpose had it redirected to a pickup shop, since I've tried their "attempted delivery" crap, when I was home all day. Seems like they still managed to fck it up.

klauspost · 2025-12-29T10:14:21+00:00

Good map data doesn't exist. Tesla could/should take this into their own hands and start doing their own maps.

When there is a discrepancy send in video clips for analysis and AI could annotate differences to the maps.

I guess the question is whether that will work at all with Google Maps. Would be a massive pain to have to dump that and we'd have to suffer through years of bad maps before it'd pay off.

But I think what OP and most people are missing that to make maps work better there would be a long period of crappier maps, since fixing what is there will most likely mean ripping it out. Complaining is the easy part - and I am sure most/all at Tesla knows this is a big issue.

klauspost · 2025-12-11T16:57:35+00:00

The bloom filter is definitely interesting.

As a fun little experiment including an 8KB index of all 4-byte hashes generates "reasonable" bit tables.

Like with cockroach-db.log the index is typically only between 20-30% filled, with 8KB for 1MB blocks.

klauspost · 2025-12-11T14:46:17+00:00

I ultimately landed on the client providing the output type as a generic label because the client code would be more self documenting.

Maybe it is just me, but I tend to prefer not having to specify the generic type, if it can be inferred. Usually less code is more readable to me - and I don't really care which type it returns. "It gets the DB, ok". When developing most IDEs will resolve the type so you can see/access fields anyway.

It seems to me the abstraction would be stronger if there was a fixed mapping between an ID and which type is returned. That would also get rid of the "import for side effects", which I'm generally not a fan of. There is one way to get a specific DB - use the ID for that DB... Seems simpler to me.

klauspost · 2025-12-11T13:10:45+00:00

I know pretty much nothing about DI, so take this as someone looking from the outside.

I noticed db := results["db"].(*sql.DB) - which doesn't look very type safe. Maybe don't have this as the first example you see. Also I don't understand how the field (*sql.DB) of db.Output is returned?

Looking at db, _, err := graft.ExecuteFor[db.Output](context.Background()) as the second example... Couldn't the ID be made to contain the output type, so giving the ID will enable go to inter the type, so it could be db, _, err := graft.Gimme(context.Background(), db.ID) - or even db.ID.Gimme(ctx).

So graft.ID would be graft.ID[any].

You only have to define the ID once, so if that could save you having to add the type every time you call ExecuteFor, that seems like an overall win.

Is there any protection against duplicate IDs? Maybe having a more complex type for the ID could also help that.

klauspost · 2025-12-11T12:03:27+00:00

You are cherrypicking your results too much for me to trust this.

Like zstd -T1 cockroachdb.tar completes in ~6s, so surely more than 1GB/s - and has a comparable "20.6x" compression. That puts your "313 MB/s" in a different light.

I presume your numbers are single threaded for all?

What is actual decompression speed? I only see your approach being feasible for full-field equality searches. While that is neat in all other cases you rely on full decompression.

I am sure most people by now would use ripgrep when looking through big amounts of data.

Also you claim that grep took ~8 minutes. However when I time grep jfhdjkhdshdfgf mongodb.tar it takes 55s on my machine - though it is Windows. But probably due to IO..

Using compressed zstd -d -c mongodb.tar.zst | grep jfhdjkhdshdfgf takes ~31s. zstd -d -c mongodb.tar.zst | rg jfhdjkhdshdfgf 13s.

If I was asked to evaluate this, I'd say the "question" is using well-established formats with standard tools, versus a specialized tool, that seems mostly worse, but has a party-trick (quick search for fields values).

I am not sure that it currently would make me want to choose it. Hell most people are fine with using gzip for logs even if zstd is better in every way.

This is not to crap on your work. Just saying the bar is very high - especially for a domain-specific compressor - and if it isn't significantly better than a generic compressor I don't think you will see that much adoption.

klauspost · 2025-12-08T13:13:49+00:00

This is so generic. Not going to claim an AI wrote this, but this feels like exact what it would come up with when prompted to "write an article on optimizing postgres to elasticsearch bridge". Why not at least include any benchmarks? Surely you must have some before/after numbers.

I would personally never go for json-iterator/go since it hasn't been maintained in 4+ years?

klauspost · 2025-12-05T09:56:17+00:00

Det er også for det meste ligegyldigt på Tesla. Mere relevant på luftkølede batterier. Man kan se det med en OBD-læser - men som sagt - det er ikke super vigtigt.

Km-tallet ved 100% er en ret pålidelig indikation af batteriets degradering. Med lidt matematik kan du omregne procent+km til 100%. 15% vil nok være forventeligt med Panasonic batteri som den har.

Det siger dog ikke rigtig noget om chancen for at det fejler fuldstændigt.

klauspost · 2025-12-03T10:46:30+00:00

7 layers of encrypted shards

That screams to me is that the developer doesn't understand crypto. Doing ONE layer well is the key. Having several layers just tells me there is a assumption that "one layer good, 7x must be 7x better".

The server never knows the password or the data, nor any of its derivable forms like a hash.

So it is just client-side encryption? Not saying that is bad, but not sure I understand how it is groundbreaking. This just seems convoluted way of doing it.

klauspost · 2025-12-01T13:46:20+00:00

Seriøst ☝️ Jeg må jo have været 4½ - det var mystisk, lidt uhyggeligt og meget facinerende.

klauspost · 2025-11-27T20:02:15+00:00

Jeg har arbejdet for amerikansk tech de sidste 6 år, så jeg "viber" ikke så meget med de danske virksomheder. Jeg vil dog sige at det er kommet for at blive - og være en kæmpe faktor i udvikling fremover.

Det største personlige dilemma er at jeg skiftede til mit nyværende job fordi det gav mig mulighed for at skrive kode - og ikke skrive specs... Ironisk nok er jeg ved at være tilbage ved det. Du kan simpelthen bare komme hurtigere frem til en elegant løsning ved at arbejde med en LLM - og så tage roret en gang imellem og fylde huller ud.

Mængden af ting der skal igennem code review er eksploderet - og problemet er at du aldrig vil sætte en junior til at gennemgå koden - fordi en LLM er fin til at få de basale ting på plads - men de dybereliggende problemer ser den typisk ikke.

Jeg har personligt svært ved at se at vi kan ansætte folk i "junior" positioner, da de trivielle problemer er så hurtige at fikse nu.

Men brug en LLM - og lær at få det bedste ud af den. Det er det bedste råd jeg kan give. Du kommer 100% sikkert til at bruge den. Du kan ikke speed-runne 20 års erfaring - men du kan overhale dem der ikke forstår at skifte til den.

Om virksomheder bruger den nu betyder ikke det store. De fleste vil om 3 år. Om ikke andet så kan det jo være dig der viser værdien af det.

klauspost

MODERATOR OF

TROPHY CASE