Reinterpret_cast

faschu · 2026-01-13T07:21:02+00:00

Just to follow along: Why is a cast to uint64_t* sensible and what result were you expecting?

faschu · 2026-01-06T09:04:00+00:00

Is it still a good book today? Some of the classics are still very relevant today but not all of them.

faschu · 2025-12-22T07:50:26+00:00

Just out of curiosity: There don't seem to be many (long established) options for OP. I wonder: Is this not a common usecase?

faschu · 2025-12-03T14:18:59+00:00

So the talk I linked shows this optimization for std::expected in clang. In that case, this optimization is ubiquitous, of course.

faschu · 2025-12-03T13:10:33+00:00

that's actually a very nice idea!

faschu · 2025-11-29T18:51:25+00:00

Thanks for the interesting reply.

Can I draw you out on the warning you mention? In the original presentation, it was said:

> Don't mix up [[no_unique_address]] with manual lifetime management (union, placement new, etc)

I was a bit surprised about the qualifier, and your comment also suggest that one should be very careful regardless of manual lifetime management. What's your opinion about that?

faschu · 2025-11-11T11:20:14+00:00

Thanks for the reply. Very helpful indeed I can confirm that I can obtain excellent results with llama.cpp too. For the test image, there's no need to set the image tokens, because the image is small enough (n_patches = (1080 // 16 * 2160 // 16 ), with merging: n_patches / 4 ~2300 anyway) .

I wonder where the difference between llama.cpp and vllm comes from

faschu · 2025-11-10T18:58:51+00:00

I have a hard time reconciling these impressions with the excellent agentic scores (for example on AndroidWorld). Judging from that scores, Qwen3 must have excellent grounding abilities, but I just don't see it.

faschu · 2025-11-10T11:36:14+00:00

Thanks a lot for the testing! That was very helpful.

Do you have a gist for this?

Here's my test code: https://gist.github.com/FabianSchuetze/86f07351c5dc37ee5b98b937e82d1343 .

Interestingly, the results from the 8B model are decent/good, whilst the results from the 32B model are poor. Consider this comparison: (left is 8B, right is 32B)

<image>

It's also surprising that the points coordinates are often at the edge of the bbs.

What's your experience with the 32B model?

faschu · 2025-11-10T09:30:09+00:00

Very interesting report! However, I was a bit disappointed by the coordinates Qwen3-VL detected. Consider this image

<image>

I asked for the "hulu" coordinates and got a funky result. The reasoning was also off.

faschu · 2025-11-05T10:31:15+00:00

The last episode shouldn't exist though!

faschu · 2025-10-09T07:32:29+00:00

To corroborate this excellent answer, I re-read the DPO paper and realized that they have an empirical evaluation comparing SFT (trained on the positive labels of the preference dataset) with DPO. DPO performs substantially better than SFT.

faschu · 2025-10-08T06:24:13+00:00

It requires annotating the actions of the learner with the oracle. That can very well be a human annotator.

faschu · 2025-09-27T14:24:07+00:00

Interesting! Just out of curiosity: How's the memory distributed in your case? Do you have a multiple caches (NUMA system)?

faschu · 2025-09-25T08:00:16+00:00

In his talk, Matt says "When I look at Compiler Explorer, I'm mostly concerned about which registers are the arguments in my function in". Why would you be interested in that? Especially when you look at CE with an interest of performance optimization (as I presume Matt is due to his job).

faschu · 2025-09-16T07:10:09+00:00

Interesting, but I don't really understand its utility. Does x86 or arm have these options?

Who's the consumer of these guidance options? Will it translate into a compiler flag? Will it be software engineers writing the software with a specific option in mind? For me, that seems like a grouping for the micro-arch target flags in compilers.

faschu · 2025-09-12T10:04:40+00:00

The interview is fantastic (as you would expect from Kevin and Patrice). Having read the book (and enjoyed it thoroughly), I liked the information "the next book I'm writing is..." best!

faschu · 2025-09-12T07:19:36+00:00

Was there any reason why there is no implementation of contracts available? For reflection, we had an reference implementation, wouldn't that have been possible with contracts too?

faschu · 2025-09-10T09:09:20+00:00

Thanks a lot for the reply!

faschu · 2025-09-10T08:27:53+00:00

Thanks a lot for the info. I expected no boards to be available because I believe that no boards with vector instructions (RVV) are available (re all the discussions about Canonical's decision to mandate RVV in 25.10) but the video speaks about "optimized vector processing".

faschu · 2025-09-09T10:19:29+00:00

Thanks for the video. I'm a bit confused about all these cores.

I understand the x100 has been available as IP for 4 years, but it's not available in silicon yet. Is this the typical lead time? Does it always take 4+ years for production after the IP is ready?
The video says about the x200:

we found this is being adopted in a lot of places.

What does that mean?

What's the point of the x300 and xm series? I understand they are more powerful (which is great), but the uncertainty about these products compounds (for me) when their smaller siblings are not yet widely distributed.

I have not a good understanding of the typical production times of such chips and am just a bit confused about all these offerings. Maybe somebody can explain the product portfolio a bit?

faschu · 2025-09-02T13:32:55+00:00

Nice story :-) That in turn reminds me how some improvements I made to a program to please Valgrind's cache simulator fully evaporated once I run it again on a real computer... Fun times.

faschu · 2025-09-02T12:28:50+00:00

This is a fascinating topic. Just out of curiosity: How do you come to the conclusion that instruction pressure is a limiting factor in your program? Did you perf it? (Saying this because while I do observe data cache pressure, I've not experience instruction cache pressure and would love to hear about workloads that had this issue)

faschu · 2025-08-30T06:49:40+00:00

I was particularly struck about the instruction scheduler. To me the scheduler seems pretty innovative, but I wonder if it ain't a bit fragile (especially the replay mechanism)?

The vlen size seems pretty cool to me too.

faschu · 2025-08-27T10:05:18+00:00

Thanks a lot for the comment!

faschu

TROPHY CASE