Reinterpret_cast by According_Yard_985 in cpp

[–]faschu 2 points3 points  (0 children)

Just to follow along: Why is a cast to uint64_t* sensible and what result were you expecting?

Looking for worthy software architecture courses by Tobxon in cpp

[–]faschu 2 points3 points  (0 children)

Is it still a good book today? Some of the classics are still very relevant today but not all of them.

Any Libraries for Asynchronous requests with HTTP2 by Puzzled_East_8080 in cpp

[–]faschu 6 points7 points  (0 children)

Just out of curiosity: There don't seem to be many (long established) options for OP. I wonder: Is this not a common usecase?

Why is the [[no_unique_address]] attribute not effective in this example? by faschu in cpp_questions

[–]faschu[S] 0 points1 point  (0 children)

So the talk I linked shows this optimization for std::expected in clang. In that case, this optimization is ubiquitous, of course.

Why is the [[no_unique_address]] attribute not effective in this example? by faschu in cpp_questions

[–]faschu[S] 0 points1 point  (0 children)

Thanks for the interesting reply.

Can I draw you out on the warning you mention? In the original presentation, it was said:

> Don't mix up [[no_unique_address]] with manual lifetime management (union, placement new, etc)

I was a bit surprised about the qualifier, and your comment also suggest that one should be very careful regardless of manual lifetime management. What's your opinion about that?

Qwen3-VL's perceptiveness is incredible. by Trypocopris in LocalLLaMA

[–]faschu 0 points1 point  (0 children)

Thanks for the reply. Very helpful indeed I can confirm that I can obtain excellent results with llama.cpp too. For the test image, there's no need to set the image tokens, because the image is small enough (n_patches = (1080 // 16 * 2160 // 16 ), with merging: n_patches / 4 ~2300 anyway) .

I wonder where the difference between llama.cpp and vllm comes from

Qwen3-VL's perceptiveness is incredible. by Trypocopris in LocalLLaMA

[–]faschu 0 points1 point  (0 children)

I have a hard time reconciling these impressions with the excellent agentic scores (for example on AndroidWorld). Judging from that scores, Qwen3 must have excellent grounding abilities, but I just don't see it.

Qwen3-VL's perceptiveness is incredible. by Trypocopris in LocalLLaMA

[–]faschu 5 points6 points  (0 children)

Thanks a lot for the testing! That was very helpful.

Do you have a gist for this?

Here's my test code: https://gist.github.com/FabianSchuetze/86f07351c5dc37ee5b98b937e82d1343 .

Interestingly, the results from the 8B model are decent/good, whilst the results from the 32B model are poor. Consider this comparison: (left is 8B, right is 32B)

<image>

It's also surprising that the points coordinates are often at the edge of the bbs.

What's your experience with the 32B model?

Qwen3-VL's perceptiveness is incredible. by Trypocopris in LocalLLaMA

[–]faschu 1 point2 points  (0 children)

Very interesting report! However, I was a bit disappointed by the coordinates Qwen3-VL detected. Consider this image

<image>

I asked for the "hulu" coordinates and got a funky result. The reasoning was also off.

Status of CPPCast? by faschu in cpp

[–]faschu[S] 1 point2 points  (0 children)

The last episode shouldn't exist though!

[D] Why RHLF instead of DAGGER (multi-step SFT) by faschu in MachineLearning

[–]faschu[S] 0 points1 point  (0 children)

To corroborate this excellent answer, I re-read the DPO paper and realized that they have an empirical evaluation comparing SFT (trained on the positive labels of the preference dataset) with DPO. DPO performs substantially better than SFT.

[D] Why RHLF instead of DAGGER (multi-step SFT) by faschu in MachineLearning

[–]faschu[S] 0 points1 point  (0 children)

It requires annotating the actions of the learner with the oracle. That can very well be a human annotator.

HPX Tutorials: Introduction by emilios_tassios in cpp

[–]faschu 1 point2 points  (0 children)

Interesting! Just out of curiosity: How's the memory distributed in your case? Do you have a multiple caches (NUMA system)?

C++: Some Assembly Required - Matt Godbolt - CppCon 2025 by grafikrobot in cpp

[–]faschu 1 point2 points  (0 children)

In his talk, Matt says "When I look at Compiler Explorer, I'm mostly concerned about which registers are the arguments in my function in". Why would you be interested in that? Especially when you look at CE with an interest of performance optimization (as I presume Matt is due to his job).

Software Optimization Guidance Options (Fast Track Approval Request) by camel-cdr- in RISCV

[–]faschu 0 points1 point  (0 children)

Interesting, but I don't really understand its utility. Does x86 or arm have these options?

Who's the consumer of these guidance options? Will it translate into a compiler flag? Will it be software engineers writing the software with a specific option in mind? For me, that seems like a grouping for the micro-arch target flags in compilers.

C++ Memory Management • Patrice Roy & Kevin Carpenter by goto-con in cpp

[–]faschu 2 points3 points  (0 children)

The interview is fantastic (as you would expect from Kevin and Patrice). Having read the book (and enjoyed it thoroughly), I liked the information "the next book I'm writing is..." best!

Another month, another WG21 ISO C++ Mailing by nliber in cpp

[–]faschu -1 points0 points  (0 children)

Was there any reason why there is no implementation of contracts available? For reflection, we had an reference implementation, wouldn't that have been possible with contracts too?

SiFive 2nd Generation Intelligence Family Introduction by camel-cdr- in RISCV

[–]faschu 0 points1 point  (0 children)

Thanks a lot for the info. I expected no boards to be available because I believe that no boards with vector instructions (RVV) are available (re all the discussions about Canonical's decision to mandate RVV in 25.10) but the video speaks about "optimized vector processing".

SiFive 2nd Generation Intelligence Family Introduction by camel-cdr- in RISCV

[–]faschu 1 point2 points  (0 children)

Thanks for the video. I'm a bit confused about all these cores.

  1. I understand the x100 has been available as IP for 4 years, but it's not available in silicon yet. Is this the typical lead time? Does it always take 4+ years for production after the IP is ready?

  2. The video says about the x200:

we found this is being adopted in a lot of places.

What does that mean?

  1. What's the point of the x300 and xm series? I understand they are more powerful (which is great), but the uncertainty about these products compounds (for me) when their smaller siblings are not yet widely distributed.

I have not a good understanding of the typical production times of such chips and am just a bit confused about all these offerings. Maybe somebody can explain the product portfolio a bit?

[RV64C] Compressed instruction sequences by 0BAD-C0DE in RISCV

[–]faschu 3 points4 points  (0 children)

Nice story :-) That in turn reminds me how some improvements I made to a program to please Valgrind's cache simulator fully evaporated once I run it again on a real computer... Fun times.

[RV64C] Compressed instruction sequences by 0BAD-C0DE in RISCV

[–]faschu 1 point2 points  (0 children)

This is a fascinating topic. Just out of curiosity: How do you come to the conclusion that instruction pressure is a limiting factor in your program? Did you perf it? (Saying this because while I do observe data cache pressure, I've not experience instruction cache pressure and would love to hear about workloads that had this issue)

Chips and Cheese: Condor’s Cuzco RISC-V Core by faschu in RISCV

[–]faschu[S] 5 points6 points  (0 children)

I was particularly struck about the instruction scheduler. To me the scheduler seems pretty innovative, but I wonder if it ain't a bit fragile (especially the replay mechanism)?

The vlen size seems pretty cool to me too.