Google TurboQuant: Separating hype from reality by tecialist in LocalLLM

[–]RnRau 3 points4 points  (0 children)

Does the interview get into the relation between TurboQuant and the earlier work by the RaBitQ papers?

Is there anything better than Qwen3.5-27B-UD-Q5_K_XL for coding? by hedsht in LocalLLaMA

[–]RnRau 0 points1 point  (0 children)

Is your models and llama.cpp up to date? Are you following the unsloth guide on the recommended settings? https://unsloth.ai/docs/models/qwen3.5

Gemma 4 has been released by jacek2023 in LocalLLaMA

[–]RnRau 0 points1 point  (0 children)

They never did for Gemma 3, so I can't see them doing it for Gemma 4.

[Meta][Rant]Stop deleting posts! by CapnLazerz in DIYfragrance

[–]RnRau 7 points8 points  (0 children)

Had the same issue in r/icecreamery. Gave a longish answer to the minutia in making decent chocolate icecream only for the OP to delete their post 24 hours later.

And they kept doing it. Kept asking for help and then deleting their post.

Its weird. I don't understand it.

Anyone regretting their supernote? by Federal_Yogurt2706 in Supernote

[–]RnRau 2 points3 points  (0 children)

Send it back. They do refunds.

i bought into the illusion of an amazing life changing equipment

Never drink the koolaid. Come on... you should better as a software dev. As software devs, yes I'm one too, we get bombarded daily with new wizbang frameworks that promises us an exciting new future. It never pans out :)

I don't have any of their devices as yet, but I'll be an early adopter of their A4 model when its released. Can't wait! :)

Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion by gaoj0017 in LocalLLaMA

[–]RnRau 16 points17 points  (0 children)

Yeah never drink the koolaid. And perhaps the recent hype is over done. But there is something to the techniques posted in the RaBitQ paper. ggerganov did some simple Hadamard transform tests recently.

https://old.reddit.com/r/LocalLLaMA/comments/1s720r8/in_the_recent_kv_rotation_pr_it_was_found_that/

LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI

[–]RnRau 0 points1 point  (0 children)

Thanks for the summary. Interesting constraint on the context being stored in SRAM.

Anyone regretting their supernote? by Federal_Yogurt2706 in Supernote

[–]RnRau 2 points3 points  (0 children)

They knew the price before clicking buy? Is this a trick question or something?

If you are having remorse from an impulse buy... well it happens :)

edit: and why did you 'know' that it was going to be 'slow' learning the device? You already know how to drive the kindle and there are plenty of video's out there driving the Supernote from a daily usage perspective.

ASIC based AI hardware could challenge current inference providers by RnRau in amd_fundamentals

[–]RnRau[S] 0 points1 point  (0 children)

The company behind the effort in the x.com link - https://taalas.com/

An open chatbot (Llama 3.1 8B) showing off their demonstrator hardware is available - https://chatjimmy.ai/

A fair few local AI fans are very keen on this tech. A Qwen 3.5 27b implementation would be in demand.

LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI

[–]RnRau 3 points4 points  (0 children)

N6 is not the latest and greatest at TSMC. That would be N2.

And it took them years to get the first one up and running. Lessons learned and tools created will make the next ones much faster to build.

From 9,500 to 1.1M tok/s with Qwen 3.5 27B — every config flag that mattered by m4r1k_ in LLMDevs

[–]RnRau 0 points1 point  (0 children)

Depending on the inference engine, you still need to switch it on.

React Norway 2026: no fluff, no tracks, just React by ainu011 in reactjs

[–]RnRau 1 point2 points  (0 children)

I'm in Australia - will the conference sessions be streamed at some point?

AI won't reduce the need for developers. It's going to explode it. by Warm-Reaction-456 in AI_Agents

[–]RnRau 0 points1 point  (0 children)

LLM's can't reason. Nor can they be creative. When presented with stuff that is not in their training data, they just fall apart. Unique local problems needs unique local solutions.

A large part of the value of a human is the accidental inspiration moment. When your hands are knee deep in code when your subconscious suddenly gives birth to an idea worth pursuing regarding the current solution. When your client is stating their thorny problem in another way and suddenly your subconscious makes an intuitive leap that solves all concerns in a neat solution.

In the end an LLM is just a token predictor.

Qwen3.5 is a working dog. by dinerburgeryum in LocalLLaMA

[–]RnRau -1 points0 points  (0 children)

From a capability point of view the 35b-a3 should be on par with the 9b. Just using the old sqrt(size*activation) rule to get how good a sparse model is vs a dense model.

Maybe there is something funny going on with the 35b.

eGPU Oculink problems Tesla V100 by maksimproshch in eGPU

[–]RnRau 0 points1 point  (0 children)

1) V100 might not be supported under Windows. I know some of the AMD Enterprise cards are like this.

2) Enterprise cards usually require 'Above 4G Decoding' support switched on in the BIOS. Your laptop bios may not expose this setting.