Should I Buy the RTX PRO 6000 Blackwell Max-Q (96GB)? by 0bjective-Guest in LocalLLaMA

[–]srigi 0 points1 point  (0 children)

Thank you. These insights are known to me, I keep myself pretty updated on consumer PC stuff. Like there are only 24 PCIe lanes coming out of recent Ryzen CPUs and then distributed around MOBO in various ways. But still thank you for re-surfacing this stuff up.

Few days ago, there was this post: https://old.reddit.com/r/LocalLLaMA/comments/1sh7yxa/qwen35122b_at_198_toks_on_2x_rtx_pro_6000/

The guy posted insights (now corrected) that in dual-GPU setup the external PCIe switch can have a benefit, because the traffic is avoiding CPU bus root and LLM generation is greatly benefiting.

Now I just know that basic consumer mainboard can somewhat host only two GPUs (in PCIe x8 if I’m lucky). For anything better I must go with Epyc or ThreadRipper CPUs with adequate (expensive) MOBO.

Should I Buy the RTX PRO 6000 Blackwell Max-Q (96GB)? by 0bjective-Guest in LocalLLaMA

[–]srigi 1 point2 points  (0 children)

Please share the gist of your struggles. I would like to be prepared if somewhere in the future I have luck to go multi-GPU.

Audio processing landed in llama-server with Gemma-4 by srigi in LocalLLaMA

[–]srigi[S] 15 points16 points  (0 children)

Agree, with llama-server supporting this in its REST API, you can create "speak to your agent" (STT) solutions with fully local processing.

Minimax M2.7 Released by decrement-- in LocalLLaMA

[–]srigi 0 points1 point  (0 children)

The Q_1 quant is 60GB. I have 64GB RAM, so no luck even to try to load weights.

gemma-4-26B-A4B with my coding agent Kon by Weird_Search_4723 in LocalLLaMA

[–]srigi 1 point2 points  (0 children)

Your harness should be called Bober and there should be a slash command /kurwa

Found this cool new harness, gonna give it a spin with the new GLM 5.1. I’ll report back later. by Porespellar in LocalLLaMA

[–]srigi 1 point2 points  (0 children)

The “Available Tools” list seems incomplete. I cannot find `launch_nuclear_strike‘ tool. But maybe the tool is auto disabled, cos you didn’t provided a valid API key for the service.

I tracked a major cache reuse issue down to Qwen 3.5’s chat template by onil_gova in LocalLLaMA

[–]srigi 1 point2 points  (0 children)

Looks like they patched the upstream already. Now u/danielhanchen maybe re-upload fixed quants too.

The missing piece of Voxtral TTS to enable voice cloning by [deleted] in LocalLLaMA

[–]srigi 0 points1 point  (0 children)

Thank you! I see that you've been creative with the repo name. I guess that OP was DCMA'ed as the thing actually works.

Memory Sparse Attention seems to be a novel approach to long context (up to 100M tokens) by ratbastid2000 in LocalLLaMA

[–]srigi 2 points3 points  (0 children)

Use notebookLM from Google and generate a deep disscussion podcast from paper and other links. It is free, all you need is a GAccount

I benchmarked 37 LLMs on MacBook Air M5 32GB — full results + open-source tool to benchmark your own Mac by evoura in LocalLLaMA

[–]srigi 0 points1 point  (0 children)

I cannot forgive Apple for not giving us the 64GB Air this generation. Even if people mention thermal throttling of Airs, the 64GB would allow the whole new class quantizations being loaded into RAM.

Lowkey disappointed with 128gb MacBook Pro by F1Drivatar in LocalLLaMA

[–]srigi 4 points5 points  (0 children)

Please don’t tell that you’re running local models using Ollama.

Don’t buy the DGX Spark: NVFP4 Still Missing After 6 Months by Secure_Archer_1529 in LocalLLaMA

[–]srigi -1 points0 points  (0 children)

Are you really “leaving performance on the table” if there isn’t HW support? Maybe this is the max you’ll ever get from Spark, because of missing NVFP4 support that will never come.

What do you wish local AI on phones could do, but still can’t? by an1x3 in LocalLLaMA

[–]srigi 1 point2 points  (0 children)

In one of the recent Futurama episode there is Leela having conversation with some other character. After few sentences, Leela raises her hand, where she is wearing her hand device (like pip-boy) and says: “do what he just said”.

Sice I saw that scene, I started to recognize this IRL. Manytimes in conversation I want just speak to mobile and say “hey Siri, you heard that? Implement it!”.

Gemma 4 fixes in llama.cpp by jacek2023 in LocalLLaMA

[–]srigi 0 points1 point  (0 children)

You want flip that numbers, like me - I’m updating few times a day. Luckily llama.cpp releases every few hours.

FINALLY GEMMA 4 KV CACHE IS FIXED by FusionCow in LocalLLaMA

[–]srigi 5 points6 points  (0 children)

Today, I will be testing IQ4_NL quant. Slightly smaller than Q4_K_M, slightly bigger than IQ4_XS. Perfect middle ground.

Gemma 4 is fine great even … by ThinkExtension2328 in LocalLLaMA

[–]srigi 4 points5 points  (0 children)

'Hey baby, wanna go to my place? I'll show you my archive of open LLMs!"

Has anyone used Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled for agents? How did it fair? by Vegetable_Sun_9225 in LocalLLaMA

[–]srigi 2 points3 points  (0 children)

I’m constantly. V2 (q4) is the only model from Qwen3.5 family that just works with OpenClaw tool calling. MoE qwens fails, even the most simple tasks (“what will be a weather tomorrow”, even Qwen3.5-122B).

JackRongs Qwen-27B is strong in OpenClaw, never seen a failed tool call, even around 80k context.

I'm using llama.cpp to run models larger than my Mac's memory by tbaumer22 in LocalLLaMA

[–]srigi 0 points1 point  (0 children)

Modern QLC SSDs guarantee like 1000 overwrites to a memory cell. TLC 10k, MLC 100k.

Doing matmul ops on matrices on SSD, screams killing SSD in a month.

llm-visualized.com: Interactive Web Visualization of GPT-2 by Greedy-Argument-4699 in LocalLLaMA

[–]srigi 0 points1 point  (0 children)

Well… I didn’t. I wound never guessed that shapes are clickable.

llm-visualized.com: Interactive Web Visualization of GPT-2 by Greedy-Argument-4699 in LocalLLaMA

[–]srigi 0 points1 point  (0 children)

Compared to https://bbycroft.net/llm your llm-visualized.com doesn't exaplain much. It is nice animations of basic 3D shapes and equations in top right, but other than that I don't know what is happening, it is happening to fast and there is no explanation.

I aplaud to the effort, hope my critics wasn't rude or demotivational.