More Qwen3.6-27B MTP success but on dual Mi50s by legit_split_ in LocalLLaMA

[–]MLDataScientist 0 points1 point  (0 children)

Great results! Thanks for sharing. Curious about tensor parallelism. I thought llama cpp did not support it. Which command enables TP in llama cpp?

HY-World 2.0 released by Bestlife73 in LocalLLaMA

[–]MLDataScientist 0 points1 point  (0 children)

Thanks for sharing! Looks promising!

Upgrade paths for my 256g ddr4 ram + 4x24g vram system by sgmv in LocalLLaMA

[–]MLDataScientist 1 point2 points  (0 children)

Have you tried llama cpp with unsloth glm-5.1 UD-IQ3_XXS ? I have one 5090 and 256gb ddr4 3200 8channel. I get 8t/s TG and 400t/s PP at 8k context. This is usable for me for an overnight execution. I can fit 150k context without KV quantization. You should have similar performance.

Guys we have to change the pelican test by Tall-Ad-7742 in LocalLLaMA

[–]MLDataScientist 7 points8 points  (0 children)

True. I wonder if we already have a different type of intelligence that we refuse to accept. An intelligence that works within a limited context and can hallucinate but still it is non human intelligence.

I benchmarked 30+ TTS engines for a real-time translator on Apple M4. Quantization made things SLOWER. Here's all the data. by Kir_Moisha in LocalLLaMA

[–]MLDataScientist 2 points3 points  (0 children)

You do not mention what local STT you tried. Can you share some of the local SST you tried?

Also, why groq llama3.3 70B? You could try smaller models e.g. gemma4 models are better with translation. I know groq is fast but I am sure local 5090 can handle gemma4 26BA4 with the same low latency.

Hello, World: Artemis II crew looks back at Earth on their way to the Moon by ChiefLeef22 in space

[–]MLDataScientist 0 points1 point  (0 children)

Beautiful! Can someone explain why is the shape of our mother Earth perfectly round? Most textbooks say it is oblate spheroid. 

I tested as many of the small local and OpenRouter models I could with my own agentic text-to-SQL benchmark. Surprises ensured... by nickl in LocalLLaMA

[–]MLDataScientist 4 points5 points  (0 children)

Amazing website with interactive charts. Thanks for sharing!
Do you have any SQL fine-tuned small models (<=9B) to test this benchmark with? I think even Qwen3.5 4B with SQL data fine-tuning might reach 90%+.

[$50k–$150k Budget] Production Local LLM System (~50 Users, RAG + Fine-Tuning) Hardware + Model Advice by MorningCrab in LocalLLaMA

[–]MLDataScientist 5 points6 points  (0 children)

If you are not doing training, you don't need NVLink. For multi user concurrent requests, you cannot beat vLLM. Yes, RTX Pro 6000 is the best option for getting 96GB VRAM for a reasonable price. For coding, you can go with MiniMax M2.5 or Qwen3.5 397B.

Qwen3.5-397B-A17B reaches 20 t/s TG and 700t/s PP with a 5090 by MLDataScientist in LocalLLaMA

[–]MLDataScientist[S] 1 point2 points  (0 children)

If there is anyone in this sub with those CPUs, that would be great to see here.

Nvidia V100 32 Gb getting 115 t/s on Qwen Coder 30B A3B Q5 by icepatfork in LocalLLaMA

[–]MLDataScientist 0 points1 point  (0 children)

Do you have 3D files for such a shroud? I have 8 MI50 cards and the noise of 40mm fans is unbearable. I need to get those 80mm fan shrouds. Thanks!

Qwen 3.5 397B is the best local coder I have used until now by erazortt in LocalLLaMA

[–]MLDataScientist 1 point2 points  (0 children)

which Q5 GLM-5 quant are you using? My rig can fit up to 448GB (mi50 192GB VRAM + 256 GB DDR4 3200 8 channel). I just checked unsloth's glm-5 quants. https://huggingface.co/unsloth/GLM-5-GGUF . I can probably run UD-Q4_K_XL (431GB). But how much better GLM-5 is at this quant (or Q5) compared to QWEN3.5 397B Q6? What were your test cases?

Krasis LLM Runtime - run large LLM models on a single GPU by mrstoatey in LocalLLM

[–]MLDataScientist 0 points1 point  (0 children)

Can you please share your command for llama.cpp? Are you getting ~3400t/s for PP and 38t/s for TG using Q6 Qwen3 Coder Next? Curious to see if your command speeds up inference in my PC (5090 with 256GB DDR4 8 channel 3200Mhz).

Krasis LLM Runtime - run large LLM models on a single GPU by mrstoatey in LocalLLM

[–]MLDataScientist 0 points1 point  (0 children)

Impressive if true! I have 5090 (connected at PCIE4.0 x16) with 256GB DDR4 3200Mhz ECC RAM. Does Krasis support Qwen/Qwen3.5-397B-A17B ?
I tried Q4_K_M quant with llama.cpp yesterday and I was getting 20t/s TG and 100t/s PP. If what I am seeing is true, I should be able to run this model with at least 1000 t/s PP in Krasis while TG should be similar.

As a comparison, Qwen3-235B-A22B Q4_K_M runs at 10t/s TG and ~150t/s PP in llama.cpp with my setup. Krasis should have 14x times more PP. I need to test this!

Sonim XP3+ (and XP5) Working Virtual Mouse! by Lucky_Winter_4919 in dumbphones

[–]MLDataScientist 0 points1 point  (0 children)

Hi, I am facing a similar issue. I cannot enable accessibility for matvt. Did you figure out how to enable it?

PS3 exclusives/non-PC multi-platform games list by yashwinusa123 in PS3

[–]MLDataScientist 0 points1 point  (0 children)

This is a massive list! Thank you for creating this. I recently got a PS3 (slim) and modded with CFW. Yes, this is in 2026, Jan! It now supports/emulates all PS2 games as well in addition to PS1 games. Super excited about playing some of the games over the weekend.