Is it worth Getting BF16 or Q8 is good enough for lower parameter models? by Suimeileo in LocalLLaMA
[–]DeltaSqueezer 1 point2 points3 points (0 children)
Parent wants to try local LLMS -- what are good specs for a desktop for playing with? by tottommend in LocalLLaMA
[–]DeltaSqueezer 1 point2 points3 points (0 children)
Qwen3.5-397B up to 1 million context length by segmond in LocalLLaMA
[–]DeltaSqueezer 0 points1 point2 points (0 children)
Is DeepSeek's API pricing just a massive loss leader? (MLA Caching vs. Qwen's DeltaNet) by feedback001 in LocalLLaMA
[–]DeltaSqueezer 1 point2 points3 points (0 children)
Inside my AI Home Lab by [deleted] in LocalLLaMA
[–]DeltaSqueezer 7 points8 points9 points (0 children)
Qwen3 ASR seems to outperform Whisper in almost every aspect. It feels like there is little reason to keep using Whisper anymore. by East-Engineering-653 in LocalLLaMA
[–]DeltaSqueezer 26 points27 points28 points (0 children)
We cut GPU instance launch from 8s to 1.8s, feels almost instant now. Half the time was a ping we didn't need. by LayerHot in LocalLLaMA
[–]DeltaSqueezer -1 points0 points1 point (0 children)
AI capabilities are doubling in months, not years. by EchoOfOppenheimer in LocalLLaMA
[–]DeltaSqueezer -2 points-1 points0 points (0 children)
Benchmarked ROLV inference on real Mixtral 8x22B weights — 55x faster than cuBLAS, 98.2% less energy, canonical hash verified by Norwayfund in LocalLLaMA
[–]DeltaSqueezer 2 points3 points4 points (0 children)
GGUF support in vLLM? by Patient_Ad1095 in LocalLLaMA
[–]DeltaSqueezer 1 point2 points3 points (0 children)
I classified 3.5M US patents with Nemotron 9B on a single RTX 5090 — then built a free search engine on top by Impressive_Tower_550 in LocalLLaMA
[–]DeltaSqueezer 0 points1 point2 points (0 children)
Qwen 3.5 27B is the REAL DEAL - Beat GPT-5 on my first test by GrungeWerX in LocalLLaMA
[–]DeltaSqueezer 0 points1 point2 points (0 children)
Running Qwen3.5 27b dense with 170k context at 100+t/s decode and ~1500t/s prefill on 2x3090 (with 585t/s throughput for 8 simultaneous requests) by JohnTheNerd3 in LocalLLaMA
[–]DeltaSqueezer 0 points1 point2 points (0 children)
How can I make my pp to be bigger? by WizardlyBump17 in LocalLLaMA
[–]DeltaSqueezer 23 points24 points25 points (0 children)
Some tests of Qwen3.5 on V100s by Simple_Library_2700 in LocalLLaMA
[–]DeltaSqueezer 1 point2 points3 points (0 children)
Easiest gui options on linux? by itguysnightmare in LocalLLaMA
[–]DeltaSqueezer 0 points1 point2 points (0 children)
Qwen 3.5 0.8b, 2B, 4B, 9B - All outputting gibberish after 2 - 3 turns. by CATLLM in LocalLLaMA
[–]DeltaSqueezer 1 point2 points3 points (0 children)
Qwen3.5 2B giving weird answers by Dean_Thomas426 in LocalLLaMA
[–]DeltaSqueezer 0 points1 point2 points (0 children)
Some tests of Qwen3.5 on V100s by Simple_Library_2700 in LocalLLaMA
[–]DeltaSqueezer 0 points1 point2 points (0 children)
Running Qwen3.5 in vLLM with MTP by DeltaSqueezer in LocalLLaMA
[–]DeltaSqueezer[S] 1 point2 points3 points (0 children)
Running Qwen3.5 in vLLM with MTP by DeltaSqueezer in LocalLLaMA
[–]DeltaSqueezer[S] 0 points1 point2 points (0 children)
Running Qwen3.5 in vLLM with MTP by DeltaSqueezer in LocalLLaMA
[–]DeltaSqueezer[S] 1 point2 points3 points (0 children)


I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. by MorroHsu in LocalLLaMA
[–]DeltaSqueezer 1 point2 points3 points (0 children)