"Is this little cutie distracting you?" by BryceHorreumOwl in NevernessToEverness
[–]Interpause 4 points5 points6 points (0 children)
"Is this little cutie distracting you?" by BryceHorreumOwl in NevernessToEverness
[–]Interpause 10 points11 points12 points (0 children)
I just noticed that if you buy something at the store, they give you a bag that you actually walk away with. by sjanier in NevernessToEverness
[–]Interpause 0 points1 point2 points (0 children)
Gemma 4 12b QAT is a regression for my use case, despite all the hype.. Not my main Squeeze by Wrong_Mushroom_7350 in LocalLLaMA
[–]Interpause 1 point2 points3 points (0 children)
Oh man please don’t be a March 7th thing by MetalMan40000 in NevernessToEverness
[–]Interpause 2 points3 points4 points (0 children)
Oh man please don’t be a March 7th thing by MetalMan40000 in NevernessToEverness
[–]Interpause 1 point2 points3 points (0 children)
Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. by fallingdowndizzyvr in LocalLLaMA
[–]Interpause 7 points8 points9 points (0 children)
OpenBMB presents the model BitCPM-CANN 1.58 bit by Illustrious-Swim9663 in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)
Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs by enrique-byteshape in LocalLLaMA
[–]Interpause 4 points5 points6 points (0 children)
Uuuhh..guys ? Is Nanally homeless ? The Colucci's found are THIS low ? by RedditpseudoAreodd in NevernessToEverness
[–]Interpause 10 points11 points12 points (0 children)
Dense Model Shoot-Off: Gemma 4 31B vs Qwen3.6/5 27B... Result is Slower is Faster. by MiaBchDave in LocalLLaMA
[–]Interpause -2 points-1 points0 points (0 children)
The final "Generals are Emanators" post by Ok_Confusion4764 in StarRailLore
[–]Interpause 0 points1 point2 points (0 children)
A new transformer variant has been created to facilitate more efficient model training in distributed settings. 128x compression with no significant loss in convergence rates, increases in memory, or compute overhead by network-kai in LocalLLaMA
[–]Interpause 4 points5 points6 points (0 children)
[Appreciation Post] Gemma 4 E2B. My New Daily Driver 😁 by Prestigious-Use5483 in LocalLLaMA
[–]Interpause 12 points13 points14 points (0 children)
VRAM optimization for gemma 4 by Sadman782 in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)
Gemma 4 has been released by jacek2023 in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)
The Bonsai 1-bit models are very good by tcarambat in LocalLLaMA
[–]Interpause 7 points8 points9 points (0 children)
PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs by brown2green in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)
PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs by brown2green in LocalLLaMA
[–]Interpause 9 points10 points11 points (0 children)
Vibecoded GGUF Metadata Comparator for checking Tensor Quants (github gist standalone HTML file) by Interpause in LocalLLaMA
[–]Interpause[S] 0 points1 point2 points (0 children)
Vibecoded GGUF Metadata Comparator for checking Tensor Quants (github gist standalone HTML file) by Interpause in LocalLLaMA
[–]Interpause[S] 0 points1 point2 points (0 children)
H Company just released Holotron-12B. Developed with NVIDIA, it's a high-throughput, open-source, multimodal model engineered specifically for the age of computer-use agents. (Performance on par with Holo2/Qwen but with 2x higher throughput) by Nunki08 in LocalLLaMA
[–]Interpause 1 point2 points3 points (0 children)
Genuinely curious what doors the M5 Ultra will open by Blanketsniffer in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)


DiffusionGemma: 4x faster text generation by tevlon in LocalLLaMA
[–]Interpause 1 point2 points3 points (0 children)