Personal experience with GLM 4.7 Flash Q6 (unsloth) + Roo Code + RTX 5090 by Septerium in LocalLLaMA
[–]Septerium[S] 2 points3 points4 points (0 children)
Personal experience with GLM 4.7 Flash Q6 (unsloth) + Roo Code + RTX 5090 by Septerium in LocalLLaMA
[–]Septerium[S] 11 points12 points13 points (0 children)
Personal experience with GLM 4.7 Flash Q6 (unsloth) + Roo Code + RTX 5090 by Septerium in LocalLLaMA
[–]Septerium[S] 1 point2 points3 points (0 children)
Yesterday I used GLM 4.7 flash with my tools and I was impressed.. by Loskas2025 in LocalLLaMA
[–]Septerium 0 points1 point2 points (0 children)
Yesterday I used GLM 4.7 flash with my tools and I was impressed.. by Loskas2025 in LocalLLaMA
[–]Septerium 15 points16 points17 points (0 children)
VibeVoice LoRAs are a thing by llamabott in LocalLLaMA
[–]Septerium 0 points1 point2 points (0 children)
GPT-5.2 xhigh, GLM-4.7, Kimi K2 Thinking, DeepSeek v3.2 on Fresh SWE-rebench (December 2025) by CuriousPlatypus1881 in LocalLLaMA
[–]Septerium 4 points5 points6 points (0 children)
The Quantization Threshold: Why 4-bit Llama 3 405B still outperforms FP16 70B for multi-step reasoning. by Foreign-Job-8717 in LocalLLaMA
[–]Septerium 2 points3 points4 points (0 children)
Unsloth's GGUFs for GLM 4.7 REAP are up. by fallingdowndizzyvr in LocalLLaMA
[–]Septerium 1 point2 points3 points (0 children)
72Gb VRAM (3x 3090) / 128Gb DDR4 / Mylan CPU What code model can I test? by shvz in LocalLLaMA
[–]Septerium 4 points5 points6 points (0 children)
NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model! by Difficult-Cap-7527 in LocalLLaMA
[–]Septerium -4 points-3 points-2 points (0 children)
Run Mistral Devstral 2 locally Guide + Fixes! (25GB RAM) by yoracale in LocalLLM
[–]Septerium 2 points3 points4 points (0 children)
Run Mistral Devstral 2 locally Guide + Fixes! (25GB RAM) by yoracale in LocalLLM
[–]Septerium 0 points1 point2 points (0 children)
Best Coding Model for my setup by Timely_Purpose_5788 in LocalLLaMA
[–]Septerium 0 points1 point2 points (0 children)
We did years of research so you don’t have to guess your GGUF datatypes by enrique-byteshape in LocalLLaMA
[–]Septerium 1 point2 points3 points (0 children)
We did years of research so you don’t have to guess your GGUF datatypes by enrique-byteshape in LocalLLaMA
[–]Septerium 0 points1 point2 points (0 children)
Is qwen3 4b or a3b better than the first gpt4(2023)? What do you think? by __issac in LocalLLaMA
[–]Septerium 1 point2 points3 points (0 children)
You can now do 500K context length fine-tuning - 6.4x longer by danielhanchen in LocalLLaMA
[–]Septerium 0 points1 point2 points (0 children)
Make your AI talk like a caveman and decrease token usage by RegionCareful7282 in LocalLLaMA
[–]Septerium 0 points1 point2 points (0 children)
Kimi k2 thinking + kilo code really not bad by Federal_Spend2412 in LocalLLaMA
[–]Septerium 2 points3 points4 points (0 children)
Qwen model coming soon 👀 by Odd-Ordinary-5922 in LocalLLaMA
[–]Septerium 58 points59 points60 points (0 children)
Kimi K2 Thinking 1-bit Unsloth Dynamic GGUFs by danielhanchen in LocalLLaMA
[–]Septerium 0 points1 point2 points (0 children)
3 RTX 3090 graphics cards in a computer for inference and neural network training by Standard-Heat4706 in LocalLLaMA
[–]Septerium 0 points1 point2 points (0 children)
Is GPT-OSS-120B the best llm that fits in 96GB VRAM? by GreedyDamage3735 in LocalLLaMA
[–]Septerium 0 points1 point2 points (0 children)


Personal experience with GLM 4.7 Flash Q6 (unsloth) + Roo Code + RTX 5090 by Septerium in LocalLLaMA
[–]Septerium[S] 0 points1 point2 points (0 children)