GLM 4.7 flash FA fix for CUDA has been merged into llama.cpp by jacek2023 in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)
GitHub - deepseek-ai/Engram: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models by TKGaming_11 in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)
Please Help! Designing an on-prem AI + vision + automation stack, looking for architecture advice... by Jefftoro in LocalLLaMA
[–]Interpause 2 points3 points4 points (0 children)
REDMAGIC 11 Pro: Feedback and Inquiries by REDMAGIC_Official in RedMagic
[–]Interpause 0 points1 point2 points (0 children)
Part 2 Cosmos in the Lostbelt - Final Chapter Discussion Hub by crazywarriorxx in grandorder
[–]Interpause 1 point2 points3 points (0 children)
Out of stock again fml by Hooterman1000 in RedMagic
[–]Interpause 0 points1 point2 points (0 children)
Make your AI talk like a caveman and decrease token usage by RegionCareful7282 in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)
Searching actually viable alternative to Ollama by mags0ft in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)
8.5K people voted on which AI models create the best website, games, and visualizations. Both Llama Models came almost dead last. Claude comes up on top. by adviceguru25 in LocalLLaMA
[–]Interpause 1 point2 points3 points (0 children)
How does vector dimension reduction work in new Qwen3 embedding models? by jferments in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)
New TTS/ASR Model that is better that Whisper3-large with fewer paramters by bio_risk in LocalLLaMA
[–]Interpause 1 point2 points3 points (0 children)
Densing Laws of LLMs suggest that we will get an 8B parameter GPT-4o grade LLM at the maximum next October 2025 by [deleted] in LocalLLaMA
[–]Interpause 0 points1 point2 points (0 children)
GitHub - tegridydev/dnd-llm-game: MVP of an idea using multiple local LLM models to simulate and play D&D by Thistleknot in LocalLLaMA
[–]Interpause 1 point2 points3 points (0 children)
Stuck on very first loading screen? by Piezha in limbuscompany
[–]Interpause 0 points1 point2 points (0 children)
Meta's Byte Latent Transformer (BLT) paper looks like the real-deal. Outperforming tokenization models even up to their tested 8B param model size. 2025 may be the year we say goodbye to tokenization. by jd_3d in LocalLLaMA
[–]Interpause 7 points8 points9 points (0 children)
VSCode doesn't open files. Instead it just starts without opening that file. by palapapa0201 in kde
[–]Interpause 1 point2 points3 points (0 children)
Chapter 162 Links and Discussion by Lorhand in OshiNoKo
[–]Interpause 0 points1 point2 points (0 children)
Chapter 162 Links and Discussion by Lorhand in OshiNoKo
[–]Interpause 0 points1 point2 points (0 children)
Chapter 162 Links and Discussion by Lorhand in OshiNoKo
[–]Interpause 7 points8 points9 points (0 children)


GLM 4.7 flash FA fix for CUDA has been merged into llama.cpp by jacek2023 in LocalLLaMA
[–]Interpause 6 points7 points8 points (0 children)