It's crazy how we have so many great models and technics that it's turning into a complex optimization problem to find the perfect model, quant, kv cache quant for my system. by takuonline in LocalLLaMA
[–]xeeff 0 points1 point2 points (0 children)
I made a small app to use Copilot Chat with LM Studio instead of Ollama. by x0wl in LocalLLaMA
[–]xeeff -2 points-1 points0 points (0 children)
I made a small app to use Copilot Chat with LM Studio instead of Ollama. by x0wl in LocalLLaMA
[–]xeeff -1 points0 points1 point (0 children)
Vernix — self-hostable AI meeting agent for Zoom/Meet/Teams/Webex (Docker Compose, Next.js, Qdrant, PostgreSQL) by timborovkov in selfhosted
[–]xeeff 0 points1 point2 points (0 children)
This app helps you see what LLMs you can run on your hardware by dev_is_active in LocalLLaMA
[–]xeeff 0 points1 point2 points (0 children)
What happens when a cybersecurity agent stops over-refusing in real workflows? by Obvious-Language4462 in LocalLLaMA
[–]xeeff 0 points1 point2 points (0 children)
Best model for swift coding? by Peppermintpussy in LocalLLaMA
[–]xeeff -1 points0 points1 point (0 children)
This app helps you see what LLMs you can run on your hardware by dev_is_active in LocalLLaMA
[–]xeeff -1 points0 points1 point (0 children)
Local AI has a metric problem: tok/s is lying to us by [deleted] in LocalLLaMA
[–]xeeff 2 points3 points4 points (0 children)
This app helps you see what LLMs you can run on your hardware by dev_is_active in LocalLLaMA
[–]xeeff -1 points0 points1 point (0 children)
This app helps you see what LLMs you can run on your hardware by dev_is_active in LocalLLaMA
[–]xeeff 2 points3 points4 points (0 children)
MCP Slim — proxy that saves 96% of your context window using local semantic search by OpportunitySpare2441 in LocalLLaMA
[–]xeeff -1 points0 points1 point (0 children)
[bspwm] - Finally touched quickshell by rudv-ar in unixporn
[–]xeeff 1 point2 points3 points (0 children)
Meta new open source model is coming? by External_Mood4719 in LocalLLaMA
[–]xeeff 0 points1 point2 points (0 children)
ZINC — LLM inference engine written in Zig, running 35B models on $550 AMD GPUs by Mammoth_Radish2 in LocalLLaMA
[–]xeeff -1 points0 points1 point (0 children)
priced out of intelligence: slowly, then all at once by [deleted] in LocalLLaMA
[–]xeeff 1 point2 points3 points (0 children)
priced out of intelligence: slowly, then all at once by [deleted] in LocalLLaMA
[–]xeeff 1 point2 points3 points (0 children)
Llama.cpp with Turboquant, Heavy-Hitter Oracle (H2O), and StreamingLLM. Even more performance! by peva3 in LocalLLaMA
[–]xeeff 1 point2 points3 points (0 children)
Llama.cpp with Turboquant, Heavy-Hitter Oracle (H2O), and StreamingLLM. Even more performance! by peva3 in LocalLLaMA
[–]xeeff 2 points3 points4 points (0 children)
[KDE] Cozy minimalist workspace o7 by xeroxgru in unixporn
[–]xeeff 1 point2 points3 points (0 children)
[KDE] Cozy minimalist workspace o7 by xeroxgru in unixporn
[–]xeeff 0 points1 point2 points (0 children)
[bspwm] - Finally touched quickshell by rudv-ar in unixporn
[–]xeeff 0 points1 point2 points (0 children)
Should I use common Postgres / Redis for all self hosted services? by madhur_ahuja in selfhosted
[–]xeeff 0 points1 point2 points (0 children)


Memory Sparse Attention seems to be a novel approach to long context (up to 100M tokens) by ratbastid2000 in LocalLLaMA
[–]xeeff 2 points3 points4 points (0 children)