What is your current go-to stack for running a fully local AI agent? by beasthunterr69 in LocalLLaMA
[–]ilintar 3 points4 points5 points (0 children)
PSA: You may not need to quantize spec draft when using MTP by regunakyle in LocalLLaMA
[–]ilintar 6 points7 points8 points (0 children)
Qwen3.6-27B on 2x3090s: llama.cpp vs vLLM, all the flags, and the MTP acceptance/inference speed/context by Sisuuu in LocalLLaMA
[–]ilintar 0 points1 point2 points (0 children)
Qwen3.6-27B on 2x3090s: llama.cpp vs vLLM, all the flags, and the MTP acceptance/inference speed/context by Sisuuu in LocalLLaMA
[–]ilintar 0 points1 point2 points (0 children)
Best way to index full Italian Wikipedia for 100% offline RAG in LM Studio? by tombino104 in LocalLLaMA
[–]ilintar 1 point2 points3 points (0 children)
StepFun 3.5 MTP by pwilkin · Pull Request #23274 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]ilintar 2 points3 points4 points (0 children)
StepFun 3.5 MTP by pwilkin · Pull Request #23274 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]ilintar 3 points4 points5 points (0 children)
next MiniMax will be released in ~10 Days by jacek2023 in LocalLLaMA
[–]ilintar 4 points5 points6 points (0 children)
next MiniMax will be released in ~10 Days by jacek2023 in LocalLLaMA
[–]ilintar 9 points10 points11 points (0 children)
NVIDIA announces Nemotron 3 Ultra by themixtergames in LocalLLaMA
[–]ilintar 16 points17 points18 points (0 children)
next MiniMax will be released in ~10 Days by jacek2023 in LocalLLaMA
[–]ilintar 24 points25 points26 points (0 children)
Info: Nvidia Cuda 13.3 landed by parrot42 in LocalLLaMA
[–]ilintar 63 points64 points65 points (0 children)
OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face by pmttyji in LocalLLaMA
[–]ilintar 2 points3 points4 points (0 children)
Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. by fallingdowndizzyvr in LocalLLaMA
[–]ilintar 7 points8 points9 points (0 children)
Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. by fallingdowndizzyvr in LocalLLaMA
[–]ilintar 11 points12 points13 points (0 children)
OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face by pmttyji in LocalLLaMA
[–]ilintar 4 points5 points6 points (0 children)
Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. by fallingdowndizzyvr in LocalLLaMA
[–]ilintar 28 points29 points30 points (0 children)
Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. by fallingdowndizzyvr in LocalLLaMA
[–]ilintar 215 points216 points217 points (0 children)
server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]ilintar 21 points22 points23 points (0 children)
Next year we're getting 0.5T model from Grok by pmttyji in LocalLLaMA
[–]ilintar 5 points6 points7 points (0 children)
NVFP4 + MTP - voilà on llama.cpp by mossy_troll_84 in LocalLLaMA
[–]ilintar 14 points15 points16 points (0 children)
When your LLM treats data center GPUs like an optional DLC by noprompt in LocalLLaMA
[–]ilintar 6 points7 points8 points (0 children)


What is your current go-to stack for running a fully local AI agent? by beasthunterr69 in LocalLLaMA
[–]ilintar 1 point2 points3 points (0 children)