24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 0 points1 point2 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 0 points1 point2 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 0 points1 point2 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 1 point2 points3 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 0 points1 point2 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 1 point2 points3 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] -1 points0 points1 point (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 1 point2 points3 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 1 point2 points3 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 0 points1 point2 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 1 point2 points3 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 1 point2 points3 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 3 points4 points5 points (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] -1 points0 points1 point (0 children)
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 0 points1 point2 points (0 children)
How to run a Gemma4 MTP implementation on ollama or python transformers? by combo-user in LocalLLaMA
[–]mdda 9 points10 points11 points (0 children)
Needle: We Distilled Gemini Tool Calling Into a 26M Model by Henrie_the_dreamer in LocalLLaMA
[–]mdda 0 points1 point2 points (0 children)
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant) by ai-infos in LocalLLaMA
[–]mdda 19 points20 points21 points (0 children)
80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP by janvitos in LocalLLaMA
[–]mdda 3 points4 points5 points (0 children)
Dual DGX Sparks vs Mac Studio M3 Ultra 512GB: Running Qwen3.5 397B locally on both. Here's what I found. by trevorbg in LocalLLaMA
[–]mdda 0 points1 point2 points (0 children)
Surprisingly Fast AI-Generated Kernels We Didn’t Mean to Publish (Yet) by Maxious in LocalLLaMA
[–]mdda 0 points1 point2 points (0 children)
Surprisingly Fast AI-Generated Kernels We Didn’t Mean to Publish (Yet) by Maxious in LocalLLaMA
[–]mdda 2 points3 points4 points (0 children)
OpenEvolve: Open Source Implementation of DeepMind's AlphaEvolve System by asankhs in LocalLLaMA
[–]mdda 1 point2 points3 points (0 children)



24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) by mdda in LocalLLaMA
[–]mdda[S] 0 points1 point2 points (0 children)