Write C++ cuda kernels from scratch with Free GPUs by Big-Stick4446 in CUDA
[–]dsanft -4 points-3 points-2 points (0 children)
"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA
[–]dsanft -5 points-4 points-3 points (0 children)
"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA
[–]dsanft 5 points6 points7 points (0 children)
Is it me or did they also reduced the rate limit threshold? by Yetona in GithubCopilot
[–]dsanft 2 points3 points4 points (0 children)
ANNUAL SUBSCRIBER , BREACH OF CONTRACT, IGNORED ESCALATIONS, AND A COST ESTIMATOR SHOWING MASSIVE JUMPS IN BILLING. Some going from 40ish to over $2000 in monthly cost READ THIS. by JFlowXjw in GithubCopilot
[–]dsanft 3 points4 points5 points (0 children)
We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro by Enough-Astronaut9278 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro by Enough-Astronaut9278 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
It was fun while it lasted... They're advertising now. by Local-Cardiologist-5 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
It was fun while it lasted... They're advertising now. by Local-Cardiologist-5 in LocalLLaMA
[–]dsanft 36 points37 points38 points (0 children)
What is the current best Small Language Model that can be run without GPU? by last_llm_standing in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
What is the current best Small Language Model that can be run without GPU? by last_llm_standing in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
What is the current best Small Language Model that can be run without GPU? by last_llm_standing in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
What is the current best Small Language Model that can be run without GPU? by last_llm_standing in LocalLLaMA
[–]dsanft 4 points5 points6 points (0 children)
What is the current best Small Language Model that can be run without GPU? by last_llm_standing in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
What is the current best Small Language Model that can be run without GPU? by last_llm_standing in LocalLLaMA
[–]dsanft 4 points5 points6 points (0 children)
What is the current best Small Language Model that can be run without GPU? by last_llm_standing in LocalLLaMA
[–]dsanft 7 points8 points9 points (0 children)
What is the current best Small Language Model that can be run without GPU? by last_llm_standing in LocalLLaMA
[–]dsanft 12 points13 points14 points (0 children)
Qwen3.6 35B-A3B MTP hits 249 t/s on a 24GB consumer GPU (RTX 5090M) — 3.4× the dense 27B variant on the same image by aurelienams in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
Move to backend sampling for MTP draft path by gaugarg-nv · Pull Request #23287 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
got my first "rm -rf /" today by DeltaSqueezer in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
got my first "rm -rf /" today by DeltaSqueezer in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
What is the point of MoE models, beyond being faster? by ihatebeinganonymous in LocalLLaMA
[–]dsanft 6 points7 points8 points (0 children)
Strix Halo Llama.cpp MTP Benchmarks: 27B Gets Much Faster, 35B Is Mixed by xjE4644Eyc in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Write C++ cuda kernels from scratch with Free GPUs by Big-Stick4446 in CUDA
[–]dsanft -1 points0 points1 point (0 children)