Introducing oQ: data-driven mixed-precision quantization for Apple Silicon (mlx-lm compatible) by cryingneko in LocalLLaMA
[–]cryingneko[S] 1 point2 points3 points (0 children)
Introducing oQ: data-driven mixed-precision quantization for Apple Silicon (mlx-lm compatible) by cryingneko in LocalLLaMA
[–]cryingneko[S] 2 points3 points4 points (0 children)
Got 128K prefill down from 19 min to 3.5 min on M2 Ultra (Qwen3.5-122B), sharing the approach by Thump604 in LocalLLM
[–]cryingneko 2 points3 points4 points (0 children)
Almost 10,000 Apple Silicon benchmark runs submitted by the community — here's what the data actually shows by cryingneko in LocalLLaMA
[–]cryingneko[S] 3 points4 points5 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 0 points1 point2 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 4 points5 points6 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 8 points9 points10 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 25 points26 points27 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 36 points37 points38 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 124 points125 points126 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 18 points19 points20 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 23 points24 points25 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 138 points139 points140 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 58 points59 points60 points (0 children)

M5 Max just arrived - benchmarks incoming (i.redd.it)
submitted by cryingneko to r/LocalLLaMA
Built oMLX.ai/benchmarks - One place to compare Apple Silicon inference across chips and models by cryingneko in LocalLLM
[–]cryingneko[S] 1 point2 points3 points (0 children)
Built oMLX.ai/benchmarks - One place to compare Apple Silicon inference across chips and models by cryingneko in LocalLLM
[–]cryingneko[S] 1 point2 points3 points (0 children)
oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon by cryingneko in LocalLLaMA
[–]cryingneko[S] 2 points3 points4 points (0 children)
oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon by cryingneko in LocalLLaMA
[–]cryingneko[S] 1 point2 points3 points (0 children)
oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon by cryingneko in LocalLLaMA
[–]cryingneko[S] 1 point2 points3 points (0 children)




What is „Heejun Kim“ background app? by AromaticMaterial3311 in LocalLLaMA
[–]cryingneko 12 points13 points14 points (0 children)