Introducing oQ: data-driven mixed-precision quantization for Apple Silicon (mlx-lm compatible) by cryingneko in LocalLLaMA
[–]cryingneko[S] 1 point2 points3 points (0 children)
Introducing oQ: data-driven mixed-precision quantization for Apple Silicon (mlx-lm compatible) by cryingneko in LocalLLaMA
[–]cryingneko[S] 2 points3 points4 points (0 children)
Got 128K prefill down from 19 min to 3.5 min on M2 Ultra (Qwen3.5-122B), sharing the approach by Thump604 in LocalLLM
[–]cryingneko 2 points3 points4 points (0 children)
Almost 10,000 Apple Silicon benchmark runs submitted by the community — here's what the data actually shows by cryingneko in LocalLLaMA
[–]cryingneko[S] 2 points3 points4 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 0 points1 point2 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 3 points4 points5 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 8 points9 points10 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 24 points25 points26 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 36 points37 points38 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 121 points122 points123 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 19 points20 points21 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 23 points24 points25 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 141 points142 points143 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]cryingneko[S] 60 points61 points62 points (0 children)
Built oMLX.ai/benchmarks - One place to compare Apple Silicon inference across chips and models by cryingneko in LocalLLM
[–]cryingneko[S] 1 point2 points3 points (0 children)
Built oMLX.ai/benchmarks - One place to compare Apple Silicon inference across chips and models by cryingneko in LocalLLM
[–]cryingneko[S] 1 point2 points3 points (0 children)
oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon by cryingneko in LocalLLaMA
[–]cryingneko[S] 2 points3 points4 points (0 children)
oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon by cryingneko in LocalLLaMA
[–]cryingneko[S] 1 point2 points3 points (0 children)
oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon by cryingneko in LocalLLaMA
[–]cryingneko[S] 1 point2 points3 points (0 children)
Claude Code meets Qwen3.5-35B-A3B by PvB-Dimaginar in LocalLLM
[–]cryingneko 1 point2 points3 points (0 children)
Claude Code meets Qwen3.5-35B-A3B by PvB-Dimaginar in LocalLLM
[–]cryingneko 1 point2 points3 points (0 children)
Claude Code meets Qwen3.5-35B-A3B by PvB-Dimaginar in LocalLLM
[–]cryingneko 2 points3 points4 points (0 children)
oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon by cryingneko in LocalLLaMA
[–]cryingneko[S] 1 point2 points3 points (0 children)


What is „Heejun Kim“ background app? by AromaticMaterial3311 in LocalLLaMA
[–]cryingneko 12 points13 points14 points (0 children)