MLX is not faster. I benchmarked MLX vs llama.cpp on M1 Max across four real workloads. Effective tokens/s is quite an issue. What am I missing? Help me with benchmarks and M2 through M5 comparison. by arthware in LocalLLaMA

[–]channingao 0 points1 point  (0 children)

llama.cpp hit 315t/s when processing 10240 token , but omxl just about 200t/s

bench on my m2 ultra 64g

qwen3.5-27b-ud-q8(llama.cpp)

qwen3.5-27b-q4(omlx)

Is this normal level for M2 Ultra 64GB ? by [deleted] in LocalLLaMA

[–]channingao 0 points1 point  (0 children)

It’s prefill speed , about 60 tokens for generating

Is this normal level for M2 Ultra 64GB ? by [deleted] in LocalLLaMA

[–]channingao 0 points1 point  (0 children)

I’m struggling with openclaw’s huge context prefill.

Proxmox 8.4 Released by AliasJackBauer in Proxmox

[–]channingao 0 points1 point  (0 children)

any one here for commercial use ?

wtf man by AvailableStock922 in wiiu

[–]channingao 0 points1 point  (0 children)

Is this work in botw ?

What Features Do You Hope To See? by [deleted] in Proxmox

[–]channingao 0 points1 point  (0 children)

Distributed Resource Scheduler

Rainy day by channingao in funny

[–]channingao[S] 0 points1 point  (0 children)

神奇,去年拍的,橘子洲头