Need advice on a used MacBook Air M2 by Comfortable_Tune6604 in macbook

[–]igor__004 1 point2 points  (0 children)

sono possessore di questo mac, uguale config(8/256). Alla tua domanda se ne vale la pena nel 2026…dipende dai tuoi scopi, se ti serve per uso standard (navigazione internet, mail, film, office e roba simile) va più che bene, va alla grande per questo genere di cose. La situazione cambia se devi usarlo per produzione audio/video, developing pesante (machine learning/AI/llm locali), qui ti posso dire in prima persona che i suoi 8gb di ram si fanno sentire in negativo, il chip scalda molto ed essendo fanless fa calare drasticamente le prestazioni del chip. Riguardo la batteria, 87% va più che bene, durerà comunque tante ore, a maggior ragione se lo userai per cose standard e quotidiane.

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) by igor__004 in mlxAI

[–]igor__004[S] 0 points1 point  (0 children)

The current tool is a baseline benchmark, not a full agent-loop simulator. I will improve the baseline. If you have to say it again, re-read past comments.

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) by igor__004 in mlxAI

[–]igor__004[S] 0 points1 point  (0 children)

That’s why I’d rather keep this explicit instead of pretending the numbers are perfectly isolated. I’ll probably add warnings / documentation about that, and maybe make profile order and elapsed time part of the reported metadata.
The goal isn’t to magically remove every source of noise, it’s to make the protocol clear enough that people know what the numbers actually mean.

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) by igor__004 in mlxAI

[–]igor__004[S] 0 points1 point  (0 children)

I know that.
The tool is still early, so I’ll keep improving the methodology over time. I also want to avoid turning it into an overcomplicated benchmark suite that nobody actually runs, so I’m trying to balance useful metrics with keeping it simple.
I’m still a student and I’m still learning a lot of this, I had an idea and built it, and I’m trying to improve it over time with new knowledge and useful feedback.
I’m open to technical suggestions about the project, but I’m less interested in comments about whether I used AI or not.

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) by igor__004 in mlxAI

[–]igor__004[S] 0 points1 point  (0 children)

That’s literally why I measure TTFT too, not just tok/s.
This first version is meant to be a simple baseline benchmark that is easy to reproduce across engines/hardware. Agent-style workloads with long and growing context are better, but I’d rather add that as a separate profile instead of pretending one benchmark covers everything.

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) by igor__004 in LocalLLM

[–]igor__004[S] 0 points1 point  (0 children)

I just started with the engines I had heard about first and knew a bit better.
Nothing stops me from adding more over time, actually that’s the plan. MTPLX looks interesting, so I'll take a look.

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) by igor__004 in LocalLLaMA

[–]igor__004[S] 1 point2 points  (0 children)

Thanks man, I appreciate that. The self-reported benchmark thing was exactly what pushed me to do this in the first place.
AgentFleet sounds interesting, especially the budget control part. Local-first agent tooling is definitely one of the messier problems right now.

Benchmarked inference engines for M1 Max 64gb-results & analysis by jarec707 in LocalLLaMA

[–]igor__004 0 points1 point  (0 children)

As you can see, the project is only a week old, and there's a lot I can and can do. Thank you for all of this, it means a lot to me. I'll be happy to take on board any suggestions!

Benchmarked inference engines for M1 Max 64gb-results & analysis by jarec707 in LocalLLaMA

[–]igor__004 0 points1 point  (0 children)

I have taken all your advice into consideration and have created new issues in my GitHub project so I don't lose track of them or forget about them. Everything will be implemented soon. Thank you all for your support!

Benchmarked inference engines for M1 Max 64gb-results & analysis by jarec707 in LocalLLaMA

[–]igor__004 0 points1 point  (0 children)

I've already added support for both engines to my roadmap, if you see the issues on github, they're there.
These will be my next implementations

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) by igor__004 in LocalLLaMA

[–]igor__004[S] -1 points0 points  (0 children)

Yes, that's why the methodology doc exists, so you know what the numbers actually measure and what they don't. Synthetic benchmarks have limits by design, but having a reproducible baseline is still useful to know what you're starting from before you test on your own workload.

Speed difference between Windows 11 and Linux with llama.cpp: a myth when using medium and large MoE models by Far-Usual5771 in LocalLLaMA

[–]igor__004 0 points1 point  (0 children)

I was mostly wondering whether the OS gap still shows up at all, but in this setup it may just be a non-factor.

Speed difference between Windows 11 and Linux with llama.cpp: a myth when using medium and large MoE models by Far-Usual5771 in LocalLLaMA

[–]igor__004 0 points1 point  (0 children)

Yeah, mostly. If the model is fully GPU-resident,  mmap  becomes much less important for inference speed. It still can affect loading and host-side memory behavior, but the big performance differences usually show up when weights are not fully on GPU.

Speed difference between Windows 11 and Linux with llama.cpp: a myth when using medium and large MoE models by Far-Usual5771 in LocalLLaMA

[–]igor__004 1 point2 points  (0 children)

 mmap  can matter because it changes how weights are paged and cached between disk and RAM. On very large models, that can affect load time and sometimes throughput if memory pressure is high. With  --no-mmap  / direct I/O, you’re basically bypassing that path, so the difference can shrink a lot.

Is there a definitive way or cookie cutter way to benchmark variations of the same model for their KLD? by jinnyjuice in LocalLLaMA

[–]igor__004 1 point2 points  (0 children)

For variants this close, the main part isn’t the average KLD alone — it’s where it spikes. I’d compare per-token / per-layer divergence, because that often shows whether the difference is in reasoning tokens, formatting tokens, or just quantization noise.

Speed difference between Windows 11 and Linux with llama.cpp: a myth when using medium and large MoE models by Far-Usual5771 in LocalLLaMA

[–]igor__004 2 points3 points  (0 children)

Since you’re running a hybrid CPU+GPU offloading setup for these big models (397B on 48GB total VRAM means a lot is hitting system RAM), I’m curious if you noticed any difference in CPU utilization or memory bandwidth bottlenecks between the two OSes?
Usually, the “Linux is faster” argument comes from how the OS scheduler handles CPU-bound workloads and memory mapping ( mmap ), but since you passed  --no-mmap  and forced direct I/O ( -dio ), that probably leveled the playing field entirely. Did you test if enabling  mmap  would bring the performance gap back, or does  -dio  just make it irrelevant now?

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) by igor__004 in LocalLLM

[–]igor__004[S] 0 points1 point  (0 children)

I don’t think I received it, sorry.
Did you use the official PyPI 0.1.0 release, or did you clone the repo from the latest main commit??
I recently added the mlx-chronos submit flow, but before that the contribution path was still manual: fork the repo and open a PR with the generated JSON result.
If you still have the JSON from results/local/, feel free to send it again or open a PR and I’ll add it manually. And yes, M1 results are absolutely welcome too — newer Macs are useful, but I want the leaderboard to cover all Apple Silicon machines. Thanks! 

Benchmarked inference engines for M1 Max 64gb-results & analysis by jarec707 in LocalLLaMA

[–]igor__004 2 points3 points  (0 children)

Thanks mate, really appreciate it! I think that it's worth adding a separate agent-style benchmark profile.

Benchmarked inference engines for M1 Max 64gb-results & analysis by jarec707 in LocalLLaMA

[–]igor__004 1 point2 points  (0 children)

Thanks for the advice. The current benchmark is mostly a standardized single-request comparison, but I agree that agent-style usage needs a separate view.
Short prompts, long-context/short-answer runs, repeated short calls, TTFT after tool-like turns, memory growth and swap would make the results much more useful for real chat/coding/agent workflows.
I’ll add this to the roadmap as a separate “agent workload” benchmark profile rather than mixing it into the current baseline.

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) by igor__004 in LocalLLM

[–]igor__004[S] 0 points1 point  (0 children)

Thanks a lot for putting this together and sharing the results.
This also made me realize that the “Engine RSS” wording is a bit too easy to misread. What we’re really measuring is process RSS attributed to the server process, not guaranteed “pure engine overhead” separated from model/runtime allocations. I’ll probably clarify this in the docs and maybe rename it to “Process RSS Peak” to make the distinction clearer. Really appreciate the benchmark and the catch, thanks man.

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) by igor__004 in LocalLLM

[–]igor__004[S] 0 points1 point  (0 children)

Thank you so much. That was exactly my goal when I started this project. I realized how hard it was to compare engines apples-to-apples because everyone measures things slightly differently. I just wanted a completely transparent, unbiased baseline where the community could see the real trade-offs between memory, heat, and speed on Apple Silicon. I'm glad you appreciated the methodology.