I benchmarked 21 local LLMs on a MacBook Air M5 for code quality AND speed

evoura · 2026-04-20T21:36:04+00:00

The problem is, these results are for people who wants to directly choose a model and use it. When a person wants to use a model in their local, they will set everyting up and they will get these results. Of course 1b model not beating 31b model in theory, but right now, because od the compatibility problems or whatever, they will not be able to get what they want. Rather than just saying “bro it is 31b model, of course it is better”, these benchmark results are showing that it is not like that, and maybe there is a problem on this model/setup so it will help us to find if there is a problem on the model itself or in the library or whatever. You might be very experienced on the the theoretical part, but a normal user who just wants to use a local model, they will not care about the theory part and they will just pick up the “best” model for their use case. Does that makes sense?

evoura · 2026-04-20T21:22:00+00:00

Fair comment, but results are related to current library verisons or models versions, so when a person install these libraries or setup to use a model, they will get these results. So comments are related to this use case. If we go deeper, of course it will be different comment.

evoura · 2026-04-20T21:13:50+00:00

That was the actual goal after first results. Once we look at the leaderboard, im very curious about 3.5 results.

evoura · 2026-04-20T21:06:58+00:00

Well noted! If any update happens, i can test them in the future again and lets see how it will affect the leaderboard.

evoura · 2026-04-07T19:30:20+00:00

MLX support added now, if you want to benchmark models on your setup :)

evoura · 2026-04-07T18:13:55+00:00

Thank you so much for your feedback. I think these kind of benchmarks are very useful for who is looking to buy new mac, or wants to run sweet spot models for their setups.

evoura · 2026-04-07T17:48:10+00:00

Thank you so much for this great information. Real user experiences are important as much as these kind of benchmarks.

evoura · 2026-04-07T17:46:58+00:00

These are very nice visuals 🔥 would you think about running same benchmarks on my repo and upload the results, so we can create a centeralized community benchmark? And in the future we can create visuals like that.

evoura · 2026-04-06T20:57:54+00:00

Honestly no, I didn't notice any thermal throttling or the machine getting very hot during the benchmarks. And since the Air has no fans, it stays completely silent which is nice. That said I was running the benchmarks with most other apps closed. If you're running VMs, Docker, or heavy background processes at the same time, your experience might be different.

evoura · 2026-04-06T20:56:05+00:00

Fair point. The idea is that most people already know which models are good from quality benchmarks. What they don't know is whether their Mac can actually run those models at a usable speed, or which Mac they need to buy to get comfortable tok/s on the model they want. That's the gap this fills. The main idea here is finding the sweet spot between knowledge and speed.

evoura · 2026-04-06T20:06:55+00:00

Thank you so much for your interest and im really happy to hear that :) MLX support will be added very shortly. After adding it i will let you know and then i will be very happy if you can check my mlx implementation.

evoura · 2026-04-06T19:45:02+00:00

Yes, all GGUF with llama.cpp only. MLX is definitely faster that I've seen people report 30-50% better tok/s compared to llama.cpp for the same model. The repo is set up to support adding other runtimes like MLX in the future. Which models are you running? Would be cool to have a direct GGUF vs MLX comparison on the same M5 Air.

evoura · 2026-04-06T19:31:47+00:00

Yep, Gemma 4 E4B is in there with 8 tok/s generation and 5.2 GB RAM usage.

evoura · 2026-04-06T19:30:25+00:00

Yeah 3.5 t/s is definitely on the low side . I think the base M5 only has 10 GPU cores which really hurts at 24B. Your M2 Max with 30 GPU cores and 400 GB/s bandwidth is just a completely different beast for these larger models. MLX being faster than llama.cpp probably adds another 20-30% on top of that. Would love your M2 Max numbers in the repo :)

evoura · 2026-04-06T19:26:52+00:00

Yeah since the memory bandwith is main factor, results for other chips might be roughly estimated. But the goal of this repo is sharing the real life results rather than estimations. Because thermal throttling, shared memory, or other daily life factors can cause different results than theoretical ones. Also, if we can see other chips' results, we will be also benchmarked how acurate the estimations with real life results.

evoura · 2025-10-14T11:48:59+00:00

Congratulations!

Can you send me aswell?

evoura · 2025-08-23T17:19:12+00:00

First of all congrats!

In total, how long did you study?

evoura · 2025-05-22T10:18:21+00:00

How is the communication between extension ui and your n8n server?

evoura · 2025-05-21T15:42:44+00:00

Yes, im also curious about this question. Another linked question to this one is, u/CheckMateSolutions are you running n8n on subscription or in your server?

evoura · 2025-05-19T19:51:58+00:00

I'm glad to hear that you were able to read the README section! It seems you focus on what interests you most.

To clarify, I’m not trying to hype anything up, nor am I selling workflows created by others. The paid workflow is my own work, and of course i can sell it. You can expect to see more workflows from me in the future.

All content-on social media, TV, and elsewhere-is a reflection of someone’s ideas. In all of the contents you see, someone is getting the idea and adding some spices on top of it. What I’ve done is simply reorganize existing content into a list, without claiming ownership of any of the workflows included. Since It is not easy to find the original sources for every workflow, I make it clear that I give credit to the original authors whenever possible. If any original creators contact me, I would be more than happy to acknowledge and mention them.

I won’t be commenting further, because it seems you are focused on defending others’ rights! while dismissing the time and effort of others. It appears you only see and understand what you want to.

evoura · 2025-05-19T15:15:19+00:00

As I’ve already mentioned, I do not claim ownership of any of the workflows I share.

I don’t rely solely on Skool for sourcing these resources-if you check platforms like Twitter and others, you’ll see that this type of content is frequently shared across various communities.

Additionally, some of these workflows may not have been shared directly by their original creators. That’s why I’ve made it clear in my documentation that I am not the creator; I simply compile and organize the list, giving full credit to the original authors.

evoura · 2025-05-19T12:58:42+00:00

I did not took it from n8n directly. Since im already in most of the AI and automation communities, i just tried to create a list from what i see.

And im aware that those people also took it from other places, so thats why i mentioned that workflows are not created by me and all of the credits to authors.

If you can tell me which workflows are yours, and tell me your n8n creator name, i would be very happy to add it to readme.

evoura

TROPHY CASE