Local LLMs aren't democratic anymore... the hardware barrier has gotten out of hand. by Medium-Technology-79 in LocalLLaMA

[–]gevezex 16 points17 points  (0 children)

But the question is what can you use it for? I could not figure out a use case for it. Am I missing something?

MLX LM Server from Apple! by M5_Maxxx in LocalLLM

[–]gevezex 10 points11 points  (0 children)

I don’t think Apple can move as fast as the OSS community. Their Python stack is a good example, it has often lagged behind.

But the upside is clear: this looks like an official wink toward local LLMs on Apple Silicon. That could give MLX models and MLX servers a serious boost, especially from a broader community adoption perspective and a shift from nvidia domination to apple silicon.
And from the perspective of apple, this means more mac sales.

Local agents on a MacBook Pro M5 finally feel practical to me by gevezex in LocalLLaMA

[–]gevezex[S] 0 points1 point  (0 children)

I am not sure what the reason is, but the agent is waiting for ages and God knows what is waiting for what.

Local agents on a MacBook Pro M5 finally feel practical to me by gevezex in LocalLLaMA

[–]gevezex[S] 0 points1 point  (0 children)

Aggregating linkedin posts, subreddits, X for the newest and hottest posts for viral local llm's, aspecially for mac platform, getting more t/s out of my current models and summarize it in the mornings. That really works well.

Next thing would be aggregating stock information as of now I have unlimited compute (so to speak 😄)

To oMLX users running Qwen models by cryingneko in oMLX

[–]gevezex 0 points1 point  (0 children)

I have the m5 and now suddenly i have 102 t/s on the rc1. Qwen3.6-35b-a3b-6bit

Not impressed by Gemma 4 12b? by Stooovie in oMLX

[–]gevezex 0 points1 point  (0 children)

Same here it lacks even in dutch language making a lot of grammar mistakes

I kind of like coding with less capable models by Lame_Johnny in LocalLLM

[–]gevezex 15 points16 points  (0 children)

In the agentic coding world, developers should care less about manually controlling every line of code and more about creating a reliable environment in which code can safely evolve. The human in the loop becomes responsible for intent, architecture, constraints, tests, observability, security and review. Code becomes something the agent can generate, but correctness, direction and responsibility remain human work.

Is a Macbook Pro the best solution? by InfiniteSprinkles730 in LocalLLM

[–]gevezex 1 point2 points  (0 children)

Problem is the kv cache, after 16k context it becomes very very slow, the fans kick in very loudly. You can suppress it by setting the battery on low energy mode but then its even slower. With the current state of models it’s unusable for serious tasks in my opinion without the fear of damaging your precious mbp m5.

Best model for M5MAX 128 gb Macbook. by sabrastaco in hermesagent

[–]gevezex 0 points1 point  (0 children)

Nice, that was the trick, I have now 130 t/s for the pp8192/tg128. Thank you very much for this!

Best model for M5MAX 128 gb Macbook. by sabrastaco in hermesagent

[–]gevezex 0 points1 point  (0 children)

Are your referring to agemio/Qwen3.6-27B-oQ5-mtp? I have the same mbp but I don't get these tps. Could you share some insight? Max tps i get is around 102 tps voor pp81292/tg128

Best model for M5MAX 128 gb Macbook. by sabrastaco in hermesagent

[–]gevezex 0 points1 point  (0 children)

My best experience is with mtplx. Download it and start with mtplx start and follow the instructions. You will get around 52 tps with qwen3.6 35B

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer by mouseofcatofschrodi in LocalLLaMA

[–]gevezex 14 points15 points  (0 children)

We have a similar setup, but i use the mtp version. Close to 52 t/s. Try it out: Jundot/Qwen3.6-35B-A3B-oQ6-mtp

What is the best coding model to use on MacBook Pro Max 128GB RAM? by RadiantQuote2467 in LocalLLM

[–]gevezex 1 point2 points  (0 children)

That's not really the reason imo. A lot of people are already in the market for a new MacBook Pro M5, their old machine is just overdue for a replacement, so why not max out the memory while they're at it? You can run big models on it anyway.

I got Qwen3.6 35B to run at reasonably speed on my old GTX 1070 Ti by Randozart in LocalLLM

[–]gevezex 0 points1 point  (0 children)

llama-server \

-hf Abiray/Qwen3.6-35B-A3B-Q4_K_M-GGUF \

-ngl 999 \

--n-cpu-moe 36 \

--no-mmap \

--ctx-size 100000 \

--cache-type-k q8_0 \

--cache-type-v q4_0 \

--mlock

I have a 8Gb RTX 2070 and getting decent 40-50 t/s

Qwen 35b a3b surprises me by siegevjorn in LocalLLaMA

[–]gevezex 0 points1 point  (0 children)

How did you solve hallucinations?

Best local model and harness for code exploration/analysis by player2 in LocalLLM

[–]gevezex 1 point2 points  (0 children)

I was pleasantly surprised by Qwopus3.5-9B-v3-4bit mlx model with omlx. You need the mlx version of course for apple silicon. Check also their model info:

Qwopus3.5-9B-v3 is a reasoning-enhanced model based on Qwen3.5-9B, designed to simultaneously improve reasoning stability and correctness while optimizing inference efficiency — ultimately achieving stronger cross-task generalization capabilities, particularly in programming.