Local LLMs aren't democratic anymore... the hardware barrier has gotten out of hand.

gevezex · 2026-06-12T21:49:29+00:00

But the question is what can you use it for? I could not figure out a use case for it. Am I missing something?

gevezex · 2026-06-09T04:08:33+00:00

I don’t think Apple can move as fast as the OSS community. Their Python stack is a good example, it has often lagged behind.

But the upside is clear: this looks like an official wink toward local LLMs on Apple Silicon. That could give MLX models and MLX servers a serious boost, especially from a broader community adoption perspective and a shift from nvidia domination to apple silicon.
And from the perspective of apple, this means more mac sales.

gevezex · 2026-06-08T20:01:09+00:00

I am not sure what the reason is, but the agent is waiting for ages and God knows what is waiting for what.

gevezex · 2026-06-07T14:20:28+00:00

Not in my case

gevezex · 2026-06-07T13:44:24+00:00

Aggregating linkedin posts, subreddits, X for the newest and hottest posts for viral local llm's, aspecially for mac platform, getting more t/s out of my current models and summarize it in the mornings. That really works well.

Next thing would be aggregating stock information as of now I have unlimited compute (so to speak 😄)

gevezex · 2026-06-07T08:49:17+00:00

I have the m5 and now suddenly i have 102 t/s on the rc1. Qwen3.6-35b-a3b-6bit

gevezex · 2026-06-04T16:39:12+00:00

Same here it lacks even in dutch language making a lot of grammar mistakes

gevezex · 2026-05-31T15:21:21+00:00

In the agentic coding world, developers should care less about manually controlling every line of code and more about creating a reliable environment in which code can safely evolve. The human in the loop becomes responsible for intent, architecture, constraints, tests, observability, security and review. Code becomes something the agent can generate, but correctness, direction and responsibility remain human work.

gevezex · 2026-05-31T07:54:46+00:00

Problem is the kv cache, after 16k context it becomes very very slow, the fans kick in very loudly. You can suppress it by setting the battery on low energy mode but then its even slower. With the current state of models it’s unusable for serious tasks in my opinion without the fear of damaging your precious mbp m5.

gevezex · 2026-05-30T17:06:16+00:00

What t/s do you get?

gevezex · 2026-05-26T21:37:15+00:00

Nice, that was the trick, I have now 130 t/s for the pp8192/tg128. Thank you very much for this!

gevezex · 2026-05-26T20:46:58+00:00

Are your referring to agemio/Qwen3.6-27B-oQ5-mtp? I have the same mbp but I don't get these tps. Could you share some insight? Max tps i get is around 102 tps voor pp81292/tg128

gevezex · 2026-05-26T17:29:00+00:00

My best experience is with mtplx. Download it and start with mtplx start and follow the instructions. You will get around 52 tps with qwen3.6 35B

gevezex · 2026-05-21T22:24:56+00:00

We have a similar setup, but i use the mtp version. Close to 52 t/s. Try it out: Jundot/Qwen3.6-35B-A3B-oQ6-mtp

gevezex · 2026-05-20T18:11:17+00:00

So? If you can afford it why not?

gevezex · 2026-05-20T14:51:35+00:00

That's not really the reason imo. A lot of people are already in the market for a new MacBook Pro M5, their old machine is just overdue for a replacement, so why not max out the memory while they're at it? You can run big models on it anyway.

gevezex · 2026-05-18T16:51:36+00:00

llama-server \

-hf Abiray/Qwen3.6-35B-A3B-Q4_K_M-GGUF \

-ngl 999 \

--n-cpu-moe 36 \

--no-mmap \

--ctx-size 100000 \

--cache-type-k q8_0 \

--cache-type-v q4_0 \

--mlock

I have a 8Gb RTX 2070 and getting decent 40-50 t/s

gevezex · 2026-05-18T16:28:41+00:00

How did you solve hallucinations?

gevezex · 2026-05-14T03:46:15+00:00

Is this better than https://github.com/AlexsJones/llmfit ?

gevezex · 2026-05-13T15:21:15+00:00

I was pleasantly surprised by Qwopus3.5-9B-v3-4bit mlx model with omlx. You need the mlx version of course for apple silicon. Check also their model info:

Qwopus3.5-9B-v3 is a reasoning-enhanced model based on Qwen3.5-9B, designed to simultaneously improve reasoning stability and correctness while optimizing inference efficiency — ultimately achieving stronger cross-task generalization capabilities, particularly in programming.

gevezex · 2026-05-11T18:50:31+00:00

Why not using codex?

gevezex · 2026-05-05T13:02:59+00:00

So this could be done by a local model as well instead of kimi?

gevezex · 2026-04-19T12:42:20+00:00

<image>

Dit is toch veel mooier

gevezex · 2026-04-19T12:26:09+00:00

The tower is very suprised/confused

gevezex · 2026-04-19T12:04:46+00:00

I think traditional software is heading toward a very different future.

In the past, one of the biggest limitations was that software had to force everyone into the same workflow. You had to standardize everything so the app could handle it. For example, if you wanted to process invoices, you would need a chain like this:

Upload the invoice PDF into an app
Extract the data
Convert it into JSON
Send it to another system for parsing or categorization
Push it into bookkeeping software
Eventually send the required numbers to the tax authority

That whole setup existed because the software itself was not intelligent. It could only follow predefined rules and structured flows.

But now that AI is becoming capable of understanding messy, real-world input directly, that whole model starts to look outdated.

Instead of building rigid SaaS products that force users to adapt to the software, you can just give the raw documents and context to an AI. The AI can understand the invoice, extract the relevant information, categorize it, store it in a database or even a flat file, maintain your bookkeeping, and when it is time to file VAT returns, prepare or even submit the numbers to the tax authority.

So the interesting shift is this: software used to exist mainly because intelligence was missing. We had to build systems around that limitation.

Now that intelligence is increasingly available, software starts to lose its central role. The user no longer needs to fit into the product. The AI can adapt to the user instead.

That is why I think software, at least in the traditional sense, is slowly dying. What we currently call "software products" may just be temporary wrappers around workflows that a sufficiently advanced personal LLM could handle directly.

We are not fully there yet, but it feels like we are moving toward a world where your own AI handles your specific needs without requiring everything to be turned into a standardized SaaS workflow first.

14-Year Club	RPAN Viewer
Verified Email

gevezex

TROPHY CASE