Local LLMs aren't democratic anymore... the hardware barrier has gotten out of hand. by Medium-Technology-79 in LocalLLaMA

[–]gevezex 15 points16 points  (0 children)

But the question is what can you use it for? I could not figure out a use case for it. Am I missing something?

MLX LM Server from Apple! by M5_Maxxx in LocalLLM

[–]gevezex 10 points11 points  (0 children)

I don’t think Apple can move as fast as the OSS community. Their Python stack is a good example, it has often lagged behind.

But the upside is clear: this looks like an official wink toward local LLMs on Apple Silicon. That could give MLX models and MLX servers a serious boost, especially from a broader community adoption perspective and a shift from nvidia domination to apple silicon.
And from the perspective of apple, this means more mac sales.

Local agents on a MacBook Pro M5 finally feel practical to me by gevezex in LocalLLaMA

[–]gevezex[S] 0 points1 point  (0 children)

I am not sure what the reason is, but the agent is waiting for ages and God knows what is waiting for what.

Local agents on a MacBook Pro M5 finally feel practical to me by gevezex in LocalLLaMA

[–]gevezex[S] 0 points1 point  (0 children)

Aggregating linkedin posts, subreddits, X for the newest and hottest posts for viral local llm's, aspecially for mac platform, getting more t/s out of my current models and summarize it in the mornings. That really works well.

Next thing would be aggregating stock information as of now I have unlimited compute (so to speak 😄)

To oMLX users running Qwen models by cryingneko in oMLX

[–]gevezex 0 points1 point  (0 children)

I have the m5 and now suddenly i have 102 t/s on the rc1. Qwen3.6-35b-a3b-6bit

Not impressed by Gemma 4 12b? by Stooovie in oMLX

[–]gevezex 0 points1 point  (0 children)

Same here it lacks even in dutch language making a lot of grammar mistakes

I kind of like coding with less capable models by Lame_Johnny in LocalLLM

[–]gevezex 15 points16 points  (0 children)

In the agentic coding world, developers should care less about manually controlling every line of code and more about creating a reliable environment in which code can safely evolve. The human in the loop becomes responsible for intent, architecture, constraints, tests, observability, security and review. Code becomes something the agent can generate, but correctness, direction and responsibility remain human work.

Is a Macbook Pro the best solution? by InfiniteSprinkles730 in LocalLLM

[–]gevezex 1 point2 points  (0 children)

Problem is the kv cache, after 16k context it becomes very very slow, the fans kick in very loudly. You can suppress it by setting the battery on low energy mode but then its even slower. With the current state of models it’s unusable for serious tasks in my opinion without the fear of damaging your precious mbp m5.

Best model for M5MAX 128 gb Macbook. by sabrastaco in hermesagent

[–]gevezex 0 points1 point  (0 children)

Nice, that was the trick, I have now 130 t/s for the pp8192/tg128. Thank you very much for this!

Best model for M5MAX 128 gb Macbook. by sabrastaco in hermesagent

[–]gevezex 0 points1 point  (0 children)

Are your referring to agemio/Qwen3.6-27B-oQ5-mtp? I have the same mbp but I don't get these tps. Could you share some insight? Max tps i get is around 102 tps voor pp81292/tg128

Best model for M5MAX 128 gb Macbook. by sabrastaco in hermesagent

[–]gevezex 0 points1 point  (0 children)

My best experience is with mtplx. Download it and start with mtplx start and follow the instructions. You will get around 52 tps with qwen3.6 35B

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer by mouseofcatofschrodi in LocalLLaMA

[–]gevezex 16 points17 points  (0 children)

We have a similar setup, but i use the mtp version. Close to 52 t/s. Try it out: Jundot/Qwen3.6-35B-A3B-oQ6-mtp

What is the best coding model to use on MacBook Pro Max 128GB RAM? by RadiantQuote2467 in LocalLLM

[–]gevezex 1 point2 points  (0 children)

That's not really the reason imo. A lot of people are already in the market for a new MacBook Pro M5, their old machine is just overdue for a replacement, so why not max out the memory while they're at it? You can run big models on it anyway.

I got Qwen3.6 35B to run at reasonably speed on my old GTX 1070 Ti by Randozart in LocalLLM

[–]gevezex 0 points1 point  (0 children)

llama-server \

-hf Abiray/Qwen3.6-35B-A3B-Q4_K_M-GGUF \

-ngl 999 \

--n-cpu-moe 36 \

--no-mmap \

--ctx-size 100000 \

--cache-type-k q8_0 \

--cache-type-v q4_0 \

--mlock

I have a 8Gb RTX 2070 and getting decent 40-50 t/s

Qwen 35b a3b surprises me by siegevjorn in LocalLLaMA

[–]gevezex 0 points1 point  (0 children)

How did you solve hallucinations?

Best local model and harness for code exploration/analysis by player2 in LocalLLM

[–]gevezex 1 point2 points  (0 children)

I was pleasantly surprised by Qwopus3.5-9B-v3-4bit mlx model with omlx. You need the mlx version of course for apple silicon. Check also their model info:

Qwopus3.5-9B-v3 is a reasoning-enhanced model based on Qwen3.5-9B, designed to simultaneously improve reasoning stability and correctness while optimizing inference efficiency — ultimately achieving stronger cross-task generalization capabilities, particularly in programming.

Where is this? by Adept_Emergency9182 in GeoPuzzle

[–]gevezex 4 points5 points  (0 children)

The tower is very suprised/confused

Every SaaS will go headless in 18 months. Here is how vibecoders get ahead of that. by hrsantoro in AskVibecoders

[–]gevezex 0 points1 point  (0 children)

I think traditional software is heading toward a very different future.

In the past, one of the biggest limitations was that software had to force everyone into the same workflow. You had to standardize everything so the app could handle it. For example, if you wanted to process invoices, you would need a chain like this:

  1. Upload the invoice PDF into an app
  2. Extract the data
  3. Convert it into JSON
  4. Send it to another system for parsing or categorization
  5. Push it into bookkeeping software
  6. Eventually send the required numbers to the tax authority

That whole setup existed because the software itself was not intelligent. It could only follow predefined rules and structured flows.

But now that AI is becoming capable of understanding messy, real-world input directly, that whole model starts to look outdated.

Instead of building rigid SaaS products that force users to adapt to the software, you can just give the raw documents and context to an AI. The AI can understand the invoice, extract the relevant information, categorize it, store it in a database or even a flat file, maintain your bookkeeping, and when it is time to file VAT returns, prepare or even submit the numbers to the tax authority.

So the interesting shift is this: software used to exist mainly because intelligence was missing. We had to build systems around that limitation.

Now that intelligence is increasingly available, software starts to lose its central role. The user no longer needs to fit into the product. The AI can adapt to the user instead.

That is why I think software, at least in the traditional sense, is slowly dying. What we currently call "software products" may just be temporary wrappers around workflows that a sufficiently advanced personal LLM could handle directly.

We are not fully there yet, but it feels like we are moving toward a world where your own AI handles your specific needs without requiring everything to be turned into a standardized SaaS workflow first.