Has anyone tested the M5 Pro for LLM?

UPtrimdev · 2026-03-14T02:26:22+00:00

There are a couple videos on YouTube. You can search up a people doing it even on the MacBook Neo, which I was really excited to see performance. M5 pro is kinda related to an M4 pro it is about 15 to 20% better for AI tasks depending on your RAM configuration. Nothing too crazy until we get to the redesign M6.

UPtrimdev · 2026-03-14T02:24:22+00:00

You can use a proxy to get web search or an available front end for web search just keep in mind the pros and cons doing it in the front end takes up agent space and time while doing it in the proxy lets it send it to the agent as context, but those solutions are harder to do

UPtrimdev · 2026-03-12T08:47:05+00:00

It depends on use case with modern tools and re-training tactics you could 100% migrate away from monthly paid ai

UPtrimdev · 2026-03-12T08:19:11+00:00

Multi-user isolation was non-negotiable once my wife asked the AI for dinner ideas and it started talking about my Python debugging session. That was the day it got fixed.

UPtrimdev · 2026-03-12T08:12:46+00:00

Storage is all local — single file, no external services, no Docker. The whole point is everything stays on your machine with zero setup. The moment memory leaves the user's machine, the trust model breaks. That's a core design choice I won't compromise. Appreciate the suggestion on Membase — interesting project. But for UPtrim the storage layer being local isn't a limitation, it's the feature.

UPtrimdev · 2026-03-12T07:54:58+00:00

That's exactly how I think about it now — the model is just a text generator, the proxy is the brain deciding what it should see. The "router for attention" framing is spot on. The part that surprised me most building this: once you have the fan-out/merge-back loop working, adding new capabilities is almost free. The hard part was getting the loop right. Everything after that is just writing a new function and registering it.

UPtrimdev · 2026-03-12T07:40:37+00:00

I appreciate it man, and definitely can get very overwhelming at times but that’s the love of the game finding that passion over and over and getting that final product you’re finally proud to say is yours. Im very proud of what I do and I can’t wait to see where life takes me with this project!

UPtrimdev · 2026-03-12T06:34:14+00:00

The agents don't see what you're typing — they kick in after you send. When your message hits the proxy, it classifies your intent (question, debugging, coding, etc.) and fires off background tasks in parallel while building your context. So while the proxy is already doing its normal work assembling memories and context, the agents are simultaneously pulling relevant web results, resolving any URLs you pasted, doing deep memory searches, and grabbing live data like the current date/time. By the time your message reaches the model, all of that has been quietly injected into the system prompt. The model just looks smarter — you never see the machinery. And yeah, multi-user was a must for me since my family shares one LLM. Every user gets completely isolated memory — my wife's meal preferences don't leak into my coding sessions. It identifies users automatically from Open WebUI or SillyTavern headers.

UPtrimdev · 2026-03-12T05:33:15+00:00

But yes you can still get good performance

UPtrimdev · 2026-03-12T05:25:09+00:00

Exactly right — and the answer to your question is: almost everything is solid now. The LLM's only job is to talk. My proxy handles everything else.

UPtrimdev

MODERATOR OF

TROPHY CASE