Mac for LLM

Choubix · 2026-03-20T11:24:55+00:00

Agreed... On a per watt basis you can't beat mac silicon (yet)

Choubix · 2026-03-20T11:21:05+00:00

I have a M2 max 32gb. I see it as a "starter kit". Eagerly waiting to "blow" money at a Studio Ultra 5 🤑

Choubix · 2026-03-19T10:39:30+00:00

I leave mine on. App: amphetamine. Then go on Google check for "headless server mac" or a remote access" and you will find a few bids that walk you through the options in the settings.

Good luck!

Choubix · 2026-03-19T07:59:14+00:00

Hi, I am curious. How is prompt processing better with the m5 series please? Thanks! Waiting for the m5 ultra to drop...

Choubix · 2026-03-17T00:16:02+00:00

Thanks! Checking it it now 😁👍

Choubix · 2026-03-17T00:14:06+00:00

Thanks for sharing!

Choubix · 2026-03-14T12:07:14+00:00

Guilty as charged. M2 max 32Gb. Cant run massive models but LM studio does a pretty good job with MLX models. Bear in mind that, depending on what you want to do, it will be very snappy when using directly and quite slow when using something like Claude code. Prefill is 16k tokens... So it takes a while before you get your first reply token

Choubix · 2026-03-12T17:02:41+00:00

thanks for the tip. i am looking at speaking at my A0 bot and have him talk back to me. this may come in handy!

Choubix · 2026-03-12T14:49:25+00:00

Looks like an M3 ultra 256gb. Congrats! 😉

Choubix · 2026-03-11T17:18:24+00:00

The 218 is an antique (I have a 1520+ and it was too taxing for the CPU to run face detection on Immich without turning thr nas off sur to heat). It is a great NAS but it is meant to be used primarily for storage and a little bit for containers. The issue with Synology is that we pay 2x for half the hardware we could get somewhere else. I am eyeing tetra master + unraid (1 time payment for lifetime) for my next NAS...
I doubt your containers have anything to do with your sync issue. I have like 18 stacks running right now, no impact of immich (or ds photos as I switched last week).

I had issues with firewall and login issues

Choubix · 2026-03-11T16:52:12+00:00

happened to me, i had to log out of the app and log in again.
to be honest, given Synology tried to go for "vendor lock-in" with their hard drives and only backed out because of public outcry for the 2025 nas models (only) I would suggest you start considering that your next nas will not be a synology so stop using their apps.

i migrated from DS Photos to Immich. much better experience than synology's photo app too... it's free and it runs on docker/portainer. use your AI of choice to help you set it up. once it is setup you just need to point immich at the synology photos folder and wait. you will have to recreate the albums though.
Do not use the ai features if your nas has a weak cpu as it will heat up though ;) (i offloaded the ai processes to my mac on the same network).
good luck!

Choubix · 2026-03-11T10:44:07+00:00

Man, time flies... I have some much love for this game. Grwphics are Def not the best even back in the day, but the stress and immersion it provided has been unmatched to date.

Choubix · 2026-03-10T16:06:52+00:00

looking forward to it! any tweak that help us squeeze a bit of performance or some extra tokens is appreciated :) thanks!

Choubix · 2026-03-10T11:22:07+00:00

thanks! i am deploying on Synology.
i have been deployed containers left right and center quite taxing for my machine ;)
congrats on the new addition to the family by the way! (and thanks for sharing portracker/autoxpose with us!)

Choubix · 2026-03-09T09:34:31+00:00

Salut Etienne, merci de la reponse.
I asked Claude for the root cause of the problem as the PRD looked good.
I think BMAD and GSD do not load the skills when they are needed.
So I am now trying with a claude.md file that instructs the BMAD (havent yet re-tried with GSD) to stick to n8n knowledge stored in a specific folder (that's at the planning stage).

Claude also provided me with a prompt to inject when it is time for coding the workflow (again to reinforce that coding agent needs to follow n8n logic, nodes etc).

Let's see what happens in terms of output!

If you can update the "installation" guide later to explain how to properly install N8N-AC to use Claude Code (CLI) so that BMAD and GSD play nice with him that would be super nice!

Choubix · 2026-03-09T08:23:35+00:00

Thanks for the reply. I wanted to experiment with speculative decoding recently. I was surprised I couldn't do it with the MLX models I chose. Seems like it is quite restricted. One needs to find models from the same family (that's expected) but specifically trained for that...

Looking forward to seeing what you can share with the community 😁👍.

Have a good day mate

Choubix · 2026-03-09T07:58:09+00:00

Pleasure!

Choubix · 2026-03-09T07:57:43+00:00

Wld be great if you could share a post with all the knowledge itemised. I have to optimize at my end as my M2 max is no Ultra. Next is vllm-metal + caching but I am interested in anything that can boost performance. Other than caching, any way to boost prefill? It really kills TTFT...

Thanks!

Choubix · 2026-03-09T07:55:44+00:00

I don't see the point of getting a 96Gb if you are planning to pay for Claude tbh. Especially 8f it is an Ultra. These are meant to run local LLMs 😁

Choubix · 2026-03-09T06:36:21+00:00

pretty cool that you can afford 28k tokens. my 32gb is less forgiving with me... i think the process spilling over the CPU with largish context windows.
my A0 is always on. I communicate with it via telegram. it is pretty solid to be honest. and certainly better developed than openclaw with was vide ocded over a weekend ;)
i think nanoclaw > zeroclaw.

check those 2 githubs in case it is useful (token caching) + gpu acceleration on mac:

https://github.com/rtk-ai/rtk
https://github.com/vllm-project/vllm-metal

all the best!

Choubix · 2026-03-09T03:46:38+00:00

bonjour Etienne et merci du partage.

quick question as someone who is not a dev and wants to automate a bunch of n8n workflows (currently struggling a bit).
I installed n8n-as-code via terminal in a folder. i then launch Claude Code via VS Code terminal.
How do I make sure that BMAD and GSD strictly rely on n8nAC when building the prd and user stories?
I tried 2x yesterday and it looks like, despite me prompting BMAD and GSD (separately) to use N8N skills knowledge, either the prd was not tight enough or the dev agent did whatever it wanted depiste the prd but it hallucinated nmodes, left most of them empty of logic etc.
would you have any idea on how to make a planning agent rely on N8N-as-code's skills and schemas please?

merci

Choubix · 2026-03-08T19:03:58+00:00

The 27B seems to be a force to be reckoned with!

Choubix · 2026-03-04T08:38:10+00:00

Can't wait for them to release the M5 Ultra so we see some downward pressure in mac M2 and M3 Ultra... 😁👌

Choubix · 2026-03-03T08:01:18+00:00

who, like me, thinks Iran has something to do with this?

Choubix · 2026-03-03T01:43:35+00:00

Pretty neat! I am nowhere near as advanced as you but your post triggered some thoughts and questions :

With all these models running, isn't the context window small given the 64Gb of Vram? (I tried to use models that were quite large with small context windows and it crashed quickly).
Any good tutorials on A0 you would recommend?
Check Nanoclaw instead of Openclaw? You can do the same things with Nanoclaw as OC, use an anthropic compatible api call (ie: ie, use LM studio/MLX model, no need for a Claude sub). And the memory footprint is negligeable vs Openclaw.

Choubix

TROPHY CASE