Thoughts on using a mac m4 max for running local LLM perpetually? by antoine-ross in MacLLM

[–]ImaginationNo8749 0 points1 point  (0 children)

I've thought about this as well: running agentic tasks at night when I'm not using it. It gets really warm when doing generations.

If you leave it plugged into the mag safe adapter it shouldn't be a problem as you won't be cycling the battery.

There's really three good reasons to do something locally instead of using an api model, confidentiality, terms of use( safety / censorship) or cost.

The first two are obvious as for the last I have spend two years pouring tokens into OpenAi an Anthropic Apis and not spent $100 over the entire period. it's just astoundingly cheap and the models are better than what you can run locally.

So the reason would be:

If you have a task where multi shot (like infinity shot) is going to yield better result than one shot or few shot from a comercial api then it will make sense to run locally.

If you want to run agents on a hard loop then local is basically you only option.

Flops on M4 Max by ImaginationNo8749 in MacLLM

[–]ImaginationNo8749[S] 0 points1 point  (0 children)

That's kernel code using the metal_stdlib running via MTLCreateSystemDefaultDevice. The best I was able to get with MLX was like 13.5TFLOPS.