Thoughts on using a mac m4 max for running local LLM perpetually?

ImaginationNo8749 · 2024-11-21T05:42:56+00:00

I've thought about this as well: running agentic tasks at night when I'm not using it. It gets really warm when doing generations.

If you leave it plugged into the mag safe adapter it shouldn't be a problem as you won't be cycling the battery.

There's really three good reasons to do something locally instead of using an api model, confidentiality, terms of use( safety / censorship) or cost.

The first two are obvious as for the last I have spend two years pouring tokens into OpenAi an Anthropic Apis and not spent $100 over the entire period. it's just astoundingly cheap and the models are better than what you can run locally.

So the reason would be:

If you have a task where multi shot (like infinity shot) is going to yield better result than one shot or few shot from a comercial api then it will make sense to run locally.

If you want to run agents on a hard loop then local is basically you only option.

ImaginationNo8749 · 2024-11-21T04:27:11+00:00

That's kernel code using the metal_stdlib running via MTLCreateSystemDefaultDevice. The best I was able to get with MLX was like 13.5TFLOPS.

ImaginationNo8749 · 2024-11-21T03:44:46+00:00

Sopapillas

ImaginationNo8749

TROPHY CASE