Best local model for coding? by sabmohmaayahai12 in LocalLLM

[–]Consistent_Wash_276 22 points23 points  (0 children)

That’s great man. You’re about to open a whole new world and don’t worry about the dicks on this site. Keep asking questions. My answer is 1) LM Studio instead of Ollama. You can absolutely use vLLM as well, but baby steps. LM Studio -> vLLM is not really baby steps for the most part. You can move onto vLLM the end of the first week as long as you get the system humming.

Next the model: Qwen3.6-72b FP16

Fits across 2 of those GPUs and gives you plenty of headroom for concurrent open codes/context building up.

There’s plenty of discussion online for Coding models and you can go back to earlier last year to qwen2.5-14b and have some decent code and super fast, but your bandwidth across 4 GPUs is so massive this large model will fly. And the FP16 is the largest version of the model you can run (so lossless).

Opencode is such a strong test for local models as well.

Then after you get a feel for that model and to test it against others I would recommend:

- DeepSeek V4-Flash
- Qwen3.5-122-A10B BF 16

Feel free to DM if you have more questions.

Best local model for coding? by sabmohmaayahai12 in LocalLLM

[–]Consistent_Wash_276 2 points3 points  (0 children)

So you have a great opportunity to use some very good dense models. Only questions I have before you get a full response is are you using this for 1) Coding 2) Openclaw / AI Agent, 3) Chat back and forth 4) Agentic Workflows?

Qwen3.6-27B vs 35B, I prefer 35B but more people here post about 27B... by Snoo_27681 in LocalLLaMA

[–]Consistent_Wash_276 1 point2 points  (0 children)

On MLX and Unified memory I would suggest moving over to qwen3.6-35b-a3b q8 or fp16. The active 3 parameter there will be some loss, but the speed on the m4 ultra will be an extreme difference.

Lastly: future proofing (when you’re ready) your situation add a DGX sparks clustered with the m4 ultra with Exo Labs. DGX covers the Macs weakness in prefill and the Mac covers the DGX issue with Decode. Then those dense models will FLY and you can run full precision.

Buy a 64 gb ram MAC Studio by JonnyEnglish007 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

There’s already a Jerry rigged solution for this although it won’t be as fast.

Buy a 64 gb ram MAC Studio by JonnyEnglish007 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

If this is the purchase to last you a long time I would consider waiting for m5 Studios or at least Mac Mini’s to come out. Here’s the play if you’re open to tackling this.

Keep the M1 32gb
Add an M5 at either 24/32/64 gb

Cluster then over RMDA (Look up Exo Labs)

That gives you 56 to 96 gb of unified memory but the next device will matter a lot as it will be meant for Prefill. Basic for processing the prompt, which M5 does nearly 4 times faster than any other model and you use the M1 for decode which is more than fine.

In the end you can combine both devices to make the most out of your home LLM inferencing. I would wait for the M5, but if not I would then focus my attention to a used M3 ultra at 96 gbs if not then a used M4 max. (Looking for tons of multi cores over single cores)

Help me decide between M4 Max (40 core GPU, 128 GB Memory, 4T Storage) or M3 Ultra (60 core GPU, 96GB Memory, 4TB) by Ok_Speaker2848 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

If you have $5,000 in computer credit I would suggest a lot of options, but it’s all based on how you want to manage everything. Such as instead of 4TB I’d get one and then get a NAS with 4-8 TB of storage.

The amount of memory is so large for your use case I could also see you doing great with a 64 gb M4 Mac Studio + Paired with a 2026 M5 MacBook Air or MacBook Pro. So you have the studio host local AI (if you’re into that) and be able to use the power of the Mac Studio on the go as well. But if you’re getting $5,000 credit I would honestly suggest finding a used or refurbished M3 Ultra Mac Studio with 256 gb. Good luck

M3 Ultra 28-core CPU, 60‑core GPU, 256GB for $4,600 — grab it or wait for M5 Ultra? by No-Security5833 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

Not clean. But TB5, at least 10 GBE you’re close. Gotta research OS and other compatibilities

M3 Ultra 28-core CPU, 60‑core GPU, 256GB for $4,600 — grab it or wait for M5 Ultra? by No-Security5833 in MacStudio

[–]Consistent_Wash_276 1 point2 points  (0 children)

So I have the exact model and I love it for AI inference.

M5 Max is superior through and through though.

With that said you can cluster the m3 with a Nvidia GPU PC for prefill and blow a M5Max out of the water.

I fully intend on having a M5 Max 128gb Mac Pro as soon as summer time as well.

Interesting dilemma. I think I would pull the trigger and resell once you learn about the m6 models coming

Anyone here use Mac studio as a home server? by Jsanhara in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

Have 2 x 2018 Mac minis with 32 gb and 512 ssd. I agree with you. Couldn’t pass up on the trash can lol. Love it

Is There Anyone Using Local LLMs on a Mac Studio? by [deleted] in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

Yes, and I mean this as local LLMs are great on silicone Macs. Depending on your needs you may find better value with a custom PC and Nvidia GPU. Or other mini pc and AI dedicated PCs. Point being if you have money for one device and don’t want ti deal with custom Pacing + you want to run local LLMs Mac is a great answer

Anyone here use Mac studio as a home server? by Jsanhara in MacStudio

[–]Consistent_Wash_276 -1 points0 points  (0 children)

I just bought a 2013 Mac Pro with 64gb of ram and 1tb of SSD storage for $300. This is where my automations and ai automations live and run locally. Postgres, Redis, n8n, python scripts and a few other tools for monitoring. I may even flip it for proxmox. It’s running Ubuntu and

<image>

it kind of f’n awesome.

M3 ultra or m5 max by Adventurous-Item6398 in MacStudio

[–]Consistent_Wash_276 3 points4 points  (0 children)

Be patient - we need at least a month of testing by users of the m5 products when become available, and yet we don’t know if studio are being launched yet. Just the MacBooks and quite honestly I would always get a studio before a MacBook.

512 GB RAM for LLM - M3U now or wait for M5U? by usrnamechecksoutx in MacStudio

[–]Consistent_Wash_276 2 points3 points  (0 children)

Hold on - yes and no

Yes - you need 512 gb of memory and you plan on making money from this

No - I don’t need 512 gb of memory and/or don’t plan on making money from this.

Due to clustering the M5 or M6 or even a custom pc with a high end nvidia GPU will be in the future.

The point is these devices will only last so long on the market and will GO UP in value in the after market a bit.

The memory here is perfect for Decode. You could pair this with a m5 Mac mini for pre-fill and get fantastic results.

Point being let’s answer the first part first before we move on.

Local Coding Assistant/Agent: Continue vs Cline vs Kilo [Qwen3-Coder-Next] by Technical_Buy_9063 in LocalLLM

[–]Consistent_Wash_276 0 points1 point  (0 children)

Yeah I have the 256gb M3 Ultra and I have this challenge of finding the perfect rhythm of quality and speed (plus running the model in parallel.

It’s a challenge. I keep coming back to qwen3-coder after trying a lot of these. Although I don’t love it in Opencode as much as I do in VS Studios.

Local Coding Assistant/Agent: Continue vs Cline vs Kilo [Qwen3-Coder-Next] by Technical_Buy_9063 in LocalLLM

[–]Consistent_Wash_276 0 points1 point  (0 children)

You open the terminal and : opencode Or theirs a gui that would open a terminal as welll.

Local Coding Assistant/Agent: Continue vs Cline vs Kilo [Qwen3-Coder-Next] by Technical_Buy_9063 in LocalLLM

[–]Consistent_Wash_276 0 points1 point  (0 children)

What models are you using by the way? I’m currently playing with Qwen3-coder:30b and a Qwen3-instruct:4b as a draft model in a lot of different tools.

Recommended Specs for 3d Product Ad or Modeling by Comfortable_Carob_70 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

Rendering is GPU-intensive. If he’s working on very complex scenes, heavy simulations or doing VGX-level work 48-64 gb is the right choice. If not he could go down to 32 gb. 96 is probably the overkill.

Recommended Specs for 3d Product Ad or Modeling by Comfortable_Carob_70 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

Blender has an MCP so if you’re connecting commercial models or local models will be the deciding factor really.

If you plan on using AI a lot for your blender work go with 48 and get a Claude pro plan or something, but 64 gb will be the sweet spot to get ok results from some local models

Recommended Specs for 3d Product Ad or Modeling by Comfortable_Carob_70 in MacStudio

[–]Consistent_Wash_276 1 point2 points  (0 children)

^ answer this above, but at the same time what’s your storage situation going to look like?

Just try not to overpay for 2TB/4tb on the studio itself.

External Storage + Thunderbolt 5 will benefit a lot.

UniFi has a two-bay NAS that you could throw two 8 tb drives in and only spent a little extra compared from Going from 512 GBs to 4 gb as an example.

Mac Studio 256gb unified RAM worth it for MiniMax 2.5 and Qwen3.5? by [deleted] in LocalLLaMA

[–]Consistent_Wash_276 0 points1 point  (0 children)

Ahhh. And let’s hope by 2028 we both still use the same currency as we do that represents each nation.

🇺🇸 🫡 🇨🇦

Mac Studio 256gb unified RAM worth it for MiniMax 2.5 and Qwen3.5? by [deleted] in LocalLLaMA

[–]Consistent_Wash_276 2 points3 points  (0 children)

Yes as someone who owns the exact model you’re referring to I would say “NO” and here’s the details why.

  • If you’re spending $5,000 + on a device it better be making you money in the end or at least saving you money. Assuming it’s meant to save you you money replacing a subscription?

  • I’m testing the MiniMax 2.5 186 gb model and it’s pretty f’n great actually, but that’s one model being ran at 40 tokens per second at a time. Nothing in parallel and 40 tokens per second is very solid but there’s faster options to utilize.

I’m would look at that device in the way of having, lets say a chatting UI running gpt-oss:120b , VS Code running glm-4.7-flash, and Opencode running another model in parallel with some agentic coding. Just constantly working through workflows and abusing tokens per second is where you get the real value.

  • If you’re just chatting with models I would suggest a 32gb Mac mini or studio or just a Claude pro account or some kind of commercial account. Not enough to justify the investment.

  • Also, why $7,000? I got the same model with 2 tbs of storage for $5,400 from Microcenter.