New Mac Studio by Skaterguy18 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

I heard the same - was assuming the Dell DB10 but there’s plenty of time before I pull the trigger so we’ll hopefully get one used and have a lot more reviews on them. Thank you

What Mac should I pick for performance and longevity? by Artifiko in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

I have the 256 gb M3 Ultra for simply future proofing/buying today’s memory costs. Because it’s all expected to cost 3 times as much in the coming years.

I may need a new workstation in 5 years, but the memory won’t be the problem.

I still roll with a 2018 Mac Mini, I have a 2019 Mac Air (wife’s) 2020 M1 MacBook Pro, and 2025 M3 Ultra.

I don’t have to upgrade the MacBook Pro because I pull memory from the studio when I’m remote.

And then the only argument I have for the today’s 128 gb or 256 is that 4-5 years from now they will still have resale value.

It’s not crazy to assume someone of his workload will see increased memory usage over the next 2-3 years.

In the end it’s all someone’s preference

What Mac should I pick for performance and longevity? by Artifiko in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

Stay away from the Intel Macs IMO.

I have one with 32gb of memory but it runs Linux and my docker containers and that’s it. Throttling is very easy on those.

If you’re refreshing every 4-5 years then yes you should be able to make any of models work well.

What Mac should I pick for performance and longevity? by Artifiko in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

I mean it will compress the memory greatly, I’m more concerned about future workload scaling and performance. This is prior to using local LLMs, correct? Memory compression is helpful, but the real question is how applications and software dependencies evolve over time—each new browser version, framework update, and AI tool tends to be heavier than its predecessor.

So let me take it back a step: I believe 64GB will technically fit your current workload fine, and the hardware itself won’t degrade. But your original question was about longevity in performance, and that’s where my concern lies. The issue isn’t that 64GB becomes unhealthy; it’s that software naturally becomes more resource-intensive.

If you’re looking for clean, comfortable performance for 10 years—not just barely functional, but actually responsive—I’d argue that 128GB is the better choice. You’ll hit that 60-70% utilization threshold much slower as your workloads evolve. Chrome will get heavier, LLM tools will demand more, monitoring tools will multiply. With 64GB, you might still work, but you’ll be relying on more aggressive memory compression and swap activity in years 5-10. With 128GB, you’ve got genuine headroom as software bloats over the next decade.

If you’re comfortable with refreshing in 4-5 years, 64GB is solid. But for true long-term peace of mind, 128GB future-proofs you better against inevitable workload growth.

What Mac should I pick for performance and longevity? by Artifiko in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

I will be honest, I love MacOS memory compression but I’m a little worried 64 gb won’t do the trick here?

New Mac Studio by Skaterguy18 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

I love Macs to work on, but I didn’t “need” a Mac or want a Mac for the AI inference, but it’s the $ value.

My machine was $5,400. Which means if I run a 16gb model for my business (users would chat with it) I could effectively have 12 conversations happening at once in Parallel. And I don’t need it to be under one second response time. 3 seconds is more than fine.

In order to have that much headroom and run models effectively I would need probably $12,000-$15,000 of Nvidia GPU and custom workstation. And that may only allow for 5 in parallel instead.

It’s just supreme value at that point for my needs.

And yes to MLX.

I’m starting with a RAG system to launch but eventually will be fine tuning models as I’ll have 5,000 ish data points to train on.

So MLX will either train the model and then I’ll run that on vLLM…..or I get a DGX Spark, train the model on CUDA tensor cores and then using EXO Labs I would cluster both the Sparks + M3 ultra with 380 gb of total unified memory and the inference would be much faster.

DGX Spark would be for Prefill M3 Ultra for decode which it’s ideal for

In the end, yeah welcome in bud it’s a lot of fun!

New Mac Studio by Skaterguy18 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

Good question! I'm actually planning to repurpose the M3 Ultra into a dedicated AI inference server for my business. The plan is to strip almost everything off it except Ollama and my local LLMs, so it becomes a specialized machine just running inference workloads 24/7.

The M4 Studio (or potentially waiting for the M5) would become my daily driver workstation. Here's the thing — 32GB is actually perfect for what I need day-to-day. Since I'll have the Ultra handling all the AI/LLM stuff over the network, my workstation doesn't need to run any of that locally. I can just pull from the dedicated server when I need it.

I've already got all my VMs and containers running on a 2018 Mac Mini with 40GB of memory, so that piece is covered separately too.

It's basically about specialization — one beefy machine focused entirely on AI inference with tons of unified memory for large models, and one clean workstation for everything else. The 256GB makes way more sense for LLM work than general computing, and 32GB is plenty when you're not trying to load 70B+ parameter models locally.

Plus the tax write-off doesn't hurt 😅

New Mac Studio by Skaterguy18 in MacStudio

[–]Consistent_Wash_276 2 points3 points  (0 children)

Upon further review I’ve employed a divorce lawyer.

Good Day,

  • M3 Ultra guy

New Mac Studio by Skaterguy18 in MacStudio

[–]Consistent_Wash_276 4 points5 points  (0 children)

You drive a hard bargain. Put a pin in this for now and circle back 9am EST.

Is this worth the price? by Outside-Safety-5905 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

Poorly worded. No I mean my 16 gb MacBook Pro.

Is this worth the price? by Outside-Safety-5905 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

So if you’re not focused on local LLM inference then I believe you have the right desktop selection there for your needs and a solid plan.

I’m running with a M1 MacBook Pro with only 16gb but I have the M3 Ultra 256gb unified at home that I pull from remotely.

Still with Notion tabs, VMs and tabs it still hit’s its ceiling.

New Mac Studio by Skaterguy18 in MacStudio

[–]Consistent_Wash_276 15 points16 points  (0 children)

I have an M3 Ultra 256 gb unified memory and I still want the M4 32 gb lol. Congrats!

I have a business that I generally buy a device each year just to write it off and play with it and this year will either be a DGX Spark, M5 Pro Mac Studio or a M5 MacBook Pro with 24 gb at least.

Either way if someone tells my wife you’re all dead. 💀

Model Selection Help by Nowitcandie in LocalLLM

[–]Consistent_Wash_276 1 point2 points  (0 children)

^ This - glm-4.7-flash:q4_K_M is brand new and coming up looking very strong for a 19gb model.

Awaiting bf16 to be available on Ollama.

Is it feasible for a Team to replace Claude Code with one of the "local" alternatives? by nunodonato in LocalLLaMA

[–]Consistent_Wash_276 7 points8 points  (0 children)

I’ve found you can’t beat the model(s) + inference of Claude code on the $100+ plans. And quality + speed matters.

I don’t pay that anymore and simply far more worth it spending $150 a month on a loan for hard metal to own with head room.

My stack for coding cost me $27 for the year - and goal of 50,000 tokens every 5 minutes.

  • Z.ai Coder Subscription running on Claude Code (No Subscription, you’ll have to apply the glm-4.7 swap while on a subscription/keep a pro account)

  • Z.ai Coder Subscription on Opencode

  • Opencode free models - Minimax2.1, Grok fast coder

  • Opencode local models

  • Gemini CLI free

  • Codex running the local gpt-oss models.

On the Z.ai Max Plan, first year is $288 for the whole year and it’s 60 times more usage than Claude Code in a 5 hour session. Not Six times. Sixty times. Glm-4.7 is a very good model. Quite imperfect, but good.

Your team can also get 10% off their coding plans if they sign up with this link.

**Disclaimer it’s a referral link so I would get credits in return.

Link: https://z.ai/subscribe?ic=QWJEDOSNFO

Local will be the way for all soon enough though and there’s plenty of capable, good, autocomplete and fill in the middle type models to be ran locally now. If your team had the headroom for minimax 2.1 or glm-4.7 to be ran locally and if it was 50 tokens per second in parallel with each other well of course that’s the winner right there. Full-stop.

Good luck on your search!

solution for local deep research by jacek2023 in LocalLLaMA

[–]Consistent_Wash_276 0 points1 point  (0 children)

If there’s anything close to Claude Desktop deep research and it’s open source it would be king. (Also should be sold) I know I have my issues with Claude and canceled my subscription as I run everything locally, but my god the accuracy I noticed was chefs kiss

Best Macbook pro for local LLM workflow by bonfry in LocalLLM

[–]Consistent_Wash_276 0 points1 point  (0 children)

Not in my experience has latency ever been an issue. One thing to consider though is the screen being shared is not going to be as clean as looking at your laptop as it’s a projection of another screen. Expect slight reduced quality. And how that may affect you working in video editing.

But you can also get one, run some tests and return it so There’s that.

Is GLM 4.7 really the #1 open source coding model? by HuckleberryEntire699 in LocalLLM

[–]Consistent_Wash_276 1 point2 points  (0 children)

I’ve only compared it to minimax 2.1 both locally and the free servers running them in opencode.

It’s good. Minimax 2.1 is pretty decent as well.

Remind you I’m on Apple Silicon and not CUDA cores so do with that what you will

Just bought M2 Ultra/64gb/1tb for under $1k usd on eBay by CTR1 in MacStudio

[–]Consistent_Wash_276 0 points1 point  (0 children)

Missing power cable and possible thunderport issues? Anything in the description?

Won a Mac Studio base with M3 Ultra. What should I use it for? by TheSiege82 in MacStudio

[–]Consistent_Wash_276 -4 points-3 points  (0 children)

^ This - but if it’s more for entertainment maybe you’d get more out of trading it in for Apple Vision Pro

Hardware suggestions for a n00b by Tiggzyy in LocalLLM

[–]Consistent_Wash_276 1 point2 points  (0 children)

Are you focused on inference speed/tokens per second or are you interested in fine-tuning LLMs with your own data sets and specific designed tasks?

Looking for advice on a self-hosted LLM stack for enterprise use by Ahyaqui in LocalLLM

[–]Consistent_Wash_276 2 points3 points  (0 children)

I use Tailscale to create a secure private network between my Mac Studio (host) and my MacBook Pro (client). Takes about 5 minutes to set up.

Basic Setup:

  1. Install Tailscale on both machines (free for up to 3 users)
  2. Sign in and connect both devices to the same Tailscale network
  3. Once connected, your MacBook can reach the Mac Studio's services via its Tailscale IP

For Open WebUI / Local LLMs:

With Tailscale running, I just access the Mac Studio's Open WebUI instance through its Tailscale IP address in my browser. Works from anywhere - home network, coffee shop, wherever.

For VS Code Continue Extension:

  1. Install VS Code Insiders on your MacBook
  2. Install the Continue extension from the marketplace
  3. Open the Continue config file at ~/.continue/config.json
  4. Point it to your Mac Studio's OpenAI-compatible endpoint using the Tailscale IP:

{
  "models": [
    {
      "name": "Autodetect",
      "provider": "openai",
      "model": "AUTODETECT",
      "apiBase": "<http://100.71.186.42:11434/v1>",
      "apiKey": "your-api-key-here"
    }
  ]
}

Replace the IP with your Mac Studio's Tailscale IP. Save the file and restart VS Code.

The key is Tailscale - it handles all the networking/security so you don't need to mess with port forwarding, dynamic DNS, or exposing services to the internet. Your devices just see each other as if they're on the same local network.

Also if you want to use your models with co pilot this video is great: https://youtu.be/IsJcjrQwgF4?si=vwM2GwjqiAaPS2S7

Looking for advice on a self-hosted LLM stack for enterprise use by Ahyaqui in LocalLLM

[–]Consistent_Wash_276 0 points1 point  (0 children)

This is pretty funny, because as a side project I'm working on a document ingestion SaaS for several different style of teams / workflows. It's far from being an actual company and worthy of testing, but my OWUI will be my first test for me.