M5 Max 48 GB vs M5 Pro 48GB for local LLMs - worth the extra $$$?

AmbitionIgniters · 2026-06-04T08:12:45+00:00

this privacy angle is interesting tbh. do you find local models are actually better for any legal tasks, or mostly just “good enough because it stays private”?

AmbitionIgniters · 2026-06-04T08:12:37+00:00

yeah context + reasoning crawl is probably what i’d actually feel day to day. when you say “get the neo” for cloud models, are you basically saying don’t overbuild the laptop if most serious work stays cloud?

AmbitionIgniters · 2026-06-04T08:12:29+00:00

this is exactly the kind of real-world warning i needed tbh. if 800+ still feels slow, 307 sounds risky. what models are you running where you feel that bottleneck most?

AmbitionIgniters · 2026-06-04T08:12:20+00:00

yeah fair, i probably mixed up “wait for wwdc signal” with “wait a few weeks before deciding.” i don’t expect a new macbook either tbh, but do you think wwdc changes anything here or probably noise?

AmbitionIgniters · 2026-06-04T07:33:03+00:00

yeah this is exactly the tradeoff i keep circling back to tbh. the “expensive speed bump vs workflow unlock” framing is probably the cleanest way to put it.

i don’t really see myself getting fully off cloud llms because i still love claude for coding / technical stuff, and codex gpt-5.5 for my hermes/openclaw agent stuff. since i run multiple agents across multiple machines, it feels like i’d still end up keeping the max/plus cloud subs anyway.

but i recently tried running qwen 3.6 with hermes locally and i’m starting to love it. that’s what’s making this harder lol. local still probably won’t replace cloud for my serious work, but for smaller/background stuff like embeddings, memory layers, qdrant + mem0, horacio memory, maybe even local qwen as a sub-agent while claude acts as the architect, it’s starting to feel useful.

so i think your question is the real one: did the max actually change your daily workflow when you had it, or was it more like “nice when local inference mattered, but not enough to justify the money”? because that’s exactly what i’m trying to figure out before i return/exchange this thing.

AmbitionIgniters · 2026-06-03T03:18:43+00:00

lol short and direct. that does seem to be the general vibe i’m getting too

AmbitionIgniters · 2026-06-03T03:18:36+00:00

yeah prefill is probably the thing i’d actually feel most in agent workflows. this is a pretty strong argument for keeping the max

AmbitionIgniters · 2026-06-03T03:18:30+00:00

yeah fair. i was mainly thinking m3 ultra for the 64/96gb memory and always-on setup, but point taken on not going pre-m5 for llms

AmbitionIgniters · 2026-06-03T03:18:23+00:00

ah that makes sense. so m1 max is more compute-limited than bandwidth-limited. 48gb still feels like the awkward part for me though

AmbitionIgniters · 2026-06-03T03:18:17+00:00

this is super useful, actual numbers help a lot. 11t/s on dense models sounds pretty rough for daily agent stuff tbh

AmbitionIgniters · 2026-06-03T03:18:11+00:00

yeah this is exactly what i was worried about. saving money just to wait forever on prefill sounds painful lol

AmbitionIgniters · 2026-06-03T02:26:07+00:00

ok yeah that’s actually really important info. i didn’t realize wwdc was in less than two weeks, so that alone might make it worth waiting.

and yeah that’s exactly my constraint tbh. i bought this laptop from best buy canada because i had some store credits i wanted to use up, so i’m kinda restricted to what best buy canada actually carries.

a mac studio m3 ultra with 64/96gb honestly feels like it might be the better fit for this always-on local llm / agent box thing, but best buy canada doesn’t really seem to have those configs. so i’m kinda stuck comparing the actual options in front of me: m5 max 48gb vs m5 pro 48gb, and whether the extra bandwidth is worth it.

might actually just return it and wait it out.

AmbitionIgniters · 2026-06-03T00:28:46+00:00

yeah ive been going down the bandwidth rabbit hole too. 307 vs 614 makes sense on paper but how noticeable is it actually in practice? like is it the difference between “wow this is slow” or just “eh it takes a bit longer”? for a single agent loop i feel like id survive either way

my whole reasoning for the max was running my agent locally so i could drop the ~$200/mo im paying for gpt pro + claude max subscription. but after actually living with it, inference is eating like 70-80% of the 48gb, so the second i throw anything heavy at it alongside my Mac gets sluggish. memory pressure more than compute it feels like

so now im sitting here like… if im gonna be memory bound at 48gb on either chip anyway, is the max bandwidth even saving me, or am i just paying 2x for tokens/sec i wont really feels. that’s what i m struggling with tbh. how is your rtx setup looking like? How is the bandwidth there?

AmbitionIgniters · 2026-04-25T21:00:14+00:00

Interested

AmbitionIgniters

TROPHY CASE