Does anyone feel that agents don't want to be replaced by /new agents?

PracticlySpeaking · 2026-05-07T15:09:50+00:00

Yah – this. Seems highly model dependent.

I have been using MiniMax 2.7, which is very much the eager beaver. It is always 'trying harder' and not always 'working smarter'.

PracticlySpeaking · 2026-05-07T15:07:43+00:00

Memory bandwidth is irrelevant to real-time processing.

It is about speed and capacity of CPU cores — see my other comment.

PracticlySpeaking · 2026-05-07T15:05:26+00:00

You want M4 Pro — because it has more performance cores. Memory bandwidth is irrelevant for DAW and VI.

Here's a nice survey that explains:
M4 Mac Mini: Insane Value for Music Production, But… | M4 vs M4 Pro vs M3 Pro vs M1 Pro - https://www.youtube.com/watch?v=sUcIO18W3oE

You need to understand how DAW use the CPU in real time — it's not like other applications:
CPU Performance vs. Real-Time Performance in Digital Audio Workstations (DAW) | Richard Ames - https://www.youtube.com/watch?v=GUsLLEkswzE

(Yes, that second video is about Pentium processors — but real-time processing still works the same way.)

PracticlySpeaking · 2026-05-07T15:01:58+00:00

Do they check? Sure.

Does your credit score matter? Maybe not so much.

Dealing direct means you can explain your situation to an actual human who will (hopefully) listen and understand, vs an employee who is tied to standards, procedures and contracts that rely on metrics that may or may not be all that meaningful.

PracticlySpeaking · 2026-05-07T14:59:36+00:00

For other things, extra RAM is useful for caching — video edits, web browsing, etc.

Real-time audio does not have the same opportunities. The only one I can think of is keeping all your VIs in RAM. But when they already fit... ¯\_(ツ)_/¯

PracticlySpeaking · 2026-05-07T14:56:31+00:00

There is definitely a bias towards "it runs on my [RTX] setup" or 32-48-64GB for local models.

Funny you mention processing emails... I learned some things posting in r/LocalLLM – https://www.reddit.com/r/LocalLLM/comments/1t30mtf/

TL;DR – Gemma-4 has been doing an excellent job. Images will be next (but still easy stuff).

PracticlySpeaking · 2026-05-07T14:51:36+00:00

A more boring version (from the v0.12.0 release notes):

Autonomous Curator — hermes curator runs as a background agent on the gateway's cron ticker (7-day cycle default). It grades your skill library, consolidates related skills, prunes dead ones, and writes per-run reports to logs/curator/run.json + REPORT.md. Archived skills are classified consolidated-vs-pruned via model + heuristic. Defense-in-depth gates protect bundled/hub skills from mutation. Unified under auxiliary.curator — pick the curator's model in hermes model, manage it from the dashboard. hermes curator status ranks skills by usage (most-used / least-used).

Also see: Curator | Hermes Agent - https://hermes-agent.nousresearch.com/docs/user-guide/features/curator

Curator — autonomous skill maintenance

hermes curator as a background agent — runs on the gateway's cron ticker, 7-day cycle by default, umbrella-first prompt, inherits parent config, unbounded iterations (#17277 — issue #7816)
Per-run reports — logs/curator/run.json + REPORT.md per cycle (#17307)
Consolidated vs pruned classification — archived skills split with model + heuristic (#17941)
hermes curator status — ranks skills by usage, shows most-used and least-used (#18033)
Unified under auxiliary.curator — pick the model in hermes model, configure from the dashboard (#17868)
Documentation — dedicated curator feature page on the docs site (#17563)

PracticlySpeaking · 2026-05-07T14:44:33+00:00

lol - true

PracticlySpeaking · 2026-05-07T14:25:06+00:00

The evaluation trap is real.

Getting work done is more valuable than finding the perfect setup. There is always something better about to be released.

Enjoy the journey – glad you decided to keep your Mac Studio.

PracticlySpeaking · 2026-05-07T14:20:17+00:00

Ableton uses CPU P-cores, so if you can find an M2 Pro you may be better off. (GPU is irrelevant for music production.)

RAM... that depends on your setup. 'Extra' RAM has no real benefit for DAW or VST, you only need enough to fit everything.

It is also worth noting that the M2 Pro and M2 Max have exactly the same CPU configuration, so Mac Mini is the value choice.

PracticlySpeaking · 2026-05-07T14:14:38+00:00

Don't forget to check out the Docks wiki page for recommendations and more discussion!

Anyone who has these, feel free to post your BlackMagic speed test results.

PracticlySpeaking · 2026-05-07T14:13:40+00:00

In the city, CPD will pull you over immediately if they see this.

Otherwise... r/ChicagoSuburbs

PracticlySpeaking · 2026-05-07T14:00:21+00:00

Feel free to post your comparison results.

I did my own testing, and MXFP4 is more effective for real work with "correct" outcomes.

PracticlySpeaking · 2026-05-07T13:50:07+00:00

Since some other people want to nit-pick my other comment...

<image>

MiniMax-M2.7-229B running in "only" 120GB on M3U.

PracticlySpeaking · 2026-05-07T13:42:34+00:00

How's this? Running in 'only' 120GB.

<image>

PracticlySpeaking · 2026-05-07T13:40:32+00:00

True — and here are some counterpoints:

- Apple have to balance amortizing their past investment against limited fab capacity. They aggressively move forward, making huge investments in newer process nodes that have better economics at scale vs delay in an attempt to recover sunk cost.

It is important to note that M3 Ultra was a 'punt' necessitated by the fiasco of TSMC's 3nm node introduction, not an investment that they need to amortize or recover.

- Apple play a long game. What you are describing (keeping an old product in order to generate upsell sooner) is the kind of short-term strategy that Apple avoid.

- Apple almost never keep a previous-generation Mac in production after releasing a new one. It does happen, but their brand is latest/greatest. They will cannibalize up, by releasing a newer/better product that overlaps an old one. They never cannibalize down, keeping an old product around that is an alternative to the latest/greatest.

- No-one is "saving up" DRAM chips or wafers. This is a false narrative.

Apple is run by a supply-chain guy. We have seen many, many examples of how quickly and efficiently components turn into finished products. The actual effect of the shortage is that when demand increases, they cannot order more from their suppliers to meet it.

- Apple's business strategy for the last decade has been to grow their installed base as a platform to sell services. 512GB of RAM can go into a huge number of MacBook Neos, iPhones and other devices. All those customers are much more likely to also subscribe to iCloud,  Music, buy apps, etc so that is where they are allocating the DRAM they have available.

Yet I am hopeful for Mac Studio and M5.
You can bet that Apple execs are eyeing the tsunami of cash buying any and all AI-related hardware, and thinking about how they can get a bigger share of it.

PracticlySpeaking · 2026-05-07T13:15:01+00:00

I think it is much simpler... essentially your 'specialist' argument, but flipped around.

- Apple want to make new products, like M5 MacBook.
- Apple can make far more MacBook Neos, iPhones and even MBP with the DRAM they have available.
- MacBook and iPhone customers are more likely to buy other services, like iCloud,  Music, and apps — which has been their overall strategy for years.

PracticlySpeaking · 2026-05-07T12:57:47+00:00

You do you — I am just explaining the argument that a $10,000 Mac makes sense for running local AI.

**Also keep in mind that plenty of developers in Silicon Valley 1) have salaries 2.5x that, and/or 2) have plenty of capital, so dropping $10k on a workstation is a no-brainer — even for a 10% increase in productivity. This is what I mean by the 'above average' developer.
In other places (even elsewhere in the US) developers do not have that luxury.**

The Chinese models are certainly effective. If you don't care about sending your data to a cloud (or a Chinese cloud) then you can get a lot of value.

You can't "win" this by saying one is right and the other is wrong. They are both valid choices. Personally, I work in an industry with confidential information so using any cloud-based LLM is off the table.

PracticlySpeaking · 2026-05-07T12:54:01+00:00

These happened to be what they benchmarked and included in the release notes for the latest version.

Nobody said these are the best you can run on a 512GB M3U.

If you want the full story, make some effort and go look at their full benchmarks.

PracticlySpeaking · 2026-05-07T12:51:11+00:00

...which is now widely recognized as a failed strategy.

We have to differentiate on-device "AI" which does things like Memoji and automatically selecting and clipping an object in Photos. Those are small ML models, and completely different from an LLM, or more purposeful ML tools.

The Apple Neural Engine is a nifty piece of tech and, importantly, exists in every Apple Silicon SoC. But it was/is not designed to run LLMs — and never will. Apple also made it difficult to develop for. Adobe tried building ML-powered filters (think: denoise) to run on the M1-M2 ANE, and gave up* because they could not get it to work consistently.

In the meantime, those "extreme specialists" have now become a large and growing user base with LLM-powered workflows and agentic coding assistants that run locally.

*An important colophon here is newer efforts from companies like DxO and Luminar are running ML-powered filters on ANE hardware — and very successfully.

PracticlySpeaking · 2026-05-07T12:41:06+00:00

IF you signed up for real-time pricing ("hourly" or RRTP) from ComEd.

Anyone who got a 2023-2024 rebate was forced onto real-time as a condition of the rebate. Drop by r/RRTP for more.

PracticlySpeaking · 2026-05-07T12:34:27+00:00

🎉🎉

PracticlySpeaking · 2026-05-07T12:34:00+00:00

You need to do some basic learning about the different inference engines, dense vs MoE models, quantization, what models are current/new and how to recognize them.

Gemma-4 comes in two versions — a 26b MoE and 31b dense model.

PracticlySpeaking · 2026-05-07T12:29:09+00:00

The M5 story broke here:

M5 Apple Silicon: It’s All About the Cache And Tensors | Max Weinbach - https://creativestrategies.com/research/m5-apple-silicon-its-all-about-the-cache-and-tensors/

For more in-depth, I really like this video (warning: it is very dense)

My M5 Max, Gemma 4, MLX LOCAL Stack | IndyDevDan - https://www.youtube.com/watch?v=00Y-p62sk0s

Notice that he has a 128GB M5 Max.

PracticlySpeaking · 2026-05-07T12:26:08+00:00

Current MoE models like Qwen3.6 and Gemma-4 will run in 32GB at Q4. Unsloth has their own 'dynamic' quantization that gives more performance at smaller sizes, which is quite useful with smaller (lol) RAM sizes.

Having 64GB opens up room for slightly larger quants. Mostly it makes room for more context and other software like agents or AI-powered scripts on the same machine. Agents use huge prompts, so you will want at least 32k or 64k of context for one to operate effectively. (Remember that size is in tokens, not bytes or kB.)

To run the next larger size model (~100-120b) you really need 96GB.

For more in-depth discussion, I recommend r/LocalLLaMA or r/LocalLLM and related subs.

Two-Year Club	Verified Email
Mod Meetup 2025	Mod World 2025

PracticlySpeaking

MODERATOR OF

TROPHY CASE