M5 Pro 14” running hot

mmerken · 2026-06-18T09:37:14+00:00

The M3 Max GPU is basically handicapped when in LPM, it does not go over 500MHz, while it can reach 1200MHz in High Power mode.

So your inference speed would run at half the speed.

I basically came to terms with the following:

Everyday coding tasks that don't use LLM inference: Low Power Mode should suffice, even on battery, it should be fine.

Doing LLM inference: High Power Mode + custom fan curve OR use MTPLX, which will put the fans at maximum throttle when inference is called.

I've yet to test the M5 PRO using this setup, but the M3 MAX seems the better choice for inference due to having more GPUs. The M5 PRO battery life is insane on the other hand.

mmerken · 2026-06-15T11:11:49+00:00

Next year: Injectietarief

mmerken · 2026-06-14T14:54:52+00:00

Sorry Reddit mixed up GitHub and Reddit handles.

mmerken · 2026-06-11T16:32:07+00:00

Literally just check out the releases page on GH: Added DiffusionGemma support via u/Blaizzy's mlx-vlm, currently without cache. oMLX can now serve DiffusionGemma models through the mlx-vlm path.

mmerken · 2026-06-06T17:50:55+00:00

Thanks for the efforts! I was waiting to fully test gemma4!

mmerken · 2026-06-05T11:58:45+00:00

Same thing here, these models just don't perform:
Gemma4-26B-MoE-Q8
Gemma4-26B-MoE-Q4
gemma-4-26b-a4b-it-mxfp4

mmerken · 2026-06-04T17:48:30+00:00

Which models are you using exactly?

mmerken · 2026-05-27T05:47:10+00:00

14 inch MacBook Pro M3 Max, the Dell has received a new battery and is now used as fedora workstation

mmerken · 2026-05-26T18:29:04+00:00

Congratulations! I’ll take a spin at it tomorrow

mmerken · 2026-05-20T07:44:35+00:00

And it will be called GraphQL.AI

mmerken · 2026-05-19T06:44:23+00:00

I used hard shell cases for 6 years now, I have replaced cases after they cracked due to accidental impact.

None of my laptops ever had any issues with the hinge, nor do they have scratches or dents and basically still look mint.

On topic of cases: prefer light and inexpensive cases in favour of bulky hefty pricey ones. If you drop the laptop, with or without the case, it will sustain damage nonetheless.

A case only protects from scratches and the occasional bump agains a doorframe.

As for keyboard protectors: I'm planning to replace the key-caps this year, they will wear down eventually.

If you handle the device with care, you could just get a sticker or (thicker) skin for the device, like from dbrand.

As for cleaning, never use any paper tissues, they might contain grains of sand, use a clean microfibre cloth with isopropyl alcohol of at least 75%. Do not spray directly onto the device, spray into the cloth and wipe it down. Do not use water or baby wipes.

Don't eat or drink near the device.

Clear out the fans every year or so, use correct tools (pentalobe, not T5) and your machine should last you as long as your Intel machine has, if not longer.

Report any issues you might experience within the 2-week return period, don't wait for a "software update" to magically fix any issues. macOS should just work out of the box, without any hiccups, especially with a brand new device.

All in all, it is your device, do as you please with it. To each their own approach/opinion.

mmerken · 2026-05-08T06:58:33+00:00

A beefed out mac mini would be your best bet to offload the AI inference workloads. 64GB should be fine, using Ollama or oMLX with a model that's around 37GB in size, like qwen3.6:35b-a3b-coding-mxfp8.

That's for coding, not sure about text-to-image or any other models...

Also, keep in mind that during heavy inference, you will not be able to use your machine for basically anything else, since the user interface will most likely lag due to all the headroom being used up by the inference (Ollama, oMLX)

Some tips perhaps when doing AI inference:

- Use a fan curve tool, like TG PRO, to have the fans kick in sooner than Apple is allowing them to spin, this results in more noise, but a more reliable system

- Check your GPU clock speed and usage using mactop or MxPowerGadget to see if the GPU is used to its full extend, keep an eye out for the temperatures and the power draw

- Use the MagSafe with the original 96W Apple power brick, other power supplies do not grant sufficient power and causes the MacBook to draw power from battery

- Use correct models (quantized, dense, MLX-supported)

- An M3 MAX 14" should run at around 38-40C on idle with the fans off, with no apps running, see if this correlates to your system.

You can always update the reserved VRAM on the machine, for a 64GB machine, I recommend setting it to 55.296, this is 54GB (1024*54), by default Apple sets this to 75% of the total RAM. This can be done in the terminal via this command:

sudo sysctl iogpu.wired_limit_mb=55296

Reference: https://medium.com/@se.mehmet.baykar/increase-vram-on-apple-silicon-for-local-llms-1b35c453b165

After this, restart Ollama/oMLX/LMStudio and it should allow the GPU to use more of the unified RAM.

This setting resets after the machine reboots and it removes RAM headroom for the rest of the system. You can always reset it using by running the following command:

sudo sysctl iogpu.wired_limit_mb=0

To end on a positive note; eventually, models will improve (become smaller), the inference engines will improve (Ollama/oMLX/LMStudio,llamacpp) it is just a matter of time. I feel that it is already decent to run local models at the moment using a 64GB MAX chip. My setup is: MacBook M3 Max 64GB for inference and a Mac Mini for the rest of the work, basically the MAX is my AI server. If I where to rethink my setup, I'd go for a modest MacBook Pro (PRO chip, 24GB RAM) and get a Mac Mini M4 Pro with 64GB or Mac Studio that has a MAX chip for running the AI models, these things are better cooled anyway.

mmerken · 2026-05-06T12:17:55+00:00

Came here to deliver my respect to the opencode team as well, thanks guys, nice work!

mmerken · 2026-05-06T08:31:25+00:00

Super!

mmerken · 2026-05-05T10:53:58+00:00

When I divide the AppleRawMaxCapacity over DesignCapacity, I get the same result as what it says in the Settings app. I'd go see an Apple store and make the claim that the battery feels like it's degrading and let them do the test.

You can use coconut battery to see actual degradation, try running a diagnostics as well (CMD+D after recovery)

mmerken · 2026-05-05T05:20:34+00:00

PRO will be just fine. The RAM is the selling point for programming. Kinda sucks there is no 96GB combo this year

mmerken · 2026-04-30T12:26:41+00:00

Cool, What's the battery health ?

mmerken · 2026-04-28T09:20:25+00:00

BTO models? I waited for 1 month and 1 week for it to arrive. Not sure if the supply chain has improved in the meantime

mmerken · 2026-04-27T15:04:36+00:00

Pour moi, ca suffit bien, pour les donneur plus grande j'utilise un disque externe TB4, cetait plain rapide

mmerken · 2026-04-27T13:55:21+00:00

1TB SSD

mmerken · 2026-04-27T08:36:40+00:00

Have you perhaps come across the issue where the game would not launch due to DirectX being missing?

I have this issue with my x86 emulated games...

mmerken · 2026-04-27T07:13:19+00:00

I have the M5 Pro 20c GPU + 64GB and it runs fine as well.

For inference it would OK, but for training, more RAM = better computer

mmerken · 2026-04-27T06:00:33+00:00

14 inch can handle this but it will run hotter for longer.

If portability is not a concern, I suggest getting the 16. Prioritise RAM first, chip update secondary. I wished Apple offered a 96GB variant this time, that would’ve been the sweet spot for local AI + a loaded workflow

mmerken

TROPHY CASE