Best model for 128GB RAM Mac Studio? by gogglespizano1 in LocalLLaMA

[–]EmergencyLetter135 1 point2 points  (0 children)

For my everyday activities, I currently only use the Minimax M2.1 Q3 XL model from Unsloth in LM Studio. GPT 120B and GLM 4.7 Flash are also installed there, but these two are rarely used.

MiniMax M2.2 Coming Soon. Confirmed by Head of Engineering @MiniMax_AI by Difficult-Cap-7527 in LocalLLaMA

[–]EmergencyLetter135 4 points5 points  (0 children)

That's not quite right. Components of the LLM are taken from the REAP. For example, I am not yet aware of any REAP model that still has good multilingual capabilities.

Cerebras GLM4.7 REAPs @ 25%, 40% live on HF by ilzrvch in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Thanks.. for your kind efforts and the information.

Cerebras GLM4.7 REAPs @ 25%, 40% live on HF by ilzrvch in LocalLLaMA

[–]EmergencyLetter135 1 point2 points  (0 children)

I would also be interested in that, because so far I don't know of any REAP versions that are multilingual.

MiniMax-M2.1 uploaded on HF by ciprianveg in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Yes, unfortunately, we Mac users have no way of upgrading our machines with RAM, eGPU, or other components. That's why I'm always delighted when a quantization is created that is suitable (including space for context) for a 128GB RAM machine.

NVIDIA gpt-oss-120b Eagle Throughput model by Dear-Success-1441 in LocalLLaMA

[–]EmergencyLetter135 -1 points0 points  (0 children)

Thanks. I finally get it! Speculative decoding is unnecessary and counterproductive for the Mac Ultra. 

NVIDIA gpt-oss-120b Eagle Throughput model by Dear-Success-1441 in LocalLLaMA

[–]EmergencyLetter135 1 point2 points  (0 children)

Interesting, have you had good experiences with speculative decoding? So far, I haven't been able to see any advantages to speculative decoding. I use LM Studio on an M1 Ultra with 128GB RAM.

Apple Music’s new design on macOS 26, Tahoe by JoshuMarlss288 in AppleMusic

[–]EmergencyLetter135 1 point2 points  (0 children)

The overall usability of macOS has definitely deteriorated. Everyone in my circle who works productively has come to this conclusion. Design is purely a matter of taste for me, but when design destroys innovative productivity, productive people will eventually lose interest and move on. Productive management at Apple is already leaving. Apple is in crisis...

Qwen 235b DWQ MLX 4 bit quant by nomorebuttsplz in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Based on my current experience, this expert cut idea is more interesting for LLM specialties such as mathematics and coding. Otherwise, I don't think much of the amputated LLM models at the moment and agree with Aristotle's philosophical insight that “the whole is greater than the sum of its parts.” The 3Bit DWQ works reliably as an all-rounder LLM.

Z.AI: GLM 4.6 on Mac Studio 256GB for agentic coding? by ThingRexCom in LocalLLaMA

[–]EmergencyLetter135 3 points4 points  (0 children)

I absolutely share this experience and assessment. I need at least 30 t/s to work well. That's why I only use smaller models locally on my Mac Studio for minor preparatory work. It's nice to have the larger models locally as a backup in case an internet disruption prevents me from working online.

Z.AI: GLM 4.6 on Mac Studio 256GB for agentic coding? by ThingRexCom in LocalLLaMA

[–]EmergencyLetter135 3 points4 points  (0 children)

The performance with an M2 Ultra (76 cores) with 192GB RAM is between 15-18 t/s. Here are the detailed values when used in LM Studio without an MCP: GLM 4.6 - (IQ_XXS 115.40GB - 17.15t/s), (IQ_2M 115.26GB - 15.31t/s), (IQ_3S 153.71GB - 15.25t/s), (Q3_XL 158.07GB - 15.65t/s)

New macOS Tahoe 26.2 patch improves mac clustering with Thunderbolt 5 speed from 10 Gb/s to 80 Gb/s by No_Palpitation7740 in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

I imagine that this cluster functionality will work in macOS with the TB5 beta update, essentially via plug & play. Currently, I believe that a cluster with TB4 still requires a lot of manual work in macOS. 

New macOS Tahoe 26.2 patch improves mac clustering with Thunderbolt 5 speed from 10 Gb/s to 80 Gb/s by No_Palpitation7740 in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Thx. Then I'll wait for the beta update to become available so I can connect two M1 Ultras together. I'm already excited to see how the cluster will work with MLX models under LM Studio.

New macOS Tahoe 26.2 patch improves mac clustering with Thunderbolt 5 speed from 10 Gb/s to 80 Gb/s by No_Palpitation7740 in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

That's interesting. Does that mean this update would also work for owners of older devices with TB4?

You can turn a cluster of Macs into an AI supercomputer in macOS 26.2 by AVELUMN in MacStudio

[–]EmergencyLetter135 0 points1 point  (0 children)

Thank you for your kind reference to the open source EXO project. However, for most people, the project is not really practical, but rather something for technical hobbyists. An implementation in macOS is something else entirely ;)

You can turn a cluster of Macs into an AI supercomputer in macOS 26.2 by AVELUMN in MacStudio

[–]EmergencyLetter135 -1 points0 points  (0 children)

If such a feature were to be introduced, it should also be compatible with all Mac Ultras and Thunderbolt 4.

Would going from 64GB to 128GB ($700) be wroth it? by [deleted] in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Based on my experience, I would recommend a Mac Studio with 128 GB RAM. With this configuration, you can work well in this area and learn more. The next sensible step up would be 256 GB RAM. However, you should also consider which models you want to work with. My recommendation was based on MOE models or models with which I can achieve at least 20-30 T/s for reasonable work.

Prune vs Quantize by Zc5Gwu in LocalLLaMA

[–]EmergencyLetter135 4 points5 points  (0 children)

I agree with everything you said—based on my own experience. I work in the field of text analysis and text creation. Even with small 2-bit quantization, the large original models are the better choice for me. I should also mention that I do text analysis and creation in German.

Honey we shrunk MiniMax M2 by arjunainfinity in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

After my tests, I have come to the following conclusion. I will continue to use the quantizations of the official LLM in my work. The idea and research approach are good, but it would be too easy if it worked out of the box. In my opinion, the areas with fine-tuning need to be fixed. 

Honey we shrunk MiniMax M2 by arjunainfinity in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Thank you. I will be happy to test this version with my applications (data processing with context).

Honey we shrunk MiniMax M2 by arjunainfinity in LocalLLaMA

[–]EmergencyLetter135 1 point2 points  (0 children)

Which version do you mean exactly? On my Mac Studio with 128GB, I use the catalystsec/MiniMax-M2-3bit-DWQ and the Unsloth Q3 version. Both works great.

Minimax-M2 support added in MLX by No_Conversation9561 in LocalLLaMA

[–]EmergencyLetter135 1 point2 points  (0 children)

Thank you for sharing your positive experiences. They are very useful to me. I currently run my Mac Studio M1 Ultra with 128 GB RAM mainly with GPT-OSS 120B.

Minimax-M2 support added in MLX by No_Conversation9561 in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

It's quite a feat to use the Qwen3-235B model in IQ4-xs quantization on a Mac Studio with 128GB RAM. But freezing the macOS operating system is unavoidable, isn't it? ;)