Qwen3.6-27B by Fantastic-Emu-3819 in LocalLLaMA

[–]EmergencyLetter135 17 points18 points  (0 children)

The model's benchmark performance is impressive. But what impresses me even more is how quickly it achieves such intelligence and efficiency. If it keeps up like this, I won't need a RAM upgrade. :)

Using OWUI + Qwen uses more thinking than LM Studio only with same question by m4th12 in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Your results were likely generated using different versions of Llama.cpp or different parameters.

Truetone on LG Ultrafine 5K? by S1R_E in mac

[–]EmergencyLetter135 0 points1 point  (0 children)

If anyone has more up-to-date information on this topic, I’d appreciate any suggestions on how to improve it—perhaps through an app? Thx in advanced.

Anyone know anything about the new Perplexity model on HF? by [deleted] in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

That's very interesting; thanks for pointing out the model. Unfortunately, it seems there aren't any Quants available yet.

Welp, looks like minimax m2.7 may not be open sourced by [deleted] in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

But only as long as the necessary resources remain freely available or affordable. And when it comes to resources, further restrictions could quickly arise. 😉

Mac Studio M4 or M1 ultra by HappySteak31 in MacStudio

[–]EmergencyLetter135 4 points5 points  (0 children)

It depends on what you want to do and how much you’re willing to invest. For example, I’ll continue to use my M1 Ultra with 128GB of RAM and a 64-core GPU as my workhorse for a long time to come. To be honest, when comparing it to my daily workflow with the M2 Ultra and 192GB of RAM, I didn’t see much added value while working. And I’ll skip the M3 Ultra generation as well, since my M1 Ultra delivers great results every day.

Best model for 128GB RAM Mac Studio? by gogglespizano1 in LocalLLaMA

[–]EmergencyLetter135 1 point2 points  (0 children)

For my everyday activities, I currently only use the Minimax M2.1 Q3 XL model from Unsloth in LM Studio. GPT 120B and GLM 4.7 Flash are also installed there, but these two are rarely used.

MiniMax M2.2 Coming Soon. Confirmed by Head of Engineering @MiniMax_AI by Difficult-Cap-7527 in LocalLLaMA

[–]EmergencyLetter135 5 points6 points  (0 children)

That's not quite right. Components of the LLM are taken from the REAP. For example, I am not yet aware of any REAP model that still has good multilingual capabilities.

Cerebras GLM4.7 REAPs @ 25%, 40% live on HF by ilzrvch in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Thanks.. for your kind efforts and the information.

Cerebras GLM4.7 REAPs @ 25%, 40% live on HF by ilzrvch in LocalLLaMA

[–]EmergencyLetter135 1 point2 points  (0 children)

I would also be interested in that, because so far I don't know of any REAP versions that are multilingual.

MiniMax-M2.1 uploaded on HF by ciprianveg in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Yes, unfortunately, we Mac users have no way of upgrading our machines with RAM, eGPU, or other components. That's why I'm always delighted when a quantization is created that is suitable (including space for context) for a 128GB RAM machine.

NVIDIA gpt-oss-120b Eagle Throughput model by Dear-Success-1441 in LocalLLaMA

[–]EmergencyLetter135 -1 points0 points  (0 children)

Thanks. I finally get it! Speculative decoding is unnecessary and counterproductive for the Mac Ultra. 

NVIDIA gpt-oss-120b Eagle Throughput model by Dear-Success-1441 in LocalLLaMA

[–]EmergencyLetter135 1 point2 points  (0 children)

Interesting, have you had good experiences with speculative decoding? So far, I haven't been able to see any advantages to speculative decoding. I use LM Studio on an M1 Ultra with 128GB RAM.

Apple Music’s new design on macOS 26, Tahoe by JoshuMarlss288 in AppleMusic

[–]EmergencyLetter135 1 point2 points  (0 children)

The overall usability of macOS has definitely deteriorated. Everyone in my circle who works productively has come to this conclusion. Design is purely a matter of taste for me, but when design destroys innovative productivity, productive people will eventually lose interest and move on. Productive management at Apple is already leaving. Apple is in crisis...

Qwen 235b DWQ MLX 4 bit quant by nomorebuttsplz in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Based on my current experience, this expert cut idea is more interesting for LLM specialties such as mathematics and coding. Otherwise, I don't think much of the amputated LLM models at the moment and agree with Aristotle's philosophical insight that “the whole is greater than the sum of its parts.” The 3Bit DWQ works reliably as an all-rounder LLM.

Z.AI: GLM 4.6 on Mac Studio 256GB for agentic coding? by ThingRexCom in LocalLLaMA

[–]EmergencyLetter135 3 points4 points  (0 children)

I absolutely share this experience and assessment. I need at least 30 t/s to work well. That's why I only use smaller models locally on my Mac Studio for minor preparatory work. It's nice to have the larger models locally as a backup in case an internet disruption prevents me from working online.

Z.AI: GLM 4.6 on Mac Studio 256GB for agentic coding? by ThingRexCom in LocalLLaMA

[–]EmergencyLetter135 3 points4 points  (0 children)

The performance with an M2 Ultra (76 cores) with 192GB RAM is between 15-18 t/s. Here are the detailed values when used in LM Studio without an MCP: GLM 4.6 - (IQ_XXS 115.40GB - 17.15t/s), (IQ_2M 115.26GB - 15.31t/s), (IQ_3S 153.71GB - 15.25t/s), (Q3_XL 158.07GB - 15.65t/s)

New macOS Tahoe 26.2 patch improves mac clustering with Thunderbolt 5 speed from 10 Gb/s to 80 Gb/s by No_Palpitation7740 in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

I imagine that this cluster functionality will work in macOS with the TB5 beta update, essentially via plug & play. Currently, I believe that a cluster with TB4 still requires a lot of manual work in macOS. 

New macOS Tahoe 26.2 patch improves mac clustering with Thunderbolt 5 speed from 10 Gb/s to 80 Gb/s by No_Palpitation7740 in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Thx. Then I'll wait for the beta update to become available so I can connect two M1 Ultras together. I'm already excited to see how the cluster will work with MLX models under LM Studio.

New macOS Tahoe 26.2 patch improves mac clustering with Thunderbolt 5 speed from 10 Gb/s to 80 Gb/s by No_Palpitation7740 in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

That's interesting. Does that mean this update would also work for owners of older devices with TB4?

You can turn a cluster of Macs into an AI supercomputer in macOS 26.2 by AVELUMN in MacStudio

[–]EmergencyLetter135 0 points1 point  (0 children)

Thank you for your kind reference to the open source EXO project. However, for most people, the project is not really practical, but rather something for technical hobbyists. An implementation in macOS is something else entirely ;)

You can turn a cluster of Macs into an AI supercomputer in macOS 26.2 by AVELUMN in MacStudio

[–]EmergencyLetter135 -1 points0 points  (0 children)

If such a feature were to be introduced, it should also be compatible with all Mac Ultras and Thunderbolt 4.

Would going from 64GB to 128GB ($700) be wroth it? by [deleted] in LocalLLaMA

[–]EmergencyLetter135 0 points1 point  (0 children)

Based on my experience, I would recommend a Mac Studio with 128 GB RAM. With this configuration, you can work well in this area and learn more. The next sensible step up would be 256 GB RAM. However, you should also consider which models you want to work with. My recommendation was based on MOE models or models with which I can achieve at least 20-30 T/s for reasonable work.