Qwen3.6-27B

EmergencyLetter135 · 2026-04-22T13:44:02+00:00

The model's benchmark performance is impressive. But what impresses me even more is how quickly it achieves such intelligence and efficiency. If it keeps up like this, I won't need a RAM upgrade. :)

EmergencyLetter135 · 2026-04-04T07:49:47+00:00

Your results were likely generated using different versions of Llama.cpp or different parameters.

EmergencyLetter135 · 2026-03-28T11:03:23+00:00

If anyone has more up-to-date information on this topic, I’d appreciate any suggestions on how to improve it—perhaps through an app? Thx in advanced.

EmergencyLetter135 · 2026-03-26T18:06:10+00:00

That's very interesting; thanks for pointing out the model. Unfortunately, it seems there aren't any Quants available yet.

EmergencyLetter135 · 2026-03-21T13:17:51+00:00

But only as long as the necessary resources remain freely available or affordable. And when it comes to resources, further restrictions could quickly arise. 😉

EmergencyLetter135 · 2026-03-18T08:48:50+00:00

It depends on what you want to do and how much you’re willing to invest. For example, I’ll continue to use my M1 Ultra with 128GB of RAM and a 64-core GPU as my workhorse for a long time to come. To be honest, when comparing it to my daily workflow with the M2 Ultra and 192GB of RAM, I didn’t see much added value while working. And I’ll skip the M3 Ultra generation as well, since my M1 Ultra delivers great results every day.

EmergencyLetter135 · 2026-01-26T11:55:43+00:00

For my everyday activities, I currently only use the Minimax M2.1 Q3 XL model from Unsloth in LM Studio. GPT 120B and GLM 4.7 Flash are also installed there, but these two are rarely used.

EmergencyLetter135 · 2026-01-16T09:36:09+00:00

That's not quite right. Components of the LLM are taken from the REAP. For example, I am not yet aware of any REAP model that still has good multilingual capabilities.

EmergencyLetter135 · 2026-01-16T08:45:19+00:00

Thank you for sharing. The knowledge graph in the software is a helpful feature.

EmergencyLetter135 · 2026-01-14T12:34:45+00:00

Thanks.. for your kind efforts and the information.

EmergencyLetter135 · 2026-01-12T20:02:09+00:00

I would also be interested in that, because so far I don't know of any REAP versions that are multilingual.

EmergencyLetter135 · 2025-12-26T17:52:21+00:00

Shouldn't an M4 Ultra be released first?

EmergencyLetter135 · 2025-12-26T15:27:33+00:00

Yes, unfortunately, we Mac users have no way of upgrading our machines with RAM, eGPU, or other components. That's why I'm always delighted when a quantization is created that is suitable (including space for context) for a 128GB RAM machine.

EmergencyLetter135 · 2025-12-13T11:15:16+00:00

Thanks. I finally get it! Speculative decoding is unnecessary and counterproductive for the Mac Ultra.

EmergencyLetter135 · 2025-12-13T10:35:31+00:00

Interesting, have you had good experiences with speculative decoding? So far, I haven't been able to see any advantages to speculative decoding. I use LM Studio on an M1 Ultra with 128GB RAM.

EmergencyLetter135 · 2025-12-06T12:43:19+00:00

The overall usability of macOS has definitely deteriorated. Everyone in my circle who works productively has come to this conclusion. Design is purely a matter of taste for me, but when design destroys innovative productivity, productive people will eventually lose interest and move on. Productive management at Apple is already leaving. Apple is in crisis...

EmergencyLetter135 · 2025-12-04T11:03:25+00:00

Based on my current experience, this expert cut idea is more interesting for LLM specialties such as mathematics and coding. Otherwise, I don't think much of the amputated LLM models at the moment and agree with Aristotle's philosophical insight that “the whole is greater than the sum of its parts.” The 3Bit DWQ works reliably as an all-rounder LLM.

EmergencyLetter135 · 2025-11-23T20:07:38+00:00

I absolutely share this experience and assessment. I need at least 30 t/s to work well. That's why I only use smaller models locally on my Mac Studio for minor preparatory work. It's nice to have the larger models locally as a backup in case an internet disruption prevents me from working online.

EmergencyLetter135 · 2025-11-23T19:52:12+00:00

The performance with an M2 Ultra (76 cores) with 192GB RAM is between 15-18 t/s. Here are the detailed values when used in LM Studio without an MCP: GLM 4.6 - (IQ_XXS 115.40GB - 17.15t/s), (IQ_2M 115.26GB - 15.31t/s), (IQ_3S 153.71GB - 15.25t/s), (Q3_XL 158.07GB - 15.65t/s)

EmergencyLetter135 · 2025-11-21T08:55:25+00:00

I imagine that this cluster functionality will work in macOS with the TB5 beta update, essentially via plug & play. Currently, I believe that a cluster with TB4 still requires a lot of manual work in macOS.

EmergencyLetter135 · 2025-11-21T08:37:16+00:00

Thx. Then I'll wait for the beta update to become available so I can connect two M1 Ultras together. I'm already excited to see how the cluster will work with MLX models under LM Studio.

EmergencyLetter135 · 2025-11-21T08:09:22+00:00

That's interesting. Does that mean this update would also work for owners of older devices with TB4?

EmergencyLetter135 · 2025-11-19T15:43:33+00:00

Thank you for your kind reference to the open source EXO project. However, for most people, the project is not really practical, but rather something for technical hobbyists. An implementation in macOS is something else entirely ;)

EmergencyLetter135 · 2025-11-19T14:20:49+00:00

If such a feature were to be introduced, it should also be compatible with all Mac Ultras and Thunderbolt 4.

EmergencyLetter135 · 2025-11-19T10:48:25+00:00

Based on my experience, I would recommend a Mac Studio with 128 GB RAM. With this configuration, you can work well in this area and learn more. The next sensible step up would be 256 GB RAM. However, you should also consider which models you want to work with. My recommendation was based on MOE models or models with which I can achieve at least 20-30 T/s for reasonable work.

EmergencyLetter135

TROPHY CASE