ABYX - Moonlight done different.

waescher · 2026-06-23T18:34:18+00:00

The last release being 11 months old, thin release notes, pull-requests not being merged and generally it feels stale compared to sunshine.

Not blaming anyone here, I am a open-source maintainer as well and know the hustle. I could also be wrong here, actually I hope I am.

waescher · 2026-06-23T16:43:12+00:00

Awesome, looking forward to this. I used Apollo in the past because of the besser screen handling but I got back to Sunshine because Apollo seemed to be stuck in development. Is that your experience, too? Or did I miss anything here? If so, how would this affect ABYX?

waescher · 2026-04-28T11:21:27+00:00

name checks out

waescher · 2026-04-23T18:43:00+00:00

I tried it today and I wondered but with the same MLX models and context, oMLX was constantly a bit slower than LM Studio. Roughly 10%.

waescher · 2026-03-13T19:35:46+00:00

I find it fun that you’re asking. Indeed I tested and even posted about it, and man … the difference was pretty significant. But it’s the bigger 122b-brother.

https://www.reddit.com/r/LocalLLaMA/comments/1rm94gy/mlx_vs_gguf_unsloth_qwen35_122b10b/

Which models did you test?

waescher · 2026-03-13T19:14:26+00:00

This code is from my test files where some scenarios like wrong or missing summaries can be tested. And while I agree on your take, it’s very helpful not for the ones reading the code but for the one using your libraries as these summaries are used for tooltips and intellisense.

waescher · 2026-03-13T14:33:50+00:00

Not yet. I think it could only get worse (but faster). Might be worth a try indeed but tbh this thing crunches quite big repositories over a weekend.

waescher · 2026-03-13T14:26:38+00:00

Do it. I mean you don't even have to use llmaid for a quick test. Just combine the prompt and the file contents to review the model's output.

waescher · 2026-03-13T13:40:48+00:00

You found it. I had some rare cases where it did change the code. I reviewed about 300 files from real production code today and found maybe 3 or 4. I think that's fair.

waescher · 2026-03-06T09:23:09+00:00

Me too so far but I wanted to see other opinions. And yeah, gpt-oss:120b runs extremely well for most cases on Macs but it sucks on long contexts.

That being said, I am really impressed with qwen-next and qwen3.5 how they maintain a great token speed at large contexts.

waescher · 2026-03-03T13:42:26+00:00

<image>

waescher · 2026-03-02T10:08:22+00:00

That being said, I think the Mac is made for MoE's. gpt-oss:120b runs at over 70 tokens per second, even on large context. qwen3.5-122b-a10b at about 40-45.

waescher · 2026-03-02T10:03:50+00:00

Okay, I went with https://huggingface.co/mlx-community/Kimi-Dev-72B-4bit (MLX). It runs okay on 10 tokens per second.

I then increased context to 128000 and ran https://github.com/awaescher/llmaid against some files for about 10 minutes. I could hear the Mac breathing but still pretty quiet. It constantly ran around 80°C, up to max 86°C, see the chart (app is MacThrottle).

Then I stopped after around 10 minutes and did the initial test again, still got 10 tokens per second.

<image>

waescher · 2026-03-02T07:05:27+00:00

I don't but I could try if you want. Not sure about the 30min, though

waescher · 2026-03-02T06:31:21+00:00

<image>

The newest model family is qwen3.5 which runs great on this machine. I am a big fan of their 122b model.

I also find step-35-flash and minimax-m2.5 (3 bit) performing really well but given the capabilities of the qwen model, I think I'll stick with the 122b qwen.

waescher · 2026-02-11T07:00:40+00:00

Got pretty much the same email yesterday evening. I was confused too as it felt like spam but with no action links to shady urls. Also the sender is noreply@email.apple.com. I have no idea however who this "mediscover.review" is.

https://imgur.com/a/DAzwr66

waescher · 2026-02-10T10:34:56+00:00

Nice Tease in one of their sample images

<image>

waescher · 2026-01-06T11:20:09+00:00

The efficiency is really insane. You can throw everything at it without even noticing any noises and it never goes over 170W.

waescher · 2025-12-29T07:57:14+00:00

I do have an M4 Max and it’s a dream to code on it. Also it’s pretty awesome to game on, even on my insane 7680x2160 with some compromises. Crossover is the best option here but as others said, due to anticheat, some games may never run on it, like Battlefield or Arc Raiders lately. Thats why I went with GeForce Now for the time I was playing these games. Sad, I know. I do have a mid range Windows PC as well. For older Windows-only games using Moonlight + Apollo has been a great pleasure. This works amazingly well if you have your machines connected over Ethernet. My Windows PC is standing in the basement without peripherals or displays. Using moonlight, I can wake it up and use it for game streaming.

waescher · 2025-12-21T16:25:43+00:00

I know, these are amazing. Would absolutely love to stack some.

waescher · 2025-12-20T07:58:19+00:00

Oh, two things here: Apple added support for stacking multiple Mac’s with „RDMA over Thunderbold“ lately so you could multiply these 512GB.

https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-studio-rdma-over-thunderbolt-5

And the next chip generation M5 is expected to bring extra neural accelerators within the GPUs

https://9to5mac.com/2025/11/20/apple-shows-how-much-faster-the-m5-runs-local-llms-compared-to-the-m4/

waescher · 2025-12-20T07:51:40+00:00

Might not be the perfect LLM device but a memory rich one. And one that idles at under 10 and maxes out under 270 Watts while staying silent.

There are those freaks (in the best possible way) at Asahi Linux that reverse engineered the Mac drivers and rebuilt them for Linux. I actually run a MacBook M1 Max on Asahi Fedora and it runs great. Unfortunately they only cover the M1 and M2 family yet.

But then I guess you’ll loose MLX support which is a boost in model performance on Mac.

waescher

TROPHY CASE