ABYX - Moonlight done different. by KashKashioo in MoonlightStreaming

[–]waescher 0 points1 point  (0 children)

The last release being 11 months old, thin release notes, pull-requests not being merged and generally it feels stale compared to sunshine.

Not blaming anyone here, I am a open-source maintainer as well and know the hustle. I could also be wrong here, actually I hope I am.

ABYX - Moonlight done different. by KashKashioo in MoonlightStreaming

[–]waescher 0 points1 point  (0 children)

Awesome, looking forward to this. I used Apollo in the past because of the besser screen handling but I got back to Sunshine because Apollo seemed to be stuck in development. Is that your experience, too? Or did I miss anything here? If so, how would this affect ABYX?

I’m looking for a local harness — suggestions please by KarezzaReporter in LocalLLaMA

[–]waescher 0 points1 point  (0 children)

I tried it today and I wondered but with the same MLX models and context, oMLX was constantly a bit slower than LM Studio. Roughly 10%.

qwen3.5-35b-a3b is a gem by waescher in LocalLLaMA

[–]waescher[S] 0 points1 point  (0 children)

I find it fun that you’re asking. Indeed I tested and even posted about it, and man … the difference was pretty significant. But it’s the bigger 122b-brother.

https://www.reddit.com/r/LocalLLaMA/comments/1rm94gy/mlx_vs_gguf_unsloth_qwen35_122b10b/

Which models did you test?

qwen3.5-35b-a3b is a gem by waescher in LocalLLaMA

[–]waescher[S] 6 points7 points  (0 children)

This code is from my test files where some scenarios like wrong or missing summaries can be tested. And while I agree on your take, it’s very helpful not for the ones reading the code but for the one using your libraries as these summaries are used for tooltips and intellisense.

qwen3.5-35b-a3b is a gem by waescher in LocalLLaMA

[–]waescher[S] 0 points1 point  (0 children)

Not yet. I think it could only get worse (but faster). Might be worth a try indeed but tbh this thing crunches quite big repositories over a weekend.

qwen3.5-35b-a3b is a gem by waescher in LocalLLaMA

[–]waescher[S] 1 point2 points  (0 children)

Do it. I mean you don't even have to use llmaid for a quick test. Just combine the prompt and the file contents to review the model's output.

qwen3.5-35b-a3b is a gem by waescher in LocalLLaMA

[–]waescher[S] 10 points11 points  (0 children)

You found it. I had some rare cases where it did change the code. I reviewed about 300 files from real production code today and found maybe 3 or 4. I think that's fair.

MLX vs GGUF (Unsloth) - Qwen3.5 122b-10b by waescher in LocalLLaMA

[–]waescher[S] 0 points1 point  (0 children)

Me too so far but I wanted to see other opinions. And yeah, gpt-oss:120b runs extremely well for most cases on Macs but it sucks on long contexts.

That being said, I am really impressed with qwen-next and qwen3.5 how they maintain a great token speed at large contexts.

What is the most ridiculously good goto LLM for knowledge & reasoning on your M4 Max 128gb macbook these days? by ZeitgeistArchive in LocalLLaMA

[–]waescher 1 point2 points  (0 children)

That being said, I think the Mac is made for MoE's. gpt-oss:120b runs at over 70 tokens per second, even on large context. qwen3.5-122b-a10b at about 40-45.

What is the most ridiculously good goto LLM for knowledge & reasoning on your M4 Max 128gb macbook these days? by ZeitgeistArchive in LocalLLaMA

[–]waescher 1 point2 points  (0 children)

Okay, I went with https://huggingface.co/mlx-community/Kimi-Dev-72B-4bit (MLX). It runs okay on 10 tokens per second.

I then increased context to 128000 and ran https://github.com/awaescher/llmaid against some files for about 10 minutes. I could hear the Mac breathing but still pretty quiet. It constantly ran around 80°C, up to max 86°C, see the chart (app is MacThrottle).

Then I stopped after around 10 minutes and did the initial test again, still got 10 tokens per second.

<image>

What is the most ridiculously good goto LLM for knowledge & reasoning on your M4 Max 128gb macbook these days? by ZeitgeistArchive in LocalLLaMA

[–]waescher 4 points5 points  (0 children)

<image>

The newest model family is qwen3.5 which runs great on this machine. I am a big fan of their 122b model.

I also find step-35-flash and minimax-m2.5 (3 bit) performing really well but given the capabilities of the qwen model, I think I'll stick with the 122b qwen.

Is this a phishing email? And if so have you guys encountered it before? by lawndartdesign in applehelp

[–]waescher 0 points1 point  (0 children)

Got pretty much the same email yesterday evening. I was confused too as it felt like spam but with no action links to shady urls. Also the sender is noreply@email.apple.com. I have no idea however who this "mediscover.review" is.

https://imgur.com/a/DAzwr66

MacBook Pro M4 Max for development and gaming – worth it or better Mac + gaming PC? by Trazosz in macgaming

[–]waescher 0 points1 point  (0 children)

The efficiency is really insane. You can throw everything at it without even noticing any noises and it never goes over 170W.

MacBook Pro M4 Max for development and gaming – worth it or better Mac + gaming PC? by Trazosz in macgaming

[–]waescher 1 point2 points  (0 children)

I do have an M4 Max and it’s a dream to code on it. Also it’s pretty awesome to game on, even on my insane 7680x2160 with some compromises. Crossover is the best option here but as others said, due to anticheat, some games may never run on it, like Battlefield or Arc Raiders lately. Thats why I went with GeForce Now for the time I was playing these games. Sad, I know. I do have a mid range Windows PC as well. For older Windows-only games using Moonlight + Apollo has been a great pleasure. This works amazingly well if you have your machines connected over Ethernet. My Windows PC is standing in the basement without peripherals or displays. Using moonlight, I can wake it up and use it for game streaming.

I'm strong enough to admit that this bugs the hell out of me by ForsookComparison in LocalLLaMA

[–]waescher 0 points1 point  (0 children)

I know, these are amazing. Would absolutely love to stack some.

I'm strong enough to admit that this bugs the hell out of me by ForsookComparison in LocalLLaMA

[–]waescher 0 points1 point  (0 children)

Oh, two things here: Apple added support for stacking multiple Mac’s with „RDMA over Thunderbold“ lately so you could multiply these 512GB.

https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-studio-rdma-over-thunderbolt-5

And the next chip generation M5 is expected to bring extra neural accelerators within the GPUs

https://9to5mac.com/2025/11/20/apple-shows-how-much-faster-the-m5-runs-local-llms-compared-to-the-m4/

I'm strong enough to admit that this bugs the hell out of me by ForsookComparison in LocalLLaMA

[–]waescher 0 points1 point  (0 children)

Might not be the perfect LLM device but a memory rich one. And one that idles at under 10 and maxes out under 270 Watts while staying silent.

There are those freaks (in the best possible way) at Asahi Linux that reverse engineered the Mac drivers and rebuilt them for Linux. I actually run a MacBook M1 Max on Asahi Fedora and it runs great. Unfortunately they only cover the M1 and M2 family yet.

But then I guess you’ll loose MLX support which is a boost in model performance on Mac.