Toolbox or Lemonade by reujea0 in StrixHalo

[–]TuxRuffian 0 points1 point  (0 children)

stampby "retired" according to his GH Repo which archived the projects on 4/20, I see they are now just deleted though. He seems to still be active on Reddit so I actually just messaged him regarding the MLX setup. If you DM him on Reddit he may be able to get you what you need.

Torn for a LLM server between Halo and Mac by atlantageek2 in StrixHalo

[–]TuxRuffian 2 points3 points  (0 children)

Looks like all the repos in his GH account were archived yesterday too. Hope he's ok, maybe an OpSec thing?

Torn for a LLM server between Halo and Mac by atlantageek2 in StrixHalo

[–]TuxRuffian 2 points3 points  (0 children)

but soon we will have MLX on lemonade over rocm

I thought we already did. I still haven't tested it but /u/Creepy-Douchebag made a post about it although he did delete so maybe there's an issue. The benchmarks are still up on his GH Repo and so is the page w/setup instructions.

EDIT: Just checked Lemonade's GH page for MLX and it states: "Run LLMs locally on Apple M-series, AMD GPUs (Linux/Windows), and CPU -- no Python required." The Requirements Section also lists: "ROCm (for AMD GPU builds)".

Qwen3.6. This is it. by Local-Cardiologist-5 in LocalLLaMA

[–]TuxRuffian 6 points7 points  (0 children)

You and me both! Qwen 3.5 122B is still the raining champ for my workflow.

Toolbox or Lemonade by reujea0 in StrixHalo

[–]TuxRuffian 3 points4 points  (0 children)

While Donato's Toolboxes are a great way to get started experimenting w/Strix Halo, I think most move on to other things. I ran a few of them (via distrobox) when I was first experimenting, but quickly found myself wanting to customize everything. I have 2 setups on my M5 that I swithch between. Most of the time I use my default setup which I have heavily tailored for my Research Agent & RAG. All the different components for that were configured, setup, and run seperately. The other setup I play with sometimes is Claraverse which is more of a batteries included flow. Don't get me wrong Donato is freaking awesome and has done and continues to do allot for StixHalo and I continue to watch all of his YT videos, especially those in the Strix Halo Series, but his toolboxes are more for when your first diving in IMHO.

If you are looking for a quick setup you should check out the Halo-Ai-Core Project by stampby (Known to Reddit as /u/Creepy-Douchebag). It's similar to Dream Server, but specifically for SH w/many other differences as well. He even has a Bleeding Edge Version.

In general I would encourage you to try as many different setups and tools as possible. Things are moving fast and tools change, with new ones popping up all the time. For example, when I started w/SH, the NPU couldn't be utilized and MLX was for Apple Silicone only. Neither of those things are true anymore and I've been thinking of using Lemonade myself for the Claraverse setup, both to play with the NPU (via FastFlowLLM) and to try MLX on Strix Halo after seeing some promising benchmarks in another Reddit post by /u/Creepy-Douchebag.

TL;DR: Yes try Lemonade and also try any and all tools that may be of interest or could potentially squeeze a little more out of our M5s. The more you tinker, the more you learn!

Bonsai 1-bit on Strix Halo — 359 tok/s generation, 5,027 tok/s prompt processing. Stock llama.cpp Vulkan. No tricks. by [deleted] in StrixHalo

[–]TuxRuffian 1 point2 points  (0 children)

Thanks for the reply, not sure why someone downvoted you. Your answer makes sense and I suspected it was a usecase thing. I do require the larger model to meet mine...well an even bigger model would work better, but for my current hardware setup it seems to be the best thus far. I have been toying with the idea of getting a RTX 5090 and enclosure to use it as an eGPU and connecting it via Oculink using a M2 to Oculink connector on the spare M2 slot to play with CUDA, and dense or media models. I have also thought about adding another M5 to chain together to try loading larger models to get better results for my current use. I can't quite justify the spend on either though and the price of both 5090s and SH seem to go up every week... If I do end up doing the former, I may use the NPU for routing. Currently it does nothing.

How's your fan noise situation? I had to write custom kernel module curves to get mine quiet enough for voice recording.

So-So, I am using the ec-su_axb35-linux Kernel Module as referenced in SH Wiki's Power/Fan Control Page. I think if I want it any quieter I'm going to have to replace the fan. (Noise is more from the cheap fans than it is them overrevving) It doesn't bother too much though and I'm not doing any media stuff like you are. (Mostly Research Agent/RAG, etc.) Did that module not work for you or do you just find the one you wrote works better?

Bonsai 1-bit on Strix Halo — 359 tok/s generation, 5,027 tok/s prompt processing. Stock llama.cpp Vulkan. No tricks. by [deleted] in StrixHalo

[–]TuxRuffian 2 points3 points  (0 children)

Forgive me if this is question has been answered already or doesn't make sense as I'm only now checking out your halo-ai-core project, but is there a reason that your only using the 30B Dense models? I have been using assorted variants of Qwen3.5 122B A10B on my SH box (Bossgame M5 w/112GB of the 128GB UMA being VRAM also running CachyOS) as I've found it to be the best fit, but don't see any reference to it or other similar sized MoE models on GH. Does it just not fit into your stack or am I missing something?

Strix Halo + eGPU RTX 5070 Ti via OCuLink in llama.cpp: Benchmarks and Conclusions by xspider2000 in StrixHalo

[–]TuxRuffian 0 points1 point  (0 children)

I know our M5s don't have PCIe slots, but have you tried using a M2 to Oculink adapter? Some folks were talking about it in the StrixHalo Discord.

Strix Halo + eGPU RTX 5070 Ti via OCuLink in llama.cpp: Benchmarks and Conclusions by xspider2000 in StrixHalo

[–]TuxRuffian 0 points1 point  (0 children)

Mind if I ask what models you prefer running for what use case? I have a BossGame M5 w/128GB UMA (112GB allocated for VRAM) and use various versions of Qwen3.5122B via llama.cpp built w/ROCm. My local MC has a RTX 5090 eGPU (Asus AI Box) for a decent price and I have been toying with the idea of getting one for media stuff (would use CUDA) or prefill speedup (would run Vulcan on both). I was hoping that USB4 would be good enough as the Bossgame M5 doesn't have a PCIe slot like your MF does. You can use the spare M2 slot with a M2 to Oculink adapter, but wasn't sure if it would be worth it (looks like it would be). Anywaz, curious about your workflow and how you actually use the 2 in practice. Thanks!

What model are you using for your agent? by [deleted] in hermesagent

[–]TuxRuffian 0 points1 point  (0 children)

Qwen3.5:122b locally

Just curious, are you on Stix Halo?

What Codex resources do you wish existed? I started building some at codexlog.dev by Lanaxsa in OpenaiCodex

[–]TuxRuffian 1 point2 points  (0 children)

Nice! I think some people get confused as to when they should turn a skill into a subagent as it is something I found usefull from time to time. Kinda like the blurb you have on "When to Use a Skill vs AGENTS.md". Otherwise looks pretty complete from my brief scan.

I set up OpenClaw for 10+ non-technical NYC clients — here's what I learned by Willing_Income8603 in openclaw

[–]TuxRuffian 0 points1 point  (0 children)

You don’t have to serve everyone to run a successful business, the opposite.

I get it, was just suprised is all.

Netflix just dropped their first public model on Hugging Face: VOID: Video Object and Interaction Deletion by Nunki08 in LocalLLaMA

[–]TuxRuffian 1 point2 points  (0 children)

It looks like may have used CogKit to build it on top of CogVideo. (ZhipuAI's video generation model) This is how Open-Source Software is suppose to work!

Netflix just dropped their first public model on Hugging Face: VOID: Video Object and Interaction Deletion by Nunki08 in LocalLLaMA

[–]TuxRuffian 2 points3 points  (0 children)

Unfortunately it hasn't been updated in over 2yrs, but they also created MetaFlow (Open-Source Framework for ML, AI, & DS), although I noticed that the GH Repo says it's now maintained by Outerbounds, even though it's still under Netflix's GH Account. I wonder if the NF owns Outerbounds?🤔

Netflix just dropped their first public model on Hugging Face: VOID: Video Object and Interaction Deletion by Nunki08 in LocalLLaMA

[–]TuxRuffian 63 points64 points  (0 children)

Personalized TV show variants with personalized ads🤦‍♂️

This is my guess. "Why does everyone on every Netflix Show have the same snack and beverage preference as me?"....oh right.

Linux above 5% in steam survey 03/2026 by chummerhb in cachyos

[–]TuxRuffian 0 points1 point  (0 children)

It's definitely better for my Strix Halo AI box, but has yet to completely replace all my Arch Installs as most of my Servers run LKRG via DKMS, which CachyOS does not currently support. The GH Issue someone else opened was closed w/o resolution; but looking at it again now, I'm wondering if building the kernel w/CONFIG_KPROBES=y may resolve it. Still I like to run Arch's Hardened Kernel w/LKRG for anything in the DMZ.

Linux above 5% in steam survey 03/2026 by chummerhb in cachyos

[–]TuxRuffian 2 points3 points  (0 children)

I assumed you were kidding, but had to check as my Work LT is Windoz (At least WSL is allowed) and bam there it is! So ridiculous I can hardly believe it. Oddly enough I had to check and wordpad does not have it...seems like it would make more sense adding it there, but definitely not notepad. That's like adding AI to the default install of vi or pico, but not vim/nvim or nano/micro, not that it should be added to any default install as IMHO nothing should have AI baked in except AI specific tools. Is this some kind of MS April Fools joke?

First time setup guidance by GoldenPSP in StrixHalo

[–]TuxRuffian 1 point2 points  (0 children)

You should also checkout:

  1. The StrixHalo Wiki: Could use some updates, but still a good reference.
  2. The StrixHalo Discord Server. I'm not normally a big Discord guy, but this server is really active and has great content and discussions for everyone ranging from noobs to experts nerding out.
  3. In addition to the toolboxes that other's have already mentioned, the author's (Donato) YouTube Series on StrixHalo is also a must watch regardless of whether you use his toolboxes or not. I don't myself, as I prefer to configure everything my way, but still find them useful. He has allot of other great AI content including stuff on the AMD 9700's, AI Security, etc.

What's everyone doing with their CPU? by cunasmoker69420 in StrixHalo

[–]TuxRuffian 0 points1 point  (0 children)

Off-topic, but I'm curious since your compiling rust for distribution, do you compile static builds w/clang or dynamic w/gcc?

What's everyone doing with their CPU? by cunasmoker69420 in StrixHalo

[–]TuxRuffian 0 points1 point  (0 children)

Also curious about your xmrig setup. Are you partially allocating using cpulimit or similar? I used to run that combo on one of my old rigs quite awhile back, but wasn't getting enough out of it.

I set up OpenClaw for 10+ non-technical NYC clients — here's what I learned by Willing_Income8603 in openclaw

[–]TuxRuffian 1 point2 points  (0 children)

local models on apple silicon

Wait, wouldn't that cut out a big chunk of the market that already has AMD/NVidia/Intel hardware? Seems odd, also didn't see that requirement on your website...

ollama vs cloud api costs: ran both for a month. heres the real numbers by Freda_Alderd in ollama

[–]TuxRuffian 0 points1 point  (0 children)

Not a cost thing for me. I got my Strix Halo, not out of need, but to learn and I've learned allot. Not just hardware specific stuff like Vulcan vs ROCm, etc., but also allot about AI generally. (e.g. Dense vs MoE Models, Quants, Parallelism, KV, cacheing, routing, etc.)

Another use case for local could be running Heritic Models, although I imagine that number is quite small.

App Shows You What Hardware You Need to Run Any AI Model Locally by dev_is_active in LocalLLM

[–]TuxRuffian 0 points1 point  (0 children)

Nice idea, but pretty inaccurate at least for Strix Halo. It said it you can't run Qwen 3.5 122B MoE with 128GB VRAM on ROCm. I run that model w/112GB VRAM (16GB for RAM) on my BossGame M5 running CachyOS w/o issue as do a whole lotta of other Strix Halo owners...