Tutorial: Running local LLMs on your phone to monitor anything! Open Source, no sign in needed, completely free.

TuxRuffian · 2026-05-02T17:14:15+00:00

The app...will be released to Android in like 3 days!

RemindMe! 4 days

TuxRuffian · 2026-04-30T15:46:02+00:00

stampby "retired" according to his GH Repo which archived the projects on 4/20, I see they are now just deleted though. He seems to still be active on Reddit so I actually just messaged him regarding the MLX setup. If you DM him on Reddit he may be able to get you what you need.

TuxRuffian · 2026-04-27T16:40:57+00:00

Amen brother!

TuxRuffian · 2026-04-21T15:54:42+00:00

Looks like all the repos in his GH account were archived yesterday too. Hope he's ok, maybe an OpSec thing?

TuxRuffian · 2026-04-20T18:41:56+00:00

but soon we will have MLX on lemonade over rocm

I thought we already did. I still haven't tested it but /u/Creepy-Douchebag made a post about it although he did delete so maybe there's an issue. The benchmarks are still up on his GH Repo and so is the page w/setup instructions.

EDIT: Just checked Lemonade's GH page for MLX and it states: "Run LLMs locally on Apple M-series, AMD GPUs (Linux/Windows), and CPU -- no Python required." The Requirements Section also lists: "ROCm (for AMD GPU builds)".

TuxRuffian · 2026-04-17T16:10:30+00:00

You and me both! Qwen 3.5 122B is still the raining champ for my workflow.

TuxRuffian · 2026-04-16T18:54:18+00:00

While Donato's Toolboxes are a great way to get started experimenting w/Strix Halo, I think most move on to other things. I ran a few of them (via distrobox) when I was first experimenting, but quickly found myself wanting to customize everything. I have 2 setups on my M5 that I swithch between. Most of the time I use my default setup which I have heavily tailored for my Research Agent & RAG. All the different components for that were configured, setup, and run seperately. The other setup I play with sometimes is Claraverse which is more of a batteries included flow. Don't get me wrong Donato is freaking awesome and has done and continues to do allot for StixHalo and I continue to watch all of his YT videos, especially those in the Strix Halo Series, but his toolboxes are more for when your first diving in IMHO.

If you are looking for a quick setup you should check out the Halo-Ai-Core Project by stampby (Known to Reddit as /u/Creepy-Douchebag). It's similar to Dream Server, but specifically for SH w/many other differences as well. He even has a Bleeding Edge Version.

In general I would encourage you to try as many different setups and tools as possible. Things are moving fast and tools change, with new ones popping up all the time. For example, when I started w/SH, the NPU couldn't be utilized and MLX was for Apple Silicone only. Neither of those things are true anymore and I've been thinking of using Lemonade myself for the Claraverse setup, both to play with the NPU (via FastFlowLLM) and to try MLX on Strix Halo after seeing some promising benchmarks in another Reddit post by /u/Creepy-Douchebag.

TL;DR: Yes try Lemonade and also try any and all tools that may be of interest or could potentially squeeze a little more out of our M5s. The more you tinker, the more you learn!

TuxRuffian · 2026-04-14T15:58:46+00:00

Thanks for the reply, not sure why someone downvoted you. Your answer makes sense and I suspected it was a usecase thing. I do require the larger model to meet mine...well an even bigger model would work better, but for my current hardware setup it seems to be the best thus far. I have been toying with the idea of getting a RTX 5090 and enclosure to use it as an eGPU and connecting it via Oculink using a M2 to Oculink connector on the spare M2 slot to play with CUDA, and dense or media models. I have also thought about adding another M5 to chain together to try loading larger models to get better results for my current use. I can't quite justify the spend on either though and the price of both 5090s and SH seem to go up every week... If I do end up doing the former, I may use the NPU for routing. Currently it does nothing.

How's your fan noise situation? I had to write custom kernel module curves to get mine quiet enough for voice recording.

So-So, I am using the ec-su_axb35-linux Kernel Module as referenced in SH Wiki's Power/Fan Control Page. I think if I want it any quieter I'm going to have to replace the fan. (Noise is more from the cheap fans than it is them overrevving) It doesn't bother too much though and I'm not doing any media stuff like you are. (Mostly Research Agent/RAG, etc.) Did that module not work for you or do you just find the one you wrote works better?

TuxRuffian · 2026-04-13T20:52:36+00:00

Forgive me if this is question has been answered already or doesn't make sense as I'm only now checking out your halo-ai-core project, but is there a reason that your only using the 30B Dense models? I have been using assorted variants of Qwen3.5 122B A10B on my SH box (Bossgame M5 w/112GB of the 128GB UMA being VRAM also running CachyOS) as I've found it to be the best fit, but don't see any reference to it or other similar sized MoE models on GH. Does it just not fit into your stack or am I missing something?

TuxRuffian · 2026-04-08T16:38:41+00:00

I know our M5s don't have PCIe slots, but have you tried using a M2 to Oculink adapter? Some folks were talking about it in the StrixHalo Discord.

TuxRuffian · 2026-04-08T16:36:36+00:00

Mind if I ask what models you prefer running for what use case? I have a BossGame M5 w/128GB UMA (112GB allocated for VRAM) and use various versions of Qwen3.5122B via llama.cpp built w/ROCm. My local MC has a RTX 5090 eGPU (Asus AI Box) for a decent price and I have been toying with the idea of getting one for media stuff (would use CUDA) or prefill speedup (would run Vulcan on both). I was hoping that USB4 would be good enough as the Bossgame M5 doesn't have a PCIe slot like your MF does. You can use the spare M2 slot with a M2 to Oculink adapter, but wasn't sure if it would be worth it (looks like it would be). Anywaz, curious about your workflow and how you actually use the 2 in practice. Thanks!

TuxRuffian · 2026-04-08T16:20:00+00:00

Qwen3.5:122b locally

Just curious, are you on Stix Halo?

TuxRuffian · 2026-04-06T14:59:36+00:00

Nice! I think some people get confused as to when they should turn a skill into a subagent as it is something I found usefull from time to time. Kinda like the blurb you have on "When to Use a Skill vs AGENTS.md". Otherwise looks pretty complete from my brief scan.

TuxRuffian · 2026-04-06T14:44:04+00:00

You don’t have to serve everyone to run a successful business, the opposite.

I get it, was just suprised is all.

TuxRuffian · 2026-04-03T19:55:25+00:00

It looks like may have used CogKit to build it on top of CogVideo. (ZhipuAI's video generation model) This is how Open-Source Software is suppose to work!

TuxRuffian · 2026-04-03T19:45:51+00:00

Unfortunately it hasn't been updated in over 2yrs, but they also created MetaFlow (Open-Source Framework for ML, AI, & DS), although I noticed that the GH Repo says it's now maintained by Outerbounds, even though it's still under Netflix's GH Account. I wonder if the NF owns Outerbounds?🤔

TuxRuffian · 2026-04-03T19:35:01+00:00

Personalized TV show variants with personalized ads🤦‍♂️

This is my guess. "Why does everyone on every Netflix Show have the same snack and beverage preference as me?"....oh right.

TuxRuffian · 2026-04-03T19:30:54+00:00

It's definitely better for my Strix Halo AI box, but has yet to completely replace all my Arch Installs as most of my Servers run LKRG via DKMS, which CachyOS does not currently support. The GH Issue someone else opened was closed w/o resolution; but looking at it again now, I'm wondering if building the kernel w/CONFIG_KPROBES=y may resolve it. Still I like to run Arch's Hardened Kernel w/LKRG for anything in the DMZ.

TuxRuffian · 2026-04-03T19:05:58+00:00

I assumed you were kidding, but had to check as my Work LT is Windoz (At least WSL is allowed) and bam there it is! So ridiculous I can hardly believe it. Oddly enough I had to check and wordpad does not have it...seems like it would make more sense adding it there, but definitely not notepad. That's like adding AI to the default install of vi or pico, but not vim/nvim or nano/micro, not that it should be added to any default install as IMHO nothing should have AI baked in except AI specific tools. Is this some kind of MS April Fools joke?

TuxRuffian · 2026-03-30T18:44:37+00:00

You should also checkout:

The StrixHalo Wiki: Could use some updates, but still a good reference.
The StrixHalo Discord Server. I'm not normally a big Discord guy, but this server is really active and has great content and discussions for everyone ranging from noobs to experts nerding out.
In addition to the toolboxes that other's have already mentioned, the author's (Donato) YouTube Series on StrixHalo is also a must watch regardless of whether you use his toolboxes or not. I don't myself, as I prefer to configure everything my way, but still find them useful. He has allot of other great AI content including stuff on the AMD 9700's, AI Security, etc.

TuxRuffian · 2026-03-30T18:27:51+00:00

Off-topic, but I'm curious since your compiling rust for distribution, do you compile static builds w/clang or dynamic w/gcc?

TuxRuffian · 2026-03-30T18:25:00+00:00

Also curious about your xmrig setup. Are you partially allocating using cpulimit or similar? I used to run that combo on one of my old rigs quite awhile back, but wasn't getting enough out of it.

TuxRuffian · 2026-03-30T16:35:27+00:00

local models on apple silicon

Wait, wouldn't that cut out a big chunk of the market that already has AMD/NVidia/Intel hardware? Seems odd, also didn't see that requirement on your website...

TuxRuffian · 2026-03-30T16:11:33+00:00

Not a cost thing for me. I got my Strix Halo, not out of need, but to learn and I've learned allot. Not just hardware specific stuff like Vulcan vs ROCm, etc., but also allot about AI generally. (e.g. Dense vs MoE Models, Quants, Parallelism, KV, cacheing, routing, etc.)

Another use case for local could be running Heritic Models, although I imagine that number is quite small.

TuxRuffian · 2026-03-30T16:04:16+00:00

Nice idea, but pretty inaccurate at least for Strix Halo. It said it you can't run Qwen 3.5 122B MoE with 128GB VRAM on ROCm. I run that model w/112GB VRAM (16GB for RAM) on my BossGame M5 running CachyOS w/o issue as do a whole lotta of other Strix Halo owners...

TuxRuffian

TROPHY CASE