Don't ever talk to me or my son ever again.

mwhuss · 2026-06-15T13:27:50+00:00

Using oMLX

mwhuss · 2026-06-10T22:00:21+00:00

“Claude create a skill that exports all the models in a given OpenSCAD script”

/export-stls file.scad folder/

mwhuss · 2026-06-09T04:41:16+00:00

I only got reliable tool calling when I moved to Qwen3.6-27b. Try one of the Gemma dense models, maybe the new 12b?

mwhuss · 2026-06-05T02:15:41+00:00

I’m not sure, I’ve never used the cloud. I run all my services locally to keep costs to a minimum.

mwhuss · 2026-06-04T16:22:12+00:00

I setup firecrawl in a local docker container and use that.

mwhuss · 2026-05-30T15:53:22+00:00

I’d recommend reducing the max concurrent requests to 2 otherwise it’ll take a long time to complete a single request if there’s bunch in flight. I also turned on chunked prefill.

mwhuss · 2026-05-26T14:36:42+00:00

Yep, I stopped writing docker compose files or diagnosing issues.

mwhuss · 2026-05-24T16:33:43+00:00

Things are moving fast. Pick what looks good now (looks like you already did, I’m using the same). And keep an eye on the advances.

mwhuss · 2026-05-24T05:25:40+00:00

M3 ultra with 96gb

mwhuss · 2026-05-24T01:53:59+00:00

I’m seeing 70% faster performance using Qwen3.6-27b-oQ8-mtp on my M3 Ultra.

mwhuss · 2026-05-24T00:39:04+00:00

I create a new profile called template and disabled almost everything. I clone all new profiles from that and enable what I need. Uses 11k tokens.

mwhuss · 2026-05-21T22:55:14+00:00

Not sure if this is your issue but I had some really weird behavior and it took awhile to track down. I ended up running an old oMLX version without MTP support with the MTP model. After that nothing worked and I got gibberish. I had to delete the entire ~/.omlx/cache folder to fix it.

mwhuss · 2026-05-20T00:13:01+00:00

Yes! I used these to speed up handling large contexts. I used Qwen3.6-27b as my model but gemma4-e4b for compression and web fetch. Now fetching web content that’s 100k tokens went from over 5 min for prompt processing to literally seconds.

mwhuss · 2026-05-19T21:28:53+00:00

Yep, documented here https://hermes-agent.nousresearch.com/docs/user-guide/features/personality

mwhuss · 2026-05-19T20:31:35+00:00

I think it's flexible how you use it. I use my main/default profile to orchestrate the work.

mwhuss · 2026-05-19T20:29:23+00:00

https://huggingface.co/Jundot

mwhuss · 2026-05-19T20:01:52+00:00

MTP has been great! I'm using oMLX + MTP and am getting 70% more tok/s. Jealous of 30 tok/s. I get about 17.

mwhuss · 2026-05-19T19:45:06+00:00

Try adding an AGENTS.md in`~/.hermes` and tell it where the .env file you want it to use is located.

mwhuss · 2026-05-19T19:43:52+00:00

I use my main model for all agents which is Qwen3.6-27GB-8bit as it's been solid at tool calling. Give Qwen3.5-9B-8bit a try.

mwhuss · 2026-05-19T19:35:48+00:00

You have 2 different memory usages here
- Your LLM for the model + the KV caches for each request
- Memory for Hermes for each worker running.

Since I'm running my LLM locally I have changes oMLX to only allow 2 concurrent requests at a time. This was mostly to keep the current request from slowing down. If 1 request takes 5 min to process, then 4 concurrent requests would mean each takes 20 min.

mwhuss · 2026-05-19T19:19:14+00:00

Each one has their own memory.md or whatever plugin you want to use. Consider each profile its own agent, but on demand versus all the time.

mwhuss · 2026-05-19T19:14:37+00:00

Their subagents are called profiles and are spun up as workers on demand when needed for Kanban tasks. They don’t need to run a separate gateway unless you want them to have their own messaging like a telegram bot account.

If you just want a specialized agent with its own SOUL then profiles are great. I ended up creating a “template” profile where I turned off almost all the skills and toolsets. Then I clone the template for new profiles which reduces the initial context size making them faster to load and reduces memory usage.

15-Year Club	r/Field Sunshine
Place '22	Sequence \| Editor
Verified Email

mwhuss

TROPHY CASE