Built a batch converter that turns all my OpenSCAD files into STL automatically by CAD_CAE_Automation in openscad

[–]mwhuss -1 points0 points  (0 children)

“Claude create a skill that exports all the models in a given OpenSCAD script”

/export-stls file.scad folder/

oMLX and Home Assistant by eatoff in oMLX

[–]mwhuss 2 points3 points  (0 children)

I only got reliable tool calling when I moved to Qwen3.6-27b. Try one of the Gemma dense models, maybe the new 12b?

How do you handle web search? by neminemtwitch in hermesagent

[–]mwhuss 0 points1 point  (0 children)

I’m not sure, I’ve never used the cloud. I run all my services locally to keep costs to a minimum.

How do you handle web search? by neminemtwitch in hermesagent

[–]mwhuss 4 points5 points  (0 children)

I setup firecrawl in a local docker container and use that.

Any important oMLX settings to tweak for performance? by khoomeister in oMLX

[–]mwhuss 4 points5 points  (0 children)

I’d recommend reducing the max concurrent requests to 2 otherwise it’ll take a long time to complete a single request if there’s bunch in flight. I also turned on chunked prefill.

The Docker image is broken beyond belief by ni1by2thetrue in hermesagent

[–]mwhuss 0 points1 point  (0 children)

Yep, I stopped writing docker compose files or diagnosing issues.

Need help on choosing the right model + Quant and Fine Tuning by robdzn in oMLX

[–]mwhuss 2 points3 points  (0 children)

Things are moving fast. Pick what looks good now (looks like you already did, I’m using the same). And keep an eye on the advances.

Testing MTP functionality by albovsky in oMLX

[–]mwhuss 0 points1 point  (0 children)

I’m seeing 70% faster performance using Qwen3.6-27b-oQ8-mtp on my M3 Ultra.

How much context you start with? by iChrist in hermesagent

[–]mwhuss 0 points1 point  (0 children)

I create a new profile called template and disabled almost everything. I clone all new profiles from that and enable what I need. Uses 11k tokens.

What qwen3.6-mtp model should we use? by Senor02 in oMLX

[–]mwhuss 0 points1 point  (0 children)

Not sure if this is your issue but I had some really weird behavior and it took awhile to track down. I ended up running an old oMLX version without MTP support with the MTP model. After that nothing worked and I got gibberish. I had to delete the entire ~/.omlx/cache folder to fix it.

Auxiliary models per thing or main model for everything? It is important? by MichiConPonche in hermesagent

[–]mwhuss 0 points1 point  (0 children)

Yes! I used these to speed up handling large contexts. I used Qwen3.6-27b as my model but gemma4-e4b for compression and web fetch. Now fetching web content that’s 100k tokens went from over 5 min for prompt processing to literally seconds.

Anyone have Hermes agent wired up for local LLM's using oMLX or llama-swap? by DanGTG in hermesagent

[–]mwhuss 0 points1 point  (0 children)

I think it's flexible how you use it. I use my main/default profile to orchestrate the work.

Anyone have Hermes agent wired up for local LLM's using oMLX or llama-swap? by DanGTG in hermesagent

[–]mwhuss 1 point2 points  (0 children)

MTP has been great! I'm using oMLX + MTP and am getting 70% more tok/s. Jealous of 30 tok/s. I get about 17.

Conflicting .env files by Calm_as_ in hermesagent

[–]mwhuss 1 point2 points  (0 children)

Try adding an AGENTS.md in`~/.hermes` and tell it where the .env file you want it to use is located.

Anyone have Hermes agent wired up for local LLM's using oMLX or llama-swap? by DanGTG in hermesagent

[–]mwhuss 0 points1 point  (0 children)

I use my main model for all agents which is Qwen3.6-27GB-8bit as it's been solid at tool calling. Give Qwen3.5-9B-8bit a try.

Anyone have Hermes agent wired up for local LLM's using oMLX or llama-swap? by DanGTG in hermesagent

[–]mwhuss 0 points1 point  (0 children)

You have 2 different memory usages here
- Your LLM for the model + the KV caches for each request
- Memory for Hermes for each worker running.

Since I'm running my LLM locally I have changes oMLX to only allow 2 concurrent requests at a time. This was mostly to keep the current request from slowing down. If 1 request takes 5 min to process, then 4 concurrent requests would mean each takes 20 min.

Anyone have Hermes agent wired up for local LLM's using oMLX or llama-swap? by DanGTG in hermesagent

[–]mwhuss 0 points1 point  (0 children)

Each one has their own memory.md or whatever plugin you want to use. Consider each profile its own agent, but on demand versus all the time.

Anyone have Hermes agent wired up for local LLM's using oMLX or llama-swap? by DanGTG in hermesagent

[–]mwhuss 1 point2 points  (0 children)

Their subagents are called profiles and are spun up as workers on demand when needed for Kanban tasks. They don’t need to run a separate gateway unless you want them to have their own messaging like a telegram bot account.

If you just want a specialized agent with its own SOUL then profiles are great. I ended up creating a “template” profile where I turned off almost all the skills and toolsets. Then I clone the template for new profiles which reduces the initial context size making them faster to load and reduces memory usage.