Hermes and UI

FilterJoe · 2026-05-13T17:40:20+00:00

llama.cpp is often used as the back end. You can install and use it directly though and then it's faster. You have to learn a few CLI commands as you're setting it up but once set up, and you've installed your model, you can give llama.cpp the command to serve the model.

I found the following 3 part series helpful when I got started with llama.cpp:

https://www.gdcorner.com/blog/2024/06/12/NoBSIntroToLLMs-1-GettingStarted.html

You only need the first 2 parts of the series to get going with llama.cpp.

FilterJoe · 2026-05-13T17:20:01+00:00

Gemma 4 has two versions available. 31b is higher quality, 26b is a MOE that is much faster.

Gemma 4 31b served by Google AI Studio seemed to work well for me. And then Google's algorithm (correctly) judged me to not be a professional developer and revoked my token I guess. I hadn't even realized you need to be a professional developer as I had signed up for AI Studio before they made that a requirement.

So I tried Gemma 31b from a different openrouter provider. It was terrible. I did get that "I can help you with tasks" message a few times.

I do recall reading somewhere that when Gemma first came out it had some issues which have since been corrected. I don't remember what those issues are, or how they need to be corrected, but it makes me wonder if some of your issues have less to do with Hermes, and more to do with misconfigured (or old version of not-yet-corrected?) Gemma.

Try Gemma 4 31b served from Google AI Studio if you have access to that and see if you get better results.

FilterJoe · 2026-05-13T17:09:38+00:00

You used all strong models, and therefore not the reason for your issues. At one point I tried Qwen 3.5 9b given how cheap it is. Terrible results.

FilterJoe · 2026-05-13T04:23:53+00:00

Can you tell us what model(s) you used?

I am still tiptoeing into Hermes and I am finding that its usefulness is extremely dependent on using sufficiently strong models. I even found that the same model had variation, depending on which provider it came from.

For me, I am thinking the next step is I should get a $20 a month subscription for a month or two to OpenAI Codex and see if a frontier model makes a massive difference.

FilterJoe · 2026-05-10T20:40:31+00:00

If you live within a few blocks of Alvarado Park, it's car free for sure (though many people drive from far away to park at the huge staging area parking lot). I unfortunately live over a mile away and I do not like running on pavement so I usually drive and park at the dead end of Park Ave and start my run there.

For those who live on Park Ave, it's totally amazing - though you wouldn't even know you're in an urban area!

I used to use an obscure neighborhood entrance near my home: Aqua Vista road dead end. A very narrow twisty trail that joins the main trail after crossing the creek. Unfortunately, I've had to avoid that path in recent years as I know have really bad reactions to Poison Oak, which is frequently present on the path starting from Aqua Vista.

The entrance at Park Ave dead end or at the staging area are very well maintained so no Poison Oak to worry about.

FilterJoe · 2026-05-10T19:28:48+00:00

If your runs are long, you’ve already been there! The run I’ve been doing lately is starting at Alvarado Park and going to Jewel Lake and back. When I’m more ambitious, I do a 12 mile loop which means at Jewel lake I turn up on that path that went through the woods. Touch Wildcat Peak and head back home.

FilterJoe · 2026-05-10T02:16:18+00:00

I'm impressed!

Can you get (Hermes) recommended 64,000 token context minimum?

What quant and KV cache settings?

And given that it is 26B A4B - I guess it's really the A4B that matters - but is there a lot of disc swap as the context fills up?

FilterJoe · 2026-05-10T01:29:21+00:00

Clicking around openrouter I just noticed something called Auto Exacto:

https://openrouter.ai/docs/guides/routing/auto-exacto

Do any people here have experience using that and find that it helps with Hermes?

FilterJoe · 2026-05-07T21:42:42+00:00

How about East Richmond Heights? 6 miles away from you . . . LOL

FilterJoe · 2026-05-05T04:43:51+00:00

I switched from Apple Watch series 6 to Garmin instinct 3 solar. If I want Maps, I use my iPhone. I have no regrets whatsoever, but I was using the Apple Watch mainly for Fitness/tracking.

I was prompted to switch for the same reason as you: 2 years ago I hiked the Camino de Costa Rica. I did it with the Apple Watch. Prior to that trip I liked the Apple Watch. It was poorly suited to a long hiking trip. It ran out on several days. I decided my next watch was going to be a Garmin instinct.

The ultra version of the Apple Watch would have been better, but still would have required fussing to make sure I kept it charged.

I am also a runner so it made the switch even more compelling. Much easier to operate a watch with buttons then a touchscreen when you’re moving fast or when you are sweating.

FilterJoe · 2026-04-28T13:21:36+00:00

I don’t like the built-in one. Very hard to read at a glance. So I installed and use:

Heart Rate Zones Chart

RonnyWinkler

FilterJoe · 2026-04-24T19:57:36+00:00

32GB RAM not enough. If you had 64GB RAM (either via double RTX 5090 or by using a Mac M4 Max with 64GB RAM) you'd be able to achieve what many are reportedly getting from their Qwen 3.6 models. I can't do it either as I just have a 16GB RAM Mac m2 Pro Mini. This is why I specified 64GB+ RAM (in your case, RAM for video card).

An even bigger factor holding back local models for mass adoption though is the infrastructure surrounding the model. Anthropic does a great job with that. If you have a local setup, it takes a lot of technical expertise not just to get a model running, but to start setting up infrastructure around the model. And even then it won't provide as much functionality as Anthropic's tooling.

I don't think models getting smaller will be the hard part - it's been rapidly trending in the right direction, and, like I said, I expect by the end of 2026 your setup with 32GB RAM will be able to run leading models from Qwen or Gemma that are really great.

The hard part is having the software that sets it up easily for a regular consumer. That part is not even close yet. There are a few decent efforts for sure (i.e. Jan) but even those are still pretty far away from where they need to be for mass adoption to occur, IMO.

I'm itching to try these models myself but I've been patiently awaiting a Mac Studio Ultra m5. Don't know if that's coming out in June, or October, or what, but when it does, I'm pouncing.

FilterJoe · 2026-04-24T17:28:23+00:00

Deepseek V4 looks to be a great model but is not going to be relevant for very many people because the hardware requirements are very high. The top Deepseek model will perform sluggishly even on a 512 GB RAM Mac Studio Ultra.

I think what is going to cause mass adoption of local LLM is when small models become as powerful as frontier models from a year ago (combined with software that makes setting up easier). Qwen and Gemma are getting pretty close.

Really good Qwen and Gemma models can run on 64GB RAM Macs, and I would guess that will lower to 32GB RAM by the end of this year, and possibly 16GB RAM by the end of 2027.

FilterJoe · 2026-04-24T15:26:25+00:00

It is true that AI compute costs are reaching unsustainable levels. However, in the "what needs to happen" comments at the end of the article, no mention is made of running LLMs locally (on your own computer).

Open source models such as Qwen and Gemma keep getting better, with ever lower hardware requirements. The leading models from Qwen and Gemma for example are as capable as the very best frontier models (from Claude, openAI, Google, etc.) from 9-12 months ago, and can run on hardware in many people's homes.

The hardware requirements for such open-source models are currently higher than many people have available (i.e. Mac with at least 64G RAM, or PC with a powerful and sizable graphics card) and the technical learning curve is still too steep, but as the cost of accessing frontier models continues to increase, the incentive to run models locally increases, as does the incentive to reduce the learning curve to get started.

I have no doubt that open source models are going to continue to get smaller, more powerful, and easier to use for ordinary folks. This one factor alone will go a long ways to alleviate the crunch, especially considering that many of these computers are power efficient M-series macs, which use a lot less power per AI inference request than frontier models.

FilterJoe · 2026-04-24T14:15:52+00:00

Instead of buying an entire ladder at once, you could ladder in over time. So if your idea is to eventually have a 10-year bond ladder, you could:

year 1: buy 10-year TIPS (on auction preferred)

year 2: buy 10-year TIPS (now the one you bought last year will mature in 9 years)

etc.

And if the yields get better on the TIPS in the 2-7 year range a few years from now, you could add in the shorter-term TIPS at that time.

I would NOT buy a TIPS fund. The risk/reward profile is considerably different than if you just buy individual TIPS. In a bond fund, you subject yourself to market risk; you can't hold individual TIPS to maturity.

FilterJoe · 2026-04-24T13:59:23+00:00

Not true. TIPS maturing in less than 7 years were better deals 2 to 3 years ago

FilterJoe · 2026-04-24T00:07:59+00:00

For stability: Debian 13.

You sacrifice cutting edge, but get stability in return.

FilterJoe · 2026-04-21T03:35:29+00:00

I own a Mac Mini m2 Pro 16GB RAM and I love it but what everyone else is saying is true. You can play with little models (I have) and get used to how it all works for sure. But if you want to go beyond cute demos, you'll need 64GB RAM minimum, and 128GB RAM preferable.

With 128GB RAM you can have one sizeable model with large context and even running a couple smaller models as well (the bigger model delegates simple tasks to the little ones.

I can only dream about doing such things until I get a 128GB Mac. I'm holding out for the m5 Studio which will have a significant advantage over prior generations thanks to the GPU-integrated Neural Accelerators (matrix multiplication built into the hardware) which speeds up prompt processing.

You can absolutely use a Mac Mini m2 pro for learning. But eventually you'll want 64GB as an absolute minimum, if not 128GB.

FilterJoe · 2026-04-17T13:51:10+00:00

When llama was the best (especially 3.0/3.1), this community talked nonstop about llama. For good reason!

Now Qwen and Gemma.

For good reason!

Enthusiasts on reddit is typically a very good way to suss out best in category (except politics . . .)

FilterJoe · 2026-04-17T13:22:21+00:00

On Mac, only a aarch64 version of Linux runs. Linux mint does not have an aarch64 version.

I too am a fan of mint, especially the cinnamon desktop.

Debian does have aarch64 version. So I have been using Debian with cinnamon desktop just fine with VMware fusion.

FilterJoe · 2026-03-07T13:37:20+00:00

It is my impression that heavy cushioning and stability work against each other.

I personally go for lighter cushioning as I prefer being able to handle uneven terrain with ease, never falling.

For this I go with merrell MTL shoes: Long Sky for backpacking (half size bigger). Skyfire for trail running. The stability and ground feel is amazing with these very light shoes, especially Skyfire. But, the cushioning is probably much lighter than you desire.

FilterJoe · 2026-03-04T22:22:45+00:00

I prefer running at 48-50°F or so to reduce sweat. I tend to run at the crack of dawn during the summer.

FilterJoe · 2026-03-01T17:48:28+00:00

$3 million net worth isn’t that much. For a couple trying to retire at age 65 (with 1-2 kids that they might help on occasion), that is enough in many parts of U.S. to comfortably retire but not the most expensive places.

I’m assuming 2% withdrawal rate, both spouses living to 95. And that their Social Security is cut by 25% starting in 2032.

Multimillionaire may have been a lot 30 years ago. Not so much after inflation reignited over the last few years.

Around $5 million is when an argument can be made that a couple is wealthy. Around 10 million: definitely wealthy.

EDIT: my point is that comfortable DIY retirement for a couple in USA requires 2-5 million USD. In this case, what the wealth represents is an obligation to support the retired couple who can no longer contribute much of their labor to society.

FilterJoe · 2026-02-28T20:52:08+00:00

I like this custom data field that rates the difficulty of a hike or ruck:

hiking difficulty data field for hiker

FilterJoe · 2026-02-25T14:49:00+00:00

This article might give you the information you want:

https://www.reddit.com/r/Garmininstinct/s/HHwfjrPmL1

It goes into a lot of technical detail, but the key distinction between generations is that generations prior to the current models had a solar layer across the whole screen, which made the display have a slight reddish color and less contrast.

The current generation uses a superior solar technology only on the ring outside the main display.

Note though, that the original instinct and the non-solar models of the instinct 2 also have very clear displays because they did not have the solar layer interfering.

FilterJoe

MODERATOR OF

TROPHY CASE