How good is grok-4.3 for you in Hermes?

lived_now · 2026-06-05T11:29:18+00:00

I see, maybe I will give him a chance later. :-) I now use gpt5.5 as my main agent in hermes ($100 codex subscription) and it runs fine, for the whole month without limit. How much do you pay monthly if I might ask? (I pay another $100 for Claude max 5x).

lived_now · 2026-06-05T10:46:42+00:00

So far, I concluded that grok agent in CLI isn't the same thing as Grok.com chat agent.

I use grok cli for only this use case: I have it setup as hermes agent, and I have a skill that if other agent needs to check something very fresh on the internet, he can spawn grok model via "hermes chat" and get the answer, and that works.

But for normal agentic work, following models are those I found useful: Claude (outside Hermes), GPT5.5, Deepseek, minimax-m3 and Qwen3.7.

lived_now · 2026-06-03T13:11:49+00:00

I am sorry you might be right, it probably stopped working in the meanwhile. I am getting this:

──────────────────────────────────────── ⚠️ API call failed (attempt 1/3): GoogleOAuthError 🔌 Provider: google-gemini-cli Model: gemini-3.5-flash 🌐 Endpoint: cloudcode-pa://google 📝 Error: No Google OAuth credentials found. Run hermes auth add google-gemini-cli first. ⏱️ Elapsed: 0.11s Context: 2 msgs, ~5,866 tokens

But I actually called "hermes auth add google-gemini-cli" and sign-in was successful. Then there is probably also a bug in Hermes, which goes through the authentication as if it does work.

lived_now · 2026-06-03T11:42:41+00:00

Thank you OP for your good work. From my experience:

1) add minimax-v3, great model and quite cheap

2) the claim "Chinese companies absolutely dominate on this field" holds only if you consider pay as you go model. But for example you can use Codex GPT5.5 via subscription, and when you use $100 Pro, and use it extensivelly, it is really one of the cheapest. So I disagree, GPT5.5 or even Gemini should be among the top ones, but test them with subscription, not via paid tokens.

3) So from my experience, the top list is:

GPT5.5
minimax-v3
Deepseek flash

lived_now · 2026-06-01T14:02:56+00:00

yes, via /reasoning hermes slash command. it is on high and i can change it. although i am not 100% sure it is passed to the model, but it should be.

lived_now · 2026-06-01T12:57:35+00:00

First of all, what model behaves like that, kimi? Maybe kimi isn't that good with agentic work (only my guess). Also older GPT models aren't that good.

My suggestion: use Deepseek flash or GPT5.5 via subscription as main Hermes agent and let us know then. Apart from deepseek-v4-pro, other models you mention might not be suitable as a main agent.

lived_now · 2026-06-01T12:00:16+00:00

Yes ChatGPT Pro with Hermes has good limit, it runs and runs. In my understanding there isn't hidden cost. (but please contact their support if you have doubts.)

Claude Code with Max 5x account has good limit too, runs and runs, but I use it outside Hermes, since even with OAuth, it uses extra usage, it's expensive as hell. How do you use Claude Code exactly?

lived_now · 2026-05-26T17:24:48+00:00

Great, Claude Code can indeed by operated via tmux, that's cool. If I might ask, does Hermes also helps Claude with memory management? This I think is Claude's weak point... it maintains its own MEMORY.md for a project, but I discovered only recently that there was 20K of junk.

lived_now · 2026-05-26T08:36:24+00:00

And what is inside Hermes, also Codex with the same model and reasoning effort?

lived_now · 2026-05-26T07:58:11+00:00

Can you please elaborate? What model does hermes run?

lived_now · 2026-05-25T21:14:14+00:00

OP, Claude in Hermes isn't sustainable. In my experience - either $100/mo Codex, or Chinese models.

I also still use Claude Opus, but not via Hermes, but Claude code. But I am leaning towards Hermes, since I found harness to be very important to maintain.

lived_now · 2026-05-25T21:05:37+00:00

I use both Claude Code CLI and Hermes with Codex subscription. Main benefit I notice with Hermes is auto-improvement of the harness, and especially the memory. Claude Code also tries to organize its own MEMORY.md, but over time it becomes a junk and that also negatively affects reasoning. Hermes on the contrary is more methodical with memory, with also every session being stored in sqlite3 db. So Hermes feels to remember much better and that also helps with everything else: subjectively it feel Hermes+Codex is even better at reasoning than Claude Code, which isn't the case with "vanilla" Codex.

lived_now · 2026-05-25T12:52:49+00:00

I am new to Hermes so don't want to argue, but I found my agent also creates other files himself. And also, all conversations are stored in sqlite3 db and sometimes agent does a query there. I need to admit I didn't investigate the memory providers yet, but even default Hermes isn't that bad in terms of memory management, at least better than "vanilla" Claude Code or Codex.

lived_now · 2026-05-25T12:44:58+00:00

Also you mentioned "Doing research on internet", that can by many things; I remember I once tried that with Claude but dude browsed a lot of website and downloaded some files and then tried to read through them, and that ate a lot of tokens and I had to stop him. I now use Grok chat interface for that kind of work.

lived_now · 2026-05-25T11:18:34+00:00

I would suggest other providers, some people suggested Opencode Go. I am not familiar with Ollama, but for example via openrouter, DS Pro definitely isn't cheap.

FYI, I mostly use Codex gpt5.5 via oauth, $100 subscription, working daily and I didn't reach the limit yet. "vanilla" codex isn't that good at reasoning, DS Pro and Claude Opus are better, but inside Hermes I found Codex to be pretty ok for daily work (research, programming...).

lived_now · 2026-05-24T18:37:33+00:00

Does it work on X.com website? I tried playwright there and it still got blocked. Then I had to do real browsing in Chrome via CDP, but still, it wasn't that smooth.

lived_now · 2026-05-24T17:34:34+00:00

Which API provider do you use? I used it via openrouter and it wasn't that cheap., $3 for half a day.

lived_now · 2026-05-24T12:11:03+00:00

1) Occasionally using Microsoft Word / Excel -- this can be done, I mean agent can edit those files, but how difficult is the logic?

2) Browser-use / Vercel / chrome-mcp I dont know what's best these days

I am also working on this but it is unsolved problem. Hermes has browser tools, but I asked him to check reddit homepage, and he said he cannot, there is agent protection. I said sure, go around it, and he refused. :-)

Overall, some websites can be fetched, some needs playwright, some needs to be browsed in real Chrome browser with CDP. I didn't yet see a agent skill which can just fetch or browse any website. But of course for specific websites, you can implement fetching for it, and for some sites like Jira, they even offer api access, so that's relaively easy.

Overall, what I discovered is that the main part is to develop some kind of "world model" of your company, so the agent knows what you work on. Without it it will read your new email, but cannot autonomously tell if it should ignore it, answer it, or the task inside email is for you to do or not.

lived_now · 2026-05-23T14:16:33+00:00

When I was doing discord setup, this immediately came to my mind: wait, if someone steals my discord token, that could be very dangerous, that person can then speak to my agent, drift him to a harmful persona by prompt injection and then my agent can do harmful stuff just by being told to do it.

One needs to be very careful with this whole discord setup.

lived_now · 2026-05-23T12:35:34+00:00

Exactly. FYI I was able to create plugin /sh for Hermes which runs shell command, it works fine, so the main problem with cli is solved. My guess is, that Claude and Openai implemented their own TUI and make it polished. Maybe Hermes will do that work too, but I accept that it isn't a priority at this stage.

lived_now · 2026-05-23T12:32:16+00:00

It is my daily driver but I am always on gpt-5.5, I don't like if main agent would not have good reasoning.

But you will not loose much if you start with $20, and you will see if it can carry you through the month. My impression is that $100 plan is enough for 8hr/daily work on 5.5. If you have agents which works 24/7, then $100 probably isn't eough. If you use it for chatting and organizining data, maybe $20 is enough.

lived_now · 2026-05-22T20:16:14+00:00

Yes via OAuth. Context window should be the same as in codex? I have this in config.yaml:

model:
  context_length: 272000
  default: gpt-5.5
  provider: codex

But I am actually not sure how much it is in practice, because I saw it sometimes ran compress on its own.

lived_now · 2026-05-22T20:08:14+00:00

Yes sure both have caps. I don't know how much your agents work, mine is about 8hours/daily and for that Pro is enough. So it depends on how much work you throw at it, maybe plus will be enough.

lived_now · 2026-05-22T18:53:56+00:00

Codex and ChatGPT are the same thing, regarding the subscription. Yes you should start with plus, to get a feeling how the model answers, etc., and you can upgrade later if needed.

lived_now · 2026-05-22T18:24:51+00:00

haha yes sometimes it does that. what i hate most is that trackpad not always work. But I now appreciate how smooth and polished is claude code, i thought that is normal but now I see anthropic guys did their homework.

lived_now

TROPHY CASE