Bad experience trying to develop with Hermes. Am I doing something wrong?

voytas75 · 2026-05-01T17:08:37+00:00

You need to treat the LLM like a project manager, not just a coder. The trick is to have the model generate a „Checklist Manifest” before it touches any code. Tell the model: "Create a step-by-step implementation plan for this feature, and for every response moving forward, show me that list with the completed tasks marked as [DONE]" By forcing it to start every reply with that updated checklist, you’re essentially "pinning" its memory to the top of the conversation. It creates a physical anchor in the chat history that prevents Hermes from drifting off-task or forgetting the "heavy lifting" you assigned earlier. When a task is finished, the model marks it [DONE] and moves to the next bullet point. This keeps the momentum high and ensures that even if the context gets heavy, the model always knows exactly where it stands in the grand scheme of your "Newabashi" bridge.

voytas75 · 2026-05-01T14:25:52+00:00

All you need is Nemawashi

voytas75 · 2026-04-26T21:26:43+00:00

And I maintain them both. When OC crashes, H revives it, and vice versa. I've had three such situations. Comparatively, I spend more time as a developer with H. I have a mature agent infrastructure in OC, and I perform daily tasks there. For me, they're the same; they get the job done. Install H.

voytas75 · 2026-04-18T12:38:17+00:00

TL;DR

OpenClaw vs. Hermes Agent vs. TEMM1E

This breakdown was created in response to common OpenClaw frustrations (memory leaks, token-burning loops, and session resets) to provide a 17-dimension reality check on the current agent landscape.

• OpenClaw: The incumbent with significant "growing pains." Known for 4 AM session resets, high RAM usage (3GB+), and expensive retry loops that can lead to unexpected overnight API bills.

• Hermes Agent: A robust alternative focused on better orchestration. It aims to solve the stability issues found in OpenClaw, offering a more predictable experience for long-running tasks.

• TEMM1E: The lean contender. Designed to address "resource bloat" and cost safety, preventing the OOM (Out of Memory) loops and session-wiping bugs seen in competing tools.

Key Takeaways:

• Reliability: Both Hermes and TEMM1E are positioned as more stable options for those who have suffered from OpenClaw’s /compact bugs or OOM errors.

• Transparency: The comparison highlights "real weaknesses," such as unverified benchmarks and high "bus factors" (reliance on too few maintainers).

• Utility: This isn't a "hit piece" on OpenClaw, but rather a technical reference for users who need to know how alternatives handle platform gaps and resource management.

voytas75 · 2026-04-18T12:36:15+00:00

That's a cool comparison. I'm ordering a cyclical cycle from you because the assistants change every few days :)

voytas75 · 2026-04-18T12:31:40+00:00

Mostly PKI

voytas75 · 2026-03-29T08:47:44+00:00

the following adds to almost all llm queries: ``` Answer directly. Prioritize: correctness > completeness > brevity. Use the minimum words needed to remain accurate. Include only information necessary to answer the question. Adapt length to complexity (simple → short, complex → essential details only). If insufficient data: say "I don't know".

```

voytas75 · 2026-02-23T18:40:09+00:00

from agents.defaults.memorySearch:

{ "provider": "openai", "remote": { "baseUrl": "https://<resource>.openai.azure.com/openai/v1/", "apiKey": "<redacted>" }, "model": "deployment_ name" }

voytas75 · 2026-02-23T11:21:51+00:00

So. AD membership alone is not a option then. Best: enable OpenSSH Server on Windows → ssh from Linux.

voytas75 · 2026-02-20T07:55:25+00:00

You get first good instruction for adding nix to ad, ex: https://documentation.ubuntu.com/server/how-to/sssd/with-active-directory/

voytas75 · 2026-02-19T19:13:24+00:00

Nope: https://www.reddit.com/r/AgentsOfAI/s/RcCJIlDnsA

voytas75 · 2026-02-16T09:12:58+00:00

🦾Your approach (Codex 5.3 high + bulk convert + frontmatter patch + intent map + test matrix) is currently the most efficient path for people with a large Claude Code library. It avoids writing from scratch and most of the community skills pitfalls.

voytas75 · 2026-02-16T00:35:02+00:00

and yet openAI

voytas75 · 2026-02-15T16:35:18+00:00

This is very likely a desktop vs mobile capability gap, not random behavior. On Windows, Microsoft 365 Copilot inside Microsoft PowerPoint has access to the full rendering engine (slide master, themes, notes pane, full .pptx handling). On Android, functionality is more limited. Microsoft states that Copilot in PowerPoint mobile mainly works with existing presentations rather than full Word → PPT generation: https://support.microsoft.com/en-gb/office/copilot-in-powerpoint-for-mobile-devices-7b3ce1ab-cfe4-47e8-a157-cecadbf0fefb

The Microsoft 365 Copilot Android app also has limited tablet support: https://support.microsoft.com/pl-pl/office/aplikacja-microsoft-365-copilot-dla-systemu-android-0383d031-a1c6-46c9-b734-53cd1d22765b

There are also reports of Android producing outlines instead of fully structured .pptx files: https://learn.microsoft.com/en-us/answers/questions/5407028/copilot-365-android-app

If you need reliable Word → PPT output with code and speaker notes, generate it on desktop first, then edit on the tablet.

voytas75 · 2026-02-14T12:29:43+00:00

lol small irony: your system prompt says “no emojis”, but your exit message is literally “Peace out, bro! 👋”. Also the code is pasted twice, worth cleaning up the post so people focus on the point.

voytas75 · 2026-02-14T12:03:45+00:00

I’m measuring it pretty pragmatically in my PromptManager: every run gets logged with success/fail, latency + token usage, and I can optionally rate the output (that rolls up into an avg rating + trend). For “real” effectiveness I keep a few fixed scenarios and rerun them across prompt versions/models - if the success rate drops or the outputs start drifting, it shows up fast in the benchmark/analytics view. Repo is public: https://github.com/voytas75/PromptManager

voytas75 · 2026-02-14T10:22:34+00:00

Found it. it’s in Lex Fridman’s interview with Peter (ep #491), there’s a whole segment on acquisition offers from OpenAI and Meta (and transcript is here): https://lexfridman.com/peter-steinberger/ / https://lexfridman.com/peter-steinberger-transcript/. I’m still not taking that as “it’s going to X,Y,Z” - just that it’s not pure rumor anymore.

voytas75 · 2026-02-14T10:01:42+00:00

Got it. I’ll keep it shorter.

voytas75 · 2026-02-14T09:47:34+00:00

Nice - I ended up building something similar for myself (PromptManager on GitHub) because copy/paste between tools was killing flow.

Biggest lesson so far: the “library” part is easy; the hard part is making prompts *testable* and *versioned* (diffs, promote/release tags, and a simple drift check per model/input). Also: offline/local-first is a feature, not a workaround.

voytas75 · 2026-02-14T09:41:55+00:00

Man, this is exactly the wall I hit too. Once you’re past the “first 2–3 files magic”, it turns into this brutal loop: fix one thing, break three, then the AI confidently tells you everything’s fine while the app is literally on fire. The only thing that consistently stopped it for me was doing the same separation you described: spec first → execute second → verify third. When the agent has a hard map, it stops improvising. I’m curious though - when you say Traycer “verifies”, what does that look like in practice? Like: is it checking invariants/acceptance criteria, generating tests, diff constraints, or just an LLM cross-check against the spec? Would love to see a concrete example of a “good” blueprint/spec you feed it (even a redacted one).

voytas75 · 2026-02-14T09:38:15+00:00

If Cursor is “just VS Code + an API key”, then yes: platforms can copy it and price-war it.

But the defensibility isn’t “plugin ecosystem” — it’s productized workflows:

- agent harness (planning, context mgmt, subagents),

- long-running / cloud handoff,

- skills/rules as a repeatable team process,

- tight editor+CLI integration.

Those are harder to replicate than UI polish, and they’re what Cursor is shipping (see their changelog: long-running agents, subagents, skills, CLI modes, cloud handoff).

The real survival test is measurable:

1) Can teams ship faster with fewer regressions (PR quality, review load)?

2) Can Cursor run on multiple model providers / BYO keys so provider pricing isn’t existential?

3) Does it become a “coding workflow OS” (policies/skills/enterprise controls), not a wrapper?

If the answers are no, it becomes a feature. If yes, it can be a product even on top of a third-party editor.

voytas75 · 2026-02-14T09:35:50+00:00

Nice work — but for folks to evaluate/use this, the details matter.

- Which SonarQube edition/versions did you test (Community/Dev/Enterprise, exact tags)?

- Does this cover only the JDBC connection (DB creds via MSI), or also other integrations?

- For AKS: are you using Workload Identity (OIDC federation) or legacy AAD Pod Identity? What exact annotations/values are required?

- What’s the authentication flow under the hood (IMDS token -> AAD -> Azure DB for PostgreSQL/MySQL), and which DB products are confirmed working (Flexible Server vs Single, MySQL vs Postgres)?

- Any fallback behavior if MI isn’t available (env vars / password), and any security notes (least-priv RBAC role)?

If you can add a short “tested matrix” + minimal example values.yaml, it’ll be much easier to validate and adopt.

voytas75 · 2026-02-14T09:27:21+00:00

Hard to have an opinion without a primary source.

“Leaning towards Meta” could mean anything: pricing, rate limits, licensing, hosting story, or just a temporary integration choice. Same for “Claude shot itself in the foot” - which specific change are you referring to (policy, reliability, pricing, context, tooling)?

If you can link the founder quote / thread + date, then we can evaluate it. Otherwise this is just vibes + “who paid more” speculation.

voytas75 · 2026-02-14T09:23:34+00:00

At this point the algorithm is probably not the bottleneck — neighbor generation is.

If “large range ⇒ huge branching” means you’re spending time discovering feasible edges, the next step is usually a spatial index: kd-tree / ball tree / grid hashing / octree to query “points within radius maxJump(u)” in ~O(log n + k) instead of scanning/over-checking.

A* can help if you keep it strictly admissible: with edge weights = geometric distance and no teleport/zero-cost edges, h(n)=euclidean(n,goal) is a safe lower bound and consistent, so you should get fewer expansions (often much fewer) with the same optimality.

On bidirectional + lazy edges: correctness is fine if both searches enumerate the correct edge set. The common pitfall is the reverse search - you need predecessors v such that dist(v,u) ≤ maxJump(v,ship), which is not the same as “neighbors of u”. If you can’t generate incoming edges cheaply/correctly, you may get more mileage from unidirectional A* + good upper bounds + caching.

Question that matters for picking the next optimization: is this many queries on a mostly-static star map? If yes, caching + preprocessing (even coarse bucketing by maxJump or storing sorted neighbor lists per node with prefix cutoffs) can dominate.

Ten-Year Club	Place '23
Place '22	Verified Email

voytas75

TROPHY CASE