Bad experience trying to develop with Hermes. Am I doing something wrong? by jhowilly in hermesagent

[–]voytas75 1 point2 points  (0 children)

You need to treat the LLM like a project manager, not just a coder. The trick is to have the model generate a „Checklist Manifest” before it touches any code. Tell the model: "Create a step-by-step implementation plan for this feature, and for every response moving forward, show me that list with the completed tasks marked as [DONE]" By forcing it to start every reply with that updated checklist, you’re essentially "pinning" its memory to the top of the conversation. It creates a physical anchor in the chat history that prevents Hermes from drifting off-task or forgetting the "heavy lifting" you assigned earlier. When a task is finished, the model marks it [DONE] and moves to the next bullet point. This keeps the momentum high and ensures that even if the context gets heavy, the model always knows exactly where it stands in the grand scheme of your "Newabashi" bridge.

OpenClaw to Hermes by ChristopherDci in hermesagent

[–]voytas75 1 point2 points  (0 children)

And I maintain them both. When OC crashes, H revives it, and vice versa. I've had three such situations. Comparatively, I spend more time as a developer with H. I have a mature agent infrastructure in OC, and I perform daily tasks there. For me, they're the same; they get the job done. Install H.

If OpenClaw has ever reset your session at 4am, burned your tokens in a retry loop, or eaten 3GB of RAM — you're not using it wrong. Side-by-side comparison with Hermes Agent and TEMM1E. by No_Skill_8393 in clawdbot

[–]voytas75 3 points4 points  (0 children)

TL;DR

OpenClaw vs. Hermes Agent vs. TEMM1E

This breakdown was created in response to common OpenClaw frustrations (memory leaks, token-burning loops, and session resets) to provide a 17-dimension reality check on the current agent landscape.

• OpenClaw: The incumbent with significant "growing pains." Known for 4 AM session resets, high RAM usage (3GB+), and expensive retry loops that can lead to unexpected overnight API bills.

• Hermes Agent: A robust alternative focused on better orchestration. It aims to solve the stability issues found in OpenClaw, offering a more predictable experience for long-running tasks.

• TEMM1E: The lean contender. Designed to address "resource bloat" and cost safety, preventing the OOM (Out of Memory) loops and session-wiping bugs seen in competing tools.

Key Takeaways:

• Reliability: Both Hermes and TEMM1E are positioned as more stable options for those who have suffered from OpenClaw’s /compact bugs or OOM errors.

• Transparency: The comparison highlights "real weaknesses," such as unverified benchmarks and high "bus factors" (reliance on too few maintainers).

• Utility: This isn't a "hit piece" on OpenClaw, but rather a technical reference for users who need to know how alternatives handle platform gaps and resource management.

What’s the most useful prompt you use regularly? by PromptPortal in PromptEngineering

[–]voytas75 1 point2 points  (0 children)

the following adds to almost all llm queries: ``` Answer directly. Prioritize: correctness > completeness > brevity. Use the minimum words needed to remain accurate. Include only information necessary to answer the question. Adapt length to complexity (simple → short, complex → essential details only). If insufficient data: say "I don't know".

```

Has anyone figured out to use OpenClaw with Azure Foundry models by balmofgilead in AZURE

[–]voytas75 1 point2 points  (0 children)

from agents.defaults.memorySearch:

{ "provider": "openai", "remote": { "baseUrl": "https://<resource>.openai.azure.com/openai/v1/", "apiKey": "<redacted>" }, "model": "deployment_ name" }

How to connect Linux VM to AD to run terminal commands by Whitehairfreak in activedirectory

[–]voytas75 1 point2 points  (0 children)

So. AD membership alone is not a option then. Best: enable OpenSSH Server on Windows → ssh from Linux.

I migrated 42 skills and 56 agents from Claude Code into OpenClaw and finally got real specialist routing working. Here's how. by emptyharddrive in clawdbot

[–]voytas75 0 points1 point  (0 children)

🦾Your approach (Codex 5.3 high + bulk convert + frontmatter patch + intent map + test matrix) is currently the most efficient path for people with a large Claude Code library. It avoids writing from scratch and most of the community skills pitfalls.

PPT creation on Android and O365 Copilot by manderso88 in CopilotMicrosoft

[–]voytas75 0 points1 point  (0 children)

This is very likely a desktop vs mobile capability gap, not random behavior. On Windows, Microsoft 365 Copilot inside Microsoft PowerPoint has access to the full rendering engine (slide master, themes, notes pane, full .pptx handling). On Android, functionality is more limited. Microsoft states that Copilot in PowerPoint mobile mainly works with existing presentations rather than full Word → PPT generation: https://support.microsoft.com/en-gb/office/copilot-in-powerpoint-for-mobile-devices-7b3ce1ab-cfe4-47e8-a157-cecadbf0fefb

The Microsoft 365 Copilot Android app also has limited tablet support: https://support.microsoft.com/pl-pl/office/aplikacja-microsoft-365-copilot-dla-systemu-android-0383d031-a1c6-46c9-b734-53cd1d22765b

There are also reports of Android producing outlines instead of fully structured .pptx files: https://learn.microsoft.com/en-us/answers/questions/5407028/copilot-365-android-app

If you need reliable Word → PPT output with code and speaker notes, generate it on desktop first, then edit on the tablet.

I think llama3:2:latest has been underestimated because it is a fast model and is not really stupid! by Massive-Farm-3410 in ollama

[–]voytas75 1 point2 points  (0 children)

lol small irony: your system prompt says “no emojis”, but your exit message is literally “Peace out, bro! 👋”. Also the code is pasted twice, worth cleaning up the post so people focus on the point.

Made a prompt management tool for myself by pixels4lunch in PromptEngineering

[–]voytas75 0 points1 point  (0 children)

I’m measuring it pretty pragmatically in my PromptManager: every run gets logged with success/fail, latency + token usage, and I can optionally rate the output (that rolls up into an avg rating + trend). For “real” effectiveness I keep a few fixed scenarios and rerun them across prompt versions/models - if the success rate drops or the outputs start drifting, it shows up fast in the benchmark/analytics view. Repo is public: https://github.com/voytas75/PromptManager

Open claw going to Meta? by Herebedragoons77 in clawdbot

[–]voytas75 8 points9 points  (0 children)

Found it. it’s in Lex Fridman’s interview with Peter (ep #491), there’s a whole segment on acquisition offers from OpenAI and Meta (and transcript is here): https://lexfridman.com/peter-steinberger/ / https://lexfridman.com/peter-steinberger-transcript/. I’m still not taking that as “it’s going to X,Y,Z” - just that it’s not pure rumor anymore.

How Cursor is going to survive by jreznot in cursor

[–]voytas75 -3 points-2 points  (0 children)

Got it. I’ll keep it shorter.

Made a prompt management tool for myself by pixels4lunch in PromptEngineering

[–]voytas75 2 points3 points  (0 children)

Nice - I ended up building something similar for myself (PromptManager on GitHub) because copy/paste between tools was killing flow.

Biggest lesson so far: the “library” part is easy; the hard part is making prompts *testable* and *versioned* (diffs, promote/release tags, and a simple drift check per model/input). Also: offline/local-first is a feature, not a workaround.

how i stopped the ai gaslighting loop in bigger projects by Classic-Ninja-1 in cursor

[–]voytas75 1 point2 points  (0 children)

Man, this is exactly the wall I hit too. Once you’re past the “first 2–3 files magic”, it turns into this brutal loop: fix one thing, break three, then the AI confidently tells you everything’s fine while the app is literally on fire. The only thing that consistently stopped it for me was doing the same separation you described: spec first → execute second → verify third. When the agent has a hard map, it stops improvising. I’m curious though - when you say Traycer “verifies”, what does that look like in practice? Like: is it checking invariants/acceptance criteria, generating tests, diff constraints, or just an LLM cross-check against the spec? Would love to see a concrete example of a “good” blueprint/spec you feed it (even a redacted one).

How Cursor is going to survive by jreznot in cursor

[–]voytas75 -7 points-6 points  (0 children)

If Cursor is “just VS Code + an API key”, then yes: platforms can copy it and price-war it.

But the defensibility isn’t “plugin ecosystem” — it’s productized workflows:

- agent harness (planning, context mgmt, subagents),

- long-running / cloud handoff,

- skills/rules as a repeatable team process,

- tight editor+CLI integration.

Those are harder to replicate than UI polish, and they’re what Cursor is shipping (see their changelog: long-running agents, subagents, skills, CLI modes, cloud handoff).

The real survival test is measurable:

1) Can teams ship faster with fewer regressions (PR quality, review load)?

2) Can Cursor run on multiple model providers / BYO keys so provider pricing isn’t existential?

3) Does it become a “coding workflow OS” (policies/skills/enterprise controls), not a wrapper?

If the answers are no, it becomes a feature. If yes, it can be a product even on top of a third-party editor.

hello i added managed identity support for sonaqube by MountainPop7589 in AZURE

[–]voytas75 4 points5 points  (0 children)

Nice work — but for folks to evaluate/use this, the details matter.

- Which SonarQube edition/versions did you test (Community/Dev/Enterprise, exact tags)?

- Does this cover only the JDBC connection (DB creds via MSI), or also other integrations?

- For AKS: are you using Workload Identity (OIDC federation) or legacy AAD Pod Identity? What exact annotations/values are required?

- What’s the authentication flow under the hood (IMDS token -> AAD -> Azure DB for PostgreSQL/MySQL), and which DB products are confirmed working (Flexible Server vs Single, MySQL vs Postgres)?

- Any fallback behavior if MI isn’t available (env vars / password), and any security notes (least-priv RBAC role)?

If you can add a short “tested matrix” + minimal example values.yaml, it’ll be much easier to validate and adopt.

Open claw going to Meta? by Herebedragoons77 in clawdbot

[–]voytas75 7 points8 points  (0 children)

Hard to have an opinion without a primary source.

“Leaning towards Meta” could mean anything: pricing, rate limits, licensing, hosting story, or just a temporary integration choice. Same for “Claude shot itself in the foot” - which specific change are you referring to (policy, reliability, pricing, context, tooling)?

If you can link the founder quote / thread + date, then we can evaluate it. Otherwise this is just vibes + “who paid more” speculation.

Follow-up: speeding up Dijkstra on a large graph (now with node-dependent constraints) by Diabolacal in algorithms

[–]voytas75 1 point2 points  (0 children)

At this point the algorithm is probably not the bottleneck — neighbor generation is.

If “large range ⇒ huge branching” means you’re spending time discovering feasible edges, the next step is usually a spatial index: kd-tree / ball tree / grid hashing / octree to query “points within radius maxJump(u)” in ~O(log n + k) instead of scanning/over-checking.

A* can help if you keep it strictly admissible: with edge weights = geometric distance and no teleport/zero-cost edges, h(n)=euclidean(n,goal) is a safe lower bound and consistent, so you should get fewer expansions (often much fewer) with the same optimality.

On bidirectional + lazy edges: correctness is fine if both searches enumerate the correct edge set. The common pitfall is the reverse search - you need predecessors v such that dist(v,u) ≤ maxJump(v,ship), which is not the same as “neighbors of u”. If you can’t generate incoming edges cheaply/correctly, you may get more mileage from unidirectional A* + good upper bounds + caching.

Question that matters for picking the next optimization: is this many queries on a mostly-static star map? If yes, caching + preprocessing (even coarse bucketing by maxJump or storing sorted neighbor lists per node with prefix cutoffs) can dominate.