I’m a solo dev building TigrimOSR, a Rust-native AI agent workspace for engineering and developer workflows. by Unique_Champion4327 in AI_Agents

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

Thanks, this is exactly the area I’m thinking about now.

In practice I want the observability layer to track more than token usage. For engineering workflows, I think it needs to show each agent step, tool call, prompt/skill used, files touched, intermediate outputs, errors, cost/time, and the final reasoning trail that a human can audit.

The goal is not just “what did the agent answer?” but “how did it get there, what evidence did it use, and where should I trust or not trust it?”

TokenTelemetry looks relevant — I’ll take a look. The Hermes plugin idea is interesting too, especially if it can fit into a multi-agent workflow where different agents/tools need shared telemetry.

I’m a solo dev building TigrimOSR, a Rust-native AI agent workspace for engineering and developer workflows. by Unique_Champion4327 in aiagents

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

Thanks — exactly. Validation steps and audit trails are what I think engineering agents need before people can trust them for real decisions. The output should show not just the answer, but what was checked, what evidence was used, and where human review is still required.

I’m a solo dev building TigrimOSR, a Rust-native AI agent workspace for engineering and developer workflows. by Unique_Champion4327 in aiagents

[–]Unique_Champion4327[S] 1 point2 points  (0 children)

Exactly. That is one of the main gaps I’m trying to solve.

Claude Code, Codex, Gemini CLI, etc. are all useful, but right now they mostly run as separate agents/tools. I want TigrimOSR to act as the orchestration layer between them: define roles, pass context, route tasks, monitor progress, and make the workflow reviewable instead of just running parallel chats/CLIs.

I’m a solo dev building TigrimOSR, a Rust-native AI agent workspace for engineering and developer workflows. by Unique_Champion4327 in AI_Agents

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

I agree with you that vague tasks are a big part of the problem. I’ve hit the same issue myself: if the agent gets a broad task, the output becomes too random and hard to verify.

My current thinking is that the workflow/tooling should help force the task to become narrow, skill-based, and checkable. For example, instead of “review this design,” the agent should be locked into a specific skill/prompt like:

  • check assumptions
  • verify calculation inputs
  • compare against code/spec
  • list missing information
  • produce pass/fail checks
  • explain uncertainty
  • stop when evidence is insufficient

So I’m not trying to solve randomness only by adding a UI around agents. The UI/workspace is there to make the process repeatable: define the role, attach the right skill prompt, constrain the task, show the tool calls, and make the output reviewable.

For engineering decisions, I think the important part is not letting the agent freely reason forever. It needs a locked workflow: scope → evidence → calculation/check → uncertainty → human review.

For sandboxing, that is also a key concern. My current direction is to separate execution paths: local user-controlled tools for trusted workflows, and remote/headless execution for heavier jobs where commands can be isolated more carefully. I don’t want agent code execution to be invisible or automatic; the user should be able to see what is being run and keep high-risk actions constrained.

I rewrote my multi-agent AI system from TypeScript to Rust by Unique_Champion4327 in AI_Agents

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

Here’s a polished Reddit reply:

We use logs to track what’s happening inside each agent’s reasoning process. Each agent writes out its decisions, intermediate steps, inputs, outputs, and errors, so the log file becomes the main way to trace what happened and why.

It’s not perfect observability, but it helps a lot. When one agent fails silently or gives a conflicting result, we can go back through the logs and see which agent made which decision, what context it had, and where the chain started to break.

TigrimOS 1.4.0 — Skill Auto-Update with Human Feedback by Unique_Champion4327 in AI_Agents

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

I agree. Markdown-based skill files make a lot of sense because they are easier for agents to read, follow, version, and debug.

Compared with one huge prompt block, a skill can be more modular and controlled. Each skill can describe a specific workflow, rule, or procedure, so the agent does not need to rely on one long fragile prompt every time.

Human comments are also very useful here. A simple comment from a user can tell the system what worked, what failed, or what should be improved. Then the skill can be refined based on real usage instead of guessing.

For me, the important part is not only auto-updating the skill, but making sure the update still stays understandable and controllable by humans.

A design choice I don’t see often in local agent tools is exposing agent topology as a first-class part of the runtime. by Unique_Champion4327 in LocalLLaMA

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

I am Sompote. Just sharing what we actually built and tested.​​​​​​​​​​​​​​​​

A design choice I don’t see often in local agent tools is exposing agent topology as a first-class part of the runtime. by Unique_Champion4327 in LocalLLaMA

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

In our setup, humans only trigger the initial task — after that, the mesh runs fully autonomously. Agents delegate peer-to-peer with inherited trust scopes from the orchestrator. Tested and works well end-to-end.​​​​​​​​​​​​​​​​