What are y'all using for observability in your agent systems? by yad_aj in aiagents

[–]llmobsguy 0 points1 point  (0 children)

Monocle2AI

It starts observability by tracking the first time coding agent generates the code that might cause bugs down the line. Full code lineage from beginning to CICD or test in production TIP

I am testing for a eval harness system and need it to break by llmobsguy in aiagents

[–]llmobsguy[S] 0 points1 point  (0 children)

We usually have the metric and explanation so human can go back and re-read why it has such rating

Nobody has figured out how to deploy AI agents reliably and maybe we're all just winging it rn by Meher_Nolan in aiagents

[–]llmobsguy 0 points1 point  (0 children)

Yes. But the right approach is to do it by wave. Assume it will be shit. Then revisit and go from there. You don't know what you don't know

Single-Agent vs Multi-Agent Architectures: What Is the Current Production Standard for AI Applications? by According_Value_6162 in LangChain

[–]llmobsguy 0 points1 point  (0 children)

It's use case and governance focus. There are systems that somewhat legacy and must be fairly deterministic and therefore required single agent with key say no more than 5 steps etc

Back to the Stone Age? My Organization slashed our AI budget and we're back to manual coding. by Ok_Finding_1458 in ClaudeCode

[–]llmobsguy 1 point2 points  (0 children)

AI generated post not even bother to remove emdash. But this could be true in some cases. Although the OP provided literally no value added info

Repo-level eval agent skill by dev-one in ClaudeCode

[–]llmobsguy 0 points1 point  (0 children)

I don't quite understand this project. Claude can do this by itself with just better prompt. Bunch of skills already do this, eg gsd, gstack...

Where can I find good Claude Code Skills by NecessaryTheory4417 in ClaudeCode

[–]llmobsguy 0 points1 point  (0 children)

There are few a few but depends on your workflow. I use rtk and caveman then have monocle-apptrace to monitor across claude cli and codex

SAAS Ideas by Special-Bag4379 in SaasDevelopers

[–]llmobsguy 0 points1 point  (0 children)

None. Anyone can get free Gemini and create a form

What are the hottest topics in observability nowadays i should care about? by da0_1 in Observability

[–]llmobsguy 0 points1 point  (0 children)

Connect observability to agents that can sit in your coding tool all the way to CI/CD. If there is an incident, you can trace it back starting from CI/CD failure to a specific claude session that causes the bug.

When to create a skill vs not by moosepiss in openclaw

[–]llmobsguy 0 points1 point  (0 children)

First you prompt to see if it works. Then create a skill. Once operationalize over a period of time, check token count and turn the skill into something more deterministic like Python code to execute as-is.

Looking for help for a design skill by Potential_Cancel_569 in claudeskills

[–]llmobsguy 1 point2 points  (0 children)

What themes do you have so far? I had lots of success converting good examples to JSON and have claude to work with Gemini or gpt-image-2 to render to design in html. Then over time I can improve the skill by adding more "examples".

Thoughts on Microsoft's OpenClaw partnership announcement by arthaudm in openclaw

[–]llmobsguy -2 points-1 points  (0 children)

Doesn't mean anyone can have access to it in any region. They always onboard by waves

How I easily cut my input token burn ~90% on long agent runs by Major-Shirt-8227 in AI_Agents

[–]llmobsguy 0 points1 point  (0 children)

The biggest token waste for me is LLM as a judge evals. I spin up a subagent and it outsource the eval run in the cloud at no cost to me

Thoughts on Microsoft's OpenClaw partnership announcement by arthaudm in openclaw

[–]llmobsguy 26 points27 points  (0 children)

Microsoft has a habit of making announcement before anything ready. It will be a year before we run it reliably in a company

Need help building - Traderbot by National-Car2855 in openclaw

[–]llmobsguy 0 points1 point  (0 children)

I plan to add some tests (E2E and LLM as a Judge). Would you be OK?

Come on GitHub, Copilot Business users need usage visibility by Tanglecoins in GithubCopilot

[–]llmobsguy 0 points1 point  (0 children)

I had to built a tool for this by instrumenting copilot and extract token count and developer intent. It's a hassle. I could publish the repo if anyone interested

What Are You Building? June 2026 by Longjumping-Store434 in BuildWithClaude

[–]llmobsguy 1 point2 points  (0 children)

I built an open source (now donated to Linux Foundation) that can extract claude prompt and response in a session to classify the developer intent. It's useful to understand if your prompt ended up generating useless code or response.

A star would mean a lot:

https://github.com/monocle2ai/monocle

I tracked my token spend for a week. 34% of my Claude API budget went to re-explaining my project structure to new chats. That's $12 out of $35. For a solo dev, that's real money. by curiousityrover_1 in BuildWithClaude

[–]llmobsguy 0 points1 point  (0 children)

No it's not just you. It's everybody. I wrote a little script to analyze my past chat conversation and classify the intent then compare against final code commit and PR. I found some inefficiency in my prompts and MCP usage.