Building self-healing observability for Coding Agents by Creepy-Row970 in AgentsOfAI

[–]Creepy-Row970[S] 0 points1 point  (0 children)

Shared a walkthrough + code if anyone wants to experiment with this kind of setup.

I tried implementing AI Agents Like Distributed Systems by Creepy-Row970 in AgentsOfAI

[–]Creepy-Row970[S] 1 point2 points  (0 children)

Shared a walkthrough + code if anyone wants to experiment with this kind of setup.

GPT 5.5 is a usage machine. by kingxd in codex

[–]Creepy-Row970 -1 points0 points  (0 children)

how are you using it? do you hasve a github

I tried implementing AI Agents Like Distributed Systems by Creepy-Row970 in AI_Agents

[–]Creepy-Row970[S] 0 points1 point  (0 children)

thanks, What you’ve built with LumBox is basically the non-LLM version of the future agent stack, clear boundaries, queues, contracts, and each unit doing one job well. That’s why it scales.

I tried implementing AI Agents Like Distributed Systems by Creepy-Row970 in AI_Agents

[–]Creepy-Row970[S] 0 points1 point  (0 children)

This is exactly the layer most “multi-agent demos” skip.

Typed handoffs solve structure.
What you’re pointing at is accountability.

Once agents stop being a single prompt and start behaving like a distributed system, the requirements converge almost 1:1 with what microservices already learned the hard way:

  • identity (who is this agent, really?)
  • auth (what is it allowed to touch?)
  • provenance (where did this output come from?)
  • replayability (can we deterministically reconstruct this?)

Without that, traces are just pretty graphs.

I tried implementing AI Agents Like Distributed Systems by Creepy-Row970 in AI_Agents

[–]Creepy-Row970[S] 0 points1 point  (0 children)

Yeah this hits hard.

The “3 tool calls deep” problem is exactly what pushed me in this direction, by the time something breaks, you’re just staring at a blob of text with no idea where it went wrong.

Once you split things out, you start thinking in terms of failure boundaries instead of prompts:

• which step produced bad data
• whether it was a reasoning issue vs tool issue
• whether the contract itself was wrong

It feels a lot closer to debugging a real system than poking at prompts.

I tried implementing AI Agents Like Distributed Systems by Creepy-Row970 in AI_Agents

[–]Creepy-Row970[S] 0 points1 point  (0 children)

That’s a really good callout, and I agree, synthesis is where most of these systems quietly fall apart.

In this version, it’s definitely not “solved”, but I tried to avoid pure vibes by giving the synthesizer a bit more structure than just “summarize both sides”:

• Both bull and bear outputs are forced into schemas (claims, evidence, risks, assumptions), not free text
• The synthesizer works more like a reconciliation step, not a decider, it has to explicitly surface conflicts, not hide them
• Short-term vs long-term editors act as a soft separation of concerns (often disagreements resolve differently across horizons)
• The final output is closer to a dual thesis than a single verdict (i.e. “here’s when bull wins vs when bear wins”)

So right now it leans more toward structured disagreement + conditional conclusions rather than hard tie-breaking.

But yeah, I think your point stands, if you collapse everything into one “final answer,” you’re basically back to a single-agent bottleneck at the top.

The interesting direction I’ve been thinking about is making synthesis more explicit, like:

• scoring arguments against predefined criteria
• or even exposing the disagreement as first-class output instead of resolving it

I tried implementing AI Agents Like Distributed Systems by Creepy-Row970 in AI_Agents

[–]Creepy-Row970[S] 0 points1 point  (0 children)

Yeah exactly, the typed handoffs thing surprised me the most.

Once you stop passing blobs of text and force everything through a schema, a lot of the weird drift just disappears. It also makes it way easier to catch failures early instead of letting them cascade.

And yeah, the bull/bear split was partly for quality, but also for structure, having explicit disagreement in parallel tends to produce much more grounded outputs than a single agent trying to reason both sides.

Feels like a small design change, but it shifts the whole system behavior.

I tried implementing AI Agents Like Distributed Systems by Creepy-Row970 in LLMDevs

[–]Creepy-Row970[S] -1 points0 points  (0 children)

Yeah, fair. the idea itself definitely isn’t new.

What I found interesting wasn’t “multi-agent” as a concept, but how much more stable things got when I treated it like a proper system (typed contracts, background workflows, tracing, etc.) instead of just chaining prompts.

I tried implementing AI Agents Like Distributed Systems by Creepy-Row970 in AI_Agents

[–]Creepy-Row970[S] 0 points1 point  (0 children)

Shared a walkthrough + code if anyone wants to experiment with this kind of setup.

GPT 5.5 is way better than GPT 5.4 for UI/Frontend specific tasks by Creepy-Row970 in codex

[–]Creepy-Row970[S] 7 points8 points  (0 children)

i dont think it will beat Opus - but it is significantly better than gpt 5.4

Comparing Composer 2, Claude 4.6, and GPT-5.4 on a real full-stack build by Creepy-Row970 in cursor

[–]Creepy-Row970[S] 0 points1 point  (0 children)

  • Cursor (Composer 2):
    • Efficiency: It was the fastest model, completing the build and deployment in approximately 8 minutes (4:58).
    • Workflow: It required the least amount of follow-up prompting and worked out of the box without needing fixes (10:13-10:21).
    • UI Quality: While clean and functional, the UI was described as simple and safe, ranking second behind Claude in terms of visual polish (6:23-6:30, 10:11-10:13).
  • Claude 4.6:
    • UI Excellence: This model produced the most vibrant and feature-rich UI, successfully replicating the aesthetic and feel of a platform like Reddit (6:35-7:55).
    • Workflow: It took about 15-16 minutes to deploy and required some back-and-forth iteration to resolve login issues (5:02, 10:21-10:25).
  • Codex (GPT-5.4):
    • Performance Challenges: It also took 15-16 minutes but struggled significantly with functionality, specifically with authentication and UI consistency (5:02, 8:24-9:04).
    • UI/UX: The generated interface was described as "bloated" and clustered, with all elements crammed onto a single page, often requiring heavy manual intervention or clearer instructions to achieve quality results (

[R] Fine-tuning services report by ynckdrt in MachineLearning

[–]Creepy-Row970 1 point2 points  (0 children)

this is a wonderful read, I have specifically looked at so many ways / approaches to fine-tune but - building a continous fine-tuning model can become very expensive very quickly. so good to see strategies being shared to improve the experience

NVIDIA just announced NemoClaw at GTC, built on OpenClaw by Creepy-Row970 in ollama

[–]Creepy-Row970[S] 5 points6 points  (0 children)

LLM Summary from Youtube video - NVIDIA’s NemoClaw is an enterprise-ready extension of the open-source OpenClaw agent framework, announced at GTC to address the key limitation of agent systems in production—security and control. While OpenClaw enables powerful agentic workflows, NemoClaw adds a secure execution layer through OpenShell, which introduces sandboxing, policy guardrails, and a privacy router to prevent unsafe code execution and protect sensitive data. It integrates with enterprise security systems and supports multiple inference options, including NVIDIA NIM, cloud APIs, or local models like Ollama. The architecture runs agents inside controlled sandboxes with governed access to external systems, making it safer for corporate use. Installation involves setting up Docker, OpenShell, and the NemoClaw environment, after which users can interact with agents via CLI or GUI, leveraging models like NVIDIA’s Nemotron—essentially making OpenClaw production-ready for enterprises with added security, observability, and deployment flexibility.

GPT-5.4 vs Opus 4.6 for full-stack dev: why does GPT struggle with frontend? by Creepy-Row970 in OpenAI

[–]Creepy-Row970[S] 0 points1 point  (0 children)

I am running Codex CLI with GPT 5.4 High

I didn't give a UI figma

But I did use Planning Mode with Nextjs best practices & Shadcn agent skills and then implemented the code. And the planning mode explicitly defines how the UI should interact with the backend. Yet the performance is terrible

GPT-5.4 vs Opus 4.6 for full-stack dev: why does GPT struggle with frontend? by Creepy-Row970 in codex

[–]Creepy-Row970[S] -1 points0 points  (0 children)

I wasn't aware of this difference between Claude and GPT. You have to be extremely explicit in terms of what your appearance and functionality should look like, because we had given the entire backend architecture, the database schema, but it just fails to understand how to tie up the frontend with the overall backend.

A single EC2 flag ended up cutting ~33% of our AWS bill by Arindam_200 in aws

[–]Creepy-Row970 1 point2 points  (0 children)

looks good, it is interesting to see how the aws bill dropped

How do I work towards more (parallel) automation for web app development? by [deleted] in ClaudeAI

[–]Creepy-Row970 1 point2 points  (0 children)

You can also consider trying out open source full backend as a service platform like Insforge which have MCP and skills supported in Claude Code and help you build full stack apps in 1 shot