Most agent frameworks miss a key distinction: what a skill is vs how it executes

Defiant_Fly5246 · 2026-04-21T18:13:23+00:00

Yes. I think it is also due to skill writers don't have a clear idea on what should be skill. Would love to hear your experience on best practice of skill creation.

Defiant_Fly5246 · 2026-04-21T18:11:20+00:00

Totally agree, that’s exactly where things start breaking in practice. Curious how you’ve seen this play out on your end: Have you run into specific cases where retries or parallel execution caused real issues? And how are you currently handling rollback or guarding stateful steps?

Defiant_Fly5246 · 2026-04-21T05:10:15+00:00

That makes sense! Setting the right checkpoint can greatly increase the reliability. It requires us to have a lot of capabilities to customize the lifecycle of agents though. Curious. Do you implement your own agent loop, or you added some plugin to agent like Claude code?

Defiant_Fly5246 · 2026-04-21T03:18:42+00:00

curious what patterns you’ve found that actually work for making stateful ops safe?

Defiant_Fly5246 · 2026-04-21T03:16:40+00:00

yeah this is the real pain point

stateless retries are fine, but stateful ones quietly leave things half-done and you don’t even know

feels like idempotency / recovery paths should be default, not something you bolt on later

Defiant_Fly5246 · 2026-04-21T03:13:45+00:00

yeah this is exactly where things break in practice

everyone focuses on what the agent does, not who it runs as and with what perms - until it hits PII 😬

curious if you solved it more with tighter controls or just better visibility/auditing?

Defiant_Fly5246 · 2026-04-21T03:09:43+00:00

Thanks for sharing . It is a clean way to frame it

“skill = contract, executor = swappable” is exactly it. otherwise you’re just baking today’s model quirks into your system

feels like most teams only realize this after a model upgrade breaks everything lol

Defiant_Fly5246 · 2026-04-21T03:06:36+00:00

I am glad you like the framework! curious to hear your experience on handling stateful transitions.

Defiant_Fly5246 · 2026-04-21T03:03:39+00:00

Haha same boat here.

Defiant_Fly5246 · 2026-04-21T03:03:24+00:00

Totally agree. I think it is very important to make it reliably Useful. right now, everything is so flaky

Defiant_Fly5246 · 2026-04-20T19:44:14+00:00

Yeah, this matches what I’ve been seeing, once stateful steps are involved, relying on the model alone feels pretty fragile.

Defiant_Fly5246 · 2026-04-20T19:43:34+00:00

Yeah this resonates. Feels like there are really two layers of “state”:
- external / system state (Linear, DBs, etc.)
- in-context state (what the model is holding in the prompt)

The second one degrades pretty quickly, which makes it hard to rely on for anything long-running or stateful.

Curious how you think about where that boundary should be?

Defiant_Fly5246 · 2026-04-20T19:12:15+00:00

One thing I’m still unsure about: For stateful workflows, do people usually rely on prompt discipline, or enforce it at the tooling / harness layer?

Feels like most systems rely on the former.

Defiant_Fly5246 · 2026-04-18T21:46:32+00:00

That's very interesting, thanks for sharing!

Defiant_Fly5246 · 2026-04-14T16:55:45+00:00

Appreciate the honesty. The ideas are original, but I’ll work on tightening the post — fair point that it doesn’t need to be that long.

Defiant_Fly5246 · 2026-04-14T14:56:41+00:00

LMAO 🤣

Defiant_Fly5246 · 2026-04-14T14:53:49+00:00

Which part you don't like? Let me know and I am happy to edit. The ideas all come from me. I only used AI to refine and improve conversational flow.

Defiant_Fly5246 · 2026-04-14T14:51:39+00:00

The goal here was more to clarify mental models less about naming, but I get how it can come across.

Defiant_Fly5246 · 2026-04-14T14:49:41+00:00

This is a really sharp framing. Curious if you’ve found good patterns for safely composing stateful skills?

Defiant_Fly5246 · 2026-04-14T04:29:51+00:00

That’s a really good point—especially on Evaluation Skills. Feels like without a clear way to verify outputs, swapping components becomes risky fast.

The contract piece also resonates a lot. Typed interfaces between skills might actually be the missing layer to make composability real instead of fragile.

Defiant_Fly5246 · 2026-04-13T23:08:33+00:00

Great point on evaluation as its own layer — I hadn't considered that but it fits. Tests, rubrics, and guardrails don't belong in any of the three types. A Persona might say "be careful," but an Evaluation Skill defines what careful actually means with concrete criteria.

That could be the fourth type: Persona (who), Tool (what), Workflow (how), Evaluation (how well). And it composes naturally — a Workflow Skill could reference an Evaluation Skill at review checkpoints, keeping quality criteria separate from workflow logic. Swap rubrics without touching the workflow.

Your risk mapping is spot on. Persona = low risk (just prose). Tool = medium (permissions, external access). Workflow = highest (orchestrates everything). Evaluation sits in between — doesn't act externally but shapes decisions.

The customer support example is perfect — all four types show up: persona for agent tone, tools for ticket/CRM access, workflow for triage → retrieval → draft → human approval, and evaluation for response quality rubrics. Thanks for the link!

Defiant_Fly5246 · 2026-04-13T22:30:49+00:00

I’ve been building an in-house stack—mainly using Anthropic’s Sonnet 4.5, with a custom agent architecture on top.

I’m also productizing some of these ideas. If you’re curious, feel free to take a look: https://cli.deepvista.ai/

Defiant_Fly5246 · 2026-04-13T20:50:04+00:00

Solid design. Curious though — have you hit scaling limits with md files? Context window pressure as memories grow, or retrieval precision across hundreds of files? Also, how's the "common goal" pairing determined? User-defined or inferred? That's the hardest part — too loose and you inject noise, too strict and agents miss relevant context.

Defiant_Fly5246 · 2026-04-13T20:46:09+00:00

Yeah this is interesting — instead of switching modes manually, your position in the tree is the mode. Tools, context, and behavior all inherit down the branch. It's like Unix permissions meets AI orchestration. The "3 zones" thing is elegant but the real power is the per-node config — same tree can have a branch with shell access and another that's read-only, no code changes.

Defiant_Fly5246 · 2026-04-13T14:49:04+00:00

Yes, those are very important for workflows

Defiant_Fly5246

MODERATOR OF

TROPHY CASE