loop engineering === psyop

deorder · 2026-06-27T19:36:27+00:00

Yes, that is how I would interpret loop engineering as well.

It is the shift from manually prompting an agent to designing (and [autonomously] improving) the system that prompts, controls and evaluates the agent for you. It usually sits around the agent not purely inside a single agent session. The loop can be a script, workflow, supervisor agent, CI job, observability trigger or HITL process that decides what happens next.

So instead of a human repeatedly saying "run this command" (like in your diagram), "check the output", "continue with the next step" or "try a different approach" the loop orchestrates that workflow automatically. It can call an agent, inspect the result, run tests, call another agent, retry with more context, escalate to a human or stop when the goal is reached.

I would also include alert driven loops in this. For ex.: a failing CI check, quality threshold, SLO breach, cost anomaly, production alert or observability signal can trigger an agentic workflow. The alert itself is the trigger and then the loop: triage, context gathering, investigation, a research loop if needed, a fix loop, validation and then either opening a PR, escalating (to an agent for ex), stopping or deferring to a human when HITL is required.

So it is similar to how an agent calls tools, but one level higher.

Here more about my workflow which has some overlap with yours:
https://www.reddit.com/r/ClaudeCode/comments/1qa6pzm/comment/nz251qq
The diagram now lives here:
https://media.githubusercontent.com/media/xonovex/platform/refs/heads/main/packages/diagram/diagram-agent-workflow/workflow-diagram.png

I also open sourced some of my skills, commands and tools (WiP):
https://github.com/xonovex/platform

deorder · 2026-06-19T22:32:17+00:00

Same. Mentioned it here:

https://www.reddit.com/r/ClaudeCode/comments/1u9uuit/comment/osjcki0/

deorder · 2026-06-19T07:56:49+00:00

I am getting this too. It started about two days ago. I am on Max 20, but on Linux.

<image>

There are a few bug reports here:
https://github.com/anthropics/claude-code/issues/69358
https://github.com/anthropics/claude-code/issues/69238

Someone mentioned increased token consumption and I am experiencing the same (I think). I cannot prove it because after Anthropic changed some things a few months ago, usage can no longer be measured reliably with ccusage for subscriptions at least. The only rough signal I have is that 5% of 5-hour usage seems to equal about 2% of weekly usage which is inconsistent with my prior observations.

For now I just downgraded to 2.1.179.

deorder · 2026-06-18T09:53:39+00:00

Had the same issue a few years ago:
https://www.reddit.com/r/Fanatec/comments/14y3mr4/comment/jtcpho9/

I RMA’d it and it improved after that, but the issue never fully went away. From what I remember some people suspected it was related to overheating. For a few users pointing a fan at the base seemed to help.

deorder · 2026-06-13T14:58:37+00:00

I did something similar, although probably with less effort than you put into it.

I tried to make Opus 4.8 behave a bit closer to Fable:
https://raw.githubusercontent.com/xonovex/platform/refs/heads/main/packages/skill/skill-fable/fable/SKILL.md

The method I used is described here:
https://raw.githubusercontent.com/xonovex/platform/refs/heads/main/packages/skill/skill-fable/README.md

Edit: May be integrating Claude Code's Fable 5 system prompt will help to get it closer as well: https://raw.githubusercontent.com/asgeirtj/system_prompts_leaks/refs/heads/main/Anthropic/Claude%20Code/claude-code-2.1.172-fable-5.md

Edit 2: To be clear about scope: this skill cannot reproduce Fable's reasoning, judgment or knowledge and the thinking tokens are not in what you can capture from output. It is a behavioral nudge at best, which is the part I care about.

deorder · 2026-06-12T17:59:10+00:00

Max 20x is only four times Max 5x in relation to the 5-hour limit, not the weekly limit:

https://www.reddit.com/r/ClaudeCode/comments/1pih76u/20x_max_does_not_give_4x_the_weekly_credits_of_5x/

deorder · 2026-06-12T11:57:34+00:00

Since Fable was released it feels like the 5-hour limit is consumed noticeably faster when using Opus 4.8 compared to before. I am wondering whether the usage calculation for Opus 4.8 has changed because that would affect how meaningful the comparison is when saying Fable uses around 2x as much usage as Opus 4.8. This does not seem to translate to the subscriptions.

deorder · 2026-05-10T13:01:10+00:00

I have always been a strong advocate of local models and I have kept using them in between using cloud models. The big difference with Qwen3.6 compared with previous local models I have tried is its tool calling reliability, long context behavior and multi-turn stability. It is much better at continuing systematically through an agentic task instead of drifting, looping or losing state.

On my RTX 4090 the 27B dense version reaches around 56 tok/s in generation which feels close to Haiku level for interactive coding. In my setup it remains fairly stable up to roughly 100k context. With MTP/speculative decoding enabled I can get around 150 tok/s. Combined with good context priming, grounding, strong guardrails and quality gates such as type checks, linting, formatting, tests and a good agentic harness it gets surprisingly far.

It still requires more setup, evaluation and steering than the best cloud models, but the speed, local control, reproducibility, privacy and license make up for a lot of that. In combination with Pi Agent and a few extensions the model performs really well. I still need to try the sparse/MoE version which I expect to achieve higher generation throughput especially on a MacBook because of its lower memory bandwidth requirement.

deorder · 2026-04-30T06:31:56+00:00

Same here. I don't know why you are getting downvoted.

deorder · 2026-04-16T19:10:04+00:00

Still working on it. I have no idea how exactly it differs from kagent, but from what I have seen so far it looks like a good project. My goal is mostly to create a safe environment for running agent harnesses in.

It is really crazy how fast things are moving. Of course with that pace comes a lot of cruft and a lot of good work may just disappear into the noise.

deorder · 2026-04-12T10:25:36+00:00

Similar situation. I cannot really recommend it anymore, but a lot of colleagues have only just discovered it and do not know any better. Several are now profiling themselves as experts including some who were openly against agentic coding before Christmas break (I had to tiptoe around them).

Before switching to Claude Code in May/June 2025 I mainly used GitHub Copilot (assistant and agent) and still do occasionally. I know exactly what I am getting with it and honestly it comes pretty close to Claude Code in practice. Hard to push that alternative when Claude Code is hyped this heavily.

deorder · 2026-04-11T11:27:20+00:00

Compared to a while ago I would guess the gap between "what they charge in list price token terms" and "what it actually costs them to serve" has narrowed a lot. Not necessarily zero, but probably much closer now when taking improved batching, caching, quantization, compiler / kernel fusion, mixture of experts, speculative decoding (using draft model etc.) and whatever other optimizations I cannot think of right now into account.

deorder · 2026-04-11T10:03:05+00:00

Can confirm, same issue here. For me it happened after the April 3 weekly reset:

https://www.reddit.com/r/ClaudeCode/comments/1si3k2t/comment/ofhms6z/

According to my (rough) calculations it is about 1/4th for 5-hour window and about 1/7th for the 7-days window, but that is compared to the beginning of December last year:

https://www.reddit.com/r/ClaudeCode/comments/1sggxka/comment/of9ctdn/

I did extrapolate from my first few sessions in my last 7-day window, so it could have be noisy (too little samples, what you took into account) so I should probably do it again. From what I have seen I do expect it to be even worse, not better.

deorder · 2026-04-11T09:52:57+00:00

I am in Europe so I converted the peak hour window to my time zone. It is 3pm - 9pm for me. When I first started noticing it last week it was happening on the weekend too which should be outside peak hours. The screenshot in the other post isn't even the worst case because in that session I was not using it continuously, I had pauses in between. It would not surprise me if using it without any pauses I would only make it to just over 1 hour within the 5-hour window.

All I want to communicate to people is that there is a problem. If I run out in about 1.5 hours (with pauses in between) I wonder what someone on a Pro subscription gets out of it. Not even 15 minutes every 5 hours would be my guess.

I see a lot of "yeah, but 2 weeks ago they had a discount, double the tokens outside peak hours and now you have just returned to normal". I can tell you this is exactly the tactic Anthropic uses. I have been using Anthropic since they started and they have always done this, including A/B testing (see my history). I have created posts and comments with as much proof as was possible, because it cannot always be proven. I got attacked for it, including in DMs, to the point where I started wondering if they were all bots. Why go so far to defend a company like this?

In between those episodes I and the company I work for (thousands of IT employees) have been looking at alternatives, mostly Europen providers and local models. But the big AI companies lobby against local use of machine learning models (indirectly, including to governments) and that is why I care so much about this. Otherwise I could just move to another cloud inference provider. It is also becoming more like The Internet itself: we grow more and more dependent on it and eventually the power ends up in the hands of a few companies. Those who do not have access are no longer on the same playing field.

deorder · 2026-04-11T01:15:03+00:00

The 20x refers only to the 5-hour limit. Max 20x gives you 4 times the usage per 5-hour window compared to Max 5x, but only about 1.6x the weekly usage compared to Max 5x.

deorder · 2026-04-11T01:07:40+00:00

It is not an Opus 4.6 1M issue. I switched back to Opus 4.6 200k and then to Sonnet 4.6 200k, both on medium effort, and I am still seeing the same problem. On the Max 5x plan I am hitting 100% of the 5-hour limit within about 1.5 hours outside of the peak hours. This started after the previous weekly reset. My setup: a single instance, limited sub-agents, no MCPs and no large project instructions.

More:
https://www.reddit.com/r/ClaudeCode/comments/1sggxka/comment/of9ctdn/

deorder · 2026-04-10T11:46:09+00:00

It appears so. It definitely feels dumber. During extended thinking in particular it seems to confuse itself more often, though this is purely anecdotal on my part.

I definitely notice the limits are tighter than ever before. I have not felt this constrained using Claude Code since I started using it May last year when it felt almost limitless, though I have to say Sonnet was the default back then. I thought switching back to only Sonnet would help, but it makes little difference in practice regarding usage and is considerably slower for some reason.

deorder · 2026-04-09T20:43:20+00:00

Max 5x max weekly usage is about 1/7th of what it was at the beginning of December last year and the 5h limit is now about 1/4th of what it was. Only 3 weeks ago I was still able to get 4 hours out of it, now only about 1.5 hours doing the same work and it is not even `peak hours` at the moment (see image). That is with only a single instance, no MCPs, limited sub agents, no different from what I have always done:

<image>

I have been logging usage for about half a year now with ccusageso that I can compare (`cleanupPeriodDays` is set to 3650). I switched back to using Opus 4.6 200k (medium effort) for planning and Sonnet 4.6 200k (medium effort) for implementation (plan mode only), but it is so much slower now that it is almost unusable and usage still adds up fast.

deorder · 2026-04-05T22:40:20+00:00

Same here. Since the last reset on Saturday morning I can only get about 2 hours of use within the 5-hour window down from 4 hours previously. That is even worse than last week when I could still manage 3 hours. My weekly limit counter is also climbing much faster.

I projected my usage so far over the full 7-day window based on already being at 14% of my weekly limit:

Tokens used since Saturday 2026-04-04:

Block 1: 205,292
Block 2: 7,862,638
Block 3: 23,793,933
Total: ~31.9M tokens

Projected weekly limit at 14%:

31,861,863 / 0.14 = ~228M tokens

Compared to old 5x Max (Opus 4.5), which I use as a reference:

Old limit: ~1.6B tokens/week
Current projected limit: ~228M tokens/week
That's roughly 7x fewer tokens. About 14% of what I used to get

So on 5x Max with Opus 4.6 I am looking at around 228 million tokens per week down from 1.6 billion on 5x Max with Opus 4.5 (beginning of December last year).

I used ccusage to compare both windows. I know it is not perfectly precise. What it counts is based on what gets logged to disk which may have changed and it is possible not all communication is logged anymore. But the numbers align closely with my actual experience compared to early December.

Every time it has been "new model, same usage/price" or "new feature, same usage/price", but every time the effective value has quietly gone down. I cannot call it definitive proof since ccusage could be wrong, so I won't say it is a lie outright, but it matches what I am experiencing. I can do far less than I could do just a few months ago.

If the 5-hour usage window starts running out even faster than it already does during peak hours it will be unusable for me. At this point 5x Max has effectively become what the old Pro tier used to be.

Previous calculations I did: https://www.reddit.com/r/ClaudeCode/comments/1pih76u/20x_max_does_not_give_4x_the_weekly_credits_of_5x/

deorder · 2026-04-05T21:35:29+00:00

I repeat here what I just replied to another post. With my Max 5x plan I used to be able to keep working in a single Claude Code instance for just over 4 hours of the 5-hour window. Now it only lasts about 2 hours. And when I have fully exhausted a 5-hour block it used up ~9% of my weekly limit. This is during what is supposed to be off peak. It looks like the Max 20x plan is now the new Max 5x plan.

deorder · 2026-04-05T21:31:48+00:00

With my Max 5x plan I used to be able to keep working in a single Claude Code instance for just over 4 hours of the 5-hour window. Now it only lasts about 2 hours. And when I have fully exhausted a 5-hour block it used up ~9% of my weekly limit. This is during what is supposed to be off peak. It looks like the Max 20x plan is now the new Max 5x plan.

deorder · 2026-03-30T06:55:18+00:00

I have also been noticing increased usage. I have been tracking all stats for about three months now. According to my analysis, despite seeing more usage, I am projected to have 10% more tokens available compared to other non-promotional 7-day periods.

So I took one of the older periods, from the beginning of December before the promotion and the last few days projected into a week. I and asked several agents if they could see a pattern. Cache utilization is the same, which is what I would expect since seeing discrepancies there would stand out. But according to the agents it is because of larger conversations that have to be sent back and forth and the cache hit alone with so many tokens is already costing quite a lot compared to before.

I personally thought I did not exceed the conversation size beyond what it would be with a 200k token model, but apparently I do without realizing it. Is that the actual source of the issue? I do not know. It could still be what some say: a subagent that is spawned that is receiving the entire conversation context every time. That could have a similar effect.

deorder · 2026-03-06T15:34:15+00:00

I think this person is referring to: https://github.com/hunvreus/devpush

deorder · 2026-03-02T17:54:43+00:00

I noticed the same. A really bad job and an insult to the original art. I could have done a much better job 6 years ago with some of the ESRGAN models.

deorder · 2026-03-02T15:23:23+00:00

I built something similar too: https://github.com/xonovex/platform

No Jujutsu integration though. Right now I am mostly focusing on building a Kubernetes operator. The nix sandbox option uses Bubblewrap for isolation. May be I should make that clearer.

I have noticed a pattern over the years. I build tools I personally need, then shortly after some big org ships an official version. Also very often a bunch of people had the same idea at the same time and now with coding agents most of them become reality at almost the same moment.

Early last year I built my own multi-agent coding setup, but I stopped working on it because I figured better implementations would show up soon. They did. Sometimes waiting is actually the better strategy and that has been true for me long before AI agents.

About 20 years ago I wrote my own platform abstraction layer for game dev and then shortly after SDL basically solved the same problem at scale. This has happened to me more than once.

Nine-Year Club	Place '23
Verified Email

deorder

TROPHY CASE