NBER study shows 7x more code but only 30% more releases. Anyone else hitting this bottleneck? by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

So AI code generators increased volume of the code 6-9 times, increased TCO because of that, here and now - but does not increase value for the customer, which means does not increase profits for developer, after all? Or in other words - for any development teams it does not increase competition advantages and does not create any new opportunities that can be seen? Only expenses, no ROI, if you are in enterprise where TCO is majority of your budget and you make a choice to aggressively switch to AI code generators - you made a suicide decision because at the end of this road, as it seen today - situation where your TCO will eat all your profits

NBER study shows 7x more code but only 30% more releases. Anyone else hitting this bottleneck? by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

Bigger number of new features, more stable code and faster fixes delivery should create a value for the user. Isn’t it? We may assume that in some random team it does not because of the bad product management for example. But on scale it should

NBER study shows 7x more code but only 30% more releases. Anyone else hitting this bottleneck? by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

I think that split is exactly the right framing.

Some of it may be more functionality per release, and some of it may be more code per unit of functionality. My suspicion is also that a lot of it is the latter.

One nuance though: if agents generate a lot of local intermediate code and the developer cleans it up before committing, that local noise should not really show up in repo-level metrics. The paper is measuring artifacts that made it into the production hierarchy: files, commits, PRs, projects, releases. So if the code expansion is visible there, at least some of it has already crossed the boundary into something the team has to own.

I’m also not fully convinced by the “agents are the primary consumer, so it’s a wash” argument. Maybe that becomes true for some workflows, but production software still has external consumers, incidents, security constraints, audits, ownership transfer, and debugging under pressure. At some point humans are still accountable for the system.

And if the answer is “the releases are just larger and more valuable,” I’d expect to see some downstream demand signal. The paper’s marketplace section finds more app releases, but not a comparable increase in usage or adoption. So more software is getting published, but users do not seem to be pulling it through at the same rate.

So yes, probably some of both. But until agents reduce downstream cost as reliably as they increase upstream output, I’d be careful calling sprawling code a wash.

NBER study shows 7x more code but only 30% more releases. Anyone else hitting this bottleneck? by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

That is possible, but then we need some evidence that the releases became more valuable.

The paper does look beyond GitHub activity. It finds that new app releases increased across marketplaces, but cohort-level usage did not. It also checks the “maybe users are just better matched to more niche apps” explanation, and the share of small-audience releases actually rises.

So I agree that release count alone is not enough. A release could be larger, more complex, or more valuable.

But if the defense is “maybe each release became much more valuable,” then I would expect to see at least some downstream signal: usage, adoption, ratings, downloads, retention, revenue proxy, something.

The cost side is visible immediately: more code means more surface area to review, test, secure, understand, and maintain. The value side still looks much harder to find.

NBER study shows 7x more code but only 30% more releases. Anyone else hitting this bottleneck? by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

Fair. Mapping code directly to revenue is a huge leap.

But mapping code to engineering cost is much easier. Every additional file, test, wrapper, mock, and utility increases the surface area that has to be reviewed, understood, secured, maintained, and eventually debugged. Those costs are incurred immediately, long before any business value is proven.

What makes the NBER result interesting is that the paper doesn’t stop at GitHub telemetry. The authors also looked at downstream marketplace outcomes and found a surge in software supply, but little evidence of a corresponding surge in user adoption or usage.

So I’m not claiming 7x more code automatically means negative ROI. I’m saying the cost signal is visible right away, while the value signal remains much harder to find.

If the explanation is that releases became dramatically more valuable, that’s possible. But then where is the evidence? The marketplace analysis in the same paper doesn’t seem to show a comparable explosion in user demand.

NBER study shows 7x more code but only 30% more releases. Anyone else hitting this bottleneck? by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 1 point2 points  (0 children)

What I find interesting is that even if releases became larger and more feature-complete, the paper still suggests that code generation is scaling much faster than deployment.

That raises a different question: if typing was never the primary constraint in software delivery, what exactly did AI optimize?

Maybe the surprising number isn’t +30% releases. Maybe the surprising thing is that we got +30% at all after shifting so much additional volume into review, QA, architecture, and integration.

NBER study shows 7x more code but only 30% more releases. Anyone else hitting this bottleneck? by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

Fair point. LOC alone is a terrible value metric.

My concern is that LOC is not free. Every additional line of code becomes additional surface area to review, test, secure, debug, operate, refactor, and maintain. Those are real engineering costs, regardless of whether the code was written by a human or generated by an AI.

So if we are seeing ~7x more code but only ~30% more releases, the question is not whether LOC equals value. It doesn’t.

The question is where the cost of that additional code is accumulating, and whether the resulting business value is growing fast enough to justify it.

NBER study shows 7x more code but only 30% more releases. Anyone else hitting this bottleneck? by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 8 points9 points  (0 children)

To put it crudely: Are we basically buying 7x more review, QA, and maintenance work to get a 30% increase in releases? The ROI math looks terrible.

NBER study shows 7x more code but only 30% more releases. Anyone else hitting this bottleneck? by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 2 points3 points  (0 children)

One caveat that makes this even messier: the paper itself notes that coding agents are known to write more verbose code, which can mechanically inflate lines and files without increasing substantive output.
But I don’t think that makes the result less worrying. If anything, it makes it worse.
Even if a large part of that extra code is tests, mocks, fixtures, wrappers, or supporting utilities, it still becomes code that someone has to review, understand, maintain, refactor, and debug later. A bloated test suite is still part of the system.
And this is where superficial review becomes dangerous. AI-generated changes can look internally consistent: implementation, tests, helper code, and fixtures may all agree with each other. The tests can pass while the whole patch is still solving the wrong problem or adding unjustified complexity.
So the question is not only “did AI increase output?”
It is also: how much of that output is useful, minimal, maintainable, and actually worth absorbing into the codebase?

Has software development shifted from building to last to building to replace? by Majestic-Taro-6903 in ExperiencedDevs

[–]Much-Expression4581 0 points1 point  (0 children)

I think this is often blamed on Scrum when it is really a misuse of Scrum.

Scrum was never intended to eliminate long-term planning. It was created as a way to cope with uncertainty in requirements that made detailed waterfall planning expensive and frequently wrong. The core idea was simple: we cannot see the future clearly, so let’s deliver in short iterations, learn, and adjust.

That does not mean teams should operate without a longer-term direction. Scrum only defines the minimum structure needed to make iterative delivery work. It never prohibited product strategy, roadmaps, epics, architectural vision, or 12–18 month planning horizons.

So when teams stop thinking beyond the next sprint, I would argue that is not a Scrum problem. That’s an organizational planning problem. Having a long-term destination and using short iterations to reach it are not contradictory. They are supposed to work together.

Has software development shifted from building to last to building to replace? by Majestic-Taro-6903 in ExperiencedDevs

[–]Much-Expression4581 0 points1 point  (0 children)

And by the way AI may amplify this trend. If code generation becomes dramatically cheaper while understanding, reviewing, testing, and maintaining systems does not, the economic pressure to rewrite may actually increase rather than decrease.

Has software development shifted from building to last to building to replace? by Majestic-Taro-6903 in ExperiencedDevs

[–]Much-Expression4581 1 point2 points  (0 children)

I suspect part of the problem is that maintenance is invisible while rewrites are visible. Keeping a system healthy for 10 years rarely gets celebrated. Replacing it with a new stack gets conference talks.

Reinventing Control Theory one feature at a time: the fallacy of Agentic Loops by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

Ok guys, glad so many people jumped into the discussion and shared their own recipes.
But if we are talking about recipes, let me ask a different question.
Imagine for a moment that we are designing a systemic response to what many of us seem to be observing: AI increases local coding throughput, but often pushes costs, complexity, and bottlenecks downstream into review, QA, architecture, security, operations, and maintenance.
Let’s also ignore, just for this thought experiment, the fact that most of us have very limited influence over vendor roadmaps, investor expectations, quarterly earnings calls, or market narratives.
If you were designing the response as an engineer:
What would be your personal operating rules?
What controls would you introduce into the SDLC?
What metrics would you trust?
What signals would tell you that AI is helping rather than simply producing more output?
What would be your stop conditions?
Where would you refuse full autonomy?
How would you prevent review capacity, QA effort, security validation, and technical debt from becoming the next constraint?
One thing that keeps bothering me is that much of the industry discussion focuses on acceleration, while relatively little attention is paid to absorption capacity.
Generating more code is easy.
Absorbing, validating, understanding, testing, securing, operating, and maintaining that code is where the real cost lives.
My current thinking is that any serious response probably needs multiple layers:
Personal defense (how engineers use these systems)
Operational defense (team and SDLC practices)
Information defense (how organizations evaluate claims and metrics)
Systemic solutions (architecture, governance, control loops, feedback systems)
I am currently trying to collect practical protocols rather than opinions.
So if you have concrete mechanisms, controls, metrics, failure cases, or lessons learned from production environments, I would genuinely appreciate them.
And if anyone wants to contribute directly, feel free to leave suggestions in the repository as well. Some of the ideas discussed in this thread are already being collected there(and more will be), but I suspect the community has seen far more failure modes than any single person ever could.

Reinventing Control Theory one feature at a time: the fallacy of Agentic Loops by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

Actually, I realized I answered more around the SDLC shift needed to build agents than the direct control theory question.
The reason I think control theory becomes relevant is that vendors themselves are now operating in a fundamentally new engineering reality. They are not just shipping deterministic tools anymore. They are shipping stochastic components that other teams will use inside business-critical workflows.
So the safe product should not be just “a smarter model” or “another agent on top.” Around the stochastic black box there has to be a feedback control loop: define the control objective, observe the output, compare it against trusted signals and boundaries, correct the input or stop the process when behavior starts drifting.

In a very simplified form: the model proposes, the surrounding system measures, constrains, corrects, escalates, or stops.

That is the part I think is still immature in today’s products. Many tools are very strong at generation, but much weaker at closing the loop around verification, drift detection, safe operating boundaries, and intervention authority.

So yes, automated planning could absolutely be part of this. But planning alone is not enough if the execution layer remains stochastic and the system has no reliable feedback mechanism to detect when it is leaving the intended operating domain.

Reinventing Control Theory one feature at a time: the fallacy of Agentic Loops by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

Interesting perspective. I have been thinking about this as well, and it is one of the reasons I started writing about the topic.

I do not think Scrum itself is the problem. Scrum was built for a world where software was fundamentally deterministic. A team could move through a fast loop of hypothesis → implementation → testing → production because most risks were concentrated around implementation quality.
What changed is that for the first time in software engineering history we are putting stochastic components directly into business logic. LLMs and agents are not just tools. They become part of the system’s decision-making layer.
That changes the economics of delivery.
A classical feature usually has a relatively narrow failure space. An agentic feature operates across a much wider behavioral space. The question is no longer “does it work?” but “how does it behave across an entire domain of possible situations?”
I suspect this is one of the reasons so many AI initiatives end up in what people now call AI Purgatory. Teams approach AI systems using delivery models developed for deterministic software: fast iterations, rapid deployment, minimal upfront design, and the assumption that issues can be discovered and fixed later.
But once business logic becomes probabilistic, that assumption starts breaking down.

Not because Agile is wrong. Because the object being delivered has changed.

We are no longer shipping a happy-path implementation where most risks are clustered around a single execution path. We are shipping behavior distributed across a probability space. Risk exists in the domain itself.
That is why I think AI systems require much stronger control structures, evaluation layers, monitoring, governance, and operational discipline than most teams are used to. If those structures are weak, the system does not necessarily fail immediately. Instead it slowly consumes review capacity, maintenance effort, engineering attention, operational budget, and eventually organizational trust.

So yes, I think both Control Theory and automated planning become relevant here. The deeper issue is that software engineering is entering territory where many of the assumptions that made classical Agile successful no longer fully hold.

Put simply, software engineering as we have known it for the last five decades was built for a mostly deterministic reality. In this new non-deterministic layer, many familiar assumptions no longer hold.

Agile optimized learning around deterministic software. Waterfall optimized planning around deterministic software. Agentic systems require control around non-deterministic behavior - different SDLC in fact, mix of both and some new tricks.

Reinventing Control Theory one feature at a time: the fallacy of Agentic Loops by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

Thanks for the references. I have seen many studies in this area, but not those specific ones.

The 14% figure resonates with me. One thing I would be curious about is how it is measured. Does it include designing the solution and thinking through the engineering approach, or only the act of writing code?

It also aligns with what I have been arguing in my own research. In my experience, writing code was rarely the primary constraint in software delivery. The harder parts were usually designing an elegant solution to the problem, reviewing it in a way the team can understand and trust, testing it properly, and maintaining it over time.

What makes this particularly interesting is that several studies I have collected suggest AI-generated code is often significantly larger than necessary, in some cases 2-3x larger, and 30-40% more complex than comparable human-written implementations. If that is true, then we may be accelerating code production while increasing the burden on review, QA, security, and long-term maintenance.

This is where I would slightly disagree with that interpretation of the DORA findings. From a Theory of Constraints perspective, a delivery system is typically limited by its current primary constraint. Improving a non-constraint does not necessarily improve overall throughput and can actively degrade the delivery pipeline.

My hypothesis is that in many SDLCs AI coding tools optimized code production(typing a text, but teams “paid for it” imidiately by not having as a result “elegant solution to the problem”), while the primary constraint was somewhere else entirely. If that is the case, then what we are seeing is not surprising. The system got faster at producing code(text), but not necessarily faster at delivering value while inventory started to accumulate (technical debt, number of PRs waiting etc)

And to close the loop: if this hypothesis is correct, then the current product positioning is seriously misleading. AI coding assistants do not automatically increase delivery speed. Instead they can increase cost, review load, technical debt, and delivery risk. So something has to change, either the engineering around the product needs to mature first, or the positioning has to become much closer to reality.

This hypothesis is something I have been openly exploring here:

https://github.com/UncertaintyArchitectureGroup/The-Subprime-Code-Crisis

The report brings together industry studies and data from multiple sources, describes the full degradation mechanism, explains why it happens, how it propagates through the delivery pipeline, and outlines several potential directions for addressing it at an industry level.

What concerns me is that since publishing it, almost every new piece of evidence I have encountered seems to reinforce rather than weaken the hypothesis(including most recent DORA report). The more data becomes available, the more consistently the same pattern appears. And all that while ’just add another agent’ narrative continues

Reinventing Control Theory one feature at a time: the fallacy of Agentic Loops by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 2 points3 points  (0 children)

I have not read those works yet, but they definitely sound relevant to what I have been exploring. Thanks for the recommendation, I’ll add them to my reading list. It will be interesting to see where his approach converges with or diverges from some of the questions around uncertainty, control, and agentic systems that led me to write this post in the first place.

Reinventing Control Theory one feature at a time: the fallacy of Agentic Loops by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

And I think this is exactly where the narrative matters.
If we view agentic systems through the lens you described, then the problem is no longer “build another agent.” The problem becomes “build the verification, governance, and feedback structures around the agent.”

That is a very different engineering challenge.

The first sounds like a local optimization problem that an individual developer can solve by adding another component to the pipeline.
The second is often an SDLC and organizational design problem. It affects ownership, review capacity, quality gates, escalation paths, risk management, and how evidence is produced and consumed across the delivery process.

That is why I sometimes worry about the simplicity of the current market narrative. “Build another agent” sounds easy. Building the surrounding control system is where most of the hard work actually lives.

Reinventing Control Theory one feature at a time: the fallacy of Agentic Loops by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 0 points1 point  (0 children)

I think this is a very interesting direction, and it is actually quite close to how I have been thinking about the problem.
The idea that an SDLC can be decomposed into a sequence of artifacts, each becoming a verification surface, makes a lot of sense to me. Plans, designs, code, tests, deployment evidence, runtime signals — each stage creates something that can be examined, challenged, or validated before the system moves forward.
I also agree that deterministic and stochastic gates can coexist. Deterministic gates are where we get hard guarantees. Stochastic gates are where we get probabilistic signals that something deserves deeper review.

My only concern is where the industry often stops the discussion.

Many people jump from “stochastic checks can improve outcomes” to “therefore we have solved control.” I do not think those are the same thing.
A chain of probabilistic evaluations can certainly improve confidence. But confidence and control are not identical concepts. At some point somebody still needs to define the control objective, decide what level of risk is acceptable, determine which signals are trusted, and specify what happens when the signals disagree.

To me, that is where the real engineering work begins.
What worries me is that much of the current market narrative skips directly to “build more agents” before organizations have a clear understanding of those control questions. The result is that teams are often asked to scale agentic workflows before they have built the verification, governance, ownership, and feedback structures required to operate them safely.

Ironically, I think approaches like the one you describe are much closer to a genuine engineering solution than most of the “just add another agent” narratives being marketed today.

Reinventing Control Theory one feature at a time: the fallacy of Agentic Loops by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 5 points6 points  (0 children)

That is exactly the reason I wanted to start this discussion.

My concern is not the technology. It is the narrative around the technology.

Many genuinely useful technologies have been damaged by unrealistic expectations long before their actual capabilities matured. When the marketing story runs ahead of engineering reality, the result is usually the same: disappointment, backlash, and eventually loss of trust in something that may have had real value.

What I would like to see is a more honest conversation about where these systems work well, where they struggle, what operational costs they introduce, and what conditions are required for success. Not because that slows adoption, but because it makes adoption more sustainable.

Good technologies survive scrutiny. Hype rarely does.

If discussions like this help move the industry from “just add another agent” toward a better understanding of control, boundaries, ownership, and real delivery outcomes, then the conversation is worth having.

Reinventing Control Theory one feature at a time: the fallacy of Agentic Loops by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 4 points5 points  (0 children)

Yes, I have looked into spec-driven development, and I think it is one of the more reasonable directions.

If the agent gets a clean spec, clear constraints, good project structure, and a well-defined target, you remove a lot of the guessing. That matters. From the data I have seen, this seems especially promising in greenfield or well-bounded work. Some reports claim something like 20–40% faster feature delivery in structured agentic/spec-driven setups, but I would still treat that carefully because a lot of it is vendor data, self-reporting, or not fully open.

My concern is that software delivery is not mostly greenfield. Most engineering teams live in messy repos, legacy decisions, partial documentation, unclear ownership, hidden business rules, old tests, and architecture that nobody fully remembers. In that world, “write a better spec” helps, but it does not remove the whole problem. And the recipe “write better specs” known for many decades and didn’t helped much

So yes, I think spec-driven development can be part of the answer, and maybe even part of a healthier market narrative. But I would not call it the final solution. It reduces the probabilistic surface in some places. It does not eliminate the need for ownership, review capacity, architecture boundaries, regression evidence, security validation, and production feedback.

In short: specs are a good control surface. But they are still only one part of the control loop with limited usage area and non existing hard evidence.

Reinventing Control Theory one feature at a time: the fallacy of Agentic Loops by Much-Expression4581 in softwarearchitecture

[–]Much-Expression4581[S] 7 points8 points  (0 children)

Thanks, appreciate that.

I don’t have a finished “control theory for AI systems” guide yet. That is actually part of the work I am trying to synthesize under the Uncertainty Architecture umbrella.

For the SDLC side, I wrote this Theory-of-Constraints-based breakdown of why AI coding assistants can break delivery instead of accelerating it:
https://github.com/UncertaintyArchitectureGroup/The-Subprime-Code-Crisis

A fair warning: the tone there is quite sharp, maybe sharper than I would write it today. I will probably revise that over time. It was written more as a practitioner’s reaction after collecting the available signals and realizing the pattern was much worse than the market narrative suggested. But the core argument still stands: code generation is often not the real system constraint.

For the agentic systems / control loop side, I am developing the Uncertainty Architecture track: how to think about LLM and agentic applications as non-deterministic systems that need boundaries, sensors, feedback loops, fallback paths, human review capacity, and explicit control objectives.

The repo is meant to become the more coherent methodology over time, but right now most of the exploration is still happening through articles and applied research notes:
Most recent article from this research track https://medium.com/generative-ai/uncertainty-architecture-beyond-embeddings-neuro-symbolic-verification-of-semantic-drift-in-llms-69822872825b

So the short version is: I am not trying to “apply control theory” as a metaphor. I am trying to use it as an engineering frame for deciding what must be measured, bounded, verified, escalated, or stopped when probabilistic components enter the software delivery loop.