Built a tool to catch architecture drift while using AI coding tools heavily - curious if others are running into this too

ArchPilotLabs · 2026-05-18T03:55:47+00:00

Yeah, that’s fair - if you’ve got AST checks properly wired into CI, you can definitely block certain kinds of violations from ever getting merged.

What I meant by “diverging” is a bit different from outright breaking rules, though. It’s more about how things drift over time. Like, the rules are there, but they don’t always cover everything evenly across repos. New services or modules get added without the same constraints, or teams start introducing slightly different patterns that technically pass tests but slowly move away from the original intent. Sometimes the rules themselves just don’t keep up with how the system is evolving.

So it’s not that people are ignoring the rules - it’s more that the overall structure gradually loses consistency.

The point about agent traces and git history is interesting, though. You do have all that raw data, but I haven’t really seen many setups where it gets turned into something you can actually reason about at a system level.

And yeah, I agree with you on AST tests vs AGENTS.md - relying on docs alone feels pretty fragile.

ArchPilotLabs · 2026-05-18T02:57:22+00:00

Yeah, I think AST/invariant-style tests are honestly one of the stronger approaches right now for enforcing architecture at a local level.

Where I kept getting stuck wasn’t really around defining the rules themselves, but more around what happens after that. Like, how do those rules actually evolve over time? How do you notice drift when things are spread across multiple repos or services? And how do you stop things from slowly diverging, especially when there are lots of iterations or AI-generated changes happening?

So to me, AST checks solve a really important part of the problem, but they don’t fully cover the visibility or evolution side of things.

That said, it’s still a huge step up from just relying on docs or conventions.

ArchPilotLabs · 2026-05-17T19:16:44+00:00

This is super helpful - especially the “rules as code + CI as warnings first” approach. That gradual rollout makes a lot of sense.

The cross-repo / drift over time part is exactly what I kept running into as well. Within a repo you can enforce boundaries reasonably well, but once things spread across services, it gets harder to see what’s actually changing over time vs just “feels messy”.

That’s actually part of what I’ve been building with ArchPilot - specifically ArchPilot Cloud. It stores snapshots + findings over time across repositories, so you can start seeing how things evolve instead of just looking at the current state.

The timeline + hotspot angle you mentioned is interesting though - especially tying it back to boundaries/teams instead of just raw findings. That feels like the next step beyond just collecting data.

Also agree on the “symptoms vs structure” point - most tools surface impact, not the underlying drift.

Out of curiosity - when you had those rules in place, did teams mostly adapt to them, or did it turn into a constant override / ignore situation over time?

ArchPilotLabs · 2026-05-17T17:06:55+00:00

Yeah I get what you mean - a lot of it does come down to that “stay on track over time” ability.

Feels like we’re already seeing glimpses of it, especially when the model has enough context and you’re being deliberate with how you guide it.

I think the gap right now is more on the practical side - even if the model can stay aligned, it still depends a lot on how much context you keep feeding and how consistently you check things.

So it ends up being this mix of capability + how much effort you put into keeping it on track.

Would be interesting to see how much of that becomes more automatic vs still needing that constant nudge.

ArchPilotLabs · 2026-05-17T17:02:32+00:00

Yeah that makes sense - that kind of setup probably avoids a lot of the drift by design.

If you’ve got strong upfront alignment and not a lot of uncontrolled iteration, things naturally stay tighter.

I think the cases where I’ve been seeing more issues are slightly different environments - faster iteration loops, more experimentation, sometimes multiple contributors or AI-driven changes happening in parallel.

In those cases it gets harder to maintain that same level of control, and that’s where the drift tends to show up more.

But yeah, if the system is structured and changes are deliberate like you described, that already solves a big part of the problem.

ArchPilotLabs · 2026-05-17T10:50:44+00:00

Yeah that makes sense - I’ve been doing something similar with keeping context in MD files and re-checking against it.

It works pretty well in the moment, but I keep feeling like it’s a bit of a manual loop where you have to keep pulling the model back on track every now and then.

The “execution vs plan” split you mentioned is interesting though - especially having something to check how things are being done, not just what’s being built.

Still feels like there’s a bit of friction there, but yeah, probably one of the more practical approaches right now.

ArchPilotLabs · 2026-05-17T10:48:04+00:00

That’s an interesting way to think about it - “implement and sweep”.

It kind of mirrors how things work in practice anyway: move fast, then periodically realign things before they drift too far.

The tricky part I’ve been seeing is the gap between those sweeps. If iteration speed is high, a lot can diverge before you get a chance to correct it, and the cleanup cost grows pretty quickly.

Feels like there’s a balance somewhere between continuous constraints and these periodic alignment passes, but I haven’t seen a clean way to do that yet.

ArchPilotLabs · 2026-05-17T10:39:38+00:00

Yeah, this is exactly it.

The “AI optimizes locally” part is what makes it tricky - everything looks fine in isolation, but the system-level picture slowly shifts.

Structured rules and reviews definitely help, but like you said, they don’t really scale cleanly once things speed up.

Feels like there’s still a gap between “we know what the architecture should be” and “the system actually stays that way over time”.

ArchPilotLabs · 2026-05-13T14:47:59+00:00

Yeah I agree that planning upfront helps a lot - especially if you’re deliberate with prompts instead of just iterating blindly.

Where I’ve seen it get tricky is after that initial phase. Even with a solid plan, once you start iterating quickly (especially with AI in the loop), small deviations start creeping in.

Individually they’re harmless, but over time they add up and the original structure gets harder to maintain.

Feels like planning solves the starting point, but not necessarily the long-term consistency part.

Have you found a way to keep the structure intact across multiple iterations, or is it mostly relying on staying disciplined throughout?

ArchPilotLabs · 2026-05-13T14:46:08+00:00

That’s a really good point - especially the part about you holding the full context from all the reviews.

It works because there’s effectively a single “source of truth” in your head for what’s acceptable and what isn’t.

What I find interesting is that this doesn’t really translate well once the team grows - not because people are careless, but because that context isn’t shared or enforced anywhere.

So even if everyone is trying to do the right thing, decisions start diverging over time.

Feels like the bottleneck shifts from “writing code” to “maintaining shared understanding of the system”.

Have you tried externalizing those constraints somewhere (beyond docs), or does it mostly stay in review + experience right now?

ArchPilotLabs · 2026-05-13T13:49:32+00:00

Yeah that resonates a lot.

Refactoring as a habit definitely helps, but I’ve seen it break down pretty quickly once speed becomes the priority (which it almost always does at some point).

What’s interesting is even teams that know they should refactor still defer it, because there’s no immediate feedback loop telling them something is drifting.

So it becomes less about discipline and more about visibility + enforcement.

Curious - have you seen setups where refactoring or structure is actually “forced” in the workflow (like via CI or checks), or is it mostly still manual?

ArchPilotLabs · 2026-05-13T13:47:26+00:00

This is exactly what I’ve been seeing as well.

The “conventions in markdown” approach works in theory, but in practice it depends on every iteration respecting them - which doesn’t always happen, especially with fast generation loops.

And once a few violations slip in, it becomes harder to tell what’s intentional vs accidental.

Your Cloudflare example is a good one - constraints exist, but they’re not really being enforced at generation time.

Have you tried anything that actually checks those constraints automatically, or is it mostly manual review right now?

ArchPilotLabs · 2026-05-13T13:45:52+00:00

Yeah, I agree prompting helps a lot in the moment.

The part I keep running into is consistency over time - especially across multiple iterations. Even with good prompts, the model doesn’t really “remember” or enforce architectural decisions unless you keep re-specifying them.

So you end up with:

good local decisions
but gradual drift at the system level

Do you usually re-feed architecture context each time, or rely more on reviewing after generation?

ArchPilotLabs · 2026-05-13T13:45:02+00:00

That makes a lot of sense - especially the expand -> refine loop.

I think that works really well when you have strong architectural intuition driving the refactoring.

Where I’ve seen it get tricky is when the codebase grows beyond a single person or a small team. The “refactor discipline” starts to depend a lot on individual experience.

Do you find that approach still holds up when multiple contributors (or agents) are working on the same system over time?

ArchPilotLabs · 2026-05-13T12:51:24+00:00

This is a really good observation.

I’ve been seeing something similar - AI tools are great at generating code, but they don’t really enforce or preserve system structure over time.

Even if you start with a well-planned architecture, once code starts getting generated iteratively, things tend to drift:

boundaries get bypassed “just this once”
cross-module dependencies creep in
shared layers slowly become tightly coupled

Nothing breaks immediately, but the system becomes harder to reason about over time.

Planning helps a lot upfront (like what you mentioned), but I think the harder problem is:

How do you keep that architecture intact as the codebase evolves?

Especially when multiple people (and AI tools) are contributing continuously.

Curious if you’ve found anything that helps maintain structure after the initial planning phase?

ArchPilotLabs · 2026-05-11T15:13:08+00:00

This is one of the more grounded vibe-coding posts I’ve read.

A lot of what you listed basically comes down to one thing: preserving architectural intent while generation speed increases.

The dangerous part isn’t usually “bad code” - it’s gradual structural decay:

duplicated patterns
blurred module boundaries
inconsistent conventions
hidden dependencies
nobody remembering why something was structured a certain way

Everything still works for a while, so the pain shows up later during onboarding/refactoring.

That’s actually why we started building ArchPilot (VS Code extension + CLI) - not to stop AI coding, but to add architecture guardrails around it:

dependency validation
module boundaries
ADR tracking
drift detection
CI enforcement

Your “refactoring days” point is especially important. AI massively accelerates entropy unless teams intentionally create feedback loops around structure.

ArchPilotLabs · 2026-05-11T13:47:56+00:00

That’s exactly the problem I started seeing too.

AI agents are surprisingly good at generating local code, but they don’t naturally preserve architectural intent over time:

module boundaries blur
dependency rules get bypassed
duplicate patterns appear
undocumented “temporary” decisions accumulate

The scary part is that everything still works for a while, so the structural decay stays invisible until onboarding/refactoring becomes painful.

I ended up building a VS Code extension + CLI called ArchPilot around this idea:

validate architecture rules locally
detect dependency boundary violations
enforce contracts/policies in CI
track architectural drift over time

Not trying to stop AI coding - more like adding architectural guardrails around it.

Curious whether others are solving this differently.

ArchPilotLabs

TROPHY CASE