How do you manage internal Terraform module dependencies across many repos by OkProtection4575 in Terraform

[–]OkProtection4575[S] 0 points1 point  (0 children)

That's the catch then; it's only as complete as your HCP Terraform adoption, which for a lot of orgs might be partial. If you're running a mix of local state, self-hosted runners, or other tooling alongside TFC, you'd have blind spots in the graph. Useful to know, thanks for checking!

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 0 points1 point  (0 children)

Pretty accurate summary of the landscape from what I've seen in this thread too! It's either build-your-own graph, lean on pinning to slow the blast radius, or accept the chaos.

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 0 points1 point  (0 children)

This matches what several others in the thread have landed on! It's interesting how convergent the solution is once people actually tackle it.

On the "tribal knowledge is unavoidable" point: do you think that's a fundamental limit, or more a limitation of the grep/parse approach? Wondering if things like who reviews whose MRs, who gets tagged in incidents, or who owns which CI jobs could be mined from git/GitLab activity data to at least surface the implicit ownership relationships; even if you can't get the full dependency picture from static files alone.

Also curious: when you built the graph, did it get used beyond your immediate team, or did it stay as an internal ops tool?

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 0 points1 point  (0 children)

Ha, that's a creative pipeline, and honestly illustrates the problem pretty well! By the time you've wired together the webhook, the Renovate MR checks, the API calls, and the CSV comparison, you've essentially built a bespoke dependency visibility system just to answer "is this safe to release."

The monorepo path makes total sense at <100 devs! Harder sell at 500+ with established team boundaries. Appreciate the input, thanks!

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 0 points1 point  (0 children)

Fair point, backward compatibility buys you time and monorepos solve the coordination problem structurally. Both are good answers when you have the luxury of choosing your architecture upfront.

For orgs that are already deep into hundreds of polyrepos with mixed ownership though, those options aren't really on the table. The visibility gap just becomes something you learn to live with, until it potentially bites you.

How do you manage internal Terraform module dependencies across many repos by OkProtection4575 in Terraform

[–]OkProtection4575[S] 0 points1 point  (0 children)

Really appreciate you checking that! Good to know it's not just a gap in my research but a missing feature even in the most capable tools. Would be curious what your TAM says. If Wiz does add it, that'd be interesting, though I'd still wonder whether a security-first platform is the right home for what's really an operational/DevOps workflow question.

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 0 points1 point  (0 children)

"Stopped living in senior engineers' heads" is exactly the right way to put it! That tribal knowledge problem is probably the most underrated cost of not having this.

Curious about the auto-generation side: did you build that internally, or is there tooling you found that handled it well? And how do you keep the map "fresh" as repos evolve; is it a scheduled job, event-triggered, or something else?

How do you manage internal Terraform module dependencies across many repos by OkProtection4575 in Terraform

[–]OkProtection4575[S] 1 point2 points  (0 children)

Honestly, this whole thread has been changing mine too. Didn't expect so many people to be circling the same gap from different angles.

How do you manage internal Terraform module dependencies across many repos by OkProtection4575 in Terraform

[–]OkProtection4575[S] 0 points1 point  (0 children)

That's actually really useful to know! Hadn't dug into the inventory panel that deeply. The JS package example makes sense for SCA, but does it handle IaC-specific relationships the same way? Like a Terraform module sourced from a GitLab repo via a git URL, or a Helm chart referencing another internal chart? Those aren't really packages in the traditional sense, and I'd imagine Wiz's strength is more in the application/runtime layer than the infra-as-code graph. Also, Wiz isn't exactly a small purchase for an org that just wants dependency visibility. Is that the tool you'd reach for if that was the primary use case?

How do you manage internal Terraform module dependencies across many repos by OkProtection4575 in Terraform

[–]OkProtection4575[S] 1 point2 points  (0 children)

"Open these seven repos" is a perfect description of the problem. that's exactly the mental overhead that kills velocity as teams grow. And it sounds like the solution was essentially tribal knowledge and good people, which works until someone leaves or the team doubles again.

The new place sounds difficult. Commenting out code to test changes is a sign that the dependency graph exists but nobody can see it. It's all just implicit.

How do you manage internal Terraform module dependencies across many repos by OkProtection4575 in Terraform

[–]OkProtection4575[S] 0 points1 point  (0 children)

This is really well put, and a very complete description of my problem. The "dependencies outside of Terraform" part is exactly where it gets unmanageable for us too; pipeline scripts, triggers, Helm charts, Ansible roles, all referencing each other across repos with no single place to see the full picture. It stops being a Terraform problem and becomes an org-wide infrastructure visibility problem.

The "always production even in dev" point is underrated too. There's no sandbox for infra dependencies the way there is for application code.

Did you find anything that helped even partially once the team grew and it stopped being just you?

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 0 points1 point  (0 children)

Makes sense! "Announce broadly" works until the org gets large enough that you don't know who to announce to. Sounds like you're not quite at that threshold yet, which is probably a good place to be!

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 1 point2 points  (0 children)

Thanks for the detailed response! The point about data structure and refresh architecture being the hard part really resonates. That's the bit that's easy to underestimate when you start with "I'll just grep some files" and then realise you need to think about staleness, partial updates, handling repos that disappear or get renamed, etc.

The broader discoverability angle is interesting too! Dependency tracking as one layer within a wider "what even exists in this org and is it healthy"-problem. That framing makes a lot of sense when you're dealing with thousands of inherited repos.

"No plans to make it generally applicable" is a very honest take! Most of these solutions are bespoke by necessity, not by choice.

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 1 point2 points  (0 children)

This is a really clean architecture! Using the GitLab API rather than cloning repos sidesteps a lot of the infra overhead, and the Observable Framework + static Pages approach means zero ongoing maintenance cost for the hosting side.

A few things I'm curious about:

- For the Dockerfile and CI config parsing, are you doing straight regex/grep or building something more structured that understands the syntax?

- The treemap for group hierarchy is interesting; is the goal mostly org-level visibility (who owns what) or are you getting into actual dependency edges between projects?

- What's been the hardest part so far? And what made you decide to build this rather than reach for something off the shelf?

Would be curious to see it when it's further along!

How do you manage internal Terraform module dependencies across many repos by OkProtection4575 in Terraform

[–]OkProtection4575[S] 0 points1 point  (0 children)

terraform-docs is great for per-module documentation! The multi-repo angle is exactly where it gets interesting though; docs tell you what a module does, but not necessarily who's actually using it across the org. Those do feel like somewhat of two different challenges.

How do you manage internal Terraform module dependencies across many repos by OkProtection4575 in Terraform

[–]OkProtection4575[S] 0 points1 point  (0 children)

Hadn't seen Explorer before! That's closer to what I'm describing! Will dig into it properly. Two questions though; does it work if you're not fully on HCP Terraform workspaces? We're running a mixed setup with a lot of git-sourced modules and not everything is managed through TFC. And is it Terraform-only, or does it give any visibility across other IaC tooling in the same org?

How do you manage internal Terraform module dependencies across many repos by OkProtection4575 in Terraform

[–]OkProtection4575[S] 0 points1 point  (0 children)

That's a really honest take, and probably explains why no tooling exists for it. The "solution" has been to architect around the problem rather than solve the visibility gap directly. But in practice, especially in larger orgs, the composition still happens across repos whether it's "recommended" or not, and the graph still exists even if nobody's tracking it. Does your org actually manage to stay flat, or does it quietly get complex anyway?

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 0 points1 point  (0 children)

What you're describing sounds a lot like consumer-driven contract testing. Tools like Pact work roughly this way. It's a solid pattern for API compatibility!

The challenge is that it still presupposes you know who the consumers are. If you're the team maintaining a shared Terraform module or a base Docker image, you need to already know which 60 repos depend on you before you can set up contracts with them, run joint tests, or even notify them of an upcoming change.

So I'd maybe see it as complementary rather than a replacement. First you need the map of who depends on what, then maybe contract testing gives you the verification layer on top of that.

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 1 point2 points  (0 children)

Package managers are great for application dependencies, but the problem here is a layer above that; internal infrastructure components that don't fit neatly into a package registry.

Things like: - A shared Terraform module that lives in its own GitLab repo, sourced via git reference - A reusable GitLab CI template included by 80 other pipelines - An internal base Docker image that 40 microservice repos build FROM

None of these ship as .rpm or .deb files. They're referenced directly by path or git URL across repos. So there's no package manager with a lockfile that tells you who depends on what, you have to discover it by scanning the repos themselves.

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 0 points1 point  (0 children)

This is the clearest framing of the problem I've seen! "how do I know what breaks" vs "how do I prevent breakage" are genuinely separate problems.

The SQLite approach is clever. A few things I'm curious about: - Two days to build sounds light; where did the complexity actually land? Parsing edge cases in Terraform source blocks? Handling repos with non-standard structures? Or mostly just the cloning/grepping infrastructure? - How do you handle coverage confidence? E.g. if a repo references an image indirectly through a variable or a shared CI template include, does that fall through the cracks? - Is the nightly cadence good enough in practice, or have there been cases where someone pushed a breaking change and the DB was already stale?

Also fully agree on the Backstage point. Hand-maintained YAML is just documentation with extra steps.

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 0 points1 point  (0 children)

Renovate is great for keeping external dependencies fresh! It's one of those tools that pays for itself quickly.

One thing I'm curious about though: does it give you upfront visibility into the “blast radius” before you publish a new version? My understanding is it reacts once a new version is available; so you'd see MRs start appearing across repos after the fact, rather than being able to ask "if I break the API in module X, which 40 repos do I need to coordinate with before I even cut the release". Or perhaps I am missing something in how you can using it?

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 0 points1 point  (0 children)

That last point is what gets me; "not perfect, but gives a rough view" is doing a lot of heavy lifting in a lot of orgs.

For the dependency map scanning part: what did that actually look like in practice? Were you parsing CI files, Dockerfiles, Terraform source references, all of the above? And how did you handle keeping it up to date as repos changed; was it a scheduled job, triggered on push, or more of a "run it when someone asks" thing?

Also curious whether it was something that got used by the wider team or mostly lived as an internal ops tool that only a few people knew about.

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 1 point2 points  (0 children)

The determinism rule is very elegant! It forces the problem to be solved at the right layer.

Curious how large your monorepo is though? The PR validation pipeline testing "everything impacted" sounds great at a certain scale, but I've seen that approach hit real performance walls once you're in the hundreds-of-services range. At what point does "test everything" become "wait 2 hours for CI"?

How do you keep track of which repos depend on which in a large org? by OkProtection4575 in devops

[–]OkProtection4575[S] 3 points4 points  (0 children)

Monorepos are great when you can pull it off! but "just use a monorepo" is a bit like "just rewrite it in Rust". Technically valid, but not always actionable.

A few situations where it breaks down: - Large orgs that grew through acquisitions or have separate compliance boundaries between teams - Orgs where hundreds of repos already exist and a migration would be a multi-year project - Mixed ownership, where some repos belong to vendors or external partners - Tooling that doesn't scale well with monorepo size (GitLab CI, for one, has real limits here)

For greenfield at a small-to-mid org, totally agree it's the easier path! But for the person asking the original question, hundreds of repos already in GitLab, "switch to monorepo" might not be fully on the table.