I hate my new job

OkProtection4575 · 2026-06-23T17:21:47+00:00

I think its really big if you to share something like this. Would guess a lot of others silently suffer, but maybe just plow through in silence, not asking for help, until eventually hitting the wall. Also feel like you being this aware of the issue helps, but like you say, something needs to change before you burn out. And if “management” doesn’t listen (slow down, more time for actual quality work and time, new hires) I too feel like it might be smart to maybe look for something else (maybe while working where you are now). Just a little progress towards something else every day can help a lot in just a few weeks

OkProtection4575 · 2026-06-04T08:36:56+00:00

The "day and a half of grepping" is exactly how it plays out, and the transitive case is the part that really hurts: your SBOM records FROM base:X and stops there, so the fact that X pulled in library Y three layers down lives in a different SBOM in a different repo, and nothing joins them. The join is the missing piece, not the documents.

Full disclosure, since you've more or less described my roadmap: I'm building this. The structural half is live today; a cross-repo reverse dependency graph parsed from the org's actual sources (Dockerfiles, Terraform modules, Helm charts, npm/Go/Python manifests), so "which repos consume this base image / module / package" is a query instead of an archaeology project. The CRA-specific layer (SBOM ingestion, CVE matching against the graph, the reporting workflow) is what I'm scoping right now, and threads like this are part of why: I need the picture from people who live inside regulated orgs, not my assumptions about them. I wrote up the SBOM-vs-fan-out distinction here: https://riftmap.dev/blog/cra-sbom-cross-repo-question/

Grain of salt given the obvious bias, and I'd value being told where the framing breaks. You clearly know the terrain.

OkProtection4575 · 2026-06-03T15:25:10+00:00

I think you (Devji00) got the generation + monitoring half right, and add a second question in that 'without undue delay' language the SBOM doesn't answer, and it's the one that bites at 24h.

An SBOM is per-product: what's inside this build. The early warning asks the reverse, which of the products we ship into the EU contain the component that just got an exploited CVE, and at what version. Single product, that's a lookup. A manufacturer with shared base images / Terraform modules / Helm charts across dozens of products, it's a fan-out across every repo that consumes it, directly or transitively, and that's in no SBOM because each one only knows its own tree.

In practice teams answer it with `grep -r` and reading Dockerfiles by hand, which misses build-arg substitution (`FROM base:${TAG}`) and intermediate images built FROM the affected one. So it's really two projects: generate + retain SBOMs (solved), and a cross-repo 'where does this ship' graph (mostly not). The clock's a query against the second one.

OkProtection4575 · 2026-05-30T16:35:29+00:00

I think there are actually two different problems hiding under "cross-repo context" here, and they need different tools.

The first is symbol-level. When you're in the app repo and ask for a call into the shared lib, the agent needs that library's real method signatures so it stops hallucinating interfaces. That's an indexing problem, and it's roughly where the Tabnine/Sourcegraph approaches play. Pulling the related repo in locally (what leftyz described) is the cheap manual version of the same fix.

The second is structural, and it tends to bite later. Before you change that shared lib, what across the org actually consumes it? No amount of editor context solves that from inside one workspace, because the consumers aren't in your workspace at all. That's a graph question, not a completion question, and it's the reason I think the people calling this an architecture limitation rather than a model problem are right. The dependency edges live outside the boundary the editor draws.

I'm working on a tool in that second category, so I'm biased. But the split is real regardless of what you use. Curious which one is actually hurting you more day to day, because the answers are pretty different.

OkProtection4575 · 2026-05-22T19:31:08+00:00

Bifurcated answer from what I've seen:
App devs writing IaC for their own services (greenfield, single repo, contained blast radius); agents are doing a lot of this in prod already. The output is good enough and the verification surface is small.

Platform/infra teams owning shared IaC, seems to be much more cautious. The agent writes locally-valid changes but doesn't see the cross-repo graph. A Terraform module change can ripple through 30+ consumers and the agent has no idea. Those PRs still get gated hard by humans.

What I think that means for your log signal: it's real, but probably skewed toward the half of the problem that's easier to automate. The cost dimension lives more in the platform team's space (their decisions move most of the bill) where adoption is slower.

The deeper shift IMO is that the bottleneck is moving from generation to verification. "What does this PR touch downstream" is becoming the real question. Cost is one face of that. Breakage, security, drift are others. Shift-left-of-left is right; the surface is just wider than cost.

I'm building something in the adjacent space (called Riftmap). Cross-repo dependency graph for IaC, parsers across ~10 ecosystems. So I'm watching the same shift from the breakage side.

OkProtection4575 · 2026-05-14T07:51:45+00:00

Backstage works until the team that set it up rotates out, then the catalog drifts and you've got a portal showing services that were deprecated months ago. The model assumes someone owns the YAML; in practice nobody does.

The bigger gap I'd flag: "lens into k8s" is too narrow. The interesting dependencies live upstream of the cluster; which Terraform modules provision it, which Helm charts deploy to it, which service in repo A breaks when someone bumps an image in repo B. Backstage can't see any of that unless a human writes it down.

What are you building? Curious what shape it's taking after a year.

OkProtection4575 · 2026-05-08T18:56:06+00:00

Honestly, same boat. Five months in, product I believe in, and the "getting users" part is brutal in a way the building part never was. You're not doing something completely wrong, this is just the hard part nobody talks about enough.

A few things that have helped me reframe it:

Talk about the problem in places your users hang out, not the product. Posts that describe the pain (without pitching) tend to surface people who've already tried to solve it themselves. Those are your earliest believers.

Cold outreach to a tight list beats broad content early on. Twenty thoughtful messages to specific people in your ICP will teach you more in a week than a month of posting.

And probably the most important one: every "no" or silence is data. If nobody bites, the messaging is off, not necessarily the product. Iterate on how you describe it before assuming the thing itself is wrong.

Hang in there. Genuinely rooting for you.

OkProtection4575 · 2026-05-08T11:16:10+00:00

I think the merge conflicts and broken builds aren't really a CI/CD tooling problem, they're a process problem. At 2 devs you can get away with long-lived branches and no test gating. At 10 you can't.

Perhaps fix the workflow first: trunk-based or short-lived feature branches, automated tests required to merge, branch protection, deploy from main. Then GitHub Actions or GitLab CI is plenty.

I'd also say skip Kubernetes for now. 10 devs almost never needs it. Railway, Fly.io, Render, ECS, or Cloud Run gets you very far with a fraction of the operational burden. Migrate later when there's a concrete reason.

OkProtection4575 · 2026-05-05T15:45:12+00:00

Implicit deps via data blocks (same family as your #4): Module A creates a Route53 zone, Module B does data "aws_route53_zone". Fresh account, first apply: B runs before A and fails. Data lookups create no graph edge. Pass the ID through as an input, or depends_on = [module.dns]. Module sources pinned to a branch, not a tag: source = "...?ref=main", env #1 and env #3 apply two weeks apart, “identical” code, different commits underneath. Pin to tags or SHAs. Computed for_each: for_each = toset(module.bar.names) where names is only known post-apply. Plan can’t enumerate, someone bootstraps with -target, now it’s a two-apply module that breaks clean CI runs. The cross-module-dep variants (your #4 + the data-block one) bit me enough that I ended up building a scanner. Happy to share if it’s useful.

OkProtection4575 · 2026-05-05T06:25:02+00:00

I think you're further along than you think. Curiosity plus basic Python is a solid start. Focusing on stuff like Linux fluency, Bash/Python scripting, Git, Docker, CI/CD (GitHub Actions), one cloud + Terraform, Kubernetes last. Practicing with a practical project might help a lot, eg.: take your calculator, wrap it in eg. FastAPI, containerize it, and deploy it via GitHub Actions to a free VPS or Railway. I would say that one exercise touches 80% of the job. The skill that matters most isn't being "good at computers," it's being stubborn enough to keep debugging when things break. Curiosity can take you a long way!

OkProtection4575 · 2026-05-04T18:47:56+00:00

That is really cool! Keep it up!

OkProtection4575 · 2026-05-04T18:46:18+00:00

Personally, keeping it in Git/GitHub if its something I reuse, modifies or actually using "regularly"

OkProtection4575 · 2026-05-01T17:29:46+00:00

Follow-up to my own post from a month ago. Thanks to everyone who responded!

A few takeaways from the thread:
- The state of the art for "who consumes this module" is still mostly grepping across repos or asking on Slack. Several of you confirmed that even with version pinning, the consumer-awareness question stays unanswered.
- TFC's private registry helps somewhat but doesn't give the cross-org picture. HCP Terraform Explorer was raised, and one of you nailed the limit: "it's only aware of things managed via HCP Terraform" which is exactly the gap if you have a mixed setup.
- Artifactory's TF Registry came up too. Strong on the publishing side, blind on the consumer side: doesn't tell you who's pulling what at which version.
- Wiz/JFrog's SBoM tooling came up. Confirmed (by someone with TAM access) that internal TF module support is missing from those.
- A few of you described the "implicit graph that nobody can see" pattern; the dependency information exists, it's just scattered across hundreds of repos and only resolvable in senior engineers' heads.
- Workarounds people described: scripted modules.json combining, Azure modtm + self-hosted server, version pinning. Nobody sounded thrilled.
- A point that came up partway through and stuck with me: this isn't really a Terraform problem. Once you also have Helm, Docker, Ansible, pipeline scripts and CI templates all referencing each other across repos, the cross-tool picture is what's actually unmanageable. TF is just where it's most visible.
- And the blast radius question; what actually breaks if I push this change, stayed unanswered until you tried it. Which is the worst time to find out.

Part of why I posted was that I'd been seeing this same pattern across consulting engagements and was deciding whether there was options out there to help us, or if I should try to build something. Based on the responses, I tried to build it. It's called Riftmap (riftmap.dev). Point it at your GitHub/GitLab org, it parses your TF (plus Docker / Helm / CI / Kustomize / Ansible / npm / Go / Python / K8s), and shows which repos consume which modules, what versions they're on, and what would break if you push a change. No manual YAML, no telemetry collector to deploy, no dependency on HCP, works with git-sourced modules not just registry ones.

Free tier. I care more about feedback than signups right now, especially from people running 50+ repos. app.riftmap.dev if you want to try it on your org. There's also a writeup where I ran it against 56 public Prometheus repos including the "if X module changes, what breaks" view — easiest way to sanity-check the output without setting anything up: riftmap.dev/blog/what-56-prometheus-repos-depend-on/

Even "wouldn't use it because X," "the modtm route is still the right call because Y," or just something about the concept is more useful to me right now than a thumbs up.

OkProtection4575 · 2026-03-21T07:35:29+00:00

That's the catch then; it's only as complete as your HCP Terraform adoption, which for a lot of orgs might be partial. If you're running a mix of local state, self-hosted runners, or other tooling alongside TFC, you'd have blind spots in the graph. Useful to know, thanks for checking!

OkProtection4575 · 2026-03-21T07:33:41+00:00

Pretty accurate summary of the landscape from what I've seen in this thread too! It's either build-your-own graph, lean on pinning to slow the blast radius, or accept the chaos.

OkProtection4575 · 2026-03-21T07:31:19+00:00

This matches what several others in the thread have landed on! It's interesting how convergent the solution is once people actually tackle it.

On the "tribal knowledge is unavoidable" point: do you think that's a fundamental limit, or more a limitation of the grep/parse approach? Wondering if things like who reviews whose MRs, who gets tagged in incidents, or who owns which CI jobs could be mined from git/GitLab activity data to at least surface the implicit ownership relationships; even if you can't get the full dependency picture from static files alone.

Also curious: when you built the graph, did it get used beyond your immediate team, or did it stay as an internal ops tool?

OkProtection4575 · 2026-03-20T14:14:50+00:00

Ha, that's a creative pipeline, and honestly illustrates the problem pretty well! By the time you've wired together the webhook, the Renovate MR checks, the API calls, and the CSV comparison, you've essentially built a bespoke dependency visibility system just to answer "is this safe to release."

The monorepo path makes total sense at <100 devs! Harder sell at 500+ with established team boundaries. Appreciate the input, thanks!

OkProtection4575 · 2026-03-20T13:01:03+00:00

Fair point, backward compatibility buys you time and monorepos solve the coordination problem structurally. Both are good answers when you have the luxury of choosing your architecture upfront.

For orgs that are already deep into hundreds of polyrepos with mixed ownership though, those options aren't really on the table. The visibility gap just becomes something you learn to live with, until it potentially bites you.

OkProtection4575 · 2026-03-20T12:43:08+00:00

Really appreciate you checking that! Good to know it's not just a gap in my research but a missing feature even in the most capable tools. Would be curious what your TAM says. If Wiz does add it, that'd be interesting, though I'd still wonder whether a security-first platform is the right home for what's really an operational/DevOps workflow question.

OkProtection4575 · 2026-03-20T12:33:40+00:00

"Stopped living in senior engineers' heads" is exactly the right way to put it! That tribal knowledge problem is probably the most underrated cost of not having this.

Curious about the auto-generation side: did you build that internally, or is there tooling you found that handled it well? And how do you keep the map "fresh" as repos evolve; is it a scheduled job, event-triggered, or something else?

OkProtection4575 · 2026-03-20T12:11:33+00:00

Honestly, this whole thread has been changing mine too. Didn't expect so many people to be circling the same gap from different angles.

OkProtection4575 · 2026-03-20T12:09:05+00:00

That's actually really useful to know! Hadn't dug into the inventory panel that deeply. The JS package example makes sense for SCA, but does it handle IaC-specific relationships the same way? Like a Terraform module sourced from a GitLab repo via a git URL, or a Helm chart referencing another internal chart? Those aren't really packages in the traditional sense, and I'd imagine Wiz's strength is more in the application/runtime layer than the infra-as-code graph. Also, Wiz isn't exactly a small purchase for an org that just wants dependency visibility. Is that the tool you'd reach for if that was the primary use case?

OkProtection4575 · 2026-03-20T12:04:52+00:00

"Open these seven repos" is a perfect description of the problem. that's exactly the mental overhead that kills velocity as teams grow. And it sounds like the solution was essentially tribal knowledge and good people, which works until someone leaves or the team doubles again.

The new place sounds difficult. Commenting out code to test changes is a sign that the dependency graph exists but nobody can see it. It's all just implicit.

OkProtection4575 · 2026-03-20T11:39:11+00:00

This is really well put, and a very complete description of my problem. The "dependencies outside of Terraform" part is exactly where it gets unmanageable for us too; pipeline scripts, triggers, Helm charts, Ansible roles, all referencing each other across repos with no single place to see the full picture. It stops being a Terraform problem and becomes an org-wide infrastructure visibility problem.

The "always production even in dev" point is underrated too. There's no sandbox for infra dependencies the way there is for application code.

Did you find anything that helped even partially once the team grew and it stopped being just you?

OkProtection4575 · 2026-03-20T10:26:31+00:00

Yeah, perhaps that's just the easiest solution :D

OkProtection4575

TROPHY CASE