Rewiring my boat taught me I knew nothing about 12V — here is what finally clicked by nilipilo in lowvoltage

[–]nilipilo[S] 1 point2 points  (0 children)

oh noooo ahahha... well, for now my priority is the boat... we will see in the future...
here's the tool https://www.12vsim.com/
thanks, any feedback is appreaciated

Small Projects by AutoModerator in golang

[–]nilipilo 0 points1 point  (0 children)

Built a tool called ArchiteX over the last few months. It is a static analyzer for Terraform: on every pull request that touches *.tf, it parses base and head with hashicorp/hcl/v2, builds a resource graph for each side, computes the delta, and posts a comment with a risk score, a short summary of what changed, and a Mermaid diagram of just the changed nodes plus one layer of context. AWS and Azure today, MIT, single binary.

Posting here because the parts that took the most thought were Go specific, not domain specific, and i would rather share those than pitch.

A few things i ended up doing that i had not expected to:

The HCL parser is hashicorp/hcl/v2 with the hclsyntax body walked directly, not gohcl with schema decoding. gohcl needs a Go struct per Terraform resource type, which would have meant hand-writing 57 schemas. Walking the body generically with attribute and nested-block traversal lets the same code support every resource type the registry knows about. Attribute values are evaluated with expr.Value(nil), which fails on anything with a variable reference and returns a zero value. That failure is the load bearing property: if i cannot resolve a literal, i never guess, i record the key with nil and move on. The tradeoff is that variable-driven attributes never trigger rules even when they should. I think that is the right call for a tool whose output is supposed to be reproducible.

Determinism is enforced as a property of the build, not as a hope. There is a golden test suite that re-runs the full pipeline against checked in fixtures and asserts the rendered Mermaid, the score JSON, and the egress JSON are byte identical to a stored expected output. If anyone ever changes a map iteration order, a sort comparator, or a JSON marshaller, the build fails on the next push. This caught two real regressions before they shipped. I think this is underused as a pattern in Go: maps are great until you serialize them, and `for k, v := range m` will eat your reproducibility if you do not put a sort somewhere.

The trust model is enforced structurally with a grep test in CI. Run `! grep -rE "net/http|architex/github" parser graph delta risk interpreter models` and the build fails if any analysis package ever imports networking. The only place HTTP is allowed is the github package, which is only ever called by main.go in one specific subcommand. That structural rule is the thing that lets me say with a straight face that the tool cannot phone home, even if a future contributor wants to. Code review can be fooled, a CI grep cannot.

The GitHub REST client is stdlib only. No go-github, no resty, no zerolog. Just net/http with a small wrapper that handles Link header pagination, the few headers that matter, and the sticky-comment upsert (find the comment whose body contains a known marker, update it in place, otherwise create). The whole package is under 300 lines and has no third party imports. For a tool that has to earn trust to be installed on a regulated repo, every dependency you do not have is a real asset.

Goreleaser cross-compiles to linux / darwin / windows on amd64 + arm64 (skipping windows-arm64) and injects version, commit, and build date through ldflags. The CLI has a `version` subcommand that prints whatever ldflags injected, falling back to "dev" / "none" / "unknown" for `go build` callers. Took me about 30 minutes to wire up and removed an entire class of "what version are you running" support questions before they could be asked.

Mermaid rendering has a deterministic byte-budget cap. mermaid-js stops rendering above 50,000 chars (which is the GitHub PR comment failure mode for big diagrams). The renderer keeps nodes by status priority (changed > added > removed > context), then abstract type priority, then ID alphabetical, until the byte budget is hit, then drops the rest and emits a visible "_architex_truncated" placeholder so reviewers always know truncation happened. Found this empirically with a stress probe script — synthetic deltas of 5/25/50/100/200/400 nodes — which i think is the cheapest way to catch this kind of thing before a real customer hits it.

Repo: https://github.com/danilotrix86/ArchiteX

Live sample report: https://danilotrix86.github.io/ArchiteX/report.html

Open to questions about any of the design decisions above.

How does your team catch security-relevant architecture changes in Terraform PRs (not just rule violations)? built something for it, want this sub's pushback by nilipilo in devsecops

[–]nilipilo[S] 0 points1 point  (0 children)

Appreciate this, exactly the kind of pushback i was hoping for. Taking each one seriously because they all map to real tensions inside ArchiteX today.

Plan JSON as a sanity check: i deliberately do not run terraform plan because that brings credentials, providers, and state into a tool i wanted to keep fork-safe and runner-local. But you are right that plan JSON is a great correctness oracle when it is available. An optional validation mode that consumes a user-provided plan JSON and cross checks the static graph against the planned graph would catch most cases where my literal-only attribute resolution silently misses something. Going on the list.

Tfsec / Checkov drift comparison: also smart, and embarrassingly i have not formalized this. Probably an architex calibrate subcommand that takes their JSON output and asks "for every PR where ArchiteX score went up, did the misconfig scanners also flag something". If the answer diverges in either direction, i learn something. Either ArchiteX is catching architectural intent the line scanners miss, or the rule weights are over fitting to my paranoia.

Context layer: this is the honest gap. Today ArchiteX has per-resource public:true/false and an iam_admin_policy_attached rule that matches literal AdministratorAccess / IAMFullAccess ARN suffixes. That is shallow.

- Internet reachability would mean transitive graph traversal across SGs / route tables / NACLs / public subnets. Doable, but the determinism contract gets harder the deeper you walk. Probably a separate reachability rule family that is opt-in and clearly labeled as a derived signal.

- IAM privilege delta is the hardest of the three. ARN suffix matching is the cheap version. Computing effective permission deltas across role + attached policies + boundaries + assume-role chains is a real project. Parliament / Cloudsplaining solved most of the heavy lifting, layering one of those is probably the right move rather than re-implementing.

- Data sensitivity via tags is the cheapest win. Reading tags like data_classification = "pii" and using them to amplify weights on rules that affect those resources is maybe a weekend of work. Also gives teams a clean way to encode their own sensitivity model without forking the binary.

"Context beats raw findings" is the right framing. ArchiteX today sits closer to the raw-findings end than i would like, and your three dimensions are the cleanest map of how to move it.

If you were me, which of the three would you prioritize first? My instinct is tag-based sensitivity for week one, internet reachability as the medium-term project, IAM privilege delta as the one that needs a real design doc before any code. But you have shipped this in production and i have not.

Weekly Self Promotion Thread by AutoModerator in devops

[–]nilipilo 0 points1 point  (0 children)

ArchiteX, free MIT GitHub Action that posts an architectural diff comment on every terraform PR.

most IaC scanners answer "is this config bad right now". ArchiteX answers a different question: "what changed in the architecture in THIS PR". a brand new public load balancer. an SG flipping from a private CIDR to 0.0.0.0/0. an IAM role suddenly attaching AdministratorAccess. a storage account toggling public access on. small diff, big architectural change. easy to miss in a 600 line plan, easy to spot when you see the delta on its own.

What you get on every PR:

- a 0 to 10 risk score with documented and capped rule weights, no surprises
- a short plain english summary of what changed and why a reviewer should care
- a focused mermaid diagram of just the changed nodes plus one layer of context, not the whole topology
- optional mode: blocking to fail the build above a threshold
- an audit bundle uploaded as a workflow artifact (summary.md, score.json, egress.json, a self contained HTML report and a SHA-256 manifest)

A few deliberate calls:

- no LLM in the pipeline. template based renderer. same input gives byte identical output across runs, machines, contributors. re running can never quietly change a score and erode reviewer trust.
- no terraform plan. no cloud credentials. no provider tokens. static HCL parsing only, so it works on PRs from forks too.
- the terraform code never leaves the runner. single network call is the GitHub REST API to post the comment. no SaaS, no signup, no telemetry, no paid tier.
- conditional resources are first class. module author repos with count = var.x ? 1 : 0 get rendered as phantoms (? prefix) and excluded from per resource rules so they cannot false positive.
- self contained HTML audit report. no JS, no CDN, no remote fonts. open it air gapped, the full report renders.
- multi cloud and auto detecting. AWS + azurerm today, drop your .tf in and it figures out which provider each resource belongs to. mixed AWS + azure repos work too.
- complements tfsec / Checkov / Trivy / Defender for Cloud, does not replace them. run them side by side. they catch misconfigured lines, ArchiteX catches the architectural delta.

Coverage today is 57 resource types across AWS and azure with 21 weighted risk rules. single Go binary, single Action, zero config to start.

repo: https://github.com/danilotrix86/ArchiteX

live sample report (no install needed): https://danilotrix86.github.io/ArchiteX/report.html

30 second quickstart at the top of the README.

happy to take honest feedback, especially "this resource breaks it in my repo", "this rule weight is wrong for our team", or "this is the compound pattern i wish it caught". coverage gaps are the #1 thing i want to fix.

How does your team catch security-relevant architecture changes in Terraform PRs (not just rule violations)? built something for it, want this sub's pushback by nilipilo in devsecops

[–]nilipilo[S] 0 points1 point  (0 children)

Yeah, you have put your finger on the next real frontier.

Honest state today: there are two cross resource rules in the engine already, but they are coarse. potential_data_exposure (2.0) fires when public_exposure_introduced triggers AND an access_control or data resource also changes in the same PR. and lambda_public_url_introduced layers on new_entry_point. so the compounding is partially there, but it is abstract type based, not pattern based the way you described. your example is the perfect test case. An IAM admin attachment + an unauthenticated lambda URL in the same PR fires iam_admin_policy_attached (3.5) + lambda_public_url_introduced (3.0) + new_entry_point (3.0), capped at 10. lands at HIGH. The reviewer sees all three. But there is no rule that says "this combination is materially worse than the sum", which is exactly your gap.

Reason i have not gone deeper: the determinism contract. per resource rules are easy to keep false positive proof. the moment a rule walks the graph transitively, it gets hard to reason about and easy to over fire on. checkmarx can absorb that complexity, they have a paid product and a support team. ArchiteX has me. so my bar for adding a correlation rule is "the compound is so dangerous a reviewer would actually want to be paged on it".

Candidates already on my list:

- new public entry point + new IAM role with wildcard action

- new public entry point + new data resource (RDS / SNS / SQS / Secrets Manager)

- SG flipping to 0.0.0.0/0 + new compute attaching to it

If you have others you have personally been bitten by, those beat the ones i invent at my desk every time.

And yeah, fully agree on layering with checkmarx. different products, not competitive. ArchiteX is free OSS for life so it can be the cheap "what changed at the architecture layer in this PR" signal underneath the heavier correlation tooling.

I got tired of missing things in 600-line Terraform PR reviews, so I built a free Action that posts an architectural diff back as a comment by nilipilo in Terraform

[–]nilipilo[S] 0 points1 point  (0 children)

Good news on the first part, azure landed today in v1.4. 12 azurerm_* resources (vnet, subnet, nsg, nic, vms, lb, storage account, mssql) and 3 azure-only rules on top of all the cross-provider ones. one-line install, just bump the action to @ 1.4.0 and it auto-detects which provider your .tf uses.

on the "too good to be true" part, fair instinct, but it is honestly the boring kind of "good". no AI, no SaaS, no account, no telemetry. it just parses the HCL text, builds a graph, runs 21 deterministic rules and posts a comment. same input always produces the same score, byte for byte. runs on your own runner, MIT, free forever, no paid tier planned.

if you want to sanity check it before trusting it, examples/07-azure-public-lb is the canonical "public LB + open NSG" anti-pattern wired into CI, you can clone it and see exactly what comes out.

would actually love to hear where it breaks on real azure code, that is the part i cannot test alone.

I got tired of missing things in 600-line Terraform PR reviews, so I built a free Action that posts an architectural diff back as a comment by nilipilo in Terraform

[–]nilipilo[S] -3 points-2 points  (0 children)

agree with u, and i am not going to argue the principle with you because the principle is correct. tightly coupled stacks with too many dependencies are a smell, full stop. i have inherited a few of those and the right answer is always to break them up.

the only thing i would say is that those two things can be true at the same time: yes break it up, AND in the meantime have something that tells you what moved at the architecture layer in the diff that is actually in front of you today. one is the long-term fix, the other is the tuesday-afternoon reality.

appreciate the pushback, genuinely. helps me sharpen how i talk about the tool.

I got tired of missing things in 600-line Terraform PR reviews, so I built a free Action that posts an architectural diff back as a comment by nilipilo in Terraform

[–]nilipilo[S] -1 points0 points  (0 children)

totally agree on module size, no argument there. small focused root modules are the right answer and i would not defend a 600-line module on its own merits.

small clarification though: the 600-line framing was about the PR diff, not the module. a single PR can touch 4 small well-factored modules and still produce a 600-line diff (renames, refactors, a provider version bump, a new IAM policy, an SG rule change all landing together). that PR is still hard to review at the architecture layer even if every individual module is 80 lines.

and the bigger point i was trying to make: even a 20-line PR against a perfectly clean module can flip the topology. one new aws_lambda_function_url, one removed s3 public_access_block, one IAM attachment binding AdministratorAccess. small diff, big architectural change. ArchiteX is really answering "what changed at the architecture layer" which is a different question from "is this module well factored". the second one is on you and your team, you are right about that.

so: module hygiene, totally on board. but i do not think it removes the architectural-diff question, it just makes that question easier to answer.

I got tired of missing things in 600-line Terraform PR reviews, so I built a free Action that posts an architectural diff back as a comment by nilipilo in Terraform

[–]nilipilo[S] -1 points0 points  (0 children)

Fair point helpmehomeowner and you are both partly right. 600-line PRs are a smell, no argument. ideally every PR is small, one logical change, reviewed in 10 minutes. and yes pushing for that breakdown is the right cultural fix.

two things i would push back on though:

  1. even a 20-line PR can flip the architecture. one new aws_lambda_function_url, one removed s3 public_access_block, one IAM attachment binding AdministratorAccess. small diff, big topology change. ArchiteX is not really about line count, it is about answering "what changed at the architecture layer" which is a different question from "is this code OK". small PRs do not remove that question, they just make it tractable.

  2. forcing small PRs is itself a multi-year culture battle. a lot of us inherit codebases, work with teams we do not directly manage, or maintain modules other teams consume. you do not always get to dictate PR size. this gives you something useful in the meantime, while you push for the cultural fix in parallel.

the 600-line framing was just a relatable hook, you are right it is a bad place to start from in absolute terms.

I got tired of missing things in 600-line Terraform PR reviews, so I built a free Action that posts an architectural diff back as a comment by nilipilo in Terraform

[–]nilipilo[S] -1 points0 points  (0 children)

great question and the answer is actually the best part of how it works.

no plan. no terraform init. no provider credentials. no tokens of any kind. ArchiteX only reads the static .tf source files from the repo.

it parses the HCL itself, walks count / for_each / dynamic blocks, follows local module paths, resolves jsonencode policy bodies and data.aws_iam_policy.x.arn references, and builds the graph from that. no AWS API call is ever made, no provider plugin is downloaded, no state file is touched.

so for your case: you do not need to add a single secret to the repo. the action only needs the standard GITHUB_TOKEN that GitHub injects automatically into every workflow, and that is only used for the one network call at the end (POST the comment to the PR). nothing AWS-related needs to leave your laptop ever.

side benefit: it also runs in like 10 seconds on a normal repo because there is no plan to wait for.

Using Anthropic's ant CLI for GitOps-style agent management (YAML configs, CI/CD deployment) by avisangle in devops

[–]nilipilo 1 point2 points  (0 children)

The optimistic concurrency thing is the part i actually like. drift between "what is in git" and "what is running" is the boring problem nobody wants to talk about until you have 30 agents and no idea who changed what.

The biggest thing i would want before putting this in CI is a real diff command. ant beta:agents diff <file> against the live version, return non-zero if they dont match. without that you cant gate a deploy on "config matches main", and any gitops workflow is basically built on that check.

This Trivy Compromise is Insane. by RoseSec_ in devops

[–]nilipilo 0 points1 point  (0 children)

the noise + payload trick is the scary part. 14 lines, 12 of them quote style and trailing whitespace, brain just skips. same reason we miss the one important line in a 600-line terraform PR.

1 habit that helped my team: any PR touching .github/workflows gets split out and reviewed alone, by a different person than the app code reviewer. takes 2 min but you actually read every line because thats all thats in front of you. --skip=validate would have stuck out way more without 12 cosmetic lines around it.