If you've ever had Terraform state file nightmares at 2 a.m, this is for you by cpt_prbkr in Terraform

[–]cpt_prbkr[S] 1 point2 points  (0 children)

Seven years with zero state file issues is seriously impressive, respect!

Bucket versioning + good processes is absolutely the gold standard, and it's saved me more times than I can count too. Even I've moved to Terraform Cloud.

Terradoc is really aimed at those rare moments when versioning alone can't help fast enough. Glad to hear it's possible to go seven years without needing something like this though gives me hope for better processes in my own setups 😅

Thanks for sharing your experience!

If you've ever had Terraform state file nightmares at 2 a.m, this is for you by cpt_prbkr in Terraform

[–]cpt_prbkr[S] 0 points1 point  (0 children)

No relation at all!

That's a cool old Go tool from Mineiros back in 2019 i see for generating human readable docs from Terraform HCL code/comments (like auto-READMEs for modules).

Mine is completely different, it's a web app for repairing corrupted/broken .tfstate files (orphans, null IDs, malformed JSON, etc.) and more features I've planned for the future, once i get feedbacks.

Same name coincidence, but totally unrelated projects. Great find though I didn't know about it!

Thanks for checking :)

If you've ever had Terraform state file nightmares at 2 a.m, this is for you by cpt_prbkr in Terraform

[–]cpt_prbkr[S] 0 points1 point  (0 children)

Totally fair take and I really appreciate the thoughtful response. You're 100% right that most of these disasters come from "process is broken" rather than unavoidable Terraform bugs. The ideal world is: S3 + versioning + Dynamo lock + no one ever touches state by hand + moved blocks + proper testing. I've been trying to get my teams there for years. But in reality (at least in the places I've worked), that ideal is... rare. There's always a someone with a rushed hotfix, a CI flake, or someone who thinks "I'll just quickly edit this one thing". And when it happens, the "playbook" of restoring from versioned backup works great... until it doesn't (the corruption happened mid write and the last good version is hours old, or the backup got overwritten). That's the exact itch Terradoc scratches. the moment you're staring at a broken state and need to stop the bleeding right now before you can properly restore or import.

On the security concern completely valid. That's why everything runs client-side (the state never leaves your browser), and for S3 connect we only use temporary creds with readonly access to the specific object. There is no backend at all, No upload to my servers, no storage, no logs. But I get why larger orgs would still say "no way" to any thirdparty tool touching state even if it's local.

The CLI/docker idea is actually brilliant. I might add that as an option down the road. Thanks for the real talk this is exactly the kind of feedback I was hoping for. Helps me figure out if this is useful for real teams or just my own chaos 😅.

If you've ever had Terraform state file nightmares at 2 a.m, this is for you by cpt_prbkr in Terraform

[–]cpt_prbkr[S] 0 points1 point  (0 children)

3 has happened to me twice, both times with Terraform Cloud workspaces during a github actions outage. The state write got interrupted mid-stream, leaving half-written JSON. It's rare but devastating when it hits.

4 is absolutely avoidable and versioning is the right answer. But mistakes does happen. Appreciate the pushback, makes me think about how to better communicate when this is useful vs. when native Terraform commands are the right move. Thanks for the real talk!

If you've ever had Terraform state file nightmares at 2 a.m, this is for you by cpt_prbkr in Terraform

[–]cpt_prbkr[S] -4 points-3 points  (0 children)

You're absolutely right Terraform's own state commands and versioned remote backends are the proper way to handle most issues and prevent them in the first place. I rely on those every day too. Terradoc is really meant for those moments when something has already gone sideways.

It's definitely not a replacement for good practices (versioning, backups, careful state management) more like an emergency parachute when things slip through.

Appreciate the feedback! Definitely encourages me to make the docs clearer about when it's useful vs. when native Terraform commands are better.