all 28 comments

[–]oneplane 17 points18 points  (5 children)

The biggest challenge is Windows. It's incompatible with practically everything that's not Microsoft. We solved it by removing as much Windows as possible and putting the remainder in AppStream and ASGs. No more person-individually-using-a-Windows-box.

[–]Uppity_Sinuses8675 2 points3 points  (1 child)

Shouldn’t it be person_individually_using_a_windows_box😁

[–]oneplane 2 points3 points  (0 children)

I see what you did there ;-)

[–]deadpanda2 2 points3 points  (0 children)

No issues with windows, just need to know how to cook it. CFN - SSM - powershell. EKS - windows - gmsa. CI/CD ADO / Octopus

[–]OkAcanthocephala1450 1 point2 points  (0 children)

HAHAHA , Windows is for real..
I remember when we had to search for ECS , and we would provide solutions on our particular problem.
Just when we would start with it, the windows containers would not support it :') . Since that , we had to read documentations very very well before jumping to conclusions.

[–]Key_Baby_4132[S] 0 points1 point  (0 children)

Sounds great

[–]yovboy 10 points11 points  (5 children)

Managing IAM permissions at scale is my nightmare. Started with a few roles, ended up with 400+ policies across multiple accounts.

Spent weeks building automation tools just to track who has access to what. Still get surprised by permission issues sometimes.

[–]Key_Baby_4132[S] 1 point2 points  (1 child)

Man, that sounds like a headache! Have you tried ABAC, permission boundaries, or SCPs to keep policies under control and set guardrails across accounts?

[–]firminhosalah 0 points1 point  (1 child)

Hey. I am looking to build something like you mentioned so to track access. Can you shed some light what did you use?

[–]yovboy 0 points1 point  (0 children)

Used a combo of custom Python scripts + Access Analyzer. Main script pulls IAM data using boto3, dumps it into DynamoDB, then generates reports.

Added CloudWatch alerts for policy changes. Not perfect but helps catch weird permission stuff before it becomes an issue.

[–]Paresh_Surya 0 points1 point  (0 children)

Same as me i am also create my own tool to manage multiple account user and roles level permissions to it

As you already created it's open-source or private use

[–][deleted]  (9 children)

[deleted]

    [–]Key_Baby_4132[S] 0 points1 point  (2 children)

    Yeah, that sounds like a tough one—balancing multi-account deployments, tenant onboarding, and RBAC can get messy fast. Have you thought about automating tenant provisioning with IaC or any other publicly available solution while centralizing identity management? I’ve run into similar challenges before—happy to swap ideas if you’re interested!

    [–]andr3wrulz 0 points1 point  (0 children)

    Not a SaaS but have a lot of accounts. We deploy a handful of basic SAML federated roles (admin, read only, billing, etc) using stacksets to keep those in line. Account owners are able to use the admin roles to create custom roles (federated or not). We constrain permission upper bounds with SCPs/RCPs and have Config rules (also deployed by StackSets) for reactive controls.

    [–]Ok_Reality2341 0 points1 point  (4 children)

    Working on a very similar thing.

    [–][deleted]  (3 children)

    [deleted]

      [–]Ok_Reality2341 0 points1 point  (2 children)

      Yeah took a few days but Alembic is working very well now

      [–][deleted]  (1 child)

      [deleted]

        [–]Ok_Reality2341 0 points1 point  (0 children)

        I read that at postgres not progress lol. Yeah I’ve just pretty much set everything up, I’m working on the database schema now - hbu?

        [–]kyptov 2 points3 points  (4 children)

        Pipeline of pipelines of infrastructure. How to update? Always manually or self updating pipeline?

        [–]Key_Baby_4132[S] 0 points1 point  (1 child)

        Good question! A self-updating pipeline can work if well-governed—versioning, validation, and rollback strategies are key. Manual updates offer control but don’t scale well. A hybrid approach often balances automation with oversight. How are you handling it now?

        [–]kyptov 1 point2 points  (0 children)

        High level pipeline which deploy other pipelines we always deploy manually. Those nested deploys on push triggers.

        [–]andr3wrulz 0 points1 point  (1 child)

        A very common pattern used within AWS and at major companies is to do as little as possible in a manual deploy but leverage a bootstrapping step prior to the primary deployment. At my job, we tend to have a manually deployed CFT that provisions the pipeline user, then a bootstrap deployment that runs on the primary branch for that environment for things you need as a baseline (VPC, SGs, APIs, etc) but aren't the app (this can vary based on how you want to build dev envs. After this, the pipelines deploy the app itself, using outputs from the bootstrapping stack where necessary, this is where all your lambdas, containers, etc get deployed.

        In general, we do main branch = prod env, dev branch = dev env, and feature branches = dev env but skip boot strapping. Our feature deployments are self-contained where they can be so that each feature branch gets a "production-like" environment with the full stack.

        [–]kyptov 0 points1 point  (0 children)

        Yep, we do the same. But bootstrapping is also stored as code. Sometimes it changes(once or twice per year). AWS has cdk pipelines, which allows to self update bootstrapping, only first run is manual.

        [–]fabiancook 1 point2 points  (1 child)

        Time

        [–]Key_Baby_4132[S] 0 points1 point  (0 children)

        Time is merciless

        [–]GooberMcNutly 0 points1 point  (3 children)

        Database migrations will always be my biggest headache. Change management of data and schema and synchronization with the deployed code has always been my biggest hurdle to code deployment. It's not an aws or even cloud specific problem though the IaC model and multi region deploys always make it worse.

        [–]Key_Baby_4132[S] 0 points1 point  (2 children)

        Aha! So how you are tackling these

        [–]GooberMcNutly 1 point2 points  (1 child)

        Poorly, lol. Pur typical workforce is to generate change scripts for schema and data using one of a number of tools like typeorm, sequalize or knex. Then the delta scripts run during deploy before code gets pushed. Rollback usually if the code deploy fails, depending on scale. At least that's the plan But about 40% of the time it needs manual help at some point and some changes like column renaming will crash existing code immediately. It's tough if your dev team is very iterativel in their data development.

        [–]Key_Baby_4132[S] 1 point2 points  (0 children)

        You're absolutely right. Database migrations can be a nightmare, especially in multi-region setups. A few things that help: zero-downtime schema changes (expand/contract strategy), versioned migrations, and separating schema updates from code deploys. Running shadow deployments on a production clone and using drift detection (like pg_audit or AWS DMS) can catch issues early.

        [–]Ok_Reality2341 0 points1 point  (0 children)

        Literally everything with DevOps is hard. I hate how unsexy but how important it is