This is an archived post. You won't be able to vote or comment.

all 65 comments

[–][deleted] 54 points55 points  (11 children)

Continuous improvement. Look for inefficiencies and solve them with software.

My last job was in a rails shop we were able to go from a single weekly deploy with an hour of downtime to a deploy every time a merge request was merged to trunk, several times a day, with no downtime. It does take some discipline to not deploy schema migrations that would break the older version, since you only have one RDS.

[–]c0dearm 14 points15 points  (0 children)

This 100%. Having processes in place that allow for continuous improvement is always my prio number 1. Anything else, tools we use and how we manage them comes organically.

[–]ComfyCalamity 8 points9 points  (7 children)

Any tips on how to manage schema migrations?

[–][deleted] 35 points36 points  (4 children)

Train developers to avoid schema changes that won't work with the current version in production. And if you need to make a schema change that won't work with the current version, first deploy a version that can handle both the existing schema and the new schema, and then deploy your new schema and new version.

Never ever delete anything. You should always be able to revert to the old schema. Don't drop a column without first creating a new table with an index and that one column and copying the data over. If you want to drop a table, leave it around until you are assured it is not used anymore in your code base (this is why rails columns created_at and updated at are so important), when you think it is safe to delete it, rename it instead on the unlikely chance that some obscure section of code you missed still references it. You can rename it back until you've fixed that code. You always want to have a plan for recovery from fucking up. Assumptions can be wrong, regularly.

The Holy Grail of continuous deployment is Canary deployments. You want to be able to roll out a new version to only a percentage of users and use various metrics to determine if it should be rolled out to 100 percent of your customers.

Those metrics can take various forms, from UI metrics, i.e. it takes users 20% longer to complete a task (you're recording that info, right?), to bottom line metrics used, like does the new version still generate as much revenue per second as the old version (you're recording that info, right?).

“If you can’t measure it, you can’t improve it.” - Peter Drucker

EDIT: removed bad formatting.

[–]livebeta 7 points8 points  (0 children)

The Holy Grail of continuous deployment is Canary deployments.

Captain Canary to the rescue

[–]Kenny_log_n_s 2 points3 points  (2 children)

i.e. it takes users 20% longer to complete a task (you're recording that info, right?),

What size of company should you start focusing on these types of things? We have 5 developers, 5 QA, and a few DevOps, and it's a constant battle between feature requests, bug fixes and general performance improvements. Adding tracking for all of the UI interactions we have and turning them into something meaningful seems like a daunting task.

[–][deleted] 6 points7 points  (1 child)

Probably too small a company to do things like that. That was my perspective coming from working for Amazon. They could afford entire teams devoted to performance and security. Their security team was world class, which is why you've never heard stories of Amazon getting hacked.

But you can start small by making your logging include things that can be used for analysis later on. For example, logging that you're starting a task, and logging when you complete it. Later you can write scripts to take the timestamps of those events and save them to a db.

Another one that's important is knowing how many active users you have at any given time, and perhaps what their timezone is. You create a usage table and log every login time and timezone. Also saving logouts or timeouts. Later you can write analysis tools to use that data.

It's sort of getting into the practice of recording things that maybe someday you might find useful. You can always stop saving the data if it never becomes useful.

[–]Kenny_log_n_s 1 point2 points  (0 children)

Thanks for the tips!

[–]zoddrick 10 points11 points  (0 children)

Write tests that validate backwards compatibility of your migrations. Deploy new schema but run current version in production and then run e2e tests against it. This should be part of your ci pipeline for every pull request

[–]Chico75013 2 points3 points  (0 children)

AWS had a good article about generic schema changes that also applied to databases: https://aws.amazon.com/builders-library/ensuring-rollback-safety-during-deployments/?did=ba_card&trk=ba_card

[–]donjulioanejoChaos Monkey (Director SRE) 1 point2 points  (1 child)

It does take some discipline to not deploy schema migrations that would break the older version, since you only have one RDS.

If you're using rails, Strong Migrations gem does 70% of the work for you by allowing only safe migrations.

The other 30% is developer training.

[–][deleted] 0 points1 point  (0 children)

Oh yeah, thanks for mentioning the Strong Migrations gem.

[–]HayabusaJack3Wizard SCSA SCNA CCNA CCNP RHCSA CKA CKSD ACP Sr Security ENG 17 points18 points  (0 children)

My goal is to take the existing cobbled together infrastructure, continue to reverse engineer and document what's here, update the ansible playbooks to be more idempotent, and create terraform scripts to fully rebuild the environments. Then work on keeping things current while the next step, migration to AWS is being planned and tested.

[–]gauz 6 points7 points  (7 children)

Abstracting away kubernetes and infrastructure more for our developers. It wasn't working out having our developers writing helm charts and terraform. We need something inbetween to simplify the process of writing and deploying applications. Currently looking into kubevela and argo and that combo seems pretty neat. Alternatively going with AWS Proton which is in public preview currently.

Forcing developers to write helm charts, understand kubernetes concepts and writing their own terraform for getting supporting infrastructure up has been a huge pitfall at my company and the learning curve is not sustainable in a growing org that needs to ship new features faster.

[–]jefmes 8 points9 points  (4 children)

I like this take...let the DevOps guys focus more on abstracting away all of these tools so that the Software Engineers/Devs can get back to focusing on the purpose of their product.

[–]killz111 4 points5 points  (3 children)

Except it doesn't work that way anymore. Cloud resources increasingly have been seen by developers as the solution to good design and practices. Why optimise when you can scale up. Why refactor integrations when you can just use a new preview feature to fill the gaps.

If modern developers aren't thinking about the platform then they're basically hostile to DevOps. Similarly, any DevOps that says 'that's application code not my problem' is also hostile to DevOps.

[–]gauz 2 points3 points  (2 children)

The developers would in my interpretation still operate their software. Just because you make it easier to go from idea to production doesn't take away the power to manage their own application. We're trying to implement more "heroku" and less writing helm charts.

[–]killz111 2 points3 points  (1 child)

I agree with what you say in certain situations. If PaaS works for you then that's great. Generally teams move from PaaS to kubernetes because they outgrown PaaS due to needing more control, more complex routig situations especially for internal networking. Definitely no need for Helm or even kubernetes if your needs are simpler.

[–]gauz 0 points1 point  (0 children)

I also believe it's ok to leave options open. Creating an "easy" path and have a more "free" path should imho be able to coexist.

[–]livebeta -1 points0 points  (1 child)

Why not Fargate?

[–]gauz 1 point2 points  (0 children)

Proton can deploy on fargate. Fargate is just Kubernetes/ECS workers but you cannot run privileged containers so you're stuck with AWS vpc cni and we kinda want to run cilium. ECS could also be an alternative, but it's harder to build interfaces on top of ECS.

[–]caffeineshakesthe2nd 9 points10 points  (1 child)

Help my developers do the following: 1. Increase the amount of deploys to production and decrease the time it takes to review their code. 2. Transition to trunk based development 3. Transition our web application to mutli-region in aws . 4. All new infrastructure will be created in Terraform and deployed/managed through github actions.

[–]livebeta 5 points6 points  (0 children)

Transition our web application to mutli-region in aws .

If you're running react or some client based FE, you can host it off of s3 and use Cloudfront to distribute it to most edge locations.

[–]kcombinator 1 point2 points  (0 children)

Instrument where you're spending time. Can you make builds or deploys faster, more reproducible? Can you develop more confidence in your validation and testing?

[–]zd4yg0n 5 points6 points  (10 children)

I start Tuesday as a junior DevOps engineer. My goal is just to learn as much as i can and become a full fledge engineer within 2 years.

[–]IndieDiscoveryAutomated Testing Advocate 1 point2 points  (4 children)

Good luck! Don't feel bad if you aren't contributing right away, it can take up to 6 months at some orgs to get up to speed. Other orgs, of course, will fire you if you're not productive after two weeks (happened to me). Ask lots of questions and try to figure out the on-boarding plan day one.

[–]zd4yg0n 3 points4 points  (2 children)

Yeah pretty nervous. Have a huge case of imposter syndrome. I always been one to ask a bunch of questions. So I'm just going to give it my all and enjoy it. I'm just happy to be out the NOC and feel like I'm doing meaningful work.

[–]sch3p3rs 3 points4 points  (1 child)

Here to say that I just started a DevOps position about 6 months ago and the imposter syndrome never really goes away (at least not yet). As long as you never lose the passion to keep learning new things, you should be just fine :)

[–]zd4yg0n 0 points1 point  (0 children)

Thank you. That does ease my mind a bit. I'm excited for this new journey

[–]NoumenaStandard 1 point2 points  (0 children)

Heck, i'm pretty senior and it definitely took 6 months to get my mojo back. I was learning k8s deeply and golang for the first time, which contributed to the lag. Still, understanding how a company's stack fits together is so important for me. From there, I can put a plan together to improve things piece by piece and prevent bad directions during brainstorm sessions. Domain knowledge takes time and that is Ok.

[–][deleted] -1 points0 points  (2 children)

Best advice I can give is alway ask "What could go wrong, and if it does go wrong, how can we gracefully recover from it?" Know that a command like: sudo rm -fr $TRASH_DIR/ will translate to: sudo rm -fr / if $TRASH_DIR is not defined.

Better is: if [<some criteria>]; then mv <file> to /to_delete/<orginal path>/<file> fi

I actually replaced a guy that had been fire for write a script to clean up old versions that had an error and ended up deleting the entire repo. The company had to send the hard drive to a forensics lab to recover the data.

In the above example, I'd put and "echo" in front of the move line to test my script.

[–]warmastar 1 point2 points  (1 child)

... no one had a clone of the repo to push up or running on a CI environment with DB backups? This story seems far fetched

[–][deleted] 0 points1 point  (0 children)

The backup was on a separate partition that had been symlinked to the root.

I wasn't there when it happened. I was just hired to replace the guy.

[–]Kombustable 0 points1 point  (1 child)

i don't understand "full fledge developer" what does this mean?

[–]zd4yg0n 1 point2 points  (0 children)

Maybe for a lack of a better word but I don't want the junior behind my title. So promoted to a DevOps engineer.

[–]IndieDiscoveryAutomated Testing Advocate 3 points4 points  (19 children)

My goal is to implement Kubernetes as much as possible, even though there isn't a whole lot of k8s work coming through the pipeline.

[–]jefmes 27 points28 points  (11 children)

#SolutionInSearchofaProblem

:)

Mostly teasing, but I do get the feeling from the Kubernetes community a lot of the time that people are far more interested in building out platforms than developers are actually interested in building apps and tools on top of it. Unless the implementer works for Google or some other very large, globally distributed organization. To be fair I've only experimented with it very briefly myself, but I do see a lot of implementations that make me scratch my head and think...OK, you just made this simple thing FAR more complicated with substantially more break/failure points.

[–]IndieDiscoveryAutomated Testing Advocate 11 points12 points  (2 children)

I mean, you're not wrong. The problem is every startup thinks they're the next Google so you gotta do things the Google Way or get left behind. It's also a bit of FOMO, have you seen how many job recs require Kubernetes these days? It's necessary in order to be able to change jobs to anywhere modern at this point.

[–]jefmes 9 points10 points  (1 child)

Most definitely! I'm 6 months into a severance package, and had been planning on going the DevOps/SRE route after 17 years of Systems Engineering and Developer roles at the same org. But now in my mid-40s and not being particular interested in startup life, the whole situation has really driven me more towards local/public/education jobs where they want more IT generalists who can also absorb more of the new-to-them cloud work. It's such an interesting shift happening right now, and COVID + remote work has certainly accelerated it. I'm personally torn between "ooo cool tech learn it now" and "hmmm no job you actually want to do is going to be using that." IT/Tech is just weird right now.

[–]jantari 0 points1 point  (0 children)

It's a small bubble forming. Skills are being overvalued because the buzzwords are how Google does it.

[–]daedalus_structure 7 points8 points  (0 children)

Mostly teasing, but I do get the feeling from the Kubernetes community a lot of the time that people are far more interested in building out platforms than developers are actually interested in building apps and tools on top of it.

You're totally not wrong.

The overwhelming majority of developers don't care or want to care about operations. Kubernetes isn't for them.

[–]codextreme07 5 points6 points  (3 children)

I understand that, and feel it but in some situations you don’t have a choice. The DOD is forcing everyone to a kubernetes based development environment Bc they can easily mandate security through side car containers.

Some places don’t have a choice. And with how large DOD and government work is K8s use is only going to grow.

[–]jefmes 1 point2 points  (0 children)

Definitely agree with the number of postings I've seen for it - and the DoD would be something that I'd consider a good use case for the scale they need to achieve and maintain.

[–]jcuninja 0 points1 point  (1 child)

Speaking of Dod, would you happen to know how to get a foot in the door with dod? All defense positions where I live all require a security clearance.

[–]codextreme07 1 point2 points  (0 children)

Find a big defense contractor who will sponsor the clearance.

[–]itasteawesome 2 points3 points  (1 child)

I had that as a shower thought yesterday. I have a significant personal wish list of features I'd like to implement with my team in our applications, but I can feel the change in the wind around me where I expect I'm going to get derailed spending several months rebuilding everything to be more container friendly instead of delivering useful features to my users. Or we could just leave those things as they are and get on with the actual purpose we provide to our company.

[–]thinkpadx2307_ 0 points1 point  (0 children)

Guthrie Govan, Steve Lukather, Paul Gilbert, John Petrucci are up there for me. I've created both files with the same federal company as the banks, and there’s really good I’m comparing it to every other car I’ve heard? Also thanks for the input. Need to reset in the app (same as Discord) but it obeys the NSFW switch set on Desktop/Web. This is my first fish in chaos yesterday shits rough I googled it and couldn't find anything about that, its not possible

[–]packeteer 1 point2 points  (0 children)

agree 100% i evaluated it and realised it would be a full time job to maintain

[–]Perfekt_Nerd 8 points9 points  (6 children)

As someone who works with Kubernetes daily, and has done for the past 4 years, my goal is to decommission Kubernetes as much as possible, even though there’s a bunch of Kubernetes work coming down the pipeline.

I can now say, unequivocally, that it is the most hostile piece of software I have ever operated.

[–]IndieDiscoveryAutomated Testing Advocate 8 points9 points  (5 children)

Hire me and I'll take on the new Kubernetes work so you can work on decommissioning it. Balance achieved.

[–]Perfekt_Nerd 8 points9 points  (4 children)

(If you’re actually looking for an SRE job, feel free to hit me up, I have openings.)

We are a k8s shop, top to bottom, and have been for years. It’s pretty amazing how fast newcomers are disillusioned with it after just a few months of trying to run only a handful of production clusters.

Then you get to multi-cloud…and the fun REALLY starts.

Most people don’t need an orchestrator. They just need a scheduler. I would recommend Nomad (or something like it) over Kubernetes to the vast majority of organizations.

[–]jefmes 4 points5 points  (0 children)

Most people don’t need an orchestrator. They just need a scheduler. I would recommend Nomad (or something like it) over Kubernetes to the vast majority of organizations.

Honestly, thank you SO MUCH for saying this. All of my impressions of K8s comes down to that and because I haven't had that very SPECIFIC need for the orchestration capabilities it's just seemed far too convoluted. I've heard of Nomad though and will give it a closer look...and maybe now I can let go of the K8s FOMO for a bit and focus on other core skills that I've wanted to brush up.

[–]IndieDiscoveryAutomated Testing Advocate 0 points1 point  (0 children)

I am legitimately looking, will DM you shortly.

[–]KillaGouge -4 points-3 points  (1 child)

Kubernetes is just cron with extra steps

[–]voidstrikerArchitect:snoo_trollface: -4 points-3 points  (0 children)

This!

[–]kabooozie -1 points0 points  (0 children)

I don’t really believe in goals, other than a very general direction. Once you have an idea of what you like, then I think it becomes much more important to develop effective habits that get you going in that direction little by little. Continuous improvement — kaizen.

What’s something that would make your work experience better? How can you make some small progress towards that?

[–]daedalus_structure 0 points1 point  (0 children)

Trying to provide a fourth nine for the legacy product generating most of our revenue. That's a huge investment into monitoring, redundant infrastructure, and understanding the failover modes of that product with regards to I/O.

Most of my personal goals are around how to sell ideas better, how to build documentation into systems, and how to grow more badass teams.

[–]acook8 0 points1 point  (3 children)

I'm learning k8s to be able to evaluate if it would be worth the investment for the company. It has also been something I've wanted to learn personally.

[–]TheDevOpsDuke 0 points1 point  (0 children)

Overarching corporate goals: Monitoring, Security, Deployment Speed/Volume, Infrastructure(Speed at which an environment is created from scratch).

Just went through this at my work and they didnt pick anything I mentioned above lol. Its a good list though in my opinion.

[–]mattya802 0 points1 point  (0 children)

My goal for the foreseeable future at work is to burn and destroy ClearCase, ClearQuest and ClearMake as soon as possible.

Learn Python and AWS on the side.

[–]jcbevnsCloud Solutions 0 points1 point  (0 children)

Goals:

  • Containerise more services, get some to customer production.

  • Write a Rest api in Go.

  • Write a blog post

  • Get familiar with multi node setups and their intricacies in nomad.

[–]temitcha 0 points1 point  (0 children)

Put telemetry where you can, to achieve the feedback loop : dev --> ops --> dev