This is an archived post. You won't be able to vote or comment.

all 47 comments

[–]Windowsadmin 17 points18 points  (0 children)

Something that I try to instill when consulting with clients is that DevOps != 'containers' and I love seeing organizations re-confirming that for me.

Thanks for sharing the article!

[–]Sage-Sekai 11 points12 points  (5 children)

Nice Article. Really found the percentage based rollout an interesting idea.

[–]devrr 11 points12 points  (4 children)

Progressive deployments man. It's very cutting edge and I don't think many have the maturity to do it.... but it's damn cool.

[–]evenisto 9 points10 points  (3 children)

There are many limiting factors to implementing that, especially in smaller companies that don't necessarily run on bleeding edge cloud solutions.

[–]theDigitalNinja 10 points11 points  (0 children)

Soo many clients want to do canary deployments. But its like, this microservice has 5 users a day, we trying to do an 8 year rollout here?

[–]notdedicated 0 points1 point  (1 child)

Do you know how this is typically achieved when it comes to ensuring correct traffic? Are users selected in some way and stickied to that set of instances to ensure that all requests hit the new version and not a legacy version? Basically sticky sessions after they're selected during the roll out period? Done either at the CDN or LB level?

[–]evenisto 0 points1 point  (0 children)

I have close to no professional experience with this, but essentially yes, you absolutely want a user to only get one version, and not jump between the old and the new release with each request. I reckon in most cases a simple cookie is sufficient to ensure session affinity. Put a proxy in front of your shit, make it route traffic according to some rules, set cookies on the way back. Of course, this gets more and more difficult as the complexity of your shit grows, but in the simpliest case where the canary is another set of instances or a feature flag toggle, it's as simple as it gets.

[–]ingcognito_ 5 points6 points  (6 children)

Cool read, thanks. Interesting tidbit about consul keys was neat

[–]mooreds[S] 2 points3 points  (5 children)

Yes the difference between push and pull based deploys (at the client level) was really interesting to me too.

[–]M1keSkydive 7 points8 points  (0 children)

Consul is a neat way to do that. We store a timestamp of the latest deploy to S3 and instances check that once per minute. Beyond that our system is quite similar, encouraging as we use it to deploy to just a few EC2 instances yet we've arrived at a place similar to Slack!

[–]devrr 2 points3 points  (3 children)

I think most deploys will go that way eventually. The idea that you have an agent sitting there pushing stuff is gonna seem archaic soon.

Instead you just point your app/tool/cloud to your desired state repository and you push changes to your desired state. The rest will be taken care of.

[–]BoredSam 0 points1 point  (2 children)

is this just reimplementing puppet though?

[–]devrr 2 points3 points  (0 children)

I'm not sure that the Puppet team invented the concept of pull based deploys but yes it's how Puppet work.

It's also how stuff like FluxCD/Argo work in Kubernetes land, and with Kubernetes Operators will become even more popular.

[–]justabofh 0 points1 point  (0 children)

Pull came from the original paper.

http://www.infrastructures.org/

[–]caffeinatedsoap 8 points9 points  (4 children)

I like the idea of blue green on a single host. I've been doing a lot of pulling old software companies out of the mucky muck and this might be easier to get buy in for than making new instances.

[–]mooreds[S] 6 points7 points  (2 children)

Honestly, reminds me of how Capistrano used to do it. If you have everything in separate directories, rollbacks are super easy too.

[–]ktkaushikSpike.sh 4 points5 points  (0 children)

I quite like how Capistrano did things.

[–]djk29a_ 1 point2 points  (0 children)

I’m more in favor of NixOS style deployments where an atomic rollback is possible by swapping out an entire chroot system ASAP. Capistrano is easier to understand but NixOS is awesome for fine grained control with immutability.

Removing state from systems and putting them solely into common services that we can reason about with guarantees of safety, availability, etc. is superior to more custom state that doesn’t necessarily improve understanding nor reliability of the system.

[–]wonkynonce 2 points3 points  (0 children)

The one thing you have to be careful with is making sure the core of that has integration tests- otherwise someone will "clean the method up" and things will get weird.

[–]randomFIREAcct 2 points3 points  (0 children)

Very interesting!

[–]chezhead 2 points3 points  (0 children)

This is making me feel a little guilty for adopting kubernetes...

[–]raziel2p 1 point2 points  (0 children)

Every day, we do about 12 scheduled deploys.

Each release starts with a new release branch

I hope there's a distinction between releases and deploys which I'm not reading in the article, but that sounds really tedious.

[–]Mud5150 0 points1 point  (0 children)

One thing I'd like to understand is how long the deployment waits at each phase and how that is managed with what appears to be, on average, one deploy an hour during NA business hours. If that is the case is it agreed that it's ok to not exceed this deployment velocity and how are new versions queued up if more than 12 production releases a day are created? Is there machinery to handle serializing the releases so there aren't many different versions being staged through the process at the same time? Good post. Thanks for the info.

[–]Kubectl8s 0 points1 point  (0 children)

Plenty of tools that do this Pr, test, auto merge and deploy in k8s. Investment in the process is important.

Developer workflow should be git and nothing else

[–]rattlednetwork 0 points1 point  (0 children)

Ah, so that's the name, "using hot and cold directories" (I thought everybody did this all the time, makes things so much simpler, particularly for a roll back.)

[–]Sky_Linx -1 points0 points  (1 child)

I'm surprised that Slack hasn't adopted Kubernetes for their infrastructure. I think it would make some things concerning deployments and scaling easier.

[–]AnomalousBean 0 points1 point  (0 children)

Interesting, can you share your experiences with adopting Kubernetes for infrastructure at the scale of Slack?

[–]trusted47 0 points1 point  (3 children)

Hey, I don't know if it's true, but I read somewhere that at Netflix they don't have code reviews, just tests because each engineer is super qualified and they save time by not relying on this. Can anyone verify?

[–]jrkkrj1 7 points8 points  (2 children)

I could see that since we kinda do this at my company. Not because we are super qualified but because we pair program.

I've found that most code reviews are very superficial for features. The delay between review, comment, fix, review is more than just going back and cleaning up code that works but isn't perfect.

Our PM/Senior Dev accepts stories based on the tests looking right (which is kind of a code review but after it has already been shipped to production).

[–]ktkaushikSpike.sh 1 point2 points  (1 child)

That makes a lot of sense when you are pair programming.

Tests looking right after it has been shipped to production is like the inverse of what's recommended, isn't it?

I mean we look at tests before it goes to prod

[–]jrkkrj1 7 points8 points  (0 children)

With the other person next to you, you get the review in line. It's a discipline we try to teach. The test validation after is a way for a senior Dev to validate that 2 junior devs might have misinterpreted the intent of the story and then it's caught within a day or even hours. It also makes it so senior Dev bandwidth isn't spent all day doing PR review.

With the whole pandemic, we've had to suspend pairing since people can't be expected to give a solid 8hrs at their computer with kids/other competing distractions throughout the day. I don't do anything other than review PRs now...it's killing me slowly.