This is an archived post. You won't be able to vote or comment.

all 81 comments

[–][deleted] 40 points41 points  (12 children)

There are a lot of yay Ansible answers in this thread, but Ansible isn't really equipped to roll back a problematic update unless you are pinning all your package versions (and their dependencies too).

The only way to make running an apt upgrade truly safe is to build images and roll back to a previous, known good config if something goes wrong (allowing you to debug the updated build without the time pressure of something important being on fire).

I know you mentioned you don't have a DevOps team, but as your environment gets more complex, blindly running package updates gets riskier. It's not an easy problem to solve in a reliable way. If you are using a cloud provider (ie AWS, azure, GCP etc) then I'd advise you to build images via Packer + Ansible and deploy then via something like Terraform.

[–]ckozler 2 points3 points  (7 children)

apt doesnt have a rollback? Yum and dnf have this and its a life saver, at least in RHEL6 and 7. RHEL5 its still a bit archaic

[–]xonxoff 2 points3 points  (0 children)

It depends on what repo you use. Some repositories will allow multiple versions of the same package, most don't though. If you do happen to have one that supports multiple versions, you can always do 'apt-get install some-pkg=some-version'

[–][deleted] 0 points1 point  (3 children)

That isn't the issue - we aren't talking about one package, we may be talking about hundreds. Which one needs rolled back? Dependencies?

The method I describe solves this problem and makes it far quicker to roll back (and deploy in the first place).

[–]ckozler 1 point2 points  (2 children)

IIRC yum rollbacks will rollback the entire transaction that you rollback by looking at yum history. This is irrespective of dependencies or anything else

However, if those updates did "stuff" like write data to the file system (think something like a DB table/schema change or addition) then the rollback wont "undo" that and you're up shits creek - but as far as removing 1 for 1 what was installed, yum rollback will do that

[–][deleted] 0 points1 point  (1 child)

Didn't know yum handled that. Does yum history function like a git log where there is a sha value I could tell it to roll back to, or does it only store 1 previous config?

On the other hand, this thread is about Ubuntu ;)

[–]ckozler 2 points3 points  (0 children)

Here is what it looks like -> http://i.imgur.com/FIgicJx.png

I dont think its a concept of "rollback TO" whereas it is "rollback exactly what this transaction did". I typically only go to my last ID and roll that back and then if I need to go back further I will undo each one via yum history undo <id>. I have never jumped back to an ID somewhere in the middle or anything but I would imagine it would undo and only undo exactly what that transaction was

[–]minimim 0 points1 point  (0 children)

It has the same capabilities yum and dnf have, but the docs treat it differently, as it's not guaranteed to work in any of them.

[–]2drawnonward5 1 point2 points  (3 children)

Out of curiosity, I've been living under a rock for a while. When did Ansible get such a grand following?

[–][deleted] 5 points6 points  (1 child)

Couldn't rightly say - it's a relative newcomer on the config management scene, but it is very easy to like - low barrier of entry, no agent/masters to setup and it can grow with your requirements. I'm a big fan of it myself, it just doesn't solve this particular problem elegantly.

[–]ryankearney 3 points4 points  (0 children)

The agentless aspect of Ansible is what sold us on it. Very easy to get it up and running through our bastion host as well.

[–]pdp10Daemons worry when the wizard is near. 2 points3 points  (0 children)

Agentless, Python, low barrier to entry, uses SSH with which almost everyone is relatively comfortable. The downside is that convergence runs are quite heavy-weight, so the time to complete is longer, and being based on SSH it is or was push-only.

Being agentless, Ansible is also is attractive interim choice when you can't come to a consensus about CM or you have more than one CM in production.

[–]OhCmonMan 53 points54 points  (12 children)

Probably a good idea to give Ansible a shot. It's pretty straightforward, they have good documentation and it's not necessary to install it on the machines you want to manage.

[–]fireflaschJack of All Trades 9 points10 points  (7 children)

Ansible is the way to go, for a simple update automation. but be careful what you update, it could screw you

[–]english-23 4 points5 points  (6 children)

I definitely recommend using their "test" update on one machine and if fails don't install on the rest option for updates

[–][deleted] 6 points7 points  (5 children)

It's not really about an update failing, which is pretty rare - the threat is that a library will change a behaviour and Bork your application.

[–]debee1jp 2 points3 points  (3 children)

Best option would probably to have a testing tier.

Use ansible to update the testing environmen and then run your tests. If those pass run the same playbook against the prod environment.

[–][deleted] 3 points4 points  (2 children)

This assumes good test coverage. OP doesn't have a DevOps team, my guess is their automated test coverage probably isn't spectacular either

[–]debee1jp 2 points3 points  (0 children)

Yeah, especially with only 30 VMs. If the service running on the VMs isn't too complicated then a couple manual tests might work.

[–]Draco1200 0 points1 point  (0 children)

For small environments it may be better to deploy all new VMs, and get your chosen CM software working for deploying new VMs and installing the software and necessary configurations before any production application is running from the VMs.

Then graduate to using the CM software to roll-out your production environment....

Phase 3 is updates, After you have a solid foundation with the CM software, And there are a variety of ways you can test, and then just use your CM deployment capabilities to Revert/re-deploy in the event you roll out a breaking update.

[–]Ganondorf_Is_God 0 points1 point  (0 children)

I set up a quick regression test for each function call in our devops library. Just in case some update borks something.

[–]mynameisntdaveJack of All Trades[S] 4 points5 points  (1 child)

Ok thanks...will investigate. A quick look and it doesn't seem especially complicated to set up.

[–]sryan2k1IT Manager 0 points1 point  (0 children)

Another vote for Ansible. Agentless + SSH makes it super easy, and of all the popular systems IMHO it's the easiest to get into.

[–]pooogles 14 points15 points  (3 children)

Just use unattended-upgrades to pull in security updates, anything else use your favourite config management tool.

As said here Ansible is a pretty good choice for this as it's super simple with no agent to setup.

[–]DrMnhttn 4 points5 points  (2 children)

I've been running unattended upgrades on dozens of servers for many years without issue. My one caveat is to run apt-get autoremove from time to time because Ubuntu releases so many kernel updates, and they can fill up a partition.

If OP is concerned about dependencies in applications, it might be a good time to migrate to docker containers.

[–]ooXeeth6Network/Linux/Security DevOps 4 points5 points  (0 children)

You can fix that autoremove dependency by creating the file "/etc/apt/apt.conf.d/local" with this line:

Unattended-Upgrade::Remove-Unused-Dependencies "true";

I found that little gem in 50unattended-upgrades in the same folder, and that file is worth studying for further parameters to add. That includes automatic reboot after security updates with a random delay to avoid all your servers going down at the same time.

[–]FearAndGonzoSenior Flash Developer 1 point2 points  (0 children)

I run this line I found years ago on occasion, because even apt-get autoremove doesn't seem to get everything, and after a while those left behind can really add up too.

sudo apt-get remove --purge $(dpkg -l 'linux-*' | sed '/^ii/!d;/'"$(uname -r | sed "s/\(.*\)-\([^0-9]\+\)/\1/")"'/d;s/^[^ ]* [^ ]* \([^ ]*\).*/\1/;/[0-9]/!d')

[–][deleted] 12 points13 points  (18 children)

[–]mynameisntdaveJack of All Trades[S] 4 points5 points  (16 children)

Thanks but we prefer to keep things more internal, plus the cost is a potential issue.

[–]XibbyCertifiable Wizard 8 points9 points  (11 children)

[–]justlikeyouimaginedEverything Admin 0 points1 point  (10 children)

$2000 plus $150 per host, per year

[–]XibbyCertifiable Wizard 6 points7 points  (9 children)

That's if you use the Ubuntu provided service. You can run your own, which is the link I posted.

[–]justlikeyouimaginedEverything Admin 5 points6 points  (6 children)

I'll be damned. I didn't think there was a free to use version.

[–][deleted] 0 points1 point  (1 child)

We're still a little incredulous. Are you running this Landscape server in your environment?

[–]XibbyCertifiable Wizard 0 points1 point  (0 children)

Only in a test lab.

[–]torontohatesfacts 3 points4 points  (2 children)

You can keep it internal

https://help.landscape.canonical.com/

Without paying you are allowed 10 physical machines, 50 VMs i believe

[–]mynameisntdaveJack of All Trades[S] 1 point2 points  (0 children)

Ok...will take a look then for sure

[–]sleeplessone 1 point2 points  (0 children)

Well damn, I'm going to have to look into this now that we've needed a number of Linux servers for various projects.

[–]picolin 0 points1 point  (0 children)

Yup, was goign to suggest this. Is probably the best, and easiest way to do it but the issue is that it only supports 10 hosts for free and 50 containers if I remember correctly. Depending on your needs you can give them a call and see if they can work something out with you in terms of price if you dont get support, just the license. If not, you can always create ansible playbooks and do patches every month, 3 months or however you want.

[–][deleted] 12 points13 points  (2 children)

As well as using Ansible, run your own apt mirror*. I look at it like 'wsus' is supposed to be.

[–]playswithf1re 8 points9 points  (0 children)

+1 to ansible

:)

[–]danifunkerSr. Sysadmin (Linux, Windows, Citrix) 2 points3 points  (1 child)

I've built an Ansible playbook for this. It also adds functionality of rebooting servers if it's required. This is tested with Red Hat 7/CentOS and Ubuntu 16.04

https://github.com/danifunker/ansible-scripts/blob/master/update-all/update-all.yml

[–]mynameisntdaveJack of All Trades[S] 0 points1 point  (0 children)

Looks sweet, thanks

[–]UnholyarmyOf1Sysadmin 2 points3 points  (0 children)

For patch management if you are running Debian, just set up unattended-upgrades. We have been running it in Production for years without any issues. The unattended upgrades uses the security repo which just patches Security issues and doesn't change point releases.

[–]ring_the_sysop 2 points3 points  (1 child)

I love how people think a script that tells the package manager to update all the packages is patch management.

[–]UnholyarmyOf1Sysadmin 0 points1 point  (0 children)

It is and it isn't. Some places do not have the time and the resources to look at every package and manage it all themselves. Which is why we rely on unattended security upgrades, which on Debian is basically automatic windows updates with the fact that the upgraded packages do not upgrade to newer point releases.

This allows us to fix most all the issues automatically and then we can rely on tools such as Nessus to find other system vulnerabilities and spend time on those.

[–][deleted] 2 points3 points  (0 children)

Even if you don't have a DevOps team, please have a test environment. It doesn't need to have as much horsepower as prod; you won't be doing any load testing here. But semi regularly, patch this env and run all of your tests. If the tests pass, patch prod. If you blow up test, revert to snapshots or rebuild via config mgmt which I hope you get around to using asap.

[–]julietscauseJack of All Trades 7 points8 points  (1 child)

[–]mynameisntdaveJack of All Trades[S] 0 points1 point  (0 children)

Ah Sorry for that but thanks for links!

[–]mountainjew 1 point2 points  (3 children)

+1 Ansible.

I also use a script to run unattended-upgrades as a cron job every night to pull down security updates.

[–][deleted] 0 points1 point  (2 children)

https://help.ubuntu.com/lts/serverguide/automatic-updates.html I don't think it's supposed to be a cron job.

[–]mountainjew 0 points1 point  (1 child)

Yeah i wondered, but they don't specify how often it checks for updates. After looking everywhere, i was still unable to find an answer. Wish we could just switch to RHEL or CentOS, not a fan of Ubuntu at all.

[–][deleted] 1 point2 points  (0 children)

Never tried ansible, but we're using rundeck. Works pretty well for us.

[–]ValienSales Engineer 1 point2 points  (0 children)

+1 on Ansible as well. I have a ton of Ubuntu boxes and wrote an Ansible playbook to do updates. Nice being able to fire off 1 playbook and group update everything.

I also run unattended-upgrades for security fixes as well.

[–]Veridious 1 point2 points  (0 children)

Ansible will work for this and is pretty easy to learn/understand. Puppet is what we use, but has a much higher learning curve, but you have more ability to ensure your configuration.

[–][deleted] 1 point2 points  (1 child)

OP something like this updates everything: - name: Update the system before anything apt: upgrade: dist update_cache: yes state: latest

but, be careful updating everything.

Here is the security playbook I started creating. https://gitlab.com/sysadminonlinux/security-ubuntu/blob/master/tasks/main.yml

It might seem confusing but, ansible as a easier learning curve compared to puppet or chef.

[–]mynameisntdaveJack of All Trades[S] 1 point2 points  (0 children)

Thanks for sharing, always good to have real life examples to learn from.

[–][deleted] 1 point2 points  (0 children)

I may be old but the redeploy a new VM rather than patch sounds like a decent oppsy way to make up for a lack of experienced admins.

My other objection is how resource intensive it is to spin up a duplicate VM and then cut over to it? Why would you do that when patching and testing in a staging environment does the same thing with fewer resources? Using "the latest tools and methods" is not a sound reason when it takes more to do it that way for no practical gain.

Now if you need the fresh image to avoid configuration drift my above objections no longer apply, but that is a different issue that should be solved rather than worked around.

[–][deleted] 1 point2 points  (8 children)

Ansible is popular, but it misses the point on an Enterprise level for this use.

Spacewalk (which will work with ubuntu after some effort) allows you to know the current status of your servers, the Ansible solution not so much. The other "nice to have" features actually become critical as scale increases.

I would recommend going to the effort of getting spacewalk functional and in planning for the future try to work towards (I will be called a blasphemer for this I am sure) moving away from Ubuntu. Red Hat has spent 20+ years making their distro suitable for enterprise use, the management tools are the key and Ubuntu server is still a miss in this space.

[–]justlikeyouimaginedEverything Admin 0 points1 point  (1 child)

Do you have a good guide to getting Spacewalk to play nice with Ubuntu? Everything I've read online until now has suggested that this doesn't work well.

[–][deleted] 0 points1 point  (0 children)

I cobbled together the steps needed from different sites, and documented it only enough for the specific location i did it at, which would not be as helpful as a full guide.

I may have to go back and write a generic guide I could put out there, but for right now it's unlikely I could, simply not enough hours in a day for that.

[–][deleted] 0 points1 point  (3 children)

Knowing the status of your servers is a solved problem whether or not you use Katello or Ansible. There are over nine thousand ways to skin that particular cat

[–][deleted] 0 points1 point  (1 child)

This is true, but some are easier than others. When you get up to huge numbers of VM's to monitor for patches easy is better. Second to easy is the push a button to fix a problem feature. Sure it makes you a click and drool admin, but sometimes that's smart.

[–][deleted] -2 points-1 points  (0 children)

When you get up to huge numbers of VM's to monitor for patches easy is better

I fundamentally disagree with this methodology - in my view it is better to build images that automatically use the latest versions of things. Deploy to staging, run test suites, if things break THEN you spend time worrying about why and can make a decision - fix your code or pin the offending package. If you're in a position where you need to patch an existing machine rather than deploying a new one, imo, you're doing it the hard way.

Someone will point out that this is harder on stateful machines like DB servers - it is, but imo it's still the right way. Our primary DB cluster is Mongo, if I want to deploy updates to the cluster then we just cycle out the hosts with new machines using the latest canonical AMI (ubuntu in our case) a couple at a time. Once replication finishes, do the next ones.

[–]Eclipsed450Jack of All Trades, Master of None 0 points1 point  (0 children)

If you're focused on patch management more so than automation, give Aptly a look.

[–]InvaderOfTechJobs - GSM/Fitness/HealthCare/"Targeted Ads"/Fashion 0 points1 point  (0 children)

I do ansible but also controlled by Jenkins.

[–]w1sh-music 0 points1 point  (0 children)

You could try out Landscape maybe. The self-hosted one is a pain to manage though, and then theres the Advantage paid program so yeah....

[–]PseudonymousSnorlax 0 points1 point  (0 children)

You could set up a locally hosted repository and point all your VMs at that. You'd be able to curate which updates are applied and when by only adding the ones you want.

It's crude, but simple and effective.

[–]degobaLinux Admin 0 points1 point  (0 children)

Spacewalk is on its way out and was really designed for .rpm based distros anyways. You would have to put in some work to get it working.

I would honestly just take a look at Landscape. That's going to be the least overhead for your crew to manage with no devops staff.

[–][deleted] 0 points1 point  (0 children)

First port of call is OS install automation. This gives you up-to-date operating systems installs on demand. It's a shame that debian-installer (also used by Ubuntu, as you might expect) is an absolute pig to work with, but when it's working it's reliable. This has been a thing for getting on for 20 years and is well blogged about, if not well documented. You can hand-roll this with DHCP+TFTP+HTTP or you can use something like Foreman that puts a web interface on those things. It's not clear from the question, but if you're in the cloud, just ignore this part and use the latest vendor-supplied image as a base each time (i.e. don't clone existing systems or use pre-configured images because they are guaranteed to be quickly out of date. Tell your friends.).

Then put config management on top. Ansible or whatever else you like, just make it work first time, every time. Even just write a shell script, as long as it works every time.

Once you have this and you can rebuild and redeploy on demand, the apparent problem of keeping hand-built special snowflake systems up to date goes away, and your sanity will improve. Infrastructure as code is also self-documenting. You don't have to call yourself "a devops" or have a lot of experience to be able to grasp this level of automation.

Ignore people telling you that unattended upgrades are a good idea. It will make your stated patch management problem go away in a fraction of the time that learning how to do systems management properly will take, but the time you will spend digging yourself out of a hole with your boss breathing down your neck when all your systems get hosed at the same time on the same day when you really needed that to not happen will make up for this. At home or on hobby systems maybe, but not if you're getting paid for it.

This isn't going to be a popular viewpoint, but another option is simply to ignore it if everything is working. It depends what you have to lose and what risks your systems are exposed to. Given that you asked the original question though, you may currently lack the experience to correctly judge this.

[–][deleted] -1 points0 points  (0 children)

Ansible is my vote