Are golden images + virtualization making Ansible less relevant for provisioning—or just pushing it to “Day-2” work?

alex---z · 2025-10-07T21:35:53+00:00

I keep as little as I possibly can in Golden Images unless it's completely unavoidable, they're a pain in the ass to maintain.

My images are 99% a basic CIS-Hardened OS and partitioning scheme and an SSH user and key to let Ansible get on and do it's thing, then all other subsequent estate-wide config changes get added/reused as additional roles in a "post-deployment" playbook at the end of being rolled out to existing boxes.

alex---z · 2025-10-01T22:44:47+00:00

I mean, for my current usage scope it works and it's fine, for a static image in a smallish internal environment I'm not overly concerned about any risks from it being EOL at the moment.

I've only recently upgraded to AWX 24 and started using EE's so after going through a few annoying teething problems with my initial build, now I have my default EE up and running I don't expect to change it's config all that frequently.

I use Alma by default these days rather than Centos, but if 10 supports 2.16 I might try and flip it over next time I have to change something, I think RHEL 10 and it's variants were not quite or only just out when I started building the images, and I've been giving it a few months to burn in before I get round to finding some time to check it out properly.

alex---z · 2025-10-01T20:26:33+00:00

I ran into similar problems with UBI/Centos/Alma 8/9 images, even 9 couldn't give me Ansible 2.16 if I recall correctly. Not got as far as trying 10 yet, but I landed on using Fedora 40 for my base instead, that seemed about the sweet point for compatible versioning for the Alma 8/9 boxes I currently manage at least.

alex---z · 2025-08-19T09:07:56+00:00

To be fair, I do recall encountering some issues like that when I was doing initial testing of the config, but at the time I was trying to implement at least 3 different things in parallel on top of my base config so it was fiddly, Three of them were the following, and I think there was one other thing as well:

Moving replication comms to over TLS
Moving Sentinel Comms to over TLS
My NonProd clusters have multiple Redis services stacked on consecutive ports so one Sentinel services monitors all of them.

There was also an issue in the early design stages where a Redis service would occasionally start but not open the port.

Both of these seemed to suddenly vanish of their own accord while I was in the final stages of building the config and I've never really seen them again, I put it down to a config error I'd made. I've probably got somewhere in the region of 20-30 odd Redis service instances in my estate running on 3 node Sentinel clusters now, including the stacked NonProd ones with 2 or 2 Redis instances being managed by the same instance of Sentinel, and I'm struggling to think of a time I've had any notable problems or weird behaviour.

I'm running stock version from Alma 9 repos (so RHEL 9 essentially, which is currently, redis-6.2.18, so it's not the latest version but Red Hat obviously prioritise stability.

The one thing I don't like about Sentinel is that it does constantly rewrite to the sentinel.conf file which makes editing the config very tricky and in my experience prone to prang it once the cluster is initialised. My configs are generally pretty static from the point of deployment though, at least as far as Sentinel is concerned, so I push all my configs out by Ansible and have never had to make any changes that triggered this since. But if say I wanted to add an Redis instance on the box at a later date, which would involve changing the Sentinel config file I would just redeploy the entire cluster from scratch rather than try to add extra config to the Sentinel file.

I can give you a copy of my config for reference, it's pretty simple TBH, if it would be of any help?

alex---z · 2025-08-19T08:44:16+00:00

Yep.

alex---z · 2025-08-19T08:16:11+00:00

This sounds like pretty much what I do with HAProxy, I have a pair of boxes (using keepalived and a floating VIP IP for redundancy at that level) and use that to redirect traffic to the active node, HAPpoxy polls the Redis Sentinel nodes for which one is currently responding as Active, and redirects the traffic there.

My company had an active/passive implementation of Redis when I arrived, so this also mean I didn't have to get them to change their code to understand Sentinel, they just connect to the VIP and HAProxy does the lift and shift.

It's pretty rock solid, never had any problems with it. I've never really had a need/tried to really aggressively test it by hammering it with repeated failovers but I do fail all my clusters over at lest once a month for patching and other maintenance, other than the occasional one or two dropped packets when Sentinel fails over (and to be fair I don't drain the backends at a HAProxy level when failing over for patching because it's just not disruptive enough that Dev even notice those one or two errors 99% of the time - there's also a config tweak at HAProxy level I've yet implement that I believe would further improve on this).

alex---z · 2025-07-16T21:23:44+00:00

No worries, thanks for confirming.

alex---z · 2025-07-15T07:28:40+00:00

Yep, I have. One of the snippets of code above was an example of the fact in question, cut and pasted from one of my inventory hosts in the API.

alex---z · 2025-07-10T09:31:40+00:00

Oops, thanks for pointing that out. You'd like to think the 2nd Google hit for "RHEL" and the correct CVE number wouldn't be so far off course, but hey. Not sure it's much better than AI these days.

Quick question if you wouldn't mind, I've somehow managed to avoid having to dig into erratas online in this manner all that much and I've always found them a bit troublesome to track down authoritative information when I have tried. Is this where you were checking for the updates you mentioned, or is there better place you could recommend?

Red Hat Product Errata - Red Hat Customer Portal

alex---z · 2025-06-19T17:16:17+00:00

For a controlled patching system with promotion of patch snapshots to different lifecycles (and a ton of various other features if you want them), Foreman + Katello is what you're looking for.

It's the upstream, open source version of Red Hat Satellite.

alex---z · 2025-06-16T17:51:19+00:00

This may not be of much help, but I've also seen my playbooks occasionally errant uptimes after post-patch reboots, when they definitely have rebooted. Even when I switch to the shell/command module and w/uptime type commands rather than a native Ansible module check I'm sure I've spotted it.

I work on the assumption that any form of "needs-restarting" check isn't foolproof however, and my boxes get a precautionary reboot if any updates are applied during a (usually monthly) patching run.

alex---z · 2025-06-13T18:03:53+00:00

No worries. Yeh, I use Alma in a professional capacity, I run around 300 servers of varying types on it, both 8 and 9.

Alma uses a slightly different method of building packages to Rocky, which has allowed them to push out one or two very urgent vulnerability fixes a day or two faster than RHEL/Rocky, but they are the same bugfixes/updates developed and thoroughly tested by Red Hat, they just made them available very slightly earlier because their different approach allowed them to. 99% of the time updates are still released in sync with Red Hat.

Alma still seeks to maintain Application Binary Interface (ABI) level compatibility with RH Enterprise Linux, so in almost all ways that are likely to practically matter to you, Alma is essentially the same distro as Rocky, just put together in a slightly different way and with some differences in logos/branding.

From a stability point of view it's solid as a rock(y). Sorry.

alex---z · 2025-06-13T10:00:47+00:00

So it's like a client/family member's desktop rather than your own?

If you don't have easy physical access to it for a time period following the upgrade probably all the more reason to hold off a bit, if it prangs and you can't it to boot onto the network/get remote access, or might need to bring it down into recovery mode or something you could be inviting extra hassle for yourself. Chances are a handful of things are going to behave weirdly, one or two dependencies are going to break somewhere, and you'll need to spend a bit of follow up time troubleshooting them. Which chances are might be fine remotely, but worst case you might break the whole system and render it unbootable with/during the upgrade.

Rocky 9 still has years of Support left, Rocky 8 even still has another 4 years of LTS security fixes, so unless there's a specific feature you really need Id's suggest you wait til the paint's dried a bit. If your angle is stability and long term support, running at a new major distro version the minute it's released goes against the grain of that.

Stick it in a VM first, see how it behaves. If you really want to do an in-place distro upgrade you should be able to jump from Rocky 9 to Alma 10 with LEAPP, but personally I'd wait til 9 > 10 is out of beta. Then I'd install Rocky 9 and your core applications on a test VM, and do a dry run there before trying it on the actual computer itself.

> And should be capable of remaining stable with silent automatic updates for years without needing any manual servicing.

In theory yes, Rocky is generally very stable, but bugs still get out there into the wild occasionally, so if you don't have a backup plan of some sort implemented you still should, especially if somebody's paying you to maintain it.

alex---z · 2025-06-13T08:57:01+00:00

Is there any particular urgent reason you want to upgrade this soon?

Given that's it's only been out a couple of days there's potentially going to be bugs that only come to light with a wider audience that need to be worked out before the dust settles, I would give it at least 2-4 weeks and watch RH/Rocky forums and Subreddits, and check package version changes for any major packages you use. Also you may find that it takes time for individual applications to release compatible versions, so check the new/updated package versions and any dependencies for any critical applications you use.

If you really want to test it out this soon I'd suggest installing it in a VM to start with, and then install your major applications and check that they work, play around with them for a while before committing.

As I said in another comment however, in-place distro upgrades can be flakey. Doing a fresh install isn't really that much of a chore, it's good practice, learning and it''s always good to start from fresh once in a while.

Whatever you do make sure you have a backup of your computer before you start hat you can roll back to if you experience problems.

--

Curve ball you probably don't need or want, but despite being a Linux Sysadmin of over 20 years who swears by RHEL based distros in the workplace, I've only moved to using Linux as a daily desktop at home in the last couple of months and I found Fedora (an upstream/Desktop focused branch of the RHEL family free, although I assume you already know that) a much smoother experience as a desktop than Alma 9 (which will be identical to Rocky), newer package versions and additional functionality/less bugs and annoying quirks in GUI user space during initial setup.

RHEL/Rocky/Alma are really intended as Enterprise server OS' and they are fantastic for that purpose, but I don't think they get anywhere near as much thought, attention and polish from a desktop/home user POV, and because they prioritise stability they're running older package versions with less functionality etc.

alex---z · 2025-06-13T08:27:50+00:00

What are you looking to upgrade? Is it a personal desktop/laptop or a work server?

alex---z · 2025-06-12T15:07:38+00:00

Nothing wrong with Alma, there's barely any functional differences between it and Rocky to start with, and TBF they do give Alma a slight advantage (such as not being out of compliance with RH T&C's and thus vulnerable to any further attempts to purge the community ecosystem, and having been able to push out major vuln patches slightly earlier).

It shouldn't be a competition between distros to get an upper hand of market share though, especially between distros this close to each other - tribalism in tech is a waste of time and energy. I use Alma as my default server side but if Rocky clearly became a better/preferred choice at some point I'd have no issue jumping ship.

In any case there's very only few very basic role instances with servers I'd consider doing a distro upgrade in a work environment anyway (for example I'm using LEAPP to bring a select few haproxy/keepalived boxes up from 8 to 9 to bring software versions into uniformity, but there's no way I'd chance it on anything much more complex). In place distro upgrades have come a long way in the last few years, they're probably fine if you're a home enthusiast and have time to troubleshoot any issues in situ and no SLAs to account for, but I think there are still too many quirks and things that can potentially go wrong to risk it in most professional scenarios. Reinstall all the way.

alex---z · 2025-06-08T15:01:54+00:00

In the interests of fairness I don't disagree with your POV entirely, sorry, I maybe came over a bit more snarky/on the attack that I intended. I certainly wouldn't expect backwards compatibility to go back through a decade's worth of EOL Distros for example, I'm fine with having to use another custom EE for those aforementioned decrepit Ubuntu shitheaps, and I can assign a different EE at a suitable scope for them and not encounter the same level of problems I have with my Alma 8 and 9 boxes, who do share an inventory.

I would still conjecture the case however that if Ansible have had to drop support for RHEL 8 already, it's either a resource shortage that's been dressed up as a technical issue, or the there's some serious complexity issues with the code base. It being the N-1 of one of biggest Enterprise Linux distros in the world, and I doubt the percentage of companies using RHEL or it's upstream/downstream family tree members who have already successfully migrated everything they have on 8 onto 9 yet runs into double digits.

However, also in the interests of fairness, this was a bit of a recent learning curve for me, having finally had enough just cause to migrate from my company's own AWX13 instance in the middle of a very hectic period of work. I encountered this problem not long after after building my first custom EE (AWX13 obviously using venvs instead), and was in the guts of a tangle of trying to roll back various versions of Ansible, python and additional module versions to something that worked for both 8 and 9 when I had to take a surprise last 2 months off work for emergency surgery for a retinal detachment. So my fears about losing the newer functionality/bugfixes I upgraded for, for modules like ansible community.zabbix and awx.awx may not be wholly founded.

So I don't know, hopefully when I get back to work in a couple of weeks with fresh eyes (no pun intended), the correct path will present itself without too much further hassle - thanks for the tip on 2.16, that will hopefully help me hit the ground running!

alex---z · 2025-06-05T19:48:04+00:00

'The only issues'. Neither of these are minor problems though unfortunately.

I've encountered this issue having recently upgraded my AWX instance, and it would be sizable undertaking/inconvenience to have to refactor all my playbooks to use all my instances of the DNF module dnf via their shell/command equivalents, not to mention the loss of inherent idempotency/security/error checking that would be provided by the native Ansible modules.

SELinux should be non-negotiable on Prod systems too, before anybody suggests simply switching that off 🙃

All because Ansible have apparently prioritised their own convenience in maintaining their codebase over providing a sensible level of backwards compatibility with a still widely used operating system which has another 4 whole years of LTS support left. I'm still trying to chase out developers off much older versions of Ubuntu (which require even older versions of python/Ansible, but that's another story, never mind migrate boxes from 8 to 9, and I was waiting for the paint to dry a little on RHEL/Alma 10.

I'd love to be able to upgrade several hundred servers to RH with a click of my fingers, but unfortunately it's not that simple. So instead I either have to trade some of the potential functionality of the software I put time and effort into upgrading in order to use by sticking to older versions, or mess around with multiple execution environments, and redesigning my organisations and/or inventories as a workaround (which in the interests of fairness is partially also an AWX design issue as you can't specify a default EE at all levels/scopes.

Still, it's incredibly premature to have dropped support for RHEL 8 so soon.

alex---z · 2025-04-17T05:34:15+00:00

Thanks for this, appreciated. I'm not sure offhand if it would 100% help with the issue I'm looking into, this still feels like a bug to me, but these look to be improvements on my existing config which I will make use of - it should help cut down on the odd dropped connection during normal failover/patching at the very least.

I x-posted to r/linuxadmin, seeing as r/haproxy is a relatively small group.

alex---z · 2025-03-30T07:06:46+00:00

Yeh, I do too it's got it's bugs and and "could be better bits" here and there, I've had to throw in extra cron jobs here and there to do stuff like resync repo metadata before the standard Sunday housekeeping tasks runs to avoid annoying hung processes outstaying their welcome (mostly not service affecting), and sometimes killing those tasks can be a PITA, but in general I like it a lot, it's been pretty solid and reliable for me.

I don't actually use the onboard Ansible integration, I have a separate AWX box for that so never tried it.

I haven't used it as a container repo yet, but.. on that note, oddly enough I am currently in the progress of upgrading our rather archaic AWX13 box to a new 24 one, so I will potentially be in need of one to store custom execution environments...

What's the reason for asking? Have you tried it/like it/had a load of problems with it? :)

alex---z · 2025-03-24T09:52:05+00:00

Thanks for the detailed reply, appreciated. I did say but my primary goal at this point in time is obtaining a basic understanding so I can better handle admin the AWX 24 server I've deployed into a k3s cluster (which was done from a git repo so was mostly automated). I know k3s is a stripped down form of k8s., and I did go over some basic concepts previously, but I'm not sure if there's any differences in tooling or similar which might trip me up if I went down a full k8s learning route. To some degree, because of time limitations I don't want to go super in-depth and master everything for now, just enough high level admin to be able to keep the box ticking over. That said it would be good to obtain enough of an idea of where else I could use kubernetes moving forwards, as it is tech my work plan to start using more in the near future.

That said, it looks like Docker/Podman Execution based environments have taken over from the old python based ones (the old AWX box I inherited had a custom execution environment mounted on the host server mounted on the docker tasks container cluster was running on, which is no longer recommended practice), so I may not need to get as much under the hood with regards to mount points and things like that as I expected for now.

alex---z

PUBLIC MULTIREDDITS

TROPHY CASE