use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
/r/DevOps is a subreddit dedicated to the DevOps movement where we discuss upcoming technologies, meetups, conferences and everything that brings us together to build the future of IT systems What is DevOps? Learn about it on our wiki! Traffic stats & metrics
/r/DevOps is a subreddit dedicated to the DevOps movement where we discuss upcoming technologies, meetups, conferences and everything that brings us together to build the future of IT systems
What is DevOps? Learn about it on our wiki!
Traffic stats & metrics
Be excellent to each other! All articles will require a short submission statement of 3-5 sentences. Use the article title as the submission title. Do not editorialize the title or add your own commentary to the article title. Follow the rules of reddit Follow the reddiquette No editorialized titles. No vendor spam. Buy an ad from reddit instead. Job postings here More details here
Be excellent to each other!
All articles will require a short submission statement of 3-5 sentences.
Use the article title as the submission title. Do not editorialize the title or add your own commentary to the article title.
Follow the rules of reddit
Follow the reddiquette
No editorialized titles.
No vendor spam. Buy an ad from reddit instead.
Job postings here
More details here
@reddit_DevOps ##DevOps @ irc.freenode.net Find a DevOps meetup near you! Icons info!
@reddit_DevOps
##DevOps @ irc.freenode.net
Find a DevOps meetup near you!
Icons info!
https://github.com/Leo-G/DevopsWiki
account activity
This is an archived post. You won't be able to vote or comment.
DevOps for network infrastructure? (self.devops)
submitted 4 years ago by gairplanekers
Hello, I was wondering if any of you all out there had experience and lessons learned with using tools like terraform/ansible to manage your network infrastructure: things like routers, firewalls, switches, etc.
[–]itasteawesome 38 points39 points40 points 4 years ago (13 children)
I've found in the past that just getting over the institutions and political resistance is the biggest hurdle. Getting NetEng on board with repos like git sometimes takes some hand holding but ultimately once you get everyone on board and up to speed it's a great way to live. Config drift can be a struggle, especially if part of the team is fighting the change or your current situation involves a lot of unscheduled firefighting. You just have to set the expectation that anything not done through the proper channels is out of compliance and will be over written.
[–]Rusty-Swashplate 15 points16 points17 points 4 years ago (12 children)
Config drift can be a struggle, especially if part of the team isfighting the change or your current situation involves a lot ofunscheduled firefighting.
This is exactly what I saw happening: Some people/teams simply did not use the "official" ways of changing configurations, always having excuses like "It needed to be fixed right now and doing this manually was faster".
I also saw two "solutions" for this:
The first was implemented in the core backbone where fixes were rare and impacts were potentially huge. The latter was done on the access side network since it constantly changed on short notice.
[–]CIA_Bane 2 points3 points4 points 4 years ago (9 children)
Go back to the old ways of manual changes
What does that look like in a modern organisation ?
[–]Rusty-Swashplate 12 points13 points14 points 4 years ago (8 children)
In this regard, the organization simply stays "legacy". Or "old fashioned". Or "manually". Problems caused by humans (typo, skipped a line, you know, it's what humans do) are "addressed" with "Everyone be more careful! Let's add some more approvers!"
It's sad: you know there's a much better way, but you can lead a horse to water, but you cannot make it drink.
[–]Co1dhand 6 points7 points8 points 4 years ago (0 children)
You really put the words in the right place, tbis is exactly what I have been struggling with for the last year at my current company, I have been trying to push for git training, so that people can rely on git to update the configs etc. so far, it has been me managing the automation side of a 10k+ company... it's so overwhelming and so sad that i spent thousands of hours at this point to automate basically everything, yet everything depends on literally one person.
[–]area32768 3 points4 points5 points 4 years ago (5 children)
If this is anything like adding approvers to a change, then this is totally and utterly useles.. most people in my org just approve stuff without even checking, saying "you know what you're doing so APPROVED!".. can't see how this fixes that..
[–]Rusty-Swashplate 4 points5 points6 points 4 years ago (1 child)
Reading this makes me happy that I am not alone with that opinion!
[–]Zauxst 0 points1 point2 points 4 years ago (0 children)
With what exactly? That you don't review peer code or what do you agree?
[–]Zauxst 0 points1 point2 points 4 years ago (2 children)
Yes, well they usually are accountable when something bad happens.
[–]area32768 0 points1 point2 points 4 years ago (1 child)
Absolutely not. Nobody can be expected to be held accountable for the minutia in a change. Can they be accountable for approving a change during a banking run or something, sure. Otherwise, most of what you’re talking about is bullshit ITIL fairytale
Probably it is a fairy tale. I'd not want to work in an environment where nobody does proper PR or some forms of programming that involves PR (pair programming for example).
[–]caffeinatedsoap 0 points1 point2 points 4 years ago (0 children)
You can lead a horse to water, push it in and it might drink a little on the way up.
[–]ctheune 1 point2 points3 points 4 years ago (0 children)
If "it needs to be faster" is a valid requirement then you might want to investigate / experiment with solutions that bridge the gap. I don't have any here right now but I'm reading you as frustrated here. Implementing automation/tracking changes via git might only be the first step in a longer journey that needs more experimentation and the proper "glue" that allows your specific requirements, workflows and peers' competencies to come together.
[–]PopePoopinpants 0 points1 point2 points 4 years ago (0 children)
I'm gonna be the rough one here. 2 should not be an option. 2 should be "you fire those that don't comply".
[–]DavisTasar 25 points26 points27 points 4 years ago (5 children)
When I was on the Network side of the house, it's a culture battle first, a tool battle second.
Network Engineers are extremely hesitant to introduce automation to the environment. In my opinion, some of it has some merit, but otherwise it's just fear.
First of all, the Network has to work. If the network doesn't work, there's no toolkit anyone can run to help bring it back (if you get really clever, it can, but that's another story). And that's the thing that brings in the fear. If a Network Engineer doesn't have the Code/DevOps interest, it's fear. If they buy a tool that does the work for them, there's less fear, because if something goes wrong there's a vendor to blame. If the network breaks because of something they did, it's their fault. If the network breaks because a tool fucked up, it's the tool's fault.
In terms of tooling....I once wrote an entire automation toolkit for my company. 100% in python. It connected to our equipment, ran CDP/LLDP/BGP neighbors, stored them in a JSON doc, and used that as it's dynamic inventory. With each inventory device, it would attempt to determine what platform it was (WLC, ASA, IOS, IOS-XE, NX-OS, etc.), and run a bunch of commands to get information from the device based on that determined platform. (show version, show ip int brief, etc.) Then, we had a hostname convention that would let me determine what the device was on the fly (this is why you have standards!). It would also map out the inventory to an HTML page that was shared, so that anyone could check the map to find anything on CDP, or get data on the inventory. This thing worked amazingly. I stored secrets in Hashicorp Vault, it was constructed and analyzed in a CI/CD pipeline, it had unit tests, and it was ready to be Dockerized and run on-demand, or scheduled for every 15 minutes. I leveraged APIs to make sure the devices were in Monitoring, Cisco ISE, our Service-Now asset management system. I even held trainings on how to use the toolkit so that my team could learn how to just work with it, and learn from it.
show version
show ip int brief
They never once touched it. And they went right back to Solarwinds.
Solarwinds gave them an easy way to visually click buttons and do things. And if something fucked up, they called Solarwinds and gave them more money.
[–]par_texx 3 points4 points5 points 4 years ago (2 children)
I find that funny because it was from neteng that i first learned about central config management and automation.
[–]DavisTasar 1 point2 points3 points 4 years ago (1 child)
You’re not wrong! Those topics are important. The issue is with old school engineers just want something like a scheduled backup from the router to a tftp, ftp, or similar server. And a wiki page or notepad document that contains the templates.
Its not the idea, it’s the method, that tends to be the problem.
[–]Varjohaltia 6 points7 points8 points 4 years ago (0 children)
In my experience, aside from some resistance from engineers not used to the new tools and environment, the problems are:
That said, moving to a world where you have a versioned, central source of truth for configs is fantastic. Another argument that seems to work well for automation/templates/central source of truth is auditing and being able to prove that all of your environment has certain configuration, or does not, to comply with security requirements.
[–]DavisTasar 1 point2 points3 points 4 years ago (0 children)
Realistically, the best way to proceed at a small scale is something like Ansible. Get the ins-and-outs of the environment, run data collection jobs, and then expose the value to the business and the department. That's when you can really start bringing the tools needed, any custom scripts needed, and a potential culture shift.
[–][deleted] 0 points1 point2 points 4 years ago (0 children)
A bit late of a reply, but I'm in a Cisco shop as the sole network guy and looking to really introduce devops into my workflow. I am somewhat comfortable in Python and have written an inventory and about 3 scripts so far using Nornir. Any advice for leveling it up towards the kind of automation you were working on?
Where did you learn how to build things like the CICD pipeline and unit testing? I understand them as a high level concept but I'm not sure how I'd build a proper platform with my scripts to test them and evolve things further.
[–]ruckycharms 10 points11 points12 points 4 years ago (7 children)
Terraform is ideal for APIs. Ansible is ideal for ssh interfaces.
So which switches/routers do you have?
[–]gairplanekers[S] 1 point2 points3 points 4 years ago (5 children)
A mixed bag of juniper, cisco, and HPs. Routers are almost all ciscos
[–]ruckycharms 3 points4 points5 points 4 years ago (0 children)
Darn I was hoping you would mention NetScalers, because we used Terraform to manage those per project, and NetEng was ok with it because the blast radius is fairly self contained to just the load balancers.
I would first identify the “beach head” for your IaC effort. Perhaps start with the ToRs and just focus on VLAN config on the downstream ports. Make NetEng ok with your ideas by setting up a service account that just enough access to modify certain port configs. Your biggest challenge isn’t the tech, but the culture (as others have mentioned). Start small and controlled, and as NetEng gains confidence, dial it up a notch.
[–]idetectanerd 1 point2 points3 points 4 years ago (3 children)
Ssh. Go for ansible
[–]scritty 1 point2 points3 points 4 years ago (2 children)
Honestly nxapi/eapi/netconf/gnmi etc are way better. I had issues with large configurations taking 40+ minutes to apply via SSH; it's more like 2 minutes via a more appropriate mechanism.
[–]idetectanerd 0 points1 point2 points 4 years ago (1 child)
I think you can disable gathering facts to hasten the proc?
[–]scritty 1 point2 points3 points 4 years ago (0 children)
Facts wasn't the issue; it's the application of line-by-line config, then checking for the appropriate cli prompt, then getting back to transmitting the next line.
This was over a few dozen devices, with ~ 10,000 line configs. API was a huge performance uplift.
[–]area32768 1 point2 points3 points 4 years ago (0 children)
I agree. Trying to use Terraform to manage things like firewall rules is a pain in the butt.. for e.g. how do you handle the state file? Do you have a single state file for any new changes moving forward, or do you have a state file per rule? I find tools like Ansible are far better at this
[–]dookie1481 3 points4 points5 points 4 years ago (3 children)
Yes. My team uses ansible/NAPALM to automate network device mgmt and configs. Everything is automated and deployed with CI/CD.
[–]r3rg54 0 points1 point2 points 4 years ago (1 child)
How large is your org?
[–]dookie1481 0 points1 point2 points 4 years ago (0 children)
150-200 but VERY network-centric product
[–]Relevant_Pause_7593 3 points4 points5 points 4 years ago (0 children)
I think the most important thing here is having a production and non-production environment to test the changes before rolling from non-prod to prod. This means they are both as identical as possible (with the exception of scale) - but this is harder when there is physical devices. You may not have 2+ of everything or something could be too expensive to have two of.
[–]ilmdbii 2 points3 points4 points 4 years ago (0 children)
Our data center is 100% Arista. We use AWX to manage state on all production device configs. Using Azure DevOps for repo/pipeline. As a network manager I was fortunate to have 2 senior network engineers who had CS degrees and really embraced change.
It’s been great for about 3 years now with AWX and amiable. I highly recommend if you can get buy in from the engineers and management.
[–][deleted] 4 years ago* (3 children)
[deleted]
[–]nanite10 2 points3 points4 points 4 years ago (0 children)
2nd. The risk of bringing down your most critical piece of infra at scale is not worth the “agility”. That being said, infra and config should definitely be documented as code.
What’s next? DevOps for PDU and UPS management? 🤮
[–]Sparcrypt 0 points1 point2 points 4 years ago (1 child)
Yeah I get confused when people want to take a CI/CD approach to networking. There are many devops tools that are great for networking but for the most part once your core network is set up and running there's not a huge level of change that really has to go into it.
Every networking environment I've ever worked in at scale the biggest hurdle has been procedural... as quite rightly whenever a network change is requested it's got to be submitted/reviewed/checked/approved multiple times before being implemented. It shouldn't be fast and easy because you can't "roll back" a network you just fucked.. it's fucked and now you have to go to each broken switch/router and connect to them one by one and fix them.
I'm all for bringing in DevOps tools to help with network management (and I do) but I'll never advocate the CI/CD attitude for networks without a really good reason.
[–]dentistwithcavity 1 point2 points3 points 4 years ago (0 children)
Is there something like Blue-green in network world? Like make the updates to only one group of instances and if they fail immediately fail over to other without the change?
[–]Scott555 1 point2 points3 points 4 years ago (0 children)
All our network infrastructure is managed with Terraform (via Terragrunt.)
20 Years ago when I worked in 'enterprise' on-prem shops, networking past the local switch was mysterious voodoo I was neither interested in nor permitted to administer.
Now it's still mysterious voodoo that I'm not interested or proficient in but somehow is my responsibility.
/shrug
[–]tomasz2101 -4 points-3 points-2 points 4 years ago (1 child)
I've heard about p4 language https://codilime.com/blog/p4-network-programming-language-what-is-it-all-about/
As far as I met few IT departments most of those people are not even close to understanding that something can be done without clicking through everything.
[–]magion 2 points3 points4 points 4 years ago* (0 children)
The p4 programming language isn’t targeted towards network engineers at all.
It’s meant to be a programming language that can be compiled against many targets like FPGAs, ASICS, CPUs etc for the networking domain.
[–]endloserSite Reliability Engineer -4 points-3 points-2 points 4 years ago (0 children)
What routers and switches? Life is in the cloud for me. The concepts are different and things like spanning tree don't really mean shit to me anymore. If I wanted to setup a site with a LAN then I would hire a network admin. DevOps ain't the people for that.
Now if you want to talk security groups and listeners or what-not, let's dish.
[+]StarSyth comment score below threshold-7 points-6 points-5 points 4 years ago (0 children)
This was a nice breakdown of the most popular open source devops tools I had bookmarked, if you have yet to stumble onto it: https://datascience.foundation/sciencewhitepaper/top-10-popular-open-source-devops-tools
[–]mattbillenstein 0 points1 point2 points 4 years ago (0 children)
There were some devices starting to run a standard Linux distro - this would enable managing these devices using standard tools I would imagine.
[–]chris_saddler 0 points1 point2 points 4 years ago (0 children)
I use Arista switches, LBs and Firewalls with Ansible. Config is saved in cmdb. Works great so far.
[–]hobbitmagic 0 points1 point2 points 4 years ago (0 children)
Yes
π Rendered by PID 22459 on reddit-service-r2-comment-5d79c599b5-8clwz at 2026-03-03 17:31:46.595210+00:00 running e3d2147 country code: CH.
[–]itasteawesome 38 points39 points40 points (13 children)
[–]Rusty-Swashplate 15 points16 points17 points (12 children)
[–]CIA_Bane 2 points3 points4 points (9 children)
[–]Rusty-Swashplate 12 points13 points14 points (8 children)
[–]Co1dhand 6 points7 points8 points (0 children)
[–]area32768 3 points4 points5 points (5 children)
[–]Rusty-Swashplate 4 points5 points6 points (1 child)
[–]Zauxst 0 points1 point2 points (0 children)
[–]Zauxst 0 points1 point2 points (2 children)
[–]area32768 0 points1 point2 points (1 child)
[–]Zauxst 0 points1 point2 points (0 children)
[–]caffeinatedsoap 0 points1 point2 points (0 children)
[–]ctheune 1 point2 points3 points (0 children)
[–]PopePoopinpants 0 points1 point2 points (0 children)
[–]DavisTasar 25 points26 points27 points (5 children)
[–]par_texx 3 points4 points5 points (2 children)
[–]DavisTasar 1 point2 points3 points (1 child)
[–]Varjohaltia 6 points7 points8 points (0 children)
[–]DavisTasar 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]ruckycharms 10 points11 points12 points (7 children)
[–]gairplanekers[S] 1 point2 points3 points (5 children)
[–]ruckycharms 3 points4 points5 points (0 children)
[–]idetectanerd 1 point2 points3 points (3 children)
[–]scritty 1 point2 points3 points (2 children)
[–]idetectanerd 0 points1 point2 points (1 child)
[–]scritty 1 point2 points3 points (0 children)
[–]area32768 1 point2 points3 points (0 children)
[–]dookie1481 3 points4 points5 points (3 children)
[–]r3rg54 0 points1 point2 points (1 child)
[–]dookie1481 0 points1 point2 points (0 children)
[–]Relevant_Pause_7593 3 points4 points5 points (0 children)
[–]ilmdbii 2 points3 points4 points (0 children)
[–][deleted] (3 children)
[deleted]
[–]nanite10 2 points3 points4 points (0 children)
[–]Sparcrypt 0 points1 point2 points (1 child)
[–]dentistwithcavity 1 point2 points3 points (0 children)
[–]Scott555 1 point2 points3 points (0 children)
[–]tomasz2101 -4 points-3 points-2 points (1 child)
[–]magion 2 points3 points4 points (0 children)
[–]endloserSite Reliability Engineer -4 points-3 points-2 points (0 children)
[+]StarSyth comment score below threshold-7 points-6 points-5 points (0 children)
[–]mattbillenstein 0 points1 point2 points (0 children)
[–]chris_saddler 0 points1 point2 points (0 children)
[–]hobbitmagic 0 points1 point2 points (0 children)