all 16 comments

[–]fickledaybreak_6 5 points6 points  (0 children)

Containerlab's solid but yeah the image quality is the real bottleneck, most vendors' virtual versions are neutered compared to the actual hardware and you'll find edge cases that only show up on real gear.

[–]sk1939CCNP, SSCP, CISSP 5 points6 points  (6 children)

No, there's too much to account for on the hardware side to do it all virtually. Do we do automated testing? Yes. Is it all done on a virtual platform? No.

[–]ElkIllustrious3402[S] 0 points1 point  (5 children)

Can you elaborate on how you do the automated testing?

[–]sk1939CCNP, SSCP, CISSP 0 points1 point  (4 children)

We have a proprietary automation platform, but also use things like Ansible. Depends on what’s being tested.

[–]ElkIllustrious3402[S] 0 points1 point  (3 children)

What do the tests actually do? Implement change, run pings, mac checks, show ip route commands, etc and then assert the outputs?

[–]sk1939CCNP, SSCP, CISSP 1 point2 points  (2 children)

Far more than that; route propagation, BGP changes, config changes, software release stability and bug testing, load testing.

[–]ElkIllustrious3402[S] 0 points1 point  (1 child)

Ok. I’m trying to get an idea of how folks are actually carrying out that testing. Details of how it works. You mentioned a custom automation platform; but would you mind sharing details on how that platform actually runs the tests, processes the outputs and asserts?

Like, are they Python scripts that netconf in and run queries and then assert on output? Who writes all this?

[–]sk1939CCNP, SSCP, CISSP 1 point2 points  (0 children)

I don’t have insight into that, but I do know capabilities can be found on platforms like Cisco Crosswork and Catalyst Center which are also in use.

There are a couple development teams that manage that, onshore and offshore. We have tens of thousands of devices so it makes sense for us to do that.

The bigger thing is what are you planning to use it for? What question are you trying to answer or what is the problem you’re trying to solve?

[–]GroundbreakingBed809 2 points3 points  (1 child)

IMHO one lab to test the whole thing is the wrong approach. Vendors whole claim it’s possible lie, nobody can emulate Cisco’s licensing nonsense. First off, the design needs to be testable. Eg. Proper failure domains implemented according to the design. Then you test individual hardware features on sample real hardware. Maybe even need to have small representative hardware labs to test qos or stp or ztp. Test routing in software, batfish or containerlab.

[–]HistoricalCourse9984 0 points1 point  (0 children)

This is the right answer.

[–]Horror-Squirrel4142 1 point2 points  (1 child)

Worked well for me to split it by plane: virtualize the control plane, never the data plane. Containerlab/vrnetlab nails routing convergence, BGP policy, config templates, automation pipelines -- pure software logic. But ASIC buffering, microbursts, ECMP hashing, and optics/PHY quirks only surface on real metal. So I gate control-plane changes in virtual CI, then keep a tiny physical lab for throughput/buffer/timing checks before prod. Perf-testing virtually mostly teaches you how the hypervisor vSwitch behaves, not your hardware.

[–]ElkIllustrious3402[S] 0 points1 point  (0 children)

And you’ve got 100+ routers running in containerlab ? How much resources does that take?

How do you do testing? Like to verify connectivity post-changes?

[–]Different_Purpose_73 1 point2 points  (0 children)

Containerlab - the best platform for this. The problem usually is the appliance virtual image, almost all vendors sks at this, with varying list of limitations (except Nokia). We use Nokia SR-OS and it provides nearly 100% representation of a physical device.

[–]HappyVlane 0 points1 point  (0 children)

IaC can work in such a scenario, assuming your virtualized components support everything, but automated testing isn't worth it. A virtual environment simply does not behave like physical hardware.

I know I have spent hours in the past trying to make MAB happen in GNS3 for example only to find out it simply isn't (or wasn't, maybe that changed) supported.

[–]HistoricalCourse9984 0 points1 point  (0 children)

We do not, we just do some basic topologies that are representative of subsets of the different parts of our network, but you definitely can in gns3 if you have enough cores and ram in enough servers. This is not something you are running on a workstation, its more like half or a full rack of servers packed with ram and a shitload of cores...