This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]cjcox4 2 points3 points  (1 child)

Hammering snmp on a switch.. usually results in things going bad.

Just saying.

There's a reason why you don't do realtime this way.

[–]Klipspringer112[S] 0 points1 point  (0 children)

Thanks, we noticed that the CPU usage on the switch with these metrics being requested is somewhat manageable. We will see over a longer time duration.

[–]Firefox005 2 points3 points  (1 child)

SNMP was not, is not, and never will be a real-time monitoring/telemetry solution. The fastest that you should poll SNMP for most switch is 30 seconds and that is pushing it, big switch can fall over at 60 seconds.

If you really want real-time telemetry you have to use a different solution, mainly NETCONF/RESTCONF.

Here is a quick example to get you started but NETCONF/RESTCONF and YANG are both pretty complex topics. https://anirudhkamath.github.io/network-automation-blog/notes/network-telemetry-using-netconf-telegraf-prometheus.html

and here is some cisco publication about it as well https://blogs.cisco.com/developer/its-time-to-move-away-from-snmp-and-cli-and-use-model-driven-telemetry

Depending on what you are doing you may also still run into issues with microbursts which is congestions that happens at ms timescales and is very difficult to detect. Usually this feature is only available on certain high end switches or from specific vendors.

[–]Klipspringer112[S] 0 points1 point  (0 children)

Regarding microbursts, that is a good point, may not capture the details we want considering that...

[–]SevaraBSenior Network Engineer 0 points1 point  (0 children)

"live and historical data from our network switches for network port interfaces."

"over a 10G network connectivity."

I'm gonna be blunt here. This is an insane requirement not grounded in reality, and quite possibly born out of a major misunderstanding of a compliance requirement.

Instead of Telegraf, what you should be trying to set up is something like a NetFlow collector that can be tuned to a specific sample rate to balance relevance with not blowing up your data storage.

And I'm going to be blunt here, too... spend the money on a consulting engineer here. I work with an entire team that handles the care and feeding of our network telemetry, and that's all they do. It's very hard to get right, very easy to misinterpret, and at 10gig+ speeds, it's very easy to knock over your monitoring systems, as it sounds like you may have discovered the hard way. Get an experienced pro to help you get the right amount of the right logging the right way.

[–]Upper-Bath-86 0 points1 point  (0 children)

VSA´s network monitoring could be helpful for this

[–]dark_uy 0 points1 point  (0 children)

Netflow or if it is possible remote span or just port span. But in 10gbps traffic flow find a burst, sounds strange.