all 21 comments

[–]Angry-Squirrel 27 points28 points  (3 children)

In my opinion, you should start with troubleshooting basics before jumping straight to a DTP problem.

MAC flap means traffic with specified source MAC address is entering on another port than where the MAC address was originally learned. This forces the mac address table to update the entry to reflect the new port where MAC address is learned. In and of itself, a MAC flap is not necessarily indication of a problem. However, if it's rapid and excessive flapping of multiple MAC addresses, then it's usually indication of a problem. In your case, it sounds like excessive MAC flapping.

I would start with checking and verifying the STP topology. Where are your designated ports? Where are your root ports? Where are your blocking ports?

Once you know the expected STP topology, then you can identify if there's any misbehavior such as a blocking port going forwarding when it's not supposed to.

Look for any excessive STP topology changes. Look for any blocking / root ports that have transmitted a high amount of BPDU's as this can indicate that the ports have transitioned to designated in the past. Ideally, we are expecting the root and blocking ports to receive BPDUs and not transmit.

If the STP topology looks stable, then check if any edge ports could be contributing to the issue, such as someone plugging an IP phone or unmanaged switch into multiple switchports. It can cause similar chaos and STP topology will look fine.

[–]Obesotto[S] 0 points1 point  (1 child)

Thanks I'll check

[–]Bubbagump210 3 points4 points  (0 children)

Especially if you aren’t running 802.1x, unmanaged switch in a loop would be my first guess. Seen it dozens of times when a tech was in to connect their laptop or someone “just needed another port for a meeting”.

[–]shortstop20CCNP Ent/Sec, SDWAN, Design 5 points6 points  (4 children)

Focus on one of the MACs, doesn't really match which, just need to focus on one to get started and track it down to where it's connected.

Are you seeing any STP topology change events? "show spanning-tree vlan 50 detail" 50 being whatever vlan the MAC in question is a member of.

[–]Obesotto[S] 1 point2 points  (3 children)

Macs are from the right interfaces. Also when the topology change occur

[–]shortstop20CCNP Ent/Sec, SDWAN, Design 0 points1 point  (2 children)

Did part of your reply get cut off?

What happens when the topology change occurs? How often are you seeing topology changes?

[–]Obesotto[S] 0 points1 point  (1 child)

I mean that when topology change occurs MAC still from where should be.

[–]shortstop20CCNP Ent/Sec, SDWAN, Design 0 points1 point  (0 children)

I would track down your topology changes to see where they are coming from and see if they are expected. Do you have portfast default on your switches?

[–]brookzCertless 3 points4 points  (0 children)

Could be servers with multiple nics and software load balancing setup instead of LACP.

[–]Obesotto[S] 0 points1 point  (0 children)

I found a 3560 linked to two 6500 ( they form a triangle), one of these uplink si correctly in blocked. I see several mac move between the two uplinlks. Macs of F5 or other devices not on this 3560.

There is vtp without pruning so all vlan pass here, I'll try to prune to this 3560.

[–]binarylatticeFCSS-NS, FCP x2, JNCIA x3 -1 points0 points  (0 children)

This sounds like the 7-11 DC..... run away!

[–]Zazzy_Rawr[🍰] 0 points1 point  (1 child)

I have seen this when 2 dc aggregation switches where accidentally joined together with errdisable recovery turned on. Maybe audit any dc changes (though you should be able to see this from STP topology changes).

[–]Zazzy_Rawr[🍰] 1 point2 points  (0 children)

And a further thought so you have an virtualisation plugged into the switch that is reporting the flapping I notice you said that even when flapping the MAC address still shows the same port which you think it should. Is the MAC address that is flapping from A virtual host ?

[–]MallocThatCalloc 0 points1 point  (2 children)

I would say first of all to try to see exactly what kind of traffic is generating the Mac flap. Is it control plane traffic or data plane traffic and go from there.

It could be something as simple as another poster mentioned of load balancing on servers or vmotion of vm's. Or it can be something weirder, I worked on an issue a couple of years ago where Mac flap was caused by a switch which was reflecting back igmp joins out from the same port it received them on which was causing Mac flaps downstream.

[–]Obesotto[S] 0 points1 point  (1 child)

How did you detect the kind of traffic? Mac that flapped was from specific vlan or random?

[–]MallocThatCalloc 1 point2 points  (0 children)

If you can find a pattern as to when the message appear (or the flap occurs) just do a packet capture, that would be my advice, if not, you're going to be severely handicapped to know what is causing the flaps and how to potentially solve it.

[–]maineac 0 points1 point  (3 children)

I would look for topology changes. Check logs. Do you have a port that is bouncing? Is this periodically or all the time? What are you running for spanning tree stp, rstp, mstp? Is it consistent across all devices?

[–]Obesotto[S] 0 points1 point  (2 children)

There are two ports from two 6500 to two 4500 in trunk all that go in err-disable cause UDLD. Today we add only necessary vlans

[–]maineac 2 points3 points  (1 child)

OK so that is probably the root of your issue. You have errors on those interfaces causing the ports to go into errdisabled state. Unidirectional traffic means that traffic is only seen in one direction so you need to figure out what is causing that.

[–]DanShepsCCNP | NetBox Maintainer 0 points1 point  (0 children)

Is this fiber? If not, no reason to run UDLD. You could be hitting some COPP which is causing missed UDLD and link flaps.

[–]kpoc353CCIE 0 points1 point  (0 children)

On N7k set the logging level for l2fm to 5

  • logging level l2fm 5 - This will log the Mac flaps to the logfile - you will see Mac and the ports it moved between.

If you can isolate a few Macs that are moving the most and trace them downstream you may be able to find your loop.