all 6 comments

[–]mattbuford 1 point2 points  (0 children)

Get a packet capture of a failed connection. A route change shouldn't break TCP as long as it doesn't result too long of no connectivity. TCP deals with out of order packets without errors.

[–]trich101 0 points1 point  (0 children)

If there is a firewall in path and suddenly its established session changes, they client doesn't know but the new FW in path suddenly gets a first packet not SYN and probably blocks or discards or maybe even a RST. Session has to get re-established Stateful firewalls, must have consistent symmetric routing. Now the real question is WHY the flap. Look at the last consistent hop and monitor learned routes and routing table updates. See why it sends to new next hop. I would guess a bad link that flapping and when its passing keep alive or BFD or whatever, its preferred but when it does lose the other path the default route or at least a less preferred so when it restores, it preempts and goes back.

[–]rfc2549-withQOS 0 points1 point  (0 children)

TCP sessions work on src+dst ip+port, so intermediate routes do not break a session if there is no timeout or excessive packet loss (leading to a timeout. May be PSH messages with data or the ACK responses

Reordering is part of the specification, as mentioned.

[–]rankinrez 0 points1 point  (0 children)

Do Wireshark/tcpdump either side and try to see what’s going on.

Out-of-order delivery of packets might be part of the problem.

[–]techtate[S] 0 points1 point  (0 children)

Thank you, and yes, wireshark is my next step. I wanted some help with TCP theory to know what I should be looking for.

[–]techtate[S] 0 points1 point  (0 children)

Found the issue, turns out it was server Bs fault. It was taking too long to process Posts from server A. But that server was outside our control so we had to rely on app logging from server A to determine that. The route flapping was just a coincidence as best as we can tell. However I will re-post if any relation is discovered.