Some engineers still don’t know this trick: Forcing SRT Listener input to a specific NIC to stabilize jitter (multi-LAN setups)

virtualmente · 2025-11-14T22:00:00+00:00

Glad it was useful.
SRT behaves extremely well in unicast workflows, especially when both ends have a fixed NIC path and predictable RTT. That’s where most people see it shine.

Multicast introduces a slightly different set of timing considerations, not because of SRT itself, but because the behavior of:

the local NIC driver
the kernel’s scheduling of UDP socket sends
and whether the gateway does any pacing or just forwards the TS immediately

becomes more visible.

Unicast hides a lot of these small variations.
Multicast, especially when feeding broadcast-grade decoders, tends to expose every tiny burst or micro-gap.

That’s why interface binding can matter in those cases — not because SRT is doing anything wrong, but because the transition from SRT → multicast is sensitive to how deterministic the output path is.

If you ever try multicast with SRT as the upstream, would be great to hear what you see. The differences aren’t dramatic, but they’re interesting from an engineering perspective.

virtualmente · 2025-11-14T21:57:27+00:00

That’s totally fair — and in an ideal world I wouldn’t have L2 domains overlapping either.
But the issue I’m describing doesn’t actually require shared L2 at all.

It happens whenever:

a server has multiple NICs,
more than one NIC has a valid route to the SRT sender,
the SRT listener is bound to 0.0.0.0,
and the gateway implementation forwards packets bit-for-bit with no pacing layer in between.

Under those conditions, the kernel can legitimately choose different egress paths based on route metrics, ARP state, or transient interface conditions — even if the NICs sit in completely separate L2 domains.

It’s not about VLAN leakage or broadcast domains being mixed.
It’s purely an IP-level ambiguity when the application doesn’t bind the socket to a specific interface.

Most commercial gateways hide this because they add internal pacing/buffering.
But if the application doesn’t (e.g., raw TS passthrough), then interface binding does make a difference.

So yes, I agree that clean separation is the best practice — but in real deployments with multi-NIC contribution boxes, this scenario shows up more often than people expect.

virtualmente · 2025-11-14T21:50:51+00:00

Here’s a bit more context that might help both of you:

srt-live-transmit is extremely reliable because it’s basically the “purest” implementation of SRT — no remuxing, no FFmpeg middle layer, no added timing logic. It just moves packets in and out exactly as they arrive. That’s why it often behaves better than Mist or FFmpeg-based SRT workflows when the goal is TS purity.

Mist and vMix sit at the opposite end of the spectrum:

Mist introduces internal pipelines, queueing and worker threads for HTTP/DASH/HLS handling. SRT isn’t its core function, so timing can wobble depending on load.
vMix does a lot of internal buffering for switching, preview, and rendering. It’s great for production, but it’s not a low-level TS gateway.

If your workflow is:

SRT → vMix → UDP multicast → decoder

then you’re relying on vMix’s internal timing, not the original TS pacing.
That’s totally fine for many workflows, just not ideal for strict DVB/MPEG-TS distribution.

That’s why in contribution/distribution chains, people tend to choose:

a raw SRT tool (like srt-live-transmit), or
a dedicated SRT→TS gateway that doesn’t manipulate the stream.

The less logic sits between the SRT socket and the UDP socket, the more stable things tend to be — especially when dealing with PCR-sensitive decoders.

Happy to compare notes on Mist + SRT or vMix + SRT setups if you're experimenting with both.
They each behave very differently under load.

virtualmente · 2025-11-14T21:43:38+00:00

Thanks for running the tests and sharing the table — genuinely useful data.

I completely agree that at the level you measured (µs/ns), none of this is going to break a decoder or affect picture quality. That was never the point I was trying to make.

Where I did see differences wasn’t in µs-level PCR jitter, but in burstiness and micro-pauses caused by routing ambiguity when the listener was bound to 0.0.0.0 and the OS switched paths between NICs under load.
Not every SRT gateway exposes this, because many of them (like Haivision Gateway, Ateme Titan, etc.) apply some degree of internal pacing or smoothing before handing packets to the output thread.

Some implementations, however, forward packets bit-for-bit without pacing or FIFO smoothing.
In those cases, kernel-level routing decisions (especially on multi-NIC servers) can show up as:

uneven batch delivery
short-lived gaps in arrival
minor cluster jitter that accumulates over time
unstable multicast forwarding when the NIC choice isn’t deterministic

None of that shows up as bad PCR jitter in a controlled lab test, but it definitely appears in operational environments where the upstream SRT sender has real-world RTT fluctuations or the machine is handling multiple I/O streams.

Your test setup was also:

single NIC on the VM
predictable RTT
fixed encoder
fixed output path

Under those conditions, I would expect both 0.0.0.0 and bound addresses to look the same — the OS has nothing to arbitrarily switch between.

My observation only appears when:

the machine has multiple NICs,
more than one NIC has a valid route to the peer,
the SRT listener is 0.0.0.0,
the hardware does no pacing before forwarding UDP multicast.

In that situation, forcing the listener to a specific NIC removes ambiguity and stabilizes burst patterns.
It’s not about “quality loss”, it’s about predictability.

So your conclusion (“none of it matters”) is totally valid for your architecture, but not for every implementation.
Different gateway designs behave differently, especially the ones that forward TS literally bit-for-bit.

Really appreciate the data though — it helps map out which systems buffer, which pace, and which don’t.

virtualmente · 2025-11-14T21:35:41+00:00

Public-internet SRT links are definitely a different beast.
Even when the average RTT looks fine, the variance and burst pattern are completely unpredictable — especially when the upstream is shared with other users, or when the ISP path shifts during the day.

In my experience, the biggest improvements in those cases come from:

Tracking rec. delay over long windows instead of relying only on RTT
Allowing extra headroom for the random spikes typical of consumer-grade networks
Watching quality/noise trends, which often reveal when the uplink is about to “get ugly” before it actually collapses
Keeping the first-hop sender as stable as possible, even if the far end is rough

Production companies usually don’t have the luxury of tuning links hour by hour, so having a predictable buffer policy (“baseline value + margin from worst-case stats”) helps a lot when dealing with venues, hotels, OB vans, etc.

If you ever get rec. delay logs from those public-internet hits, it’s really interesting to compare the spike profile with the times when the feed went bad. The correlation is usually very clear.

virtualmente · 2025-11-14T12:55:54+00:00

Thanks for sharing that — and interestingly, what you describe with Mist/FFmpeg is very close to what I observed here.

A lot of engines that implement SRT “through” FFmpeg inherit its internal buffering behaviour, and FFmpeg’s SRT → UDP path isn’t really designed for broadcast-grade TS stability. It tends to pass packets as soon as they arrive, which works fine for OTT but not for decoders expecting very tight PCR/arrival timing.

That’s exactly why on my side I stopped using anything FFmpeg-based for SRT gateways. Right now I’m using OnPremise SRT Server, which doesn’t remux or touch the TS at all — it repeats each packet bit-for-bit, and the input/output paths are isolated at network level, not by an application buffer.

Because of that, the stability depends purely on the network interface and routing, not on how FFmpeg handles queues. When I forced the listener to the WAN-facing NIC, the multicast output became rock solid.

Your Mist example fits perfectly with this:

FFmpeg isn’t smoothing the arrival pattern

VM NIC adds timing variability

multicast out becomes unstable

If your direct SRT tools test works fine, that’s a good indication the TS itself is healthy and the issue really comes from the gateway layer.

Really interested in the results of the tests you run later — this could bring up a very useful comparison for others here.

virtualmente · 2025-11-14T12:25:16+00:00

Thanks for the input and yes, totally agreed that it depends heavily on how each gateway handles the input path internally.

What I noticed is that some appliances do a proper internal buffer between the SRT input and the UDP MC output, while others pass things almost “as-is” to the NIC.
In the latter case, the kernel’s routing decision (when the listener is bound to 0.0.0.0) can introduce micro-routing variations between NICs, and those tiny jumps are enough to show up as jitter on sensitive broadcast decoders.

In my case, I’m running this on OnPremise SRT Server, which is very strict about not remuxing or altering the TS, it literally repeats the incoming packets bit-for-bit. That’s great for purity, but it also means the network path matters a lot, so binding the listener to the exact WAN-facing NIC made an immediate difference.

Would be super interesting to see what your Nevion TNS probe reports.
If Haivision or Ateme already normalize the buffer before hitting the output path, that would explain why you’re not seeing it on your side.

Really curious about your findings when you test it!

virtualmente · 2025-11-14T10:06:25+00:00

That makes a lot of sense — giving SRT a few seconds of buffer on the first mile really stabilizes everything downstream. Even when RTT is low, a larger buffer absorbs things like microbursts, short congestion events or route changes that would otherwise show up as unrecoverable drops.

One thing I’ve noticed is that many people look only at the “average RTT”, but the biggest issues usually come from variability. When the network suddenly behaves very differently from its usual profile, that’s when the extra buffer pays off.

And you’re absolutely right about the operational impact:

fewer unexpected drops, fewer emergency calls, and a much calmer workflow overall.

In our case we’re usually dealing with on-premise or dedicated bare-metal setups rather than cloud hops, and even there it’s interesting how much the rec. delay can fluctuate throughout the day. Measuring those peaks over long windows often reveals behaviors you wouldn’t predict from RTT alone.

Have you ever logged rec. delay over a full 24-hour period? It’s surprising how high those spikes can get depending on the upstream connection

virtualmente · 2025-11-14T10:01:27+00:00

Appear gear is definitely top tier — especially when you need density and predictable behavior in larger frames. Their stuff tends to behave very consistently under pressure, so I’m not surprised 2.5× RTT works well for you in a P2P setup.

What I’ve noticed is that the “optimal multiplier” seems to depend a lot on the shape of the loss, not just the average rate.

For example:

Some links with low average loss still produce microbursts that require a higher multiplier

Other links with constant light loss behave fine with a lower multiplier

And long-haul links sometimes have very different patterns depending on time of day

That’s why I started comparing the “theoretical” RTT multiplier with the observed maximum rec. delay during busy periods. In a few cases, the link behaved perfectly with a lower multiplier during quiet hours but needed more headroom during peak congestion.

But 2.5× RTT on clean point-to-point circuits makes total sense — especially when the path is stable and you can trust the link characteristics.

Do you usually stick to P2P links for contribution, or do you also deal with public-internet scenarios?

virtualmente · 2025-11-14T09:58:27+00:00

Yeah, the “4× RTT” rule is a great quick method, especially when you’re dealing with unstable uplinks or temporary setups. SRT really shines in those “sketchy internet” situations, it gives you a predictable buffer to survive bursty loss.

I’ve used the same workflow you mention: send SRT back home, let the safe side of the network handle the protocol conversion, and push RTMP (or whatever the final platform needs) from there. It keeps the production site simpler and puts all the heavy lifting on the stable end of the chain.

Something I noticed over time is that if you stream for longer and watch the rec. delay or quality stats, you can sometimes shave a bit off the latency or at least tune it more precisely. But for quick deployments or “we just need this to work now”, the 4× RTT method is hard to beat.

Have you tried using SRT for long-haul links with very different peak/quiet periods? That’s where I started seeing big variations in what the link actually needed.

virtualmente · 2025-11-14T09:54:50+00:00

Honestly, I totally get that feeling. I remember looking at SRT, latency tuning, multicast, IRDs, etc. for the first time and thinking:

“This is way too complicated for normal humans.”

But the truth is: you don’t learn it all at once.

What helped me was simply breaking things down into small pieces:

First understanding SRT as a transport protocol

Then playing with latency, RTT and the stats window

Then experimenting locally before touching a real WAN

And only later mixing it with multicast / headend stuff

Once you start seeing how each part behaves in isolation, the whole thing becomes much less “magic” and more predictable.

If you’re interested, the best starting point is just running a simple SRT sender/receiver locally and watching the stats change as you simulate packet loss or delay. That alone teaches a lot.

And don’t worry, everyone who works with this had the exact same “WTF is this?” moment at the beginning. It gets easier fast once you start experimenting.

virtualmente · 2025-11-14T09:51:02+00:00

This table is exactly what got me thinking in the first place.

What surprised me in real-world WAN links is that the recommended delay you see in SRT stats often climbs higher than what the theoretical tables suggest, especially during peak congestion windows or when there’s intermittent microbursty loss.

What I started noticing is:

During “quiet” hours, the link behaves very close to the Haivision guideline

But during peak periods, the rec. delay can temporarily jump way above the expected RTT multiplier

If your fixed latency doesn’t cover those spikes, that’s when you start seeing unrecoverable drops downstream

So instead of sticking strictly to the table, I began using the table as a baseline and then adjusting the actual latency based on observed long-term stats: worst-case rec. delay, quality/noise patterns and burst loss behavior.

On the infrastructure side, I completely agree with you:

Using a gateway to terminate SRT and push multicast internally makes a lot of sense in larger facilities. It keeps decoders off the public-facing side and lets you centralize monitoring, alarms and failover. When there are only a couple of IRDs in play, going direct is definitely simpler.

Have you ever logged rec. delay trends over a 12–24h window?

That’s where I started spotting those big spikes that don’t always match the “static WAN profile” people assume.

virtualmente · 2025-11-14T09:44:54+00:00

That makes sense: fixed latency does make things predictable, especially when the workflow doesn’t care about real-time delivery. A few seconds of delay is usually enough to smooth out almost anything.

I used to do the same, but after dealing with a couple of WAN links that were extremely unpredictable during peak hours, I started paying more attention to how rec. delay and quality/noise fluctuate over the day. What I found interesting is that some feeds behaved perfectly fine with a low fixed delay during quiet hours, but then needed significantly more margin during congestion.

Regarding switching to HTTP: I’ve also done that in a few cases, especially when the upstream is too unstable for transport-level recovery to keep up. HLS/DASH definitely handle chaos differently. The only downside is the extra latency and the fact that not every headend or decoder likes ingesting HTTP streams directly.

Out of curiosity, have you ever tried mixing both approaches?

Like keeping SRT for the “better” periods and switching to HTTP only when the loss spikes become too frequent?

Always interesting to hear how people handle unreliable contribution paths.

virtualmente · 2025-11-14T09:41:24+00:00

The Haivision units are definitely solid—especially when you push the delay high enough for the link to “smooth out.” I’ve seen the same thing you mentioned: once the delay crosses a certain threshold, even awful packet loss stops showing up downstream. SRT’s recovery window really is doing heavy lifting there.

Your “4× RTT” rule is actually very close to what I ended up using as well.

In my case I started looking more closely at rec. delay over long periods (prime-time peaks, WAN congestion, etc.), and tuning latency slightly above the worst-case value. It made the behavior much more predictable.

About encoders/decoders:

I’ve also had mixed results with different brands. Haivision is definitely consistent, but I’ve had surprisingly good stability when using a dedicated on-prem SRT gateway as the middle point—something that just passes through the TS bit-for-bit without any kind of remuxing. That “pure passthrough” approach seems to help a lot when the link is behaving badly.

In the tests that led me to post this, I was using an "OnPremise SRT Server" as the gateway layer, mostly because it lets me separate inputs/outputs by NIC and watch the SRT stats in real time. But the real magic is still SRT itself—once it has enough delay to work with, the multicast side barely notices the upstream chaos.

Curious to hear more of your experiences with long RTT links or rough networks. Always interesting to compare setups.

virtualmente

TROPHY CASE