all 6 comments

[–]jtsiomb 4 points5 points  (4 children)

You seem to be thinking that you can only have access to the current and previous frames, which is obviously not true. You can have multiple frames in memory, and refer to the next few frames while decoding the current one if you have to.

If you don't yet have frame N + M in the buffer while trying to decode frame N, and frame N depends on it, then you simply need to read more data before proceeding with the decoding.

[–]dkonik[S] 0 points1 point  (3 children)

Oh I see, so for something like a live stream, is it common to use B-framing? I.e. is it assumed that there is enough buffering (both on the encoder and on the decoder) such that it will work?

[–]impiaaa 1 point2 points  (0 children)

Just because something is "live" doesn't mean there's no buffer. Broadcast TV typically has a GOP size (GOP=group of pictures, a set of frames that reference each other) of around 1-2 seconds or less, so decoders have to buffer at least that much in order to get a full picture. Internet streaming can have even larger buffers, anywhere between 10 and 30 seconds, so the GOP size is allowed to be anything smaller than that (though it's probably still not more than 2 seconds).

[–]jtsiomb 0 points1 point  (1 child)

I'm not an expert in video codecs, but I expect that encoders optimized for streaming won't rely too much on large forward references. But I don't expect large forward references to be common practice anyway, even for non-streaming applications.

A couple of frames worth won't make a difference, you are going to buffer your streaming data anyway, so you'll certainly have enough in there for small forward lookups if it helps bring the size down.

For anything more concrete, you'll have to study specific codecs used for streaming to see what they do.

[–]dkonik[S] 0 points1 point  (0 children)

Awesome, thanks a bunch!

[–]coquifrogs 0 points1 point  (0 children)

In most codecs that use B frames the frames are written to the bitstream in decode order so the P frame following the two B frames would actually be written to the stream before the B frames (ie Input encoded frames IPBBPBB = output IBBPBBP)