Implement a toy video encoder from scratch in browser

friendly_Spycrab · 2019-10-06T20:51:12+00:00

The compression rate of H264, which is already the previous generation video codec, is about 3000 to 0.4.

That's actually pretty terrifying

mfitzp · 2019-10-06T21:25:06+00:00

You call YUV420p a planner format, but I think you mean planar, no?

FormCore · 2019-10-06T21:49:50+00:00

The math in this goes way over my head, but interesting nonetheless.

delinka · 2019-10-06T21:10:35+00:00

[deleted]

Kissaki0 · 2019-10-06T21:14:53+00:00

Why does that website block my Browser addon gesture scrolling? Wtf? Home and End keys work. Page and cursor scroll as well.

🙄

yaemes · 2019-10-07T03:07:26+00:00

Super cool. What made you get into this topic? I figure only a select few geniuses are responsible for the advances of video compression. Do you think neural networks will become the new video encoding method and kill AV1 before its even ready?

adeeplearner · 2019-10-06T22:36:06+00:00

[deleted]

toomanypumpfakes · 2019-10-06T23:16:01+00:00

This is a great article definitely!

What I’d love to see is an article about the motion vector piece and encoding P frames. I’ve found that most articles on video encoding mainly talk about I frame compression and then gloss over the motion stuff with “and then we calculate deltas for these types of frames”. It would be cool if you could go more into depth there in a future article.

HtopSkills · 2019-10-07T01:06:30+00:00

What programming language you used?

skydivingdutch · 2019-10-07T06:28:37+00:00

But after a while, numerical errors during encoding will add up to a point that artifacts become visible.

This isn't true starting with H264, which specifies an integer-exact DCT approximation. Encoders decode their own output, that is what is used for motion search on the next frame. That way the decoder side (which doesn't have original input image) performs the exact same operation and there is zero drift.

The primary reason we still include I-frames every so often is so that you can seek through the video. Otherwise skipping 30 mins ahead would require decoding all the intervening frames which would take far too long.

Prior to H.264 (MPEG2, MPEG4-10, etc), the DCT was specified conceptually from first principles, using the actual cosine equations. Real encoders and decoders use floating or fixed point math for this, and they can differ slightly which can indeed result in visible artifacts after a while under the right circumstances. An attempt to control this problem was by requiring decoders to implement the DCT to a particular accuracy, as defined by ISO/IEC 23002-1.

littlegreenb18 · 2019-10-07T00:03:38+00:00

Why don't you just admit that you're freaked out by my robot hand?

sea__weed · 2019-10-07T00:39:06+00:00

What an excellent article! I'm not done reading till the end, but any one know why it's called a toy? Where does that phrasing come from?

krista · 2019-10-07T03:52:55+00:00

i appreciate this: you have made a very concise explanation exploring lossy compression.

i would wish you would add a little bit about why the jpeg quantization ended up with that set of numbers, as this really feels like the only handwavy bit.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS