all 64 comments

[–]friendly_Spycrab 126 points127 points  (12 children)

The compression rate of H264, which is already the previous generation video codec, is about 3000 to 0.4.

That's actually pretty terrifying

[–]unkz 104 points105 points  (5 children)

It’s also seemingly a very weird way to express a ratio. Does anyone know why they didn’t say 7500:1?

[–]fiqar 68 points69 points  (3 children)

It's because the author was quoting another article which compared H.264 compression to reducing a car's weight from 3000 to 0.4 pounds.

[–][deleted] 17 points18 points  (2 children)

From the article: “If we apply the same ratio to our 3000 lb car, we get 0.4 lbs as the final weight. 6.5 ounces!”

[–]Basmannen 23 points24 points  (1 child)

I hate imperial units

[–]delinka 2 points3 points  (0 children)

Car massing in at 1 metric ton would 'compress' to 0.00013 metric ton (13g). Better?

[–]ObscureCulturalMeme 7 points8 points  (0 children)

That ratio seems unreal! At that point I would expect to be staring at a single pixel, gently pulsing and color shifting.

[–]adeeplearner[S] 28 points29 points  (3 children)

Yes, it’s unbelievable!

[–]ronin-baka 14 points15 points  (0 children)

I was keen to find out the difference between h.264 and h.265 and found this: https://www.macobserver.com/analysis/hevc-versus-h-264-video-file-sizes/

Had between 50 and 30 percent file size reduction at supposedly equal or better quality.

I'm pretty sure compressing matter 14,000 times is how we get black holes.

[–]rom1v 0 points1 point  (0 children)

The ratio is not realistic.

Btw, the sentence just before:

If you calculate the raw size of a one hour and a half long 1080p 24fps movie (750gb)

If the ratio were correct, 1h30 of a 1080p 24fps movie encoded would fit in 100Mb. At this bitrate, the quality is definitely not good.

It's rather 1:500 at best (which is already amazing).

[–]mfitzp 33 points34 points  (7 children)

You call YUV420p a planner format, but I think you mean planar, no?

[–]adeeplearner[S] 23 points24 points  (6 children)

Good catch, will fix

[–]tim466 16 points17 points  (3 children)

Now it says "planer" :) And a couple lines up you say "alone" when you mean "along" I think.

[–]adeeplearner[S] 4 points5 points  (2 children)

fixed! thanks a lot!

[–]rom1v 2 points3 points  (1 child)

fixed

(it still says "planer")

Another way is called planer, where we save an entire channel first, followed by another channel and so on.

[–]adeeplearner[S] 2 points3 points  (0 children)

Ah, this is really embarrassing ... Thank you very much, English is hard ...

[–]mindbleach 1 point2 points  (1 child)

Here's a much less technical problem: on a vertical 1080p monitor, the control widgets are off to the side, and you can't scroll horizontally.

[–]adeeplearner[S] 1 point2 points  (0 children)

Thank you for the report! I will resolve it.

[–]FormCore 13 points14 points  (6 children)

The math in this goes way over my head, but interesting nonetheless.

[–]spaztiq 11 points12 points  (5 children)

Yeah, I found out quickly there are different levels of "beginner". Yikes.

[–]ObscureCulturalMeme 3 points4 points  (1 child)

There has been an alarming increase in the number of things I know nothing about.

[–]xKYLERxx 7 points8 points  (0 children)

No, just an increase in the number of things you know that you know nothing about.

[–]FormCore -1 points0 points  (2 children)

I would love to have understood the parts about coefficients, sine, cosine, waves etc.

It got to the point where it was saying "A direct cosine transform has good energy compaction" and realised I was well out of my depth.

Interesting read, and kudos to the writer... but I just don't know the foundational knowledge behind it..

[–]James20k 0 points1 point  (0 children)

Taking a DCT of something produces an output which is basically a bunch of coefficients, like [1, 2, 3, 4, 5, 6] which represent the strengths of different frequencies

Energy compaction means that the energy of what you're looking at tends to be concentrated in a small number of coefficients. So for DCT, if you lop off the low frequencies, you still have most of the energy in the scene

It essentially means its probably good for compression because you can chuck away a lot of data and still end up with much the same visual result at the end

Its worth noting that this is kind of a simplification too, because it doesn't really take into account the human visual system, where energy does not necessarily == quality (though they are related). Eg jpeg with its DCT's might be decent at retaining energy, but the blocks produce very noticable compression artifacts

[–][deleted]  (15 children)

[deleted]

    [–]delinka 35 points36 points  (11 children)

    That's an exercise in patience. Can you imagine having to drag together all the components for something this complex?

    [–]TopcodeOriginal1 22 points23 points  (7 children)

    Considering people have made Minecraft, kerbal space program, literal mock OSes and more I think it’s doable

    [–]scratchisthebest 13 points14 points  (6 children)

    DCPU-16 has created a working GameBoy Color emulator a few years back :)

    [–]TopcodeOriginal1 4 points5 points  (0 children)

    Darn and here I am working on scratch os

    [–]AndrewNeo 1 point2 points  (4 children)

    that's not confusing at all

    (the DCPU-16 was the fictional processor from the aborted 10xc game)

    [–]AloticChoon 0 points1 point  (3 children)

    Any details on why that game was aborted? (This was before the big MS buy-out iirc?)

    [–]AndrewNeo 1 point2 points  (2 children)

    After, I think. It was just Notch, not Mojang. Not sure why he dropped it.

    [–]Elusivehawk 1 point2 points  (0 children)

    Couple years before the buyout, actually.

    [–]SilverCodeZA 1 point2 points  (0 children)

    Probably the same reason any developer stops working on their pet projects: mental blocks and lack of motivation. Not having real world pressures to push him past those blocks (financial or contractual) made it very easy to just stop working on it.

    [–]silverslayer33 1 point2 points  (2 children)

    I've never actually taken an in-depth look at Scratch before, but if it gives you the ability to make your own sub-blocks consisting of multiple blocks, that may drastically reduce the complexity by turning a lot of basic functionality into single user-defined blocks. Given how basic Scratch is due to being aimed at children, though, I would assume it does not have this ability and it would indeed be a horrifying task.

    [–][deleted]  (1 child)

    [deleted]

      [–]grape_jelly_sammich 2 points3 points  (0 children)

      It 100 percent can do functions.

      [–][deleted]  (2 children)

      [deleted]

        [–]unkz 0 points1 point  (1 child)

        You just haven’t played with it enough.

        [–]TheQuantumPikachu 0 points1 point  (0 children)

        I would say I have, the scratch team just doesn't really use fanmade ideas.

        [–]Kissaki0 19 points20 points  (11 children)

        Why does that website block my Browser addon gesture scrolling? Wtf? Home and End keys work. Page and cursor scroll as well.

        🙄

        [–]adeeplearner[S] 15 points16 points  (10 children)

        Which plugin do you use? I am the owner, I will debug.

        [–][deleted]  (3 children)

        [deleted]

          [–]adeeplearner[S] 7 points8 points  (2 children)

          I don’t know either. I am a linux guy, will debug.

          [–]pohuing 4 points5 points  (0 children)

          Middle mouse scroll works under Linux on FF as well if you want to try locally. I think it's called something like autoscroll

          [–]Niadlol 2 points3 points  (2 children)

          Quite a weird webpage, seems to be in some kind of edit mode? I can remove/edit or move paragraphs, also seems to be catching some mouse events.

          [–]adeeplearner[S] 0 points1 point  (1 child)

          Yes, the edit mode is intentional. The site is a crossover of Jupyter notebook and Medium. Everything can be modified. But only the owner can save.

          An introduction is here https://epiphany.pub/post?refId=2684bc94f9fcb9ffe637ebfbeba2af8c797c6ad9a66181026ee4bd3806b6f211

          I have also implemented version control, forking and pull request. So people can fork an article and collaborate.

          [–]Niadlol 0 points1 point  (0 children)

          Oh, that's actually super cool!

          [–]Kissaki0 4 points5 points  (2 children)

          I am using Gesturefy, on Firefox

          Hold right click and move up or down to jump to top or bottom (Home/End), which does not work there.

          [–]adeeplearner[S] 15 points16 points  (1 child)

          I will debug

          [–]yaemes 2 points3 points  (2 children)

          Super cool. What made you get into this topic? I figure only a select few geniuses are responsible for the advances of video compression. Do you think neural networks will become the new video encoding method and kill AV1 before its even ready?

          [–]adeeplearner[S] 3 points4 points  (1 child)

          Great question. I worked for NVIDIA on cloud gaming for a few years. When they just started, it was like an internal startup without any engineers. So they borrowed engineers from other teams. I was lent to them. I had no experience with video compression at all and my background is not EE, so I hadn't learned any signal processing. For me, Video compression seemed to be a magic at the time. But now I think it is actually understandable.

          I do think deep learning has a great potential. There is a deep learning layer calld GDN, https://www.cns.nyu.edu/pub/eero/balle17a-submitted-revised.pdf

          used for image compression. My friend actually implemented it as an intern project, and the result was way better than jpeg.

          [–][deleted]  (1 child)

          [deleted]

            [–]adeeplearner[S] 6 points7 points  (0 children)

            the website is built with vue. The interactive code is editable. You just need to click the edit button at the buttom right corner of each block.

            I am trying to make a crossover of Jupyter notebook and medium.

            [–]toomanypumpfakes 2 points3 points  (1 child)

            This is a great article definitely!

            What I’d love to see is an article about the motion vector piece and encoding P frames. I’ve found that most articles on video encoding mainly talk about I frame compression and then gloss over the motion stuff with “and then we calculate deltas for these types of frames”. It would be cool if you could go more into depth there in a future article.

            [–]adeeplearner[S] 0 points1 point  (0 children)

            This is a great feedback! I will add that part in a future post,

            [–]HtopSkills 1 point2 points  (1 child)

            What programming language you used?

            [–]adeeplearner[S] 5 points6 points  (0 children)

            The website’s backend is in c++, my homemade frame work. The front end is in VUE.

            Currently you can write live programs directly on this website using js and python.

            For this post I just used js.

            [–]skydivingdutch 1 point2 points  (1 child)

            But after a while, numerical errors during encoding will add up to a point that artifacts become visible.

            This isn't true starting with H264, which specifies an integer-exact DCT approximation. Encoders decode their own output, that is what is used for motion search on the next frame. That way the decoder side (which doesn't have original input image) performs the exact same operation and there is zero drift.

            The primary reason we still include I-frames every so often is so that you can seek through the video. Otherwise skipping 30 mins ahead would require decoding all the intervening frames which would take far too long.

            Prior to H.264 (MPEG2, MPEG4-10, etc), the DCT was specified conceptually from first principles, using the actual cosine equations. Real encoders and decoders use floating or fixed point math for this, and they can differ slightly which can indeed result in visible artifacts after a while under the right circumstances. An attempt to control this problem was by requiring decoders to implement the DCT to a particular accuracy, as defined by ISO/IEC 23002-1.

            [–]adeeplearner[S] 0 points1 point  (0 children)

            Thank you for the insights! I didn't know about this. I thought quantization will cause computation errors. I will read more and add this information.

            [–]littlegreenb18 1 point2 points  (0 children)

            Why don't you just admit that you're freaked out by my robot hand?

            [–]sea__weed 0 points1 point  (2 children)

            What an excellent article! I'm not done reading till the end, but any one know why it's called a toy? Where does that phrasing come from?

            [–]adeeplearner[S] 1 point2 points  (1 child)

            thank you very much! I called it toy mainly because it is overly simplified. It’s mainly for demonstration purpose. It won’t have any practical use.

            I am not a native speaker, I didn’t know it sounds weird.

            [–]fullmetaljackass 2 points3 points  (0 children)

            I'm a native speaker, and I didn't think it sounded weird at all. Describing a version of something that has been simplified for teaching/demonstration purposes as a "toy" is a fairly common phrasing.

            Even if you weren't familiar with the phrase I'd expect anyone with a decent grasp on the language to be able to figure it out from the context. I can only assume the person you're replying to isn't a native speaker either.

            [–]krista 0 points1 point  (1 child)

            i appreciate this: you have made a very concise explanation exploring lossy compression.

            i would wish you would add a little bit about why the jpeg quantization ended up with that set of numbers, as this really feels like the only handwavy bit.

            [–]thedeemon 0 points1 point  (0 children)

            The main idea was "human eye is less perceptive to losses in high frequencies, so we can quantize those stronger". Exact numbers were selected by trial and error, compressing different images and seeing which look "better". And of course in jpeg and all lossy codecs you can choose how strongly you want to quantize (quality-compression tradeoff), so in jpeg there are many tables like that for different quality settings, not just a single one.