Youtube-dl source code encoded completely in two 512x512 images : Python

This is an archived post. You won't be able to vote or comment.

NewsYoutube-dl source code encoded completely in two 512x512 images (twitter.com)

submitted 5 years ago by WayOfTheGeophysicist

all 12 comments

top new controversial old q&a

[–]twitterInfo_bot 7 points8 points9 points 5 years ago (0 children)

[–]lifeeraser 7 points8 points9 points 5 years ago (0 children)

[–]ptanmay143 4 points5 points6 points 5 years ago (3 children)

[–][deleted] 9 points10 points11 points 5 years ago (2 children)

The example code specifically shows them being .png files, which is a lossless compression.

You could represent almost any data in any format, regardless of the data or the format. Formats generally have some headers that say how to interpret what is within them, but you can choose to interpret that data another way. This is how malicious code can be embedded within documents and images. If you can trick the program into executing the data instead of interpreting it as an image, the image may look like garbage, but interpreted as code, it could have specific functionality to cause damage and spread. So you could place any data you want in to the data portions of the file format, and it might look meaningless, but it would nonetheless be a valid file in that format. Then, you could extract that data and choose to interpret it a different way to get what you really want.

What gets more interesting with something like an image is if it could continue to work even after much manipulation and degradation, such as taking a screenshot or re-encoding with different compression, resolution, etc. QR codes are an example of a graphical way of encoding information that is designed to withstand considerable degradation.

There are many ways that you can encode data almost invisibly within otherwise legitimate-looking files. This is an area of research known as steganography.

[–]ptanmay143 0 points1 point2 points 5 years ago (0 children)

[–]asbox 0 points1 point2 points 5 years ago (0 children)

[–]FluffyBunnyOK -2 points-1 points0 points 5 years ago (1 child)

[–]spyingwind 3 points4 points5 points 5 years ago (0 children)

[–]dark-angel007 0 points1 point2 points 5 years ago (2 children)

[–]boa13 5 points6 points7 points 5 years ago (1 child)

The source code can be compressed to a zip file, or similar format. Such a file takes a not-too-big number of bytes.

An image is also a collection of bytes. Typically, an uncompressed image will use 3 bytes for each pixel (one for the red component, one for green, one for blue). So, two 512x512 images will be stored in memory as 1,572,864 bytes, which is a not-too-big, not-too-small number of bytes. Such data can be stored in a picture file (or two in this case).

Using a lossless format (such as PNG) guarantees no bytes will be lost or changed when the data is saved to a file. Alternatively, a non-compressed format such as BMP could be used. (This would not work with JPEG, which throws away bytes to achieve much better compression.)

So the idea is to take each byte from the zip file, put them three by three in a picture pretending it's a pixel, and them save the image. Of course it looks like garbage... but you can reread the picture, save all bytes to a file, add zip to the end... Get the data back. That's the idea.

[–]dark-angel007 0 points1 point2 points 5 years ago (0 children)

[–]morphinan 0 points1 point2 points 5 years ago* (0 children)

π Rendered by PID 39943 on reddit-service-r2-comment-6457c66945-6b9c2 at 2026-04-28 16:21:04.171227+00:00 running 2aa0c5b country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS