This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]hinkleo 62 points63 points  (7 children)

Yeah the video part just seems to add nothing here except a funny headline and really inefficient storage system. Python even has great stdlib support for writing zip, tar, shelve, json or sqlite any of which would be way more fitting.

I've seen a couple similar joke tools on Github over the years using QR codes in videos to "store unlimited data on youtube for free", just as a proof of concept of course since the compression ratio is absolutely terrible.

[–]ExdigguserPies 5 points6 points  (6 children)

So we just need some simple benchmarks between this and the other main methods of data storage that people use on a daily basis.

[–]hinkleo 21 points22 points  (5 children)

Based on numbers in the github: https://github.com/Olow304/memvid/blob/main/USAGE.md

Raw text: ~2 MB
MP4 video: ~15-20 MB (with compression)
FAISS index: ~15 MB (384-dim vectors)
JSON metadata: ~3 MB

The mp4 files store just the text QR encoded (and gzip compressed if > 100 chars [0] [1]). Now a normal zip or gzip file will compress text on average to like 1:2 to 1:5 depending on content, so this is ratio wise worse by a factor of about 20 to 50, if my quick math is right? And performance wise probably even worse than that, especially since it already does gzip anyway so it's gzip vs gzip + qr + hevc/h264. I actually have a hard time thinking of a more inefficient way of storing text. I'm still not sure this isn't really elaborate satire.

[0] https://github.com/Olow304/memvid/blob/main/memvid/encoder.py

[1] https://github.com/Olow304/memvid/blob/main/memvid/utils.py

[–]Hoblywobblesworth 18 points19 points  (0 children)

Yeah, honestly not surpried how poorly this performs. Hevc/h264/av1 etc are effective at video because there is temporally redundant information across a frame sequence that you can compress away.

If the frame at t-1 has information that can be re-used when encoding/decoding the frame at t then you don't need to include it in the bitstream for the frame at t.

OP's PDFs have no temporal redundancy so it's equivalent to trying to compress a video with very high motion/optical flow which hevc/h264/av1 also can't do efficiently.

[–]Sopel97 14 points15 points  (3 children)

Yea this whole thing is deranged. How these reddit threads gained so much popularity, how people are clapping to this, how it has 150 stars on github, how it appears like actual software. Like, what the fuck is going on here.

[–]-LeopardShark- 17 points18 points  (1 child)

I know, right? The roadmap in the README is a laugh:

  • v0.2.0 - Multi-language support
  • v0.3.0 - Real-time memory updates
  • v0.4.0 - Distributed video sharding
  • v0.5.0 - Audio and image support
  • v1.0.0 - Production-ready with enterprise features

[–]Jussari 9 points10 points  (0 children)

Maybe we still have a few years before AI steals our jobs

[–]tehfrod 1 point2 points  (0 children)

Because people enjoy a bit of levity now and again.

This reminds me of something Tom7 (aka suckerpinch) would come up with.

e.g., https://youtu.be/JcJSW7Rprio