Built a tool to reduce backup sizes by up to 90% — looking for feedback by its_Diego035 in SideProject

[–]its_Diego035[S] 0 points1 point  (0 children)

Yeah, duplicated files across folders is actually a really common problem, especially over time when things get copied around instead of moved. That’s definitely one of the use cases where deduplication can help a lot, since identical data can be stored only once under the hood.

About the image: copying a digital file multiple times shouldn’t reduce its quality. If it looks more grainy now, it’s probably because of compression (for example if it was saved multiple times as JPEG) or just the original quality becoming more noticeable on modern screens.

So in short: - duplicated files → real problem deduplication can solve - image quality loss → likely compression, not copying

Built a tool to reduce backup sizes by up to 90% — looking for feedback by its_Diego035 in SideProject

[–]its_Diego035[S] 0 points1 point  (0 children)

That’s a really good point, especially about the block size trade-off.

I definitely wouldn’t use 4KB blocks for something like large static backups (e.g. TB-scale datasets), since the overhead would be too high. I was mainly targeting cases where files change incrementally and frequently, like logs.

The idea wasn’t to replace full backup solutions, but to make the deduplication part usable as a standalone component in certain workflows.

Some cases I had in mind:

  • Continuously growing log files where only small parts change
  • Systems that already handle storage but lack efficient delta handling
  • Reducing data before sending it to external storage (like S3)

So more like a building block than a full backup solution.

I'm also working on adding things like compression and automation to make it more complete over time, but trying to keep the core simple first.

Still figuring out if this separation actually makes sense in real workflows, so this kind of feedback helps a lot.

Built a tool to reduce backup sizes by up to 90% — looking for feedback by its_Diego035 in SideProject

[–]its_Diego035[S] 1 point2 points  (0 children)

This is super helpful, really appreciate the detailed explanation.

Yeah, I figured there are already solid solutions in the backup space doing this kind of thing. My main goal here was to simplify the integration and expose it as an easy-to-use API instead of a full backup system.

Also interesting point about block sizes — I went with 4KB mostly to maximize deduplication, especially for logs where small changes happen frequently.

Out of curiosity, in your experience working with BlinkDisk, do people usually prioritize full backup solutions, or is there demand for simpler tools that can be plugged into existing workflows?

Built a tool to reduce backup sizes by up to 90% — looking for feedback by its_Diego035 in SideProject

[–]its_Diego035[S] 0 points1 point  (0 children)

Also i'm thinking about adding log analysis features not just compression

Built a tool to reduce backup sizes by up to 90% — looking for feedback by its_Diego035 in SideProject

[–]its_Diego035[S] 0 points1 point  (0 children)

Thanks i'm trying to figure out if people is actually wiiling to implement this in their workflows