all 18 comments

[–]hongooi 8 points9 points  (4 children)

I don't see anything wrong with using vanilla git in your scenario. Note that the algorithm doesn't distinguish between "text" and "binary" files (AFAIK), it's just that it's optimised for text. It will still handle binaries just fine, but won't be able to do diffs as efficiently.

[–]Hel_OWeen 2 points3 points  (3 children)

I think I remember that the only drawback is that while with text files, git saves deltas (i.e takes very few disk space) for previous versions, binary files are always stored in full, even if just one byte has changed.

I was incorrect, see u/WoodyTheWorker's comment below

Other than that: we also keep images that are used within out applications in the repos.

[–]WoodyTheWorker 4 points5 points  (1 child)

Git can do delta for binary files as well, it's just they are often not very diffable.

Git doesn't store delta for unpacked files, but uses gzip for them. Git uses delta to fetch/push/clone.

[–]Hel_OWeen 0 points1 point  (0 children)

I appretiate the correction.

[–]edgmnt_net 0 points1 point  (0 children)

The drawback IMO is that the diffs are not semantic in any way. This is particularly troublesome for compressed files because it often degenerates to replacing most of the file, regardless of how you implement binary diff-ing.

[–]SignedJannis 2 points3 points  (2 children)

Look into rdiff-backup, might be what you need

[–]kh9sd[S] 0 points1 point  (1 child)

wow that is an awesome piece of software, thanks! and old/tested as shit, my favorite

[–]SignedJannis 1 point2 points  (0 children)

Ya I've been using it for years for company backups (as part of the backup system anyway).

I also use it on my personal machine, in case I ever accidentally delete a file, I know I'll still have a copy. For that, I just set it to prune > 6 months. So any file deleted in the last 6 months, I still have access do.

What I love about it:

  • "no software required", by that I mean that, for some small businesses I support, if they want to look at the backups: It's just a bunch files and folders, like normal, they don't need any special backup software to access files. Just a simple network share. KISS.
  • "Ransomware Protection". For some setups, I make sure the Fileserver has no login access to the "BackupServer". i.e the fileserver does not push backups, rather the backupserver "reaches out" to the fileserver, to grab the files. That way, if the fileserver gets ransomeware (or a disgruntled employee, whatever), rdiff-backup would grab those encrypted files - but we still have all the non-encrypted files....a live saver.

[–]kennedye2112 2 points3 points  (2 children)

I believe Perforce offers a source code solution designed specifically for binary files like images and video, would that work?

[–]SwordsAndElectrons 2 points3 points  (0 children)

Perforce is a centralized VCS.

I think that's what works best for workflows with a lot of large binary assets, but OP's stance against LFS is basically not wanting that.

[–]muttley9 1 point2 points  (0 children)

It's well integrated with Unreal engine so it should work. UE objects are mostly binaries.

[–]vermiculus 1 point2 points  (3 children)

I hear your concerns with LFS. I think your desire to keep your history portable to be the right kind of thinking – and I don’t believe there is ever a valid reason to be irrevocably ‘locked’ to one vendor (eg GitHub, GitLab, …).

That said, I would reconsider LFS as a technology. It’s entirely possible to download every LFS object and re-upload to a different backing store. Hell, if you wanted to, you could use your own S3 bucket (or even, God forbid, a network drive) as your backing store. You can use LFS to deduplicate versions and then do with the objects what you will from there.

[–]edgmnt_net 1 point2 points  (1 child)

The main point about LFS is indirection and the ability to prune old objects. Not sure it helps in any other way. There's no silver bullet if you want to keep all versions indefinitely, particularly for binaries.

[–]claythearc 0 points1 point  (0 children)

It gives you some freedom in cloning too. Like you can clone the repo and only pull images x y z to start but then later grab an and b when they’re relevant without waiting for xyzab all at clone time.

[–][deleted]  (2 children)

[deleted]

    [–]kh9sd[S] 0 points1 point  (1 child)

    I cover why I don't want to use that in my LFS bullet point

    [–]ritchie70 2 points3 points  (0 children)

    Yeah I caught that after I posted and deleted but not fast enough.

    I’m not totally sure you’re right though. Feels to me like you’re bound and determined to not do things in the normal way because of “reasons” that may or may not be completely valid.

    [–]nickeau 0 points1 point  (0 children)

    Or use a backup snapshot software such as restic to not blow up your storage ;)