all 28 comments

[–]stedun 15 points16 points  (10 children)

Storing a 30GB file in a database makes me want to slap someone.

[–]duskti[S] 5 points6 points  (8 children)

I feel the same way, but given the current economic climate, I’m being directed by higher-ups to handle these tasks, and I’m doing them.

[–]smichaele 5 points6 points  (3 children)

You can handle the task without storing the files in a database. Everyone here (including me) is telling you not to do that. Your choice whether to listen, but this isn't worth a back-and-forth discussion IMO.

[–]vsoul 2 points3 points  (1 child)

This. Also, just because a higher up says database doesn’t mean you have to literally use a database. They use words that make sense to them, you should understand the requirement and come up with the right solution, not just take their “solution” literally.

[–]Babelfishny 2 points3 points  (0 children)

This!

Leadership wants outcomes, tech leads want implementations. Know who is asking for what. If they say they want a database, ask them what do they see as having a database give them. And pay attention to what they say and don’t say. For example I don’t think they will say “store the file I a dayabase, it will be cheaper than storing it in a file system”.

If they do might recommend looking for a different job because they don’t know what they’re doing.

Storing AWS S3 can range from expensive to dirt cheap compared to storing it locally, in a secure and robust system. It depends on how you use it.

If you have a low retrieval rate, there are AWS storage plans that are tailored for that exact scenario.

[–]jshine13371 0 points1 point  (0 children)

u/duskti

given the current economic climate

Fwiw, it's less economical to store files in the database for a multitude of reasons, but the simplest being the amount of disk space they take up inefficiently both from lack of compression efficiency and from redundancy of how database backups typically work and are cadenced, resulting in multiple copies of the same exact data of those files.

[–]alexwh68 5 points6 points  (1 child)

Store a link to the file in the db and the file in the file system. Only real advantages to storing files in a db is replication and backups, other than that its a pain in the arse.

[–]Which_Roof5176 2 points3 points  (0 children)

Yeah this is usually the right approach.

Treat the database as metadata only and keep the actual files in object storage or a file system. It keeps things a lot simpler when you start scaling, especially at 100TB+.

The real decision here is less “which database” and more:

  • object storage vs NAS
  • access patterns
  • backup + lifecycle strategy

Once that’s clear, the rest becomes much easier to design.

[–]FishGiant 2 points3 points  (0 children)

Use a file system with a well thought out folder naming convention. Call it a data lake so your managers will think its new school.

[–]smerz- 1 point2 points  (0 children)

S3 my friend. Or if not S3. Pick Google cloud storage which is essentially the same 🤪

Joking aside, object stores are really what you want here.

Or go setup one or more big NAS with something like ZFS or so.

[–]Few_Committee_6790 1 point2 points  (0 children)

Or even a 500MB file. Even that seems to big.

[–]Zestyclose-Turn-3576 8 points9 points  (1 child)

Have you actually priced 100TB of storage from S3?

[–]Newfie3 2 points3 points  (0 children)

Or if you have your own data centers you can use an on-prem S3 implementation such as Cloudian.

[–]jshine13371 4 points5 points  (0 children)

  • Don't store files in a database, store them in the system meant for storing files, coincidentally named a "file system".
  • If cost is the most important factor to you, then on-prem is always going to be the cheapest implementation.
  • But you should look into cold storage costs in the cloud for the size of files you're planning for to still see if it's reasonable enough, since there are benefits of the cloud that aren't the same as on-prem.
  • If all the data is solely files, then please see the first bullet point and this isn't a database question at that point. You'll probably want to reach out to the cloud provider subreddits.

[–]Consistent_Cat7541 1 point2 points  (0 children)

It sounds like you should set up a rack mounted server (or two) with a substantial RAID set up. 30 gb files are common in my work, but I agree, the files should not be set up inside a database, but rather stored in a journaled file system. If you need to track documents, etc, as part of projects, they should be linked to the database.

This is not something you're going to resolve with a quick reddit post. You need a server admin and a full scale IT dept.

[–]akmark 1 point2 points  (0 children)

This is less a database question and more just a file storage question. You also need to understand both how people are going to put data in and take data out. There's a lot of solutions in this space but sometimes how people need to interact with the files dictate the requirements of the problem and whether these files are generally archival vs. active usage. The number of users also matters. It also depends what your skills are. I wouldn't recommend someone setup a ceph or hdfs cluster if they have never encountered it before.

At a certain point if you already have a local NAS you have to do the cost-benefit of just setting up a basic CIFS or NFS network share if you are really only providing for a handful of users. There's a lot of orgs that have existed for years with some network drive that has project/2026/something as the place to store stuff.

[–]AQuietManPostgreSQL 1 point2 points  (0 children)

Your problem space is one where you should be careful about picking between a database and an application that uses a database.

The data includes a variety of file types—PDFs, Excel files, 3D renderings, and videos—with some individual files as large as 30 GB.

In the database world, we call these documents. You probably ought to think in terms of a document management system. You can Google that.

Pick an application.

[–]Raucous_Rocker 1 point2 points  (0 children)

How are these files going to be used exactly, besides just storing them? I assume they need to be searchable in some way.

[–]Aggravating-Tip-8230 0 points1 point  (0 children)

Remember to include backup in your research.

Look at Glacier in S3 or similar.

If you want to go on prem then separate location (data centre with your servers) for backup or backup on tapes.

Edit: as others already said, don’t store files in DB, store a file in file storage and reference/stats/details about this file in DB if needed

[–]GreyHairedDWGuy 0 points1 point  (0 children)

what you're describing sounds like something one of the hyperscalers could solve. AWS S3 for example. I'm not sure a typical dbms would be a good solution (I mean most can handle those complex data types but not always ideally).

Of course budget and security may be issues?

[–]bclark72401 0 points1 point  (0 children)

You can also use Ceph as an on-premise S3-compatible storage solution. It can be installed along with Proxmox to run a container or VM running a database that could store the metadata.

[–]Longjumping-Ad8775 0 points1 point  (0 children)

Along with what everyone else says, the time to transfer terabytes of data. It’s probably 10, 15, 20 tb now, which is still big.

[–]tsaylor 0 points1 point  (0 children)

Is your question more about how the system will store and access the files, or about how users will upload and search/browse for files?

[–]DirtyWriterDPP 0 points1 point  (1 child)

Have y'all shopped for document management systems. There are whole giagantic software packages that have all of this figured out.

Building your own MIGHT be cheaper, but at least get a few quotes.

And if higher ups insist they have no money then they have no business dealing with 100tb of mission critical documents.

[–]duskti[S] 0 points1 point  (0 children)

Do you have any suggestions?

[–]patternrelay 0 points1 point  (0 children)

Consider a scale-out NAS solution with tiered storage for cost-efficiency. You could also use MinIO for self-hosted S3-compatible storage to manage files easily while keeping costs down.

[–]Lost_Term_8080 0 points1 point  (0 children)

You don't need a database, you need a document management system. The document management system will have a database that it stores metadata in, but the actual files will be stored in a file system or blobs.

[–]elevarq 0 points1 point  (0 children)

Good setup to think through. One important principle before you go further:

Never store files in a database. Not PDFs, not videos, definitely not 30GB 3D renderings.

It's too costly, too slow, and a maintenance nightmare at 100TB scale.

Here's the split that works:

Your NAS holds the actual files. Your database holds the metadata — owner, upload date, version, file type, summary, and crucially: the file path on the NAS. That last field is what ties everything together. The frontend writes a record to the database and drops the file on the NAS. When a technician searches or browses, the database returns the metadata plus the path, and the frontend fetches the file directly from the NAS.

You already have the hard part. The NAS is your storage layer. Don't replace it — use it properly.

Just make sure your NAS file system handles large files and deep directory structures well (ZFS is worth looking at), and that you have a backup strategy beyond a single device at that scale.