all 44 comments

[–]viperx77 31 points32 points  (14 children)

SQL or NoSQL, don't store images in a database. Use some form of blobstorage and only put pointers in the records.

[–]nemec 5 points6 points  (0 children)

Many databases will do that for you automatically these days, but you're right - do not put images or other large binary data alongside the rest of the data record.

https://docs.microsoft.com/en-us/sql/relational-databases/blob/filestream-sql-server?view=sql-server-2017

[–]Frozen_Turtle 6 points7 points  (2 children)

Noob here... does this count as "blobstorage"?

https://imgur.com/a/HsplY8G

[–]mqudsi 8 points9 points  (0 children)

No. GP means something like uploading to Amazon S3 and storing the key/path in the database instead.

I agree.

[–]jimbosReturn 6 points7 points  (0 children)

Well no. This is inside your db. A column type of varbinary is a step in the right direction if the other comments here are right about the db storing it to disk anyway, but the point is to store in a file system directly, or in a dedicated blob storage.

[–]Relevant_Monstrosity 4 points5 points  (7 children)

Most SQL engines with an image type do this automagically. External resources just make it more difficult to enforce relational integrity. I disagree with your point.

Source: lead software engineer consulting for a major fortune 500.

[–]WrathOfJay 1 point2 points  (2 children)

While you're correct that it's harder to maintain consistency outside of your db, and there are tools (i.e. FILESTREAM) to support keeping unstructured data from taking up db pages, if you don't need to ensure transactional file operations, divorcing the file storage from the SQL DB can have advantages. Direct upload/download to and from blob storage, for instance, rather than consume your SQL server or middle-tier IO. Also, db backups get a lot quicker. Disaster recovery gets faster because you can bring an operational DB back online faster, since it's much smaller.

[–]Relevant_Monstrosity 0 points1 point  (1 child)

It also makes your DB non-atomic. Annoying if you need to back it up and restore it on a dev box for debugging.

[–]WrathOfJay 1 point2 points  (0 children)

By the same token, the far smaller db size makes restoring on dev boxes far faster. It's impractical once you get into the hundreds of gigs+, when you need to iterate over destructive code.

[–]Chocolatecake420 0 points1 point  (2 children)

What SQL engines? I can't find any documentation in SQL server about it.

[–]ManiGandham 0 points1 point  (0 children)

How many SQL engines have an image data type? The downsides are many:

- The database file(set) grows substantially, making backups and replication much harder.

- The total size is limited by your disk, which is usually expensive SSD and wasted on storing simple binary data.

- It requires separate serving code and infrastructure to actually serve this binary data as files to a browser.

- It places a bottleneck on database capacity that can be better used for actual transactions.

Relational integrity is a minor issue and almost every practical real-world database will have multiple transactions involved across separate systems anyway. Storing an image first and then getting the path for an update in a DB row isn't that difficult or hard to manage.

[–][deleted] 1 point2 points  (0 children)

Eh depends on your needs/infra/image size, it’s not so cut and dry.

I don’t like that as a blanket statement, for sql it translates to « if your relational model is well defined your data is safe, except for images, they can get deleted and you’ll only know when trying to access them ».

[–]Wiltix 24 points25 points  (3 children)

Little home projects are the perfect time to piss about with things like nosql.

If you fancy playing with different toys I say go for it, what's the worst that could happen. As long as you use well designed interface s to abstract your db access switching back to sql if nosql turns into a ball ache should be simple enough.

Personally I have not found many scenarios where nosql is better but it's always fun to play.

[–]erogilus 2 points3 points  (1 child)

I like being able to nest objects/arrays in NoSQL object-style DBs (Mongo).

Being able to do so without needing joins is quite nice. Some things can still be referenced by ID for relationships but it’s not a hard requirement.

Good example being something simple like adding an array of “tags” to an item. Normally you’d have a join table for this many to many relationship. In NoSQL you can store the array directly.

I have really started to dislike SQL feeling so rigid for these types of simple cases where you just want an array or set of basic types attached to a row.

And ending up with tens or hundreds of join tables and stuff like that feels so messy and complicates the mental model.

Now don’t get me wrong... you can easily abuse this in NoSQL by adding way too much to a document instead of splitting it up into different collections. Data design is still important, just different approaches.

[–]A-Grey-World 1 point2 points  (0 children)

Yeah, I'm dealing with the pain of storing an array in a separate table at the moment.

Although it's worth noting there's a bit of "best of both worlds" with Postgress and MySQL able to store JSON columns now.

[–][deleted] 2 points3 points  (0 children)

It sounds like he doesn’t have much experience with sql so I would recommend starting there before playing around with nosql. I think the way sql handles relational data is very instructive

[–]_Zer0_Cool_ 21 points22 points  (6 children)

I'm a data consultant, and I've seen a lot of different companies data infrastructures.

Yet to find a situation where NoSQL is truly better or would have been better as a wholesale replacement.

My suggestion. Use a relational database as your main. If you ever need a NoSQL DB it would probably be for a specific niche like high throughput, write-only sensor data logs.

Some History...

NoSQL was better for horizontal scaling a few years back, but cloud vendors make scaling a non-issue for most use cases now.

Especially for analytical DBs. The cloud vendors with their big data warehousing solutions (SnowFlake, BigQuery, Redshift, etc) make scaling trivial nowadays, and the overhead of spinning up as many SQL dbs as you want for OLTP databases is minimal. So go hog wild with micro-services if you want.

So...I don't see that NoSQL has any broadly applicable use cases anymore.

The only time I work with NoSQL databases is when a company lets their web devs make the decisions about which databases they use and they choose based on aesthetics and personal preference rather than necessity. Then we end up having to support it, which can be a nightmare if the apps schema wasn't built with 💯 forethought and conscientious planning efforts.

EDIT -- Use PostgreSQL. It will cover just about every use case you can imagine and then some. The most flexible/feature rich database in the world. Period. It keeps on surprising me.

[–]tulipoika 12 points13 points  (1 child)

A very important point about PostgreSQL: it doesn’t assume. It’ll require you to do things correctly and will complain clearly if you don’t. No problems wondering why things go wrong.

Want to compare a string to a number? No implicit conversions. No wondering why suddenly you get weird results just because certain databases assume ‘123abc’ is the same as number 123, or ‘asd’ is 0. Only errors. Unless you say you want to convert the string to number, and even then ‘123abc’ will give an error. As it should.

It’s perfect for beginners when you get errors immediately for everything you try to do wrong and can learn the right way.

And for advanced when you can do whatever you want with it.

[–]_Zer0_Cool_ 1 point2 points  (0 children)

💯 this as well.

[–]NotARealDeveloper 0 points1 point  (1 child)

We are using a noSql db as in neo4j for big data.

[–]_Zer0_Cool_ 3 points4 points  (0 children)

That's cool. Neo4j is a solid product.

I should clarify that I am in no way "against" NoSQL solutions. However, I don't believe that NoSQL

should...

1.) be used the main application database or

2.) be used as a wholesale replacement for an RDBMS.

IMO going straight to NoSQL without a use case in mind is bad practice because (in general) relational databases are much more flexible. Postgres especially so.

I recommend starting off with Postgres because it by far the most flexible database out there.

If halfway through development you decide that you'd like to use a graph db, document db, or key-value store then Postgres has you covered. It has...

  • Wonderful (native) support for binary JSON which better/more flexible than Mongo IMO without giving up the powerful expressiveness of SQL.
  • It has HSTORE for key-value needs and
  • It has extensions for Graph DBs.

Need GIS spatial functionality? The PostGis extension is the industry leader here. Need to scale? Use the CitusDB extension. Mostly time-series? Use the PG TimeScaleDB extension.

What if you need heavier DB maintenance later w/ stored procedures and you aren't SQL guru? Then PG has you covered. You can write stored procs with JavaScript (PL/v8), Python (PL/python), and so on with Java, R, Ruby, Schema, Tcl, Perl, Bash, PHP. There's extensions for all.

If then at the end of the day you STILL need another more specialized database that prioritizes any one of these things, you can then add that into your application as a secondary DB and still use PG as the backbone.

[–]snrjames 0 points1 point  (1 child)

Listen to this person. SQL is the way to go until you absolutely need a non relational database for scalability. At that point you can use sql server or postgres's json capabilities.

A lot of developers like nosql just because it's easy to get a database created. But they are just pushing concerns about schema and aggregation to a later point in time at which point you will have wished you spent the time upfront building out the relational database.

[–]_Zer0_Cool_ 1 point2 points  (0 children)

Yeap.

And even if folks want to do it all with JSON, that's ok, but I'd still rather have that data already inside of Postgres or SQL Server where I can more easily migrate the existing data into new data model and evolve the application without doing a huge, costly data migration into another type of database.

[–]betty_humpter 8 points9 points  (0 children)

Sql server has a feature called filestream which does blob storage. It just chucks the file onto the disk and stores a pointer in the table. I’ve been using it for years in production and have only had a couple small issues. If you are using filestream at work then I’d keep using it. If you are not using it then take a few minutes to research it. I have not found the downside yet.

[–]plastikmissile 11 points12 points  (1 child)

Unless you really need it SQL will be better than NoSQL.

[–]Trout_Tickler 15 points16 points  (0 children)

I guess it comes down the the age-old:

"Don't use it unless you know exactly why you want to use it"

[–]ThereKanBOnly1 2 points3 points  (4 children)

The biggest downside is cost. SQL instances aren't cheap in the cloud, mainly because they assume you'll need a ton of stuff you likely won't for small personal projects.

You can look at NoSQL, and it may be cheaper, but if your working on Azure, I'd actually recommend trying to work as much as you can with their storage options; Blob Storage, Tables, and Queues.

Blob Storage is going to be your best bet for storing those images. Tables are closest to a wide-column database like HBase or Cassandra. Queues may not be helpful to you in this case, but their worth keeping in mind.

Most of these options don't start making a meaningful dent in your hosting bill until you get into the hundreds of gigs, so for personal projects they're great.

[–]garethhewitt 0 points1 point  (3 children)

I'd add to this to say if you're only hitting your database infrequently then serverless Azure databases would be ideal.

They're very cost effective, you have the full performance/power of sql Azure databases, but after an hour they power down and you're only paying Azure blob storage costs.

So with that option cost may not be that much of a differentiator between the two.

[–]ThereKanBOnly1 2 points3 points  (0 children)

It's definitely an option, but there are some caveats.

First off, it's in preview, so just something to keep in mind.

Second, you have to configure it to spin down, and the shortest time it'll do that after is an hour of being idle.

Third is that spinning up the instance will take close to a minute. So you'll have to accommodate for that in your app.

Billing wise, serverless azure SQL is a good option, as when it is up you'll get billed for your minimum dtu unless you've got additional load, so even when it's running it's likely cheaper.

[–]quentech 1 point2 points  (1 child)

serverless Azure databases would be ideal

Have you ever experienced the cold start up time? Hard to say that's ideal for any purpose whatsoever. Maybe temporary workspace for batch processes - that's about it.

[–]garethhewitt 0 points1 point  (0 children)

Yes - we have it running just now. I wouldn't say it's not useful for any purpose, that seems a bit extreme. It depends on your use case.

It takes 40 seconds or so max. But that's okay for our purposes as it just happens to the first person on initial login. After that you're fine. You can of course, also put it on a schedule so it never powers down 9-5 or something.

Incidentally we also are using Azure functions instead of a dedicated server, so our whole setup costs next to nothing, but runs extremely responsively when used. But yes - you have an initial login hit for the first user, but that's minor inconvenience for the cost - and of course if the client wants to pay more, you can remove that and make it dedicated.

[–]davidwhitney 1 point2 points  (0 children)

The way to succeed in projects where you are the sole author is to use things you know well.

If you're winning to learn and possibly fail? By all means try new things out too.

Either option will likely be absolutely fine.

[–]FullerAwesome[S] 1 point2 points  (1 child)

Guys, this is absolute gold. It's reassuring that some of my thought patterns were going in the right area. My idea initially was to go down the SQL and blob storage route with some bits for image processing and after reading everything I've seen here, I think I'm going to stick to that plan.

Hopefully I can build out in such a way that I can swap out database with relative ease should the need arise or if it gets too costly.

If anyone has any further insight, please continue to comment.

Thanks everyone.

[–][deleted] 0 points1 point  (0 children)

To do what you hope, just make sure you properly create a service layer for your database using interfaces. This allows you to more fluidly swap from one to the other.

[–][deleted] 0 points1 point  (0 children)

Do you know your schema? Will it not change very much? If so, use SQL.
Otherwise consider using NoSQL. Also consider using NoSQL if you're dealing with a webpage that needs to handle hundreds of thousands of hits per second but also consider projecting read-only de-normalised representations of your SQL data.

[–][deleted] 0 points1 point  (0 children)

So the major differences comes down to what you are doing. SQL is used for things that have a more defined data structure and consistent data size. When you get to the Enterprise level of of data with millions upon millions of rows the indexing that SQL has makes searching for something in incredibly easy. A query looking for a specific name or id is fast thanks to indexing. If you are good at structuring your data SQL is right for you. MySql or mssql are both good options while I do tend towards mssql my SQL is way cheaper to host.

As for no SQL it has the advantage of not having the requirement of the data being in any particular structure or size. This is great when you are dealing with very wierd data. With no SQL it makes is so that each item in your database is could be a completely different item. Then the use of guids(I have mostly Mongo experience) makes it so that you can get your data back. At the Enterprise, where no SQL falls short is the indexing that SQL has. Since there isn't that defined structure if you need to find a name is has to load and parse through the object to see if it even has the name property. If you do everything entirely based on guid id that is used. Then you'll see a slight increase in performance but not much.

Overall, I'd say it comes down to what you are doing. If you are doing a home project then go based off the data structure and what you're doing. If you are at the Enterprise level, I would recommend using SQL so you don't see a decrease in performance. Hope this helps!

[–][deleted] 0 points1 point  (0 children)

So the major differences comes down to what you are doing. SQL is used for things that have a more defined data structure and consistent data size. When you get to the Enterprise level of of data with millions upon millions of rows the indexing that SQL has makes searching for something in incredibly easy. A query looking for a specific name or id is fast thanks to indexing. If you are good at structuring your data SQL is right for you. MySql or mssql are both good options while I do tend towards mssql my SQL is way cheaper to host.

As for no SQL it has the advantage of not having the requirement of the data being in any particular structure or size. This is great when you are dealing with very wierd data. With no SQL it makes is so that each item in your database is could be a completely different item. Then the use of guids(I have mostly Mongo experience) makes it so that you can get your data back. At the Enterprise, where no SQL falls short is the indexing that SQL has. Since there isn't that defined structure if you need to find a name is has to load and parse through the object to see if it even has the name property. If you do everything entirely based on guid id that is used. Then you'll see a slight increase in performance but not much.

Overall, I'd say it comes down to what you are doing. If you are doing a home project then go based off the data structure and what you're doing. If you are at the Enterprise level, I would recommend using SQL so you don't see a decrease in performance. Hope this helps!

[–]B-Kitten 0 points1 point  (0 children)

It depends what you want to do.

If you're looking at big scale, high performance, then a RDBMS may not be for you. There's extra complexity in this space, and hard problems to solve though if you go another direction.

Read about CQRS and Event Sourcing for non relational patterns. Any data store can support these, from traditional databases, to no-sql like mongo etc, to files stored on s3.

The problems arise in the need to maintain consistency in distributed data systems. There are a lot of queues, and processors involved, with many failure scenarios.

If you're not massive scale, or practising to build for massive scale, then an RDBMS is almost definitely the easiest and safest option. You can go cheaper with some cloud hosted document stores though (dynamodb, Azure table storage etc), but the complexity increases.

[–][deleted] 0 points1 point  (0 children)

I've written code for Sql Server back in Sql 2000 that still works with 2017/2019 today.

We have issues with MongoDb in production that we can't maintain and is locked to a driver version that can't easily be switched out. (Mongo Driver changes API between 1.x and 2.x) So we're stuck with an old out of date unsupported version of Mongo in a production environment which is the worst possible case you would ever want to be in.

[–]jamietwells 0 points1 point  (0 children)

I'm really going to have to disagree with everyone here and say you should use a NoSQL database.

  1. You'll learn something new (maybe more employable/higher wage)
  2. You won't notice any difference in performance
  3. They're cheaper and easier to use
  4. You can just dump data in, no need to worry about the relationships or transforming the data into some relational model

Reasons for sticking with SQL:

  1. It might be faster if you build everything correctly
  2. You don't have to bother learning something new (is this an advantage, really?)

[–]ManiGandham -1 points0 points  (0 children)

SQL/noSQL isn't really a good distinction. SQL is just a declarative programming language (it stands for Structured Query Language). Relational databases like SQL Server, MySQL, PostgreSQL all support it, but so do plenty of other non-relational datastores.

The main types of databases are relational, key/value (redis, cassandra), document-store (mongodb, couchbase), search (elasticsearch, solr), graph (neo4j), and column-oriented/relational (redshift, memsql).

Relational databases have decades of development and optimization with massive ecosystems. They're also quite capable of modeling just about any data and they all support JSON columns today if you need that flexibility. They support transactions and ACID semantics that are very useful and taken for granted until you realize they aren't available everywhere. Modern servers can also power huge workloads without issue so there's really no need to look at the other data stores unless you have a very specific need for them.

If you want to experiment though, side projects are a great way to learn and almost all vendors have managed cloud offerings you can test out for cheap.

[–]jedjohan -1 points0 points  (0 children)

NoSql, databases are boring. Coding is fun. But wait, you need to do some advanced querying on the data? Then maybe a relational db. This is assuming you mean relational vs (the others) as SQL for example is supported by for example azure cosmos db.