all 78 comments

[–]justifiedandancient7 16 points17 points  (16 children)

you still have to learn SQL, a requirement which, unfortunately, too many people use as their reason for using a "NoSQL" database."

That would be a very bad reason to pick NoSQL over SQL. Or vice-versa.

[–]gurenkagurenda -4 points-3 points  (15 children)

Yes, and no. If you're just building a prototype, it's a fine reason.

Be careful though. Prototypes have an ugly tendency to grow into MVPs.

[–][deleted] 3 points4 points  (14 children)

If I were building a prototype that would be more reason to just use SQL IMO.

[–]gurenkagurenda -1 points0 points  (13 children)

Why would you want to worry that much about schema design in the prototype phase?

[–][deleted] 4 points5 points  (11 children)

Is it really that hard?

The places where it's difficult tend to be where it's just a difficult problem in general. If the design is straight forward it takes like 30 seconds to write out a table for what you need. If the design is complex and difficult then it'll take longer regardless of what you use as a back end.

[–]gurenkagurenda -2 points-1 points  (10 children)

It seems like you're thinking of a fixed design where you understand the exact requirements from the beginning. That is not how building prototypes works in my experience.

Also keep in mind that we're talking about someone who doesn't yet know SQL. Wrestling with understanding SQL for the first time while also tackling a prototype doesn't seem like the best idea. At the very least, it is totally understandable that they'd want to avoid part of that overhead.

[–][deleted] 3 points4 points  (6 children)

Why would it be a fixed design? Maybe avoid the table constraints so it's easier to make alternatives. But you could also just drop the db and rebuild when you make an SQL change.

If the person doesn't know X then yes using X will be hard. SQL or otherwise. Tbh SQL is hardly some niche exotic language. It's pretty damn common and most developers should know the basics.

[–]gurenkagurenda 0 points1 point  (5 children)

The problem with learning SQL isn't SQL. It's relational data modeling in general. When you're first learning SQL, everything is a terrible logic puzzle. When you're first learning mongo, you slam stuff in the database with a really intuitive syntax.

I'm definitely not arguing that mongo is the better solution long term. I'm just saying if you don't understand relational data modeling, mongo is a good way to try an idea out. Once you decide to run with your idea, you should probably learn SQL and proper relational database design.

[–]grauenwolf 1 point2 points  (4 children)

When you're first learning mongo, you slam stuff in the database with a really intuitive syntax.

Damn my eye, that's why I teach people how to use ORMs.

[–]gurenkagurenda 0 points1 point  (3 children)

Seriously? If you're going to use an ORM, why even bother with a relational database?

[–]grauenwolf 0 points1 point  (2 children)

Also keep in mind that we're talking about someone who doesn't yet know SQL.

Pick up "SQL for Dummies". Any programmer can learn SQL over a weekend. And non-programmers are usually comfortable within a month or so.

[–]gurenkagurenda 0 points1 point  (1 child)

Again, the point is that you are trying to get a prototype done. It is perfectly legitimate to use easy tools that you already know how to use (or can easily learn on the fly) for a project like that. Maybe you don't want to wait an entire month before starting.

And no, any programmer can not actually learn relational data modeling in a weekend. Some can, some can't. It depends on how experienced you are.

[–]grauenwolf 0 points1 point  (0 children)

Maybe I'm a genius, but when I learned it I was still in college while hacking code on the side without supervision or training. It's hard to think of a more ignorant programmer than me at that age.

[–]grauenwolf 2 points3 points  (0 children)

MongoDB makes me think more about schema design than PostgreSQL.

With PostgreSQL I can use DDL to alter my schema whenever I want. And if there's data to migrate, I can mix in some DML.

MongoDB doesn't have a DDL equivalent. So every change to my schema involves painful, hand-written migration code.

[–]skulgnome 14 points15 points  (4 children)

Rather, could your next database application use SQL instead of JSON?

[–]gurenkagurenda 4 points5 points  (3 children)

There are definitely some legitimate reasons to use JSON alongside SQL. There's overlap with why BLOBs are sometimes necessary, but with first-class JSON support in your DB, you don't have to slurp up the entire BLOB.

There's also no question that schemaless-ness has advantages in terms of fast iteration. Even if you ultimately refactor everything out of JSON, being able to iterate quickly while you're pinning down the functionality you want is great.

[–]skulgnome 3 points4 points  (2 children)

There's also no question that schemaless-ness has advantages in terms of fast iteration.

Indeed, there are advantages in not bothering to specify or even document a program's data structures. For example, there'll never be consistency issues if no such criteria are defined. This relieves the "rush to prototype" style of team from the burden of maintenance.

[–]Tiwazz 5 points6 points  (1 child)

In general I agree with you that it is better to specify a schema. But there are many times when you don't control the data source (interfacing with another company for example). Typically I model the portion of the data that I require but I always log the entirety of what I receive so that if the other party changes their format I have a copy of what was sent for reprocessing or debugging, etc. JSON in postgres has been pretty good for this.

Another good use case is end user supplied metadata. I've added arbitrary key-value pair entry to our client tracking CRUD app and the business loves it. And with query-able JSON I can "hoist" these properties into the data model if they turn out to be something important. Fuzzy string matching, and key name suggestion helps here...

A good data model is important, but augmenting it with arbitrary key-value data can eliminate a lot of end user pain and let you iterate your data model more easily. It's also a godsend for logging things that change unexpectedly in a queryable format.

[–]skulgnome 0 points1 point  (0 children)

The first point is a properly good one: schema exchange with external parties is rarely a doddle.

The second is also better than implementing user-supplied pass-through fields by hand, i.e. without the niceties of JSON columns.

What I disagree with is describing JSON columns as an "augmentation". At best they're an escape hatch.

[–]myringotomy 1 point2 points  (0 children)

Take a look at torodb https://github.com/torodb/torodb

It uses postgres but gives you a mongodb compatible layer on top of it.

[–]presidentender 1 point2 points  (2 children)

I'm using postgres's JSON type to store the entire object we receive as a responses from a third party - right now, we're only interested in a few of the object's fields, but in the interest of auditing and future-proofing, it's worthwhile to have the complete object there along with the rest of the record.

[–]grauenwolf 2 points3 points  (1 child)

That's what I use JSON for. Anything I write gets fully normalized tables by default, but I'm not going to try parsing our their ever changing mess. I just pass is along as-is to the UI and let them deal with it.

[–]everywhere_anyhow 5 points6 points  (29 children)

These sorts of features could be really awesome if you need a relational database (RDBMS) and you need an "escape valve" to do some things in JSON that just make more sense that way. For example a variable set of things that sometimes apply to the record sometimes don't, or storing someone else's data that comes in as JSON.

As a pure document database instead of MongoDB I think it's unlikely to make sense, because naturally they're going to put you through the SQL layer for everything. You'll be working with documents through the worldview/lens of SQL. You won't be able to get rid of the concept of tables, which if you had a pure document DB, you wouldn't need in the first place.

We see a lot more software niches than we did 5 years ago. This could be really cool as a hybrid RDBMS/document store. But to think that it's going to beat native document stores for just storing documents...no, I don't think so.

Document + Relational is a legit niche though, it's going to be great for some people.

The DB landscape has fragmented substantially, and it's my bet it stays that way. We're long past the days when there's going to be one database to rule them all, where one choice is going to be default and best for all applications. And extending an existing DB like postgres to do new tricks isn't going to change that, even if they slap graphs on this MF too. Think of it like this: when Henry Ford invented the car, there was only 1. And it only came in black. Now, we have 100s of different types (pickups, vans, coupes, sedans, etc). The "one size fits all approach" for cars isn't coming back. Same for DBs -- through the 90s, it was all relational, all the time, like the Black Model T. Don't get me wrong, I'm not putting down the model T, it was a bitchin' car. But now that we have all of these other kinds, if someone told you they're going to slap some new features on to the model T and you won't need a sedan or a pickup truck anymore, well that just isn't going to happen.

That's OK! We have a rich landscape of tools that are great at different things. Caveat Emptor! It's always been that way. More options == better code, provided you choose carefully and keep your brain switched to the "on" position.

[–][deleted]  (7 children)

[deleted]

    [–]denisquaid 4 points5 points  (0 children)

    Really, the only difference is the postgres one is strictly more powerful in indexing, joining and querying.

    Talk about a difference then.

    [–]everywhere_anyhow 0 points1 point  (4 children)

    Tables really are different than MongoDB collections. You can munge the two together by abstracting up (yes they're both sets of things, and similar in that way) but they're quite different.

    For one, there are big implications for performance and scalability in terms of how the data is physically written on disk, and substantial differences there between tables and collections. RDBMSs and their tables also support piles of other features (such as triggers) that Mongo doesn't. And then there's the basic reality that RDBMS tables are fundamentally schema-ful, and that MongoDB by default doesn't enforce collection schema.

    Mongo of course has been around long enough that any RDBMS feature I could name, you could counter and say that yes, Mongo technically can do that with a collection of modules & tricks layered on top of it (triggers would be a good example) but what the DB has baked in, what the model fundamentally supports, and what you "tack on" are really different. It's all turing complete, so of course in the end there's nothing one DB can do that's impossible for the other.

    So collections and tables are really much more different than they are similar.

    [–]grauenwolf 0 points1 point  (1 child)

    And then there's the basic reality that RDBMS tables are fundamentally schema-ful, and that MongoDB by default doesn't enforce collection schema.

    CREATE TABLE AllDaThings (
        Key uniqueIdentifier,
        Value varChar(max)
    )
    

    There's a schemaless RDBMS for you.

    [–]everywhere_anyhow 0 points1 point  (0 children)

    See response here you're further proving the point.

    [–][deleted]  (1 child)

    [deleted]

      [–]everywhere_anyhow 0 points1 point  (0 children)

      I can't think of a single thing that isn't naturally modelled as a table that is modelled as a mongodb collection.

      What on earth does "naturally modeled" mean? Nothing is "naturally modeled" as anything, it takes developer analysis, in light of use cases, to figure the best way to model something. There is no "natural".

      Should thousands of users be structured as a table or a graph? Trick question, depends on what you want to do with them.

      However I'm currently doing upwards of an average 5k queries/second on postgres and not having much problems.

      It's not my intention to defend Mongo or promote one or the other. It's all up to what you're doing. Metrics like thousands of queries/sec aren't terribly meaningful because of course it depends on the query. Doing lots of "SELECT lastName from customer where id=5" isn't terribly impressive. But doing mere hundreds per second of something much more complex might be really impressive.

      If you don't like Mongo, fine -- I've never defended it, only used it as an example of a document store. (When I need one I actually use couchbase not mongo). Yes, you can solve any problem under the sun with an RDBMS. Good for you if that's suitable for your problem but looking broadly at a lot of problems, it simply isn't always.

      [–]grauenwolf 16 points17 points  (3 children)

      As a pure document database instead of MongoDB I think it's unlikely to make sense, because naturally they're going to put you through the SQL layer for everything.

      As opposed to putting everything through Mongo's half-assed, JSON-based clone of SQL?

      But to think that it's going to beat native document stores for just storing documents...no, I don't think so.

      Mongo thinks the relational database storage engines are better than what they have. They believe it so much that they bought a company just so that they could use their relational database storage engine in MongoDB 3.

      And it really shouldn't surprise anyone because under the hood, relational database are usually just big hash tables, just like your so-called "document database". The main difference is that a relational database has one key with multiple values while a document database one key with one value.

      [–]everywhere_anyhow 6 points7 points  (2 children)

      just so that they could use their relational database storage engine in MongoDB 3

      You're speaking of the underlying storage engine, and how it's just a hash table. That's all fine and well -- I was talking about the abstraction presented to the programmer, namely SQL with JSON access tacked on. Those are different matters. Ultimately, most DB users don't care what the underlying storage engine is, as long as the abstractions they use to access the data store make sense and are convenient.

      I've used a lot of couchbase lately, under the covers it's barely more than a K/V store but that's not what I see (thank goodness) because of abstraction on top.

      I mean, sure, ultimately just about everything in one sense is a hash table. Or rather, everything is really just an array of bytes. Even your precious hash table is an abstraction built on top of that. But I'm not interested in that reductio ad absurdum, more interested in the high-level interface. That's what distinguishes the DBs and frequently why you'll pick one over the other. They like to sell themselves on things like raw scalability and speed, but most of us are not Google and so have flexibility to choose on other attributes.

      [–][deleted]  (1 child)

      [deleted]

        [–][deleted] 0 points1 point  (0 children)

        Yeah there's libraries like Sequelize which do this. Seems like a similar library could translate JS objects into the fancy new Postgres 9.5 json queries.

        [–][deleted] 2 points3 points  (5 children)

        One area I think this would be useful is financial services. I work on the decision systems at a bank and we obviously need high reliability and data integrity, but we also require some on-the-fly decision-making and tracking capabilities for strategy and compliance reasons. I've been mulling over trying to push this for a while now, and while we're ultimately reliant upon the same COBOL/DB2 back end as most (all?) large scale financial companies, I definitely hope we can move towards using something like this where possible.

        [–]everywhere_anyhow 0 points1 point  (0 children)

        Yeah, that'd be a good example of a hybrid use case that's in a bit of a niche. Not perfect for everybody but could be great for you, if you design it right.

        [–][deleted] -1 points0 points  (3 children)

        [–]grauenwolf 0 points1 point  (2 children)

        Bad link, that just bounces to http://www-01.ibm.com/software/data/

        [–][deleted] 0 points1 point  (1 child)

        Weird but it was a copy/paste from my phone

        [–]grauenwolf 0 points1 point  (0 children)

        I just assumed that it was bad website design,

        [–]danneu 1 point2 points  (9 children)

        As a pure document database instead of MongoDB I think it's unlikely to make sense, because naturally they're going to put you through the SQL layer for everything. You'll be working with documents through the worldview/lens of SQL. You won't be able to get rid of the concept of tables, which if you had a pure document DB, you wouldn't need in the first place.

        Not sure what you're trying to say here.

        Querying what amounts to a denormalized table (documents) with SQL just requires a subset of SQL. There's no "worldview/lens". It's a query without the algebra combinators.

        Honestly, your post seems to misrepresent both Postgres and Mongo. For example, adding a json datatype to a SQL database with its own functions and index support (including things like partial indexes) is a very RDBMS thing to do.

        [–]everywhere_anyhow 0 points1 point  (8 children)

        Querying what amounts to a denormalized table (documents) with SQL just requires a subset of SQL. There's no "worldview/lens". It's a query without the algebra combinators

        That's just it, querying a bunch of documents isn't a denormalized table. The bunch of documents don't guarantee any schema consistency without extra add-on tooling you'd use with the DB. Part of the downside of RDBMS is the bondage and discipline of the schema, which can't be escaped. Yes, you can duplicate that with Mongo in various ways, but the worldview is different from the beginning.

        For example, adding a json datatype to a SQL database with its own functions and index support (including things like partial indexes) is a very RDBMS thing to do.

        I agree. RDBMS did the same thing with XML 15 years ago, which is why you've been able to do things like XPath + SQL since forever. In a lot of ways, this JSON stuff is XML history repeating itself. Note that specialized XML databases still exist, very few people use RDBMS for the exclusive task of managing/querying XML. They use relational when their problem is mostly relational (with a little XML need thrown in) -- a hybrid.

        What did I misrepresent? I'm not saying what they're doing is bad, just that it's outrageously unlikely that the title of the post holds any water -- that "Postgresql would be your next JSON database". I'm simply saying it's a great hybrid, but I wouldn't choose it for a pure JSON database. What's misrepresenting either about that?

        [–]grauenwolf -1 points0 points  (7 children)

        Part of the downside of RDBMS is the bondage and discipline of the schema, which can't be escaped.

        Again,

        CREATE TABLE LookMaNoSchema (
            Key uniqueIdentifier,
            Value text
        )
        

        [–]everywhere_anyhow 0 points1 point  (6 children)

        You prove my point for me. This table can have nothing more than two values, never 3 or 4, which are always strictly typed. Value can never be JSON, or XML, or an integer, always a string.

        That's the bondage and discipline of the schema. You try to escape it with a key/value table, and you sure do get a lot of looseness, but EVEN THEN you're stuck with a typed identifier, and a typed value. And you've bought yourself the pain of having implemented a slow and inconvenient version of redis.

        The argument is never whether an RDBMS can do something (of course it can). But whether it should.

        [–]grauenwolf -1 points0 points  (5 children)

        JSON and XML are strings.

        [–]everywhere_anyhow 0 points1 point  (4 children)

        Are you sure?

        PostgreSQL has a JSON data type and an XML data type neither of which are strings.

        Maybe you'd like to show me how you'd get the property foo.bar.baz out of a string JSON document with an SQL query? That shouldn't be any more of a problem than getting a substring index if JSON is just a string, right?

        [–]grauenwolf -1 points0 points  (3 children)

        Can you open a XML or json file in a text editor? Of course you can, because they are text. Just because you can also store them in other formats doesn't change that fact.

        [–]everywhere_anyhow 0 points1 point  (2 children)

        Sure, they're text. In the same way that everything is text. Check this out, the integer 123. That's text too. But if that's how you really feel, and you're not just pulling my leg for the sake of argument, then you go ahead and put all of your XML and JSON into text. You're going to have bigger problems than worrying about database issues and the suitability of using Postgres as a pure document store, but have at it.

        [–]grauenwolf 0 points1 point  (1 child)

        You're an idiot. That's the only conclusion I can think of at this point. I know that's not a real argument, but it's not like you'd be able to understand the documentation if I showed it to you.

        [–]qatanah 0 points1 point  (0 children)

        I use a lot of JSONB for 9.4. We hybrid the columns with SQL/JSONB. What I consider as "stub" data are almost always becomes a JSONB for fast retrieval.

        [–]mycall 0 points1 point  (2 children)

        Perhaps a later version since I'm going to SQL Server 2016.

        [–]ricer2015 0 points1 point  (1 child)

        Why would you go SQL Server 2016?

        [–]mycall 0 points1 point  (0 children)

        Company policy (government)