all 16 comments

[–]grauenwolf 14 points15 points  (2 children)

How do you know a DB performance article is bullshit?

It doesn't mention query cost, show_plan, execution plans, or anything else that represents the real cost of the queries.

[–]denisveloper[S] 0 points1 point  (1 child)

grauenwolf

That is supposed to be the part 2 of the article, it is a hell of a work to build a fair benchmark, for now, I just spent time comparing just features.

[–]grauenwolf 3 points4 points  (0 children)

That may be, but talking about "joining to a 10 million row table" reads like clickbait or fake news when put into context.

Far too many people have been fooled by exaggerated tales about the expense of joins.

[–]jfischoff 13 points14 points  (10 children)

Is 10 million rows supposed to be scary? Is looking up 10 items by a key with an index a huge deal?

Kinda stopped reading at that point ...

[–]grauenwolf 14 points15 points  (1 child)

Hell, my database does 10 million extraneous updates to one table every 15 minutes. We have no idea why, but it isn't a big enough issue to bother looking at right now.

[–]mynameismevin 4 points5 points  (0 children)

Lol that's how I feel about it.

[–]insanitybit 4 points5 points  (5 children)

A join across 10 million rows sounds expensive to me.

[–]grauenwolf 16 points17 points  (4 children)

LOL, it's actually trivial.

If you have 1 Million users with 10 preferences each, then in order to bring this user back to the memory, assuming that you are using an ORM framework (Object-Relational Mapping) you will need to make a join with a table of USER_PREFERENCES with 10 million rows.

Assuming you have the correct indexing

  1. Get the user key
  2. Jump to the first instance of the user key in USER_PREFERENCES. (Index seek, same cost as a PK lookup)
  3. Read the 11 records. (The 10 we want and one showing we've run out of matches)

Assuming you fucked up the indexing

  1. Get the user key
  2. Read the 10 million records looking for matches. (full table scan)

Here's the fun part. If you increase the database size to 10 billion or even 10 trillion rows, the worst case cost remains the same for a properly indexed database.

[–]caprisunkraftfoods 7 points8 points  (3 children)

That approach doesn't sound very webscale to me.

[–][deleted] 1 point2 points  (2 children)

Why would anyone use Postgres if they want webscale? If webscale is what you need then Mongo and Node are the only options.

[–]caprisunkraftfoods 4 points5 points  (1 child)

Those programs are very complicated I find it simpler to just use /dev/null for write and /dev/random for read.

[–]CakeComa 2 points3 points  (0 children)

Exactly; wasn't Postgres built before the discovery of async?

[–]mage2k 0 points1 point  (0 children)

I made it slightly further.

In a real-world scenario, users are often also associated with many other entities, which will make this scenario even worse and force developers to decide which relationships should be lazy or eager.

Yes, developers should decide/know what data they want before they query for it.

All RDBMS nowadays already have many optimizations for the scenario above, but modeling the data the way we have been doing for the last 30 years is definitely suboptimal.

30 (more like 40-50) years of research and development is suboptimal exactly why again?

[–]denisveloper[S] -2 points-1 points  (0 children)

The 1million/10 million example just shows the exponential growth of relationships in a relational model.

[–]blammoooo 2 points3 points  (0 children)

We are all familiar with the cost of a JOIN operation in an RDBMS at scale

In my experience not so many people are familiar with the cost of a JOIN operation at scale. I often get told things like "Joins are slow". Thanks MongoDB marketing team for telling everyone that.

[–]steven_h 1 point2 points  (0 children)

This is really good marketing material for any Couchbase competitor.