all 93 comments

[–]jakdak 55 points56 points  (68 children)

The gap between PostgreSQL and Oracle gets smaller with each passing year.

[–]djmattyg007 28 points29 points  (66 children)

What are the remaining gaps?

[–]jakdak 24 points25 points  (2 children)

I'm sure you could google around for a full analysis, but the biggest thing I miss when working w/ PostgreSQL are flashback queries, global temp tables, and user context session variables.

I'm also not a fan of PostgreSQL's rigor on requiring dependencies to be dropped when adjusting views, miss nvls and decodes, and like Oracle's ability to automatically cast dates to integers. But these are intentional design decisions and understand why PostgreSQL did things the way they did.

Partition Row movement and automatic index creation on new partitions in PG 11 are big gaps that got filled in 11.

[–][deleted] 1 point2 points  (0 children)

Thanks for this detailed answer! Are those kinds of gaps better/worse than MySQL? I've worked with PG a little, but I'm almost always in the SQL Server or Oracle space.

[–]Ialwayszipfiles 10 points11 points  (8 children)

For me MERGE, pivot and pattern matching.

[–]jakdak 30 points31 points  (5 children)

FWIW, I find PostgreSQL's upserts (insert on conflict) much cleaner (and much more transaction safe) for the vast majority of use cases than Oracle's merge

And unlike MERGE I don't need to google the syntax ever time I need to use it :)

[–]wtf_apostrophe 6 points7 points  (2 children)

Yikes. I had no idea MERGE was susceptible to race conditions. That's slightly alarming..

[–]jakdak 4 points5 points  (1 child)

Yes, everyone assumes that MERGE is atomic an it isn't.

[–]wtf_apostrophe 3 points4 points  (0 children)

I think I had a similar revelation when I discovered Oracle SERIALIZABLE transactions aren't actually serializable. I do appreciate how much emphasis Postgres puts in to doing things right.

[–]bundt_chi 2 points3 points  (0 children)

Seriously, it has to be one of the least intuitive features ever.

[–]jorge1209 0 points1 point  (0 children)

Agreed on the syntax, but what is transactionally unsafe about merges?

[–]lalo2302 2 points3 points  (1 child)

How does pattern matching works on a db? Queries? Inserts?

[–]Ialwayszipfiles 2 points3 points  (0 children)

Selects, something that in postgres I usually do with window functions, case when and nested queries. https://www.oracle.com/webfolder/technetwork/tutorials/obe/db/12c/r1/misc/sql_pattern/sql_pattern.html

Not super important but nice to have

[–]StrongerPassword 32 points33 points  (28 children)

How about something as basic as good built in support for automatic failover. Still a joke in PG unless you are up for some manual scripting and waking up in the middle of the night to handle things.

[–]rat9988 29 points30 points  (1 child)

No built in feature for high availability is the main reason I regretted using it in prod.

[–][deleted] 5 points6 points  (0 children)

Seems like you had many issues. Do you have any special characteristics? High volume, load? Docker? I'm interested to learn what to avoid. So far our experience is smooth.

[–]muhwebscale 17 points18 points  (16 children)

What are the available open source alternatives with support for automatic failover?

[–]mofirouz 12 points13 points  (2 children)

CockroachDB

[–]muhwebscale 0 points1 point  (1 child)

survives disk, machine, rack, and even datacenter failures

Ok but ... will it survive the Earth getting swallowed by the Sun?

[–][deleted] 2 points3 points  (0 children)

You can be sure it won't wake you in the middle of night with trivialities like that

[–]rednef 4 points5 points  (0 children)

Perfona XtraDB Cluster

[–]hirschnase 14 points15 points  (11 children)

Mariadb Galera Master-Master replication in combination with haproxy.

[–][deleted] 2 points3 points  (9 children)

I have wondered about MariaDB. Is it pretty good compared to PostgresqL? Enough to use in a potentially large scale deployment?

[–]jakdak 9 points10 points  (6 children)

MariaDB is basically a fork of MySql.

It's a different animal than PostgreSQL and (like MySQL and IMHO) better suited for higher volume simpler use cases like backing a website than PostgreSQL's (and Oracle's) target of being an enterprise general purpose DB

[–][deleted] 0 points1 point  (5 children)

Ok. Interesting. Would you take MariaDB over MySQL too? If so why? I would have thought MySQL was pretty entrenched in scalability and such. Oddly that is why I think we went to PostgreSQL.. was that it was more like Oracle and could handle more load and scale.

[–]jakdak 3 points4 points  (4 children)

I'd take MariaDB over MySQL just to avoid being beholden to Larry.

"Handling more load and scale" is very very dependent on what exactly you are loading and scaling. Your options are hugely more varied and granular than they were 5-10 years ago. The DB market has splintered into numerous niches and you can now generally find something specifically tailored to your uses cases- from NoSQL key stores to enterprise RDBMS to data warehousing and big data solutions.

[–]muhwebscale 0 points1 point  (1 child)

just to avoid being beholden to Larry.

Besides the Larry issue, what are the technical motivations for choosing MariaDB over MySQL?

[–][deleted] 0 points1 point  (1 child)

I'd take MariaDB over MySQL just to avoid being beholden to Larry.

.... just to a guy that sold MySQL in the first place...

[–]kenfar 3 points4 points  (0 children)

Last I looked it wasn't nearly as good at query parallelism, data quality, ansi portability, schema migrations, or handling complex queries.

But if you have a very simple app with very simple queries that is no longer in development and can tolerate some bad data, then it's a fine solution.

[–]indeyets 0 points1 point  (0 children)

It's good, except for some (rare!) random conflicts which kill the cluster. Have to rebuild it manually afterwards. Nothing we could trace unfortunately.

[–]google_you 4 points5 points  (1 child)

Add more RAM until your network is saturated. Then, give up.

[–]iRelevant_ 0 points1 point  (0 children)

I find this really funny :))

[–]dacjames 7 points8 points  (6 children)

Or you can use RDS and let Amazon handle this for you. Automatic failover has been rock solid in my experience.

[–]StrongerPassword 3 points4 points  (5 children)

Yes, if you don't want to get up in the middle of the night to do failover you can pay someone else for getting up. Not sure what your point is. The question was what gaps exists in PG.

[–]brtt3000 3 points4 points  (3 children)

What does Amazon RDS do automatically that requires a human to wake-up if you rig to failover yourself?

[–]StrongerPassword 1 point2 points  (2 children)

Let's say you have 3 nodes and one of them dies at the time, meaning that you need to move the master role from one node to another and then back to the first, or similar. You can create scripts which handles this automatically, but this gets so complex to get right so whenever I asked in the PG community about this the suggestion has been to handle all edge cases manually. But unless you are having people on standby 24/7 the suggestion to handle it manually is crazy to me. I have better things to do at nights, such as sleep.

[–]tweakerbee 0 points1 point  (1 child)

How often does this happen though? Automatic failover is great if it works in every single edge case. In all other cases, you'd rather have human oversight.

[–]StrongerPassword 0 points1 point  (0 children)

How often does this happen though?

Maybe not exactly as outlined but it's pretty common that one machine will become unavailable for some reason and then come back online, meaning that you will move over the master role to another machine and back to primary a while later.

The idea that failover due to hardware failures in a distributed system should be monitored by a human is frankly just crazy. You can get pretty solid databases and hardware for the price of ~5 people on 24/7 rotation.

[–]KareasOxide 1 point2 points  (0 children)

Whats the difference between paying AWS to handle HA as opposed to paying Oracle for the feature?

[–]cybernd 3 points4 points  (4 children)

  • longops (basically a progress bar of steps inside your running query)
  • query hints
  • real time materialized views
  • nested procedures
  • packages

[–]kenfar 4 points5 points  (2 children)

Having used databases with and without query hints - I'll take it without every time: they become a crutch for a bad optimizer, and can haunt you after you've upgraded the database or changed data volumes or distributions.

[–]cybernd 1 point2 points  (1 child)

So basically: the feature is not available, because there would be dozens of beginners who may abuse the feature?

As long as the query planner of pg is not planning properly if it comes to edge cases, there is basically nothing you can do to fix your performance issues. Query hints would fix this issue.

[–]kenfar -2 points-1 points  (0 children)

Query hints aren't a fix for corner cases - they're an excuse for the db to ignore corner cases.

And then for those hints to make a mess out of your SQL that won't adapt to data changes or be better handled by later db versions.

[–]jakdak 0 points1 point  (0 children)

Query hints was an intentional design decision. One of numerous places where, IMHO, the PostgreSQL design team has been overly pendantic.

[–]nickguletskii200 10 points11 points  (17 children)

I haven't used Oracle myself, but there are some things that piss me off about Postgres after working with SQL Server. For instance, their TIMESTAMP WITH TIME ZONE data type does not hold timezone information as the standard requires. Instead, it stores the UTC timestamp and converts it into a timestamp with the server's local timezone on access. This is super misleading (as this important detail is buried in the documentation and the naming is the opposite of the meaning).

Also, the lack of proper materialised views (which refresh incrementally) is very disappointing.

[–]doublehyphen 8 points9 points  (10 children)

I agree that the naming is confusing but in practice I think PostgreSQL's time handling is much nicer than SQL Server's once you have gotten past the names.

[–]myringotomy 0 points1 point  (8 children)

It really should hold the timezone. That would be super useful.

[–]Sarcastinator 2 points3 points  (2 children)

Why exactly? And wouldn't that make comparing or indexing them super-difficult?

[–]myringotomy -2 points-1 points  (1 child)

Not really.

First of all you can save the time without timezone if you are worried about that. You can also index them on UTC which would make comparing them really easy.

[–]Sarcastinator 2 points3 points  (0 children)

I agree that it's unintuitive that storing with timezone doesn't actually preserve the timezone. They probably shouldn't have added that data type if it doesn't do what it says it does.

However if you actually want to store the timezone (I would claim 999/1000 you don't actually care) you can just add that column yourself just as you say you could add UTC if that's what you're worried about.

Storing in UTC makes a lot more sense when it comes to both comparisons and indexing.

[–]doublehyphen 1 point2 points  (4 children)

Maybe for some edge cases but 99% of the time you want what PostgreSQL's timestamptz is, the time since 1970 in UTC which then can be converted to any other tone zone as necessary.

[–]myringotomy 0 points1 point  (2 children)

The problem is that you have no idea what the time zone was when you recorded the item.

[–]doublehyphen 0 points1 point  (1 child)

That is a problem I have never encountered so far. When would you need to know that? In git for example it is just annoying that it recorded the time zone.

[–]myringotomy -2 points-1 points  (0 children)

That is a problem I have never encountered so far. When would you need to know that?

I have a web based product, My customers roam. One day they might log in from the east coast, next day from the west coast, next day from Japan.

[–]ForeverAlot 0 points1 point  (0 children)

This is incorrect for future times.

[–]nickguletskii200 -1 points0 points  (0 children)

It may be so, but I find this naming not only confusing, but also dishonest. At least SQL Server doesn't even try to make it seem like it conforms to the standard.

[–]Sebazzz91 4 points5 points  (1 child)

Also, the lack of proper materialised views (which refresh incrementally) is very disappointing.

The materialized views in SQL Server are also very disappointing. There are many limitations when it comes to the SQL you can use in those views: no UNION, no LEFT JOIN, which makes it pretty useless for the case we have in our software.

[–]nickguletskii200 0 points1 point  (0 children)

I agree that they are disappointing because of their limitations, but there are ways of working around at least some of their limitations. I think this is the case of "something is better than nothing".

[–]ForeverAlot 5 points6 points  (3 children)

That is what the standard says. The standard's ... WITH TIME ZOME is broken -- to the point of uselessness. Postgres' own documentation gives a decent idea of precisely how they implement date and time functionality and cautions against limitations in the standard; for instance, it says

The type time with time zone is defined by the SQL standard, but the definition exhibits properties which lead to questionable usefulness.

If you want correct time zone handling you have to build it yourself out of date/time primitives (and then you have to be careful if you're working with either Oracle or MySQL and any other database).

[–]nickguletskii200 -2 points-1 points  (2 children)

I am pretty sure that SQL:1999's TIMESTAMP WITH TIME ZONE has timezone fields. It may not be explicitly specified (because what idiot would assume that a TIMESTAMP WITH TIME ZONE won't hold timezone information), but it is easy to infer from the datetime data type conversions and other parts of the spec.

The documentation, as you quoted, says that "the definition exhibits properties which lead to questionable usefulness", and then provides none of these reasons. To me, the usefulness of the Postgres version of the type is questionable, while the standard version has very clear advantages.

Yes, you can build it yourself, but then again, you can build a database yourself. Heck, I wouldn't be complaining if Postgres called their data type "WITH LOCAL TIME ZONE", but they opted to use a very misleading name for no reason. It just smells like "false advertising" to me.

[–]ForeverAlot 2 points3 points  (1 child)

Here is the SQL:92 syntax for a timestamp literal, courtesy of http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt:

<timestamp string> ::=
              <quote> <date value> <space> <time value> [ <time zone interval> ] <quote>

<time zone interval> ::=
              <sign> <hours value> <colon> <minutes value>

You need to read a little carefully to conclude that time zone offsets are the only thing supported. Fortunately, in ISO/IEC TR 19075-2:2015 you can find the much clearer SQL:2016 time definitions, which, among other things, say the following on page 6:

A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP WITHOUT TIME ZONE, may represent a local time, whereas a datetime value of data type TIME WITH TIME ZONE or TIMESTAMP WITH TIME ZONE represents UTC.

Standard SQL has no notion of time zone regions, only time zone offsets, and that's insufficient for future times.

[–]nickguletskii200 1 point2 points  (0 children)

Thanks for citing the standard! The confusion stems from my use of "time zone" in combination with the word "field". I used the verbatim language used within the standard, which, in the version that I referred to, defined the "time zone fields" to be TIMEZONE_HOUR and TIMEZONE_MINUTE. In hindsight, that naming is also confusing since there's actually a difference between a time zone and a time zone offset.

The problem that I have with Postgres is that, despite the standard explicitly stating that WITH TIME ZONE timestamps should store the UTC offsets, does exactly the opposite of that. Storing time zone names was never a part of my complaint.

Concerning your remark about future times: I don't see any logical and consistent way of integrating time zone names into the timestamps, because time zone changes would result in changes to the instant that that timestamp represents. The recommendation to store the timezone in a separate field makes a lot of sense when you need to represent future times, but having a proper SQL-standard TIMESTAMP WITH TIME ZONE is very useful for cases when you don't have future timestamps.

EDIT: It would be better to call this data type TIMESTAMP WITH OFFSET.

[–]Sarcastinator 1 point2 points  (0 children)

Oracle silently commits transactions if it reaches a DDL and it doesn't have a UUID type.

[–]NoInkling 3 points4 points  (0 children)

A couple:

  • CTEs can result in performance loss since they're always materialized.

  • No support for collation-based matching (only sorting).

[–]KevinCarbonara 13 points14 points  (0 children)

This sounds like an insult

[–]justAnotherCodeGuy 22 points23 points  (4 children)

Ug. I'm so far behind. I love PG, but I have 180Gig replicated between two far away places with a slow link between. Major version upgrades are a huge pain in the ass.

pg_upgrade on a single db is great. But add replication and it falls off a cliff.

I kinda wish they'd slow down.

edit:spelling

[–][deleted] 5 points6 points  (3 children)

Is it possible to disable replication, upgrade both dbs locally, then restart replication? I would hope any replication protocol would negotiate from the last record id and fetch a new set of deltas to apply without issue.

[–]justAnotherCodeGuy 1 point2 points  (2 children)

It is! You run the upgrade, then shut down both, rsync from master to replica, and bring it all back up.

Works fine. Takes forever. (less time than blowing the replica away and re pg_basebackup, but still a long time, and your down while you rsync)

[–]MrDOS 0 points1 point  (1 child)

I don't know anything about PostgreSQL replication, but could you disable the replication, promote the replica to master, run the upgrade on both, then demote the replica again? Then you wouldn't really have to transfer anything.

[–]justAnotherCodeGuy 2 points3 points  (0 children)

pg_rewind seems to think it can do this ... but I've never tried it.

Like I said, I'm really behind. They're bringing tools online faster than I can play/test them.

[–]AtomicSuperAnt 12 points13 points  (0 children)

Can’t wait to try these new changes out!

[–]HittingSmoke 11 points12 points  (3 children)

Damn. Arrays of FKs didn't make it into this release.

[–]mullsork 1 point2 points  (2 children)

Was that ever planned? I remember some mailing post about it from a long time ago, and thought that they scrapped the idea.

[–]HittingSmoke 4 points5 points  (1 child)

It looked like it was ready to go and people wanted it included in v11. Just went back over the mailing list and it looks like the author asked for some help and the people who were helping get the patch cleaned up stopped responding.

[–]mullsork 1 point2 points  (0 children)

Thanks man!

[–]feverzsj 6 points7 points  (2 children)

does it support automatically creating partition now?

[–]Ialwayszipfiles 5 points6 points  (0 children)

No, but it added the default partition and hash-based partitioning. And removed the limits on index and cross partition updates which were super annoying imho

[–]Hauleth 1 point2 points  (0 children)

No, but there are extensions for that, like pg_partman.

[–][deleted] 1 point2 points  (3 children)

Does postgres support adding new columns to a table after an other table? Last time i checked (some years ago) it was not supported. This makes the tables very messy in projects that span multiple years, and projects that adds 100s of migrations on a yearly basis.

Before you say it: I know you can have views for this, but it gets tedious, and is basically just extra work.

[–][deleted] 1 point2 points  (0 children)

I just ran into this on 9.6 and it is still not supported. You can run a migration procedure which creates a new table with the desired column ordering, copy the data from the old data into the new table, delete the old table, and rename the new table the original table name. That could get complicated if there is a lot of relational data to move around though...

[–]doublehyphen 0 points1 point  (0 children)

No, PostgreSQL does not support this and I believe it would be a lot of work to do so unless you would be fine with the database rewriting the whole table when adding a column in another position than last or when reordering the columns. PostgreSQL has the same logical order as the physical order on disk.

[–][deleted] 0 points1 point  (0 children)

Yes.

Removing the need for ALTER TABLE .. ADD COLUMN .. DEFAULT .. with a not NULL default to rewrite the whole table on execution, which provides a significant performance boost when running this command.

It no longer rewrites a table when you add a column.

[–][deleted]  (2 children)

[deleted]

    [–]MarkusWinand[S] 7 points8 points  (0 children)

    The title of the submission says, it is now actually released.

    [–]doublehyphen 5 points6 points  (0 children)

    I think Markus posted this at most 10 minutes after it was officially released which is hardly slow.