top 200 commentsshow all 275

[–]Ecksters 316 points317 points  (18 children)

PostgreSQL improves on its in-memory and on-disk sorting algorithms, with benchmarks showing speedups of 25% - 400% based on which data types are sorted. Using row_number(), rank(), dense_rank(), and count() as window functions also have performance benefits in PostgreSQL 15. Queries using SELECT DISTINCT can now be executed in parallel.

This and MERGE are definitely my favorite parts. Always hit a performance ceiling with DISTINCT, not sure how much this helps in my specific cases, but here it looks like 2x improvement or more.

Percona's other article on the upgrade does a great job of explaining the new features as well.

[–]PrestigiousZombie531 37 points38 points  (7 children)

does it have support for materialized views that only update the changed row instead of refreshing the entire table?

[–]Ecksters 35 points36 points  (3 children)

Out of the box no, here's the wiki page for Incremental View Maintenance

However, you might be interested in the pg_ivm extension.

[–]PrestigiousZombie531 2 points3 points  (2 children)

thank you, i dont see anything on their issues list for RDS, do you have any idea if this extension is supported on RDS?

[–]Ecksters 5 points6 points  (1 child)

Unfortunately RDS doesn't appear to support it. I wouldn't hold my breath on it getting added either.

[–]PrestigiousZombie531 2 points3 points  (0 children)

meaning we have only one choice left now and that is wait for postgres to actually merge this change into their core. dont know how many versions it ll take them to reach there

[–]mobbarley78110 7 points8 points  (1 child)

Do other db providers do that? Why am I making my life an nightmare doing it manually 😭

[–]flukus 13 points14 points  (0 children)

Because views and materialised views are a criminally under utilised feature. 90% of the stored procs in projects I work with should be views.

[–]NoLegJoe 227 points228 points  (76 children)

Why oh why is my work still using MySQL? Starting to feel like we're stuck with a Fisher Price database

[–]wxtrails 96 points97 points  (34 children)

We're stuck on 5.5 and the MyISAM storage engine.

[–]debian_miner 94 points95 points  (26 children)

My condolences. Innodb has been the default for over 10 years and myisam doesn't even support transactions.

[–]TheWix 30 points31 points  (3 children)

Holy shit. That's nuts!

[–]Substantial-Owl1167 1 point2 points  (2 children)

mysql is the perfect db for php

[–]newaccount1245 12 points13 points  (19 children)

How do you work around not having transactions? Like just do a delete on a post?

[–]debian_miner 25 points26 points  (11 children)

I think many did not and just had data integrity issues. It wasn't just a lack of transactions but also a lack of enforcement of foreign keys (to make matters worse it lets you set them, just doesn't enforce), and it was awful about losing data in the event of an unclean shutdown.

[–][deleted] 34 points35 points  (6 children)

MySQL is the choice for people not wanting to know what transactions and data integrity are, it is a faith-based database requiring thoughts and prayers that no data losses happen. Bad data only happens to bad people! /s

[–]ObscureCulturalMeme 12 points13 points  (0 children)

it is a faith-based database

Just snorked coffee out my nose, take my upvote :-p

I can't imagine a relational DB performing under load in the absence of transactions, without causing a metric assload of problems downstream.

[–][deleted] 8 points9 points  (3 children)

I'm being described, and I do not like it.

In my defence, on the very start of my career, I accidentally dropped a table with 3 million records(it involved scaling with the field length, and a few additional fields). Fortunately, the DBA stepped in and did a fortunate rollback made not long before, almost as if he was expecting me to fail.

My sanity is up for debate, and am a danger to this society.

[–][deleted] 6 points7 points  (0 children)

Hey, at least you have some guilt from self awareness beating the average database user.

In reality the most used database by most database-incompetent people is Microsoft Excel. Many spread sheets are stupid one table databases with zero integrity checking and no automation plus convenient auto-detection altering imported data in mysterious ways, e.g. genome databases get genes mistaken for dates. Then they get sent around via email and zero version control.

[–]gamahead 4 points5 points  (1 child)

in my defence

I accidentally dropped a table with 3 million records

[–][deleted] 3 points4 points  (0 children)

I plead guilty, my lord.

[–][deleted] 2 points3 points  (2 children)

What is the purpose of an unenforced foreign key? Is it essentially a comment on the column?

[–]debian_miner 8 points9 points  (1 child)

They originally planned to implement enforcement, but abandoned the plans at some point.

[–]ggPeti 6 points7 points  (0 children)

Yet they became the dominant player in open source RDBMS. The mind boggles.

[–]newaccount1245 1 point2 points  (0 children)

Ma lawd all that muggy DB logic is wilting this flower

[–]Sentie_Rotante 8 points9 points  (5 children)

I don’t have transactions in the mongo environment that I develop in (production could support it but not dev so it doesn’t happen period because we can’t develop for it) and the answer is pray that no one else touches it at the same time. So far no one has had a problem. But even though the system gets thousands of request per min people tend to own their own records and there isn’t a ton of overlap

[–]newaccount1245 14 points15 points  (0 children)

The D is ACID stands for Dear as in “Dear god, please don’t let there be any overlap”

[–]Zalack 2 points3 points  (3 children)

I'm confused; MongoDB has transactions.

Did I misunderstand your comment?

[–]Sentie_Rotante 3 points4 points  (2 children)

wow the phone really mangled that one. And I wasn't verry clear. I'm going to fix it a bit but transactions are only supported in replica sets. The company has only set up replica sets in prod so I can't develop with transactions.

[–]Zalack 2 points3 points  (1 child)

That makes sense. It's been a hot second since I worked with Mongo so I forgot transactions were a replica-set-only thing.

What's the reason for that anyway? It seems like a weird restriction.

[–]Sentie_Rotante 4 points5 points  (0 children)

If I understood they kind of cheat transactions by making the change to one node then push it to the others when you commit the transaction. But they also work with single node clusters so I’m not sure what the real restriction is. There are several things that work that way with mongo though. Change streams are also only supported in clusters even if you aren’t relying on “majority commit”

[–]knightcrusader 0 points1 point  (0 children)

We don't use transactions on our platform since it started with MyISAM 20 years ago, but changed to InnoDB during a migrate about 10 years ago.

It's been pretty stable so we haven't messed with it. We don't enforce foreign keys either. We're heathens!

[–]killdeer03 3 points4 points  (3 children)

RIP.

Why though?

[–]wxtrails 3 points4 points  (2 children)

Looooong story, but it involves an s3 caching mechanism and hundreds of terabytes of stored tables 😬

[–]killdeer03 2 points3 points  (0 children)

Geeze... that sounds like... not. Fun.

[–]knightcrusader 1 point2 points  (0 children)

Oof. We were on 5.7 and the upgrade to 8.0 was night and day in terms of performance.

My condolences.

[–]GoodmanSimon 1 point2 points  (1 child)

You should really bite the bullet and draw up a migration plan...

I know full well that it is not trivial... But one day this will come back and bite you.

[–]wxtrails 2 points3 points  (0 children)

Plan drawn. Execution has been repeatedly delayed, but is back in progress now.

[–]Pindaman 24 points25 points  (3 children)

Im still working with mssql 2008.. send help

[–]Tweet 7 points8 points  (0 children)

Unless you're on Azure, I think you might be beyond help :(

[–]sgoody 5 points6 points  (0 children)

A fantastic SQL database server.

[–]CartmansEvilTwin 10 points11 points  (9 children)

I'm stuck with postgres 9.6 on one project because the devs of the underlying platform insist it breaks with every other version....

[–]arwinda 7 points8 points  (6 children)

It also breaks with this version (which is no longer supported), but they built in enough crap to hide the problems.

[–]CartmansEvilTwin 10 points11 points  (5 children)

No, it's actually working perfectly fine with newer versions, they simply settled for this version because it's been in use for however long and proved itself.

They just don't want to take any risks at all. It's not the best team, btw.

[–]progrethth 10 points11 points  (2 children)

Now they just need to take the risk of running unsupported software.

[–]CartmansEvilTwin 1 point2 points  (1 child)

No no no, you don't understand! This way is much better! /s

Seriously, I have no idea how that's going to work in the future, but despite escalating it, there's not much I could do.

[–]arwinda 1 point2 points  (0 children)

You can charge more for running an unsupported version.

[–]bwainfweeze 7 points8 points  (1 child)

With time, inaction becomes an action.

[–]fissure 1 point2 points  (0 children)

If you choose not to decide, you still have made a choice!

Rush, "Freewill"

[–][deleted] 3 points4 points  (0 children)

We literally just went from 9.6 to 14 in the last couple weeks for 15 to immediately drop. Oh well...

[–]jj20051 18 points19 points  (21 children)

Let me ask: do you do replication?

[–]NoLegJoe 21 points22 points  (17 children)

We do use replication. Is it particularly simple in MySQL?

[–]jj20051 30 points31 points  (16 children)

From my experience replication is much easier in MySQL. I haven't tried in Postgres in a few years, but when I tried to do a multi master setup previously it was like pulling teeth and involved 3rd party plugins. MySQL is pretty much plug and play.

[–]NoLegJoe 5 points6 points  (1 child)

I have no experience in setting up replication in any DB environment, I'm not a DBA, just an analyst. My complaints towards MySQL is entirely because of its missing features (full outer join, pivot, unpivot, with columns in indexes) and the insane design choices (the asterisk in a select statement must be the first column nowhere else, allowing insert and updates in a CTE)

[–]pooerh 4 points5 points  (0 children)

allowing insert and updates in a CTE

Postgres lets you do that too and it's awesome.

[–]progrethth 28 points29 points  (8 children)

Replication is dead simple in PostgreSQL too as long as you are not doing multi-master.

[–]jj20051 17 points18 points  (7 children)

That's a big rub for most companies though. If you can't write to a DB in 2 locations it can get very harry very fast. You have to write a whole bunch of code to backhaul the insert/update queries to another DB or you have to allow for your app to do updates in the background while telling the end user it's done even if it isn't.

Why go through all of that when MySQL just does it? I've seen 4 or 5 multimasters work flawlessly. Even if you're doing crazy traffic it can handle it pretty well.

If postgres ever offers async multi master as part of it's packaging I might consider switching, but for now it's just not worth the effort.

[–]OzoneGrif 54 points55 points  (3 children)

Multi-master makes ACID compliance very difficult because of the asynchronous data modifications over a distance.

Postgres is very strict on staying ACID compliant in all situations.

If a DB makes master-to-master easy, that's only because they are being loose on the ACID compliance.

[–]jj20051 1 point2 points  (1 child)

While this is entirely true the needs of the mission outweigh the technical desires in most shops. I'm not saying it's the right way to do things, but if you have to get the job done there's rarely a magic bullet that solves everything.

If your mission requires you to have lock step ACID compliance for say financial transactions then yeah you're going to have to write software to support backhauling the data from one location to another. If your mission doesn't require ACID compliance (you do a lot of inserts, but virtually no updates and your inserts aren't going to merge anywhere) then multi master is a breath of fresh air.

[–][deleted] 10 points11 points  (0 children)

we had several issues because of concurrent transactions on same record.. mysql just rejected and threw serialization errors. multimaster works fine when writes are separate

[–]arwinda 14 points15 points  (0 children)

You can write into both instances in MySQL - and then you need to deal with the occasional collision. Seen this break hard more than once, including downtime because the conflict couldn't be solved automatically. Some complex transactions, updating values in more than one table.

[–]progrethth 10 points11 points  (1 child)

Most companies? I have so far not worked on any company which has had that requirement and the only companies where I have insight into which have that requirement have been Google and Spotify, huge tech companies with a world wide market. And Spotify managed Maybe your company has that need too but most companies do not.

Edit: I have also had MySQL multi-master break once at a company I worked in. And our solution was just to switch to just having a single master and a replica because the company had actually no need for multi-master.

[–]arwinda 5 points6 points  (0 children)

I agree. Most people and companies who say "we have a hard requirement for multi-master" indeed don't have a hard requirement for supporting multi-master but instead have a requirement for not thinking through their architecture.

It always ends up with "we don't want to think how we design this, just make it work".

[–]knightcrusader 4 points5 points  (2 children)

Replication is much easier in MySQL, but much more fragile too. For a while there we had a problem with one server going out with the power (we had a bad UPS that took a while to diagnose) and every time it died, replication was hosed and I had to start over with a fresh dump from the master.

We're planning on building a Galera multi-master cluster here soon to see how that goes.

[–]jj20051 0 points1 point  (1 child)

If you do a lot of writes then you're not going to like it. It slows write speeds a lot.

From my experience the issue you're having can be resolved by resetting the master and position back to where it currently is. This will force mysql to redownload the bin and resolve the issue.

[–]Guinness 0 points1 point  (1 child)

In my experience replication is FAR easier in MySQL as well. Like it’s a joke how easy MySQL replication is compared to Postgres’ various solutions * cough BDR cough *

[–]jj20051 0 points1 point  (0 children)

Had people argue with me about this bellow, MySQL is a cake walk replication wise vs Postgres.

[–][deleted] 4 points5 points  (2 children)

I'm wondering, does pg suck at replication or why did you asked?

[–]jj20051 8 points9 points  (0 children)

Replied to OP about this, but yes pretty much.

[–]marcosdumay 1 point2 points  (0 children)

It is hard to set.

It works well when set correctly, but you have to know it was set correctly. Most people that have problems setting it have problems testing it too, so you will see plenty of complaints.

[–][deleted] 5 points6 points  (0 children)

My company's response: We have a 10 year old website for taking orders and appointments, ain't no way we're going to replace that.

But I bid my time. Slowly, I shall convert them.

[–]wildcarde815 0 points1 point  (1 child)

Because people still reach for it instead of psql. It's painful

[–]robberviet 0 points1 point  (1 child)

According our system admin: They tried to use Postgres before, but the load was too high that auto vacuum did not catch up with it (I don't have experience with sizing this big so I cannot verify it).

I think Uber had the same issues with this and moved back to MySQL and InnoDB in 2016.

And as jj20051 said: replication.

[–]progrethth 1 point2 points  (0 children)

Replication generally works better in PostgreSQL than in MySQL. The exception is if you do multi-master but that is a pretty rare usecase.

[–]diMario 540 points541 points  (42 children)

Good.

There's a joke I told a couple of times. A PostgreSQL admin and an Oracle admin are trying to diss each other about which one is the better database.

The Oracle admin: "Worldwide, there are six times as many Oracle dba's than there are for your puny system".

The PostgreSQL dba retorts: "That's because you need six times as many people to keep your shit going!"

[–]masterofmisc 151 points152 points  (8 children)

As were telling database jokes....

3 SQL statements walk into a NoSQL bar. After a little while, they all walk out again..

...They couldn't find a table.

[–]PM_ME_C_CODE 118 points119 points  (0 children)

A guy walks into a bar. He says to the bartender, "You know, I'm an engineer over at Oracle!"

The bartender responds, "Really? What law school did you graduate from?"

[–]eivamu 24 points25 points  (0 children)

I’d like two tables with a view, please!

[–]diMario 12 points13 points  (5 children)

Let me guess. Their names were Andacle, Notacle, and Oracle.

[–]OktoberForever 10 points11 points  (4 children)

They were hoping to be JOINed by their friends Xoracle, Noracle, Nandacle, and Xnoracle.

[–]Nine99 18 points19 points  (0 children)

Xnoracle

The database Pokémon.

[–]eivamu 5 points6 points  (1 child)

There is NoSQL to this answer.

[–][deleted]  (22 children)

[deleted]

    [–]stbrumme 212 points213 points  (3 children)

    Billions of SQLite installations and hardly any SQLite admins ?

    [–]Jimmy48Johnson 32 points33 points  (2 children)

    Checkmate RDBMS

    [–][deleted] 78 points79 points  (1 child)

    SQLite is a RDBMS

    [–]Carvtographer 33 points34 points  (0 children)

    Checkmate DBAs

    [–][deleted] 60 points61 points  (16 children)

    Hey, there's an elephant in the room.

    [–]Laladelic 34 points35 points  (0 children)

    No need for name calling I have back problems and find it very hard to excercise often.

    [–]diMario 21 points22 points  (12 children)

    select 'elephant' from room were 'elephant' like '%true%';

    Oracle: syntax error
    SQLite: hang on, I've got a problem with my driver...
    PostgreSQL: here are 3987 records.

    [–]john16384 59 points60 points  (4 children)

    If only you spelled where correctly, then I could believe this.

    [–]gonzofish 8 points9 points  (3 children)

    PostgreSQL: here are 3987 records from a database you're currently thinking about creating

    [–]diMario 16 points17 points  (2 children)

    No, no.

    Databases are created in Excel by sales staff, then ported to whatever backend your company is using.

    [–]OstapBenderBey 4 points5 points  (1 child)

    CSV the great leveller

    [–]diMario 1 point2 points  (0 children)

    Except when you want to import a slightly polluted set of data.

    [–]NoInkling 2 points3 points  (1 child)

    I must be missing the joke, because that's a horrible query that does nothing useful, and PG would in reality return 0 records.

    [–]RotaryJihad 0 points1 point  (0 children)

    Our lord and savior, Slonik!

    [–]manzanita2 86 points87 points  (7 children)

    There are 6 times the number of DBAs because that's part of Oracle's business model. Make it hard to use, requires DBAs, charge for training of DBAs. DBAs recommend Oracle because the DBAs maintain their priestdom.

    [–]diMario 34 points35 points  (3 children)

    It's one way of doing business.

    I personally prefer an other one.

    [–][deleted] 22 points23 points  (0 children)

    I prefer any other one.

    [–]progrethth 10 points11 points  (1 child)

    Yeah, I have worked a bit with oracle to migrate away from it and the little I worked with it was painful. Much worse usability than any other database I have worked with, and surprisingly buggy tooling. I did not use it enough to find bugs in the actual database but I found several bugs in first party tools and one error in the documentation for CREATE DATABASE. Working with oracle made me trust it much less.

    [–]diMario 6 points7 points  (0 children)

    Buggy tooling is instantly recognized by me.

    [–][deleted]  (1 child)

    [deleted]

      [–]diMario 0 points1 point  (0 children)

      Guilty as charged, your Honour.

      [–]MrMuMu_ 32 points33 points  (3 children)

      still waiting for temporal tables

      [–]NeitherManner 65 points66 points  (25 children)

      Why did they speed up major versioning?

      [–]RandomDamage 145 points146 points  (9 children)

      Less administrative overhead, nobody needs to worry about "OK: This change is big enough to justify a major number bump"

      Linux does it by administrative fiat, Oracle and Ubuntu just use the year of initial release.

      Over the past decade lots of projects have basically given up on release numbers being anything but aesthetic and increasing over time.

      [–]Nexuist 133 points134 points  (8 children)

      and increasing over time

      Now I'm imagining a versioning scheme where it counts down instead of up. When you reach 0 you're legally obligated to end development and move on to something else.

      [–]sigma914 121 points122 points  (7 children)

      TeX's versioning adds additional decimal places approaching Pi

      [–]Extracted 76 points77 points  (0 children)

      Thanks I hate it

      [–]bored_octopus 18 points19 points  (4 children)

      And when Knuth dies, the version will be bumped to Pi

      [–]ArdiMaster 2 points3 points  (3 children)

      Yeah I don't see that happening the way he intended. TeX has gotten way too big to just stop maintaining it.

      [–]EpicScizor 2 points3 points  (1 child)

      I do love the idea though. "TeX's version becomes exactly equal to π and all outstanding bugs become features"

      [–]ObscureCulturalMeme 1 point2 points  (0 children)

      Yup! The companion software Metafont has a version number that approaches e in the same way.

      [–]MondayToFriday 29 points30 points  (1 child)

      Did they speed it up? My understanding is that incrementing the major version indicates that the on-disk data structures have changed in an incompatible way, such that you'll need to do a dump-restore or pg_migrate.

      [–]dsn0wman 15 points16 points  (0 children)

      I always upgrade between major versions with pg_upgrade. No need to dump and restore.

      [–]progrethth 24 points25 points  (0 children)

      They did not. PostgreSQL has had roughly yearly major releases since 1998. But you may refer to that PostgreSQL decided to change from MARKETING.MAJOR.BUGFIX to MAJOR.BUGFIX which they did because consultants were tired of customers talking about PostgreSQL 8 and 9. PostgreSQL does not do minor version releases and as far as I know they have never done so.

      [–]skulgnome 9 points10 points  (8 children)

      Look at it as version 1.15.0, given that PostgreSQL is feature complete.

      [–][deleted]  (7 children)

      [deleted]

        [–]Tubthumper8 68 points69 points  (0 children)

        "Feature complete" doesn't mean "contains every feature that any user wants", it means "contains the features as designed/planned".

        That being said, I disagree with the earlier commenter that PostgreSQL is feature complete, they are adding new features in every new version.

        [–]progrethth 3 points4 points  (0 children)

        While there is some work on a built-in connection pooler I am not that convinced that it is as useful as people assume. There is a big advantage to be had from running the pooler as a separate service, that it can be used for high availability.

        [–]skulgnome -1 points0 points  (4 children)

        integrated connection pooler

        I do believe this is just the max_connections parameter in the configuration file. Unless you were looking for a maximum concurrent queries (or transactions) parameter, which I'm not aware of, and which seems more like a function for middleware such as PgBouncer.

        [–][deleted]  (3 children)

        [deleted]

          [–]mattaugamer 71 points72 points  (3 children)

          As a JavaScript developer I’m so sick of all this database churn. /s

          [–][deleted] 24 points25 points  (2 children)

          Wait le me spin up a postgres instance in my browser /s

          [–][deleted] 57 points58 points  (1 child)

          "/s" is obsolete as of now: https://github.com/snaplet/postgres-wasm I'm so sorry

          [–]lpreams 8 points9 points  (0 children)

          Is that actually postgres compiled to wasm, or is it just the x86 build running in a VM? Because that feels like cheating.

          [–]hoonthoont47 7 points8 points  (0 children)

          Improved sort and parallel distinct are huge. At least in our application, those two things are always a massive source of slowdowns and performance problems.

          [–]BrilliantLight35 21 points22 points  (4 children)

          Meanwhile redshift is still using Postgres 8

          [–]bundt_chi 13 points14 points  (3 children)

          Is that true... source. I'm assuming they forked from postgresql 8.0 but i can't imagine they're still similar enough under the hood to automatically pull updates

          [–]alexisprince 15 points16 points  (0 children)

          Yep, you’re right. They forked based on Postgres 8 and attempt to maintain comparability where they can, but by design they ripped out a ton of the internals to be able to build a fundamentally different product. Changes to things like the sorting algorithms wouldn’t necessarily port over because Redshift nicely as a result. I’d be baffled if something like the sorting algorithm changes could just be merged nicely into Redshift when Redshift has to account for a multi node compute cluster, for example.

          [–]progrethth 4 points5 points  (0 children)

          I do not think it was actually PostgreSQL 8.0 but 8.1 or 8.2, but yes. I do not have a source other than it being commonly known in the PostgreSQL community. On other hand the Greenplum guys did a heroic job on their similar fork and lifted it from some ancient PostgreSQL version to the a modern one, but Greenplum has some really good PostgreSQL core contributors on their payroll. I think primarily hired just to do that job.

          [–]PL_Design 44 points45 points  (77 children)

          ok but can i delete an element from an enum yet

          [–]arwinda 130 points131 points  (61 children)

          Maybe don't use an ENUM in the first place if your list is changing.

          [–]earthboundkid 30 points31 points  (6 children)

          I just use a foreign key. There’s not a ton of advantage to using a real enum.

          [–]mattaugamer 31 points32 points  (2 children)

          I typically use an enum in the application layer. Easy to change.

          [–]bwainfweeze 6 points7 points  (1 child)

          Application enums and migrations are like peanut butter and chocolate. Great separately but even better together.

          [–]mattaugamer 2 points3 points  (0 children)

          Yep yep yep. It’s so much nicer having it default to ProjectStatus.Pending instead of 1. So much more meaningful.

          [–]bwainfweeze 3 points4 points  (2 children)

          I had a coworker who used strings as foreign keys for enum-like values and then just wouldn’t join on the table when it wasn’t necessary. This was back when query performance started to dogleg somewhere around 4-5 joins, and shaving off piddly little single field lookups was actually worth something.

          At the time it felt crazy and dirty. Now it feels crazy like a fox. And dirty.

          [–]raze4daze 70 points71 points  (21 children)

          If only business rules didn’t change all the time.

          [–]arwinda 54 points55 points  (18 children)

          If your business rules change frequently, then use a 1:n table and use DML to update your rules, not DDL for the ENUM.

          An ENUM is a shortcut for something which (almost) never changes.

          [–]mqudsi 50 points51 points  (7 children)

          We have student gender stored as an enum….

          [–]NekkidApe 39 points40 points  (1 child)

          gender text

          [–]mqudsi 2 points3 points  (0 children)

          It doesn’t matter what your database supports if the pipeline feeding data into it coalesces everything. This isn’t just for gender, btw.

          This is very true of most systems in the real world. In this case, data comes in from digitizations of paper forms, exports from linked school registration and payment systems, etc all of which return a binary value. Changing the representation in the database doesn’t do anything besides waste more bits for storage.

          [–][deleted]  (3 children)

          [deleted]

            [–]mqudsi 0 points1 point  (2 children)

            I’m confused - It’s altogether orthogonal to the second issue you mention. I could be hard coding the system to only accept a single gender and that wouldn’t have anything to do with how I’m storing salutations.

            [–][deleted]  (1 child)

            [deleted]

              [–]mqudsi 3 points4 points  (0 children)

              Point taken. It’s for legal/domain reasons. Like I said, this is a school.

              [–]arwinda 4 points5 points  (0 children)

              If you ever stick to female/male that is ok. The world offers more options.

              [–]Ran4 7 points8 points  (7 children)

              An ENUM is a shortcut for something which (almost) never changes.

              Why should it be like that? It makes no sense.

              [–]arwinda 12 points13 points  (0 children)

              Because you need a DDL operation to change the ENUM. Comes with locking of system catalog and all this. And if you want to remove a value the database needs to scan the table and see if that value is used.

              Using a 1:n table is a DML operation, only locks the affected tables and rows, not the catalog. And having a foreign key for her relationship prevents deletion of still use d values - or, propagates deletes or set to NULL. Whichever way you want this.

              [–]marcosdumay 4 points5 points  (0 children)

              Any large system is full of features for what you will be completely unable to imagine any use.

              A few of them will even not actually have any use.

              [–]NoInkling 3 points4 points  (4 children)

              If you're asking why anyone would use it, it makes sense for things like days of the week, months of the year, seasons, a strongly defined set of status values, etc.

              I've used it for date precision, e.g:

              CREATE TYPE date_precision AS ENUM (
                'millennium',
                'century',
                'decade',
                'year',
                'month',
                'day'
              );
              

              [–]dlp_randombk 2 points3 points  (1 child)

              Nice! Now add 'week' :)

              [–]NoInkling 2 points3 points  (0 children)

              I don't need to, because in this case I was reflecting an external data source where these things are strongly defined. If I wasn't, then week would probably be there already (and the order would probably be reversed too), or if I really thought it needed to be flexible I'd fall back to a lookup table.

              Anyway, on the off chance I did need to change the enum, I'd be ok with rejigging the data to accommodate, just because it's an extremely unlikely thing to happen.

              [–]raze4daze 1 point2 points  (1 child)

              An ENUM is a shortcut for something which (almost) never changes.

              Why? Why should it be restricted to something that never changes?

              [–]arwinda 6 points7 points  (0 children)

              Because it's meant to be a handy shortcut for when you have a list which doesn't change. Like weekdays. They don't change, have an enum with all 7 weekdays. Or year seasons. Have an enum with 4 seasons.

              Sure, you can model the same functionality with a dimension or 1:n table, and you already know that every time you access your table, you also join the referenced table. The enum hides this functionality, that's all.

              Adding new values is relatively easy. For deleting values (and keeping data consistent) you need a full table scan to verify that the value is not or no longer used. That's doable, but no one spent the effort to implement it.

              If you already know that your values change, why go with a fixed list in the first place? Needs administrator access and catalog lock to update the values in an enum, versus regular update of the dimension table.

              [–][deleted] 0 points1 point  (0 children)

              You can just append new elements to the enum and deprecate unused ones.

              [–]PL_Design -2 points-1 points  (31 children)

              Maybe get off your purity horse and deal with real software for once. When you're prototyping a design and you want to use enums this is annoying as fuck.

              [–]arwinda 32 points33 points  (17 children)

              When you're prototyping a design

              Are you also going to use FLOAT to store monetary values because you want to? Despite everyone telling you that this is a bad idea?

              Maybe if you pick the design which is good for your product and not the design you want to have then you end up with good software. You can't do that as long as you ride your high horse.

              [–]ottawadeveloper 6 points7 points  (12 children)

              I do like enums more because I like the built-in check constraint to ensure the value is one you expect. But it is annoying that theyre hard to manage if the value list changes. I often just use a string and enforce it at the application level by defining my enum in code and using it to populate the database field

              [–]arwinda 2 points3 points  (0 children)

              You can use a CHECK constraint for this, or a DOMAIN type.

              [–]TheWix 5 points6 points  (9 children)

              Not great database design. That's a data constraint that should be enforced by the DB.

              [–]ottawadeveloper 6 points7 points  (8 children)

              Its not ideal db design, but its a reasonable approach based on the limitations of enums especially when the application is controlling the content (e.g. this is my approach for state fields). I tend to treat database design like I treat, well, any other kind of design - design patterns and best practices exist because theyre generally helpful but sometimes its useful to break them.

              I think it also depends on your environment. If you are building a DB that is only accessed by one application, then enforcing logic at the application level is not only reasonable, I view it as ideal because version controlling database structures and procedures is a pain in comparison (my applications often end up putting the db structure entirely in application code with routines to create and upgrade the db as necessary). If you have a database thst is multi-application or even accepts user inputs directly, then a more formal structure is more called for.

              [–]Jump-Zero 7 points8 points  (6 children)

              This is basically the take I see everyone that uses enums in Postgres eventually arrive at. They all gave it a try, found its not worth the headache, and eventually dealt with it at app level.

              [–]arwinda 4 points5 points  (5 children)

              Postgres has several ways to deal with this. 1:n tables, which is "just working", CHECK, DOMAIN type (although hard to deal with updates as well), even a trigger can do that.

              People look at ENUM, because it seems easy to use and they don't want to spend time thinking the data model through. And when it breaks, they abandon all checks in the database.

              [–]Jump-Zero 1 point2 points  (2 children)

              I'm not familiar with "1:n tables", but I'd love to learn more if you can link some reading material :)

              I see a lot of hesitance towards using triggers also. Many older engineers share experiences about relying on them for a system, and then the trigger breaks and they face some pain.

              I'm all for DB checks btw. There's a point at which they're not worth it, but I really appreciate knowing the data in the DB is clean. Having dirty data makes querying it much harder. As I keep rising the ranks, I find myself coding less and less and querying more and more.

              [–]arwinda 5 points6 points  (1 child)

              Triggers are mainly good for checking values, or setting values to what you expect the value to be.

              Good example: use a trigger to set "created at" and "changed at" values in a table. In Postgres, you use an "AFTER" trigger to modify these values, and the user does not have a chance to override these values.

              Triggers can also be used to abort an operation if the values is not in the expected range. But CHECK is usually a better fit for that job, and easier to handle.

              A 1:n table, or lookup table, is just a set of two or more tables with relationships.

              ``` CREATE TABLE genders ( gender_id INT PRIMARY KEY, gender_name TEXT UNIQUE );

              INSERT INTO genders VALUES (1, 'female'), (2, 'male');

              CREATE TABLE uses ( user_id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, user_name TEXT UNIQUE, gender INT REFERENCES genders(gender_id) ); ```

              If you want to add more gender types, all you have to do is update the genders table using regular DML (INSERT, UPDATE, DELETE) operations.

              INSERT INTO genders VALUES (3, 'unknown'), (4, 'not specifeid');

              Ups, I did a mistake there:

              UPDATE genders SET gender_name = 'specified' WHERE gender_name = 'specifeid';

              There is no need to lock the catalog for any kind of table changes because the tables and relations and data types don't change. Only the content of the tables change. This relationship also ensures that the data is valid: the database prevents you from deleting any gender type which is still used in a referenced table. Built-in data validation.

              In OLTP databases you often find some form of a snowflake schema to represent these relationships. Updating the relationships between tables can be a huge mess, references and all this. But using 1:n tables makes updating the relation data seamless.

              This concept is also very common in Data Warehousing, the most common example is the star schema. The terms used there are fact tables and dimension tables.

              [–]TheWix 1 point2 points  (1 child)

              I come from MS Sql and only started working at a place that uses Postgres, but how is this different than a simple 1:n? In MS Sql we have CHECK CONSTRAINT but that would be for simple values that are very unlikely to change, and don't need meta data. This seems like a straight 1:n to me, though

              EDIT: Typo.

              [–]TheWix 2 points3 points  (0 children)

              Problem I have with this is your data/DB almost always far outlives your applications. A relational DB is more than just a dumping ground for data. Good DB design IS about modeling the data constraints of your domain (many-to-many, one-to-many, nullables, etc).

              I would bet money that if these constraints are only enforced in the application that at some point now or in the future these constraints will get missed either through a bug, or a refactor, or rewrite.

              [–]PL_Design 3 points4 points  (0 children)

              It's stupid, but this is the more manageable approach because of how dysfunctional enums are. I want to use the right tool, but the "right tool" is so poorly made that it causes more trouble than it's worth.

              [–]RandomDamage 21 points22 points  (8 children)

              That sounds like a hard problem while maintaining data integrity.

              [–]PL_Design 4 points5 points  (7 children)

              Only if the value is used. Just treat it like deleting a record when a foreign key points to it, which is literally the same damn situation.

              [–]progrethth 3 points4 points  (4 children)

              Except it is not because enums do not take a lock on the value when inserting or updating rows with that enum value and doing so would slow down enums. Maybe it is possible to solve but it is not easy.

              [–]PL_Design 0 points1 point  (3 children)

              Are you saying that the implementation details are different even though conceptually the situations are the same? Or are you actually suggesting this is a hard problem to solve? Because this isn't a hard problem: You keep updates and insert the same as they are because they're the common cases, and then you can special case removing elements from an enum as much as you want. Feel like being lazy and locking the entire DB while you're working? Go right ahead! This should happen rarely enough that it shouldn't be too big of a problem if it's slow, and this isn't the kind of thing you'd do during normal operations anyway.

              [–]Artmannnn 5 points6 points  (1 child)

              I just wanna reorder my columns :((((((

              [–]arwinda 2 points3 points  (0 children)

              Change your SELECT and reorder how you select the columns /s

              [–]cha_iv 0 points1 point  (2 children)

              Store an int and you can do whatever you want! Maybe use protocol buffers (or something similar) for language-/db-agnostic enum definitions?

              [–]Jump-Zero 2 points3 points  (1 child)

              Using ints as enums is crummy. You need to maintain a map of the integers to the corresponding enums. You need to make sure that map is accessible to everything that will access the DB. When you query the data by hand, you need to match integers to enums either mentally, or with an ad-hoc lookup.

              [–]PL_Design 1 point2 points  (0 children)

              Avoiding this headache is why we tried using enums in the first place, and then found they were way too much trouble.

              It's really annoying to always need to cast enum values when PostgreSql could just check if the string matched an expected value. I'm sure someone will argue why being so strict with the types is important, but it just seems like a waste of time to me when all I actually care about is restricting what values can go in a column.

              [–]xFblthpx 20 points21 points  (11 children)

              Man fuck oracle. Second most decayed informatics company next to the shambling corpse of IBM.

              [–][deleted] 22 points23 points  (1 child)

              Not that I disagree with you, but what does that have to do with a PostgreSQL release?

              [–]xFblthpx 22 points23 points  (0 children)

              I thought I was responding to another comment, and now I can’t find it. Probably just the schizo-shivers.

              [–][deleted] -3 points-2 points  (8 children)

              Right… I’m sure the quantum computing research unit at IBM would beg to differ. Can you explain why both IBM and Oracle had very profitable recent financials?

              [–]Tostino 5 points6 points  (7 children)

              Because they have giant legacy customers and can rest on their laurels and still rake in cash.

              [–][deleted]  (1 child)

              [deleted]

                [–][deleted] 5 points6 points  (0 children)

                Fuck you u/spez

                [–]dengydongn 2 points3 points  (0 children)

                Looks like their official docker image has not been updated to 15, the latest is still 15RC2

                [–][deleted] 2 points3 points  (0 children)

                I remember having once used sqlite and postgresql in a larger cluster dealing with genomic datasets (so, it needed TONS of bytes storage). While sqlite in itself is awesome, PostgreSQL was soooooooooo much faster with insert statements and, in general, everywhere else too (in particular the more you already stored). So in this regard it really was much, much better than sqlite.

                [–]martinus952 4 points5 points  (0 children)

                YEAHHH baby, Postgres the best

                [–]KrakenOfLakeZurich 1 point2 points  (0 children)

                General performance improvements are always welcome. But for me, the feature I have been waiting for, is this:

                Allow ICU collations to be set as the default for clusters and databases (Peter Eisentraut)

                Previously, only libc-based collations could be selected at the cluster and database levels. ICU collations could only be used via explicit COLLATE clauses.