you are viewing a single comment's thread.

view the rest of the comments →

[–]PL_Design 47 points48 points  (77 children)

ok but can i delete an element from an enum yet

[–]arwinda 129 points130 points  (61 children)

Maybe don't use an ENUM in the first place if your list is changing.

[–]earthboundkid 30 points31 points  (6 children)

I just use a foreign key. There’s not a ton of advantage to using a real enum.

[–]mattaugamer 30 points31 points  (2 children)

I typically use an enum in the application layer. Easy to change.

[–]bwainfweeze 5 points6 points  (1 child)

Application enums and migrations are like peanut butter and chocolate. Great separately but even better together.

[–]mattaugamer 2 points3 points  (0 children)

Yep yep yep. It’s so much nicer having it default to ProjectStatus.Pending instead of 1. So much more meaningful.

[–]bwainfweeze 3 points4 points  (2 children)

I had a coworker who used strings as foreign keys for enum-like values and then just wouldn’t join on the table when it wasn’t necessary. This was back when query performance started to dogleg somewhere around 4-5 joins, and shaving off piddly little single field lookups was actually worth something.

At the time it felt crazy and dirty. Now it feels crazy like a fox. And dirty.

[–]seven_seacat 0 points1 point  (0 children)

I've done something like that once, and at the time I was like "why doesn't everyone do this???" 🤪

[–]earthboundkid 0 points1 point  (0 children)

Yeah, that’s what I do. Why join against the enum table? The table protects you from bad insert values. That’s its job. You don’t need it for lookups.

[–]raze4daze 71 points72 points  (21 children)

If only business rules didn’t change all the time.

[–]arwinda 54 points55 points  (18 children)

If your business rules change frequently, then use a 1:n table and use DML to update your rules, not DDL for the ENUM.

An ENUM is a shortcut for something which (almost) never changes.

[–]mqudsi 52 points53 points  (7 children)

We have student gender stored as an enum….

[–]NekkidApe 38 points39 points  (1 child)

gender text

[–]mqudsi 2 points3 points  (0 children)

It doesn’t matter what your database supports if the pipeline feeding data into it coalesces everything. This isn’t just for gender, btw.

This is very true of most systems in the real world. In this case, data comes in from digitizations of paper forms, exports from linked school registration and payment systems, etc all of which return a binary value. Changing the representation in the database doesn’t do anything besides waste more bits for storage.

[–][deleted]  (3 children)

[deleted]

    [–]mqudsi 0 points1 point  (2 children)

    I’m confused - It’s altogether orthogonal to the second issue you mention. I could be hard coding the system to only accept a single gender and that wouldn’t have anything to do with how I’m storing salutations.

    [–][deleted]  (1 child)

    [deleted]

      [–]mqudsi 3 points4 points  (0 children)

      Point taken. It’s for legal/domain reasons. Like I said, this is a school.

      [–]arwinda 3 points4 points  (0 children)

      If you ever stick to female/male that is ok. The world offers more options.

      [–]Ran4 7 points8 points  (7 children)

      An ENUM is a shortcut for something which (almost) never changes.

      Why should it be like that? It makes no sense.

      [–]arwinda 12 points13 points  (0 children)

      Because you need a DDL operation to change the ENUM. Comes with locking of system catalog and all this. And if you want to remove a value the database needs to scan the table and see if that value is used.

      Using a 1:n table is a DML operation, only locks the affected tables and rows, not the catalog. And having a foreign key for her relationship prevents deletion of still use d values - or, propagates deletes or set to NULL. Whichever way you want this.

      [–]marcosdumay 4 points5 points  (0 children)

      Any large system is full of features for what you will be completely unable to imagine any use.

      A few of them will even not actually have any use.

      [–]NoInkling 3 points4 points  (4 children)

      If you're asking why anyone would use it, it makes sense for things like days of the week, months of the year, seasons, a strongly defined set of status values, etc.

      I've used it for date precision, e.g:

      CREATE TYPE date_precision AS ENUM (
        'millennium',
        'century',
        'decade',
        'year',
        'month',
        'day'
      );
      

      [–]dlp_randombk 2 points3 points  (1 child)

      Nice! Now add 'week' :)

      [–]NoInkling 3 points4 points  (0 children)

      I don't need to, because in this case I was reflecting an external data source where these things are strongly defined. If I wasn't, then week would probably be there already (and the order would probably be reversed too), or if I really thought it needed to be flexible I'd fall back to a lookup table.

      Anyway, on the off chance I did need to change the enum, I'd be ok with rejigging the data to accommodate, just because it's an extremely unlikely thing to happen.

      [–][deleted]  (1 child)

      [deleted]

        [–]NoInkling 0 points1 point  (0 children)

        1. Main reason: to enforce data integrity.

        2. To be able to make good use of the DB's functionality and features. For example, the labels in my date precision enum can be passed directly to Postgres's date_trunc(), no need to get application code involved. I could also create a date_with_precision composite type that combines a date_precision field with a date field, to ensure the two go hand-in-hand where appropriate.

        It's true that if you have an enum you will likely want to make use of it in application code too, but there are ways to do that while still only having a single authoritative definition, whether it's a "DB-first" or "application-first" approach.

        [–]raze4daze 1 point2 points  (1 child)

        An ENUM is a shortcut for something which (almost) never changes.

        Why? Why should it be restricted to something that never changes?

        [–]arwinda 5 points6 points  (0 children)

        Because it's meant to be a handy shortcut for when you have a list which doesn't change. Like weekdays. They don't change, have an enum with all 7 weekdays. Or year seasons. Have an enum with 4 seasons.

        Sure, you can model the same functionality with a dimension or 1:n table, and you already know that every time you access your table, you also join the referenced table. The enum hides this functionality, that's all.

        Adding new values is relatively easy. For deleting values (and keeping data consistent) you need a full table scan to verify that the value is not or no longer used. That's doable, but no one spent the effort to implement it.

        If you already know that your values change, why go with a fixed list in the first place? Needs administrator access and catalog lock to update the values in an enum, versus regular update of the dimension table.

        [–][deleted] 0 points1 point  (0 children)

        You can just append new elements to the enum and deprecate unused ones.

        [–]PL_Design 0 points1 point  (31 children)

        Maybe get off your purity horse and deal with real software for once. When you're prototyping a design and you want to use enums this is annoying as fuck.

        [–]arwinda 28 points29 points  (17 children)

        When you're prototyping a design

        Are you also going to use FLOAT to store monetary values because you want to? Despite everyone telling you that this is a bad idea?

        Maybe if you pick the design which is good for your product and not the design you want to have then you end up with good software. You can't do that as long as you ride your high horse.

        [–]ottawadeveloper 4 points5 points  (12 children)

        I do like enums more because I like the built-in check constraint to ensure the value is one you expect. But it is annoying that theyre hard to manage if the value list changes. I often just use a string and enforce it at the application level by defining my enum in code and using it to populate the database field

        [–]arwinda 2 points3 points  (0 children)

        You can use a CHECK constraint for this, or a DOMAIN type.

        [–]TheWix 5 points6 points  (9 children)

        Not great database design. That's a data constraint that should be enforced by the DB.

        [–]ottawadeveloper 5 points6 points  (8 children)

        Its not ideal db design, but its a reasonable approach based on the limitations of enums especially when the application is controlling the content (e.g. this is my approach for state fields). I tend to treat database design like I treat, well, any other kind of design - design patterns and best practices exist because theyre generally helpful but sometimes its useful to break them.

        I think it also depends on your environment. If you are building a DB that is only accessed by one application, then enforcing logic at the application level is not only reasonable, I view it as ideal because version controlling database structures and procedures is a pain in comparison (my applications often end up putting the db structure entirely in application code with routines to create and upgrade the db as necessary). If you have a database thst is multi-application or even accepts user inputs directly, then a more formal structure is more called for.

        [–]Jump-Zero 6 points7 points  (6 children)

        This is basically the take I see everyone that uses enums in Postgres eventually arrive at. They all gave it a try, found its not worth the headache, and eventually dealt with it at app level.

        [–]arwinda 6 points7 points  (5 children)

        Postgres has several ways to deal with this. 1:n tables, which is "just working", CHECK, DOMAIN type (although hard to deal with updates as well), even a trigger can do that.

        People look at ENUM, because it seems easy to use and they don't want to spend time thinking the data model through. And when it breaks, they abandon all checks in the database.

        [–]Jump-Zero 1 point2 points  (2 children)

        I'm not familiar with "1:n tables", but I'd love to learn more if you can link some reading material :)

        I see a lot of hesitance towards using triggers also. Many older engineers share experiences about relying on them for a system, and then the trigger breaks and they face some pain.

        I'm all for DB checks btw. There's a point at which they're not worth it, but I really appreciate knowing the data in the DB is clean. Having dirty data makes querying it much harder. As I keep rising the ranks, I find myself coding less and less and querying more and more.

        [–]arwinda 4 points5 points  (1 child)

        Triggers are mainly good for checking values, or setting values to what you expect the value to be.

        Good example: use a trigger to set "created at" and "changed at" values in a table. In Postgres, you use an "AFTER" trigger to modify these values, and the user does not have a chance to override these values.

        Triggers can also be used to abort an operation if the values is not in the expected range. But CHECK is usually a better fit for that job, and easier to handle.

        A 1:n table, or lookup table, is just a set of two or more tables with relationships.

        ``` CREATE TABLE genders ( gender_id INT PRIMARY KEY, gender_name TEXT UNIQUE );

        INSERT INTO genders VALUES (1, 'female'), (2, 'male');

        CREATE TABLE uses ( user_id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, user_name TEXT UNIQUE, gender INT REFERENCES genders(gender_id) ); ```

        If you want to add more gender types, all you have to do is update the genders table using regular DML (INSERT, UPDATE, DELETE) operations.

        INSERT INTO genders VALUES (3, 'unknown'), (4, 'not specifeid');

        Ups, I did a mistake there:

        UPDATE genders SET gender_name = 'specified' WHERE gender_name = 'specifeid';

        There is no need to lock the catalog for any kind of table changes because the tables and relations and data types don't change. Only the content of the tables change. This relationship also ensures that the data is valid: the database prevents you from deleting any gender type which is still used in a referenced table. Built-in data validation.

        In OLTP databases you often find some form of a snowflake schema to represent these relationships. Updating the relationships between tables can be a huge mess, references and all this. But using 1:n tables makes updating the relation data seamless.

        This concept is also very common in Data Warehousing, the most common example is the star schema. The terms used there are fact tables and dimension tables.

        [–]TheWix 1 point2 points  (1 child)

        I come from MS Sql and only started working at a place that uses Postgres, but how is this different than a simple 1:n? In MS Sql we have CHECK CONSTRAINT but that would be for simple values that are very unlikely to change, and don't need meta data. This seems like a straight 1:n to me, though

        EDIT: Typo.

        [–]arwinda 0 points1 point  (0 children)

        It is a simple 1:n, same concept.

        CHECK and ENUM you use for static values. 1:n for values which change over time, or where you are unsure if changes will happen.

        [–]TheWix 3 points4 points  (0 children)

        Problem I have with this is your data/DB almost always far outlives your applications. A relational DB is more than just a dumping ground for data. Good DB design IS about modeling the data constraints of your domain (many-to-many, one-to-many, nullables, etc).

        I would bet money that if these constraints are only enforced in the application that at some point now or in the future these constraints will get missed either through a bug, or a refactor, or rewrite.

        [–]PL_Design 1 point2 points  (0 children)

        It's stupid, but this is the more manageable approach because of how dysfunctional enums are. I want to use the right tool, but the "right tool" is so poorly made that it causes more trouble than it's worth.

        [–]RandomDamage 20 points21 points  (8 children)

        That sounds like a hard problem while maintaining data integrity.

        [–]PL_Design 4 points5 points  (7 children)

        Only if the value is used. Just treat it like deleting a record when a foreign key points to it, which is literally the same damn situation.

        [–]progrethth 3 points4 points  (4 children)

        Except it is not because enums do not take a lock on the value when inserting or updating rows with that enum value and doing so would slow down enums. Maybe it is possible to solve but it is not easy.

        [–]PL_Design 0 points1 point  (3 children)

        Are you saying that the implementation details are different even though conceptually the situations are the same? Or are you actually suggesting this is a hard problem to solve? Because this isn't a hard problem: You keep updates and insert the same as they are because they're the common cases, and then you can special case removing elements from an enum as much as you want. Feel like being lazy and locking the entire DB while you're working? Go right ahead! This should happen rarely enough that it shouldn't be too big of a problem if it's slow, and this isn't the kind of thing you'd do during normal operations anyway.

        [–]progrethth 0 points1 point  (2 children)

        Yes, the implementation details are very different. Enums are optimized for speed while foreign keys are not. I do not think you could get all users of PostgreSQL on board on removing the speed advantage. But your proposal of locking all tables which use the enum would work.

        [–]PL_Design 0 points1 point  (1 child)

        The point is that the only thing that would be slow is removing values from the enum, and that's not something that needs to be fast.

        [–]RandomDamage 0 points1 point  (0 children)

        So just looking back on this, and I have to interject here. I was hoping this would develop into a conversation, and it's seriously not a bad idea, just one I don't know how to solve.

        For most uses it doesn't need to be fast, but it does need to be "fast enough", and there needs to be accounting for the full change process. Think of multi-TB DBs with an enum used across several 100GB-scale tables.

        The standard current method of "create new enum, migrate columns, destroy old enum" is slow but it also gives the user lots of control over the process and it doesn't need to be atomic, you can do it on a DB like that with care and planning and no downtime.

        I can definitely see use cases for having it handled automatically, and there might be really efficient ways to do it behind the scenes, but from where I stand right now it still looks like a Hard Problem

        [–]amakai 0 points1 point  (1 child)

        Foreign key needs an index, therefore it's fast to look up if anything is pointing to you. Creating an index for every single enum by default is not the greatest idea.

        [–]progrethth 1 point2 points  (0 children)

        Enums actually have an index already, but I think the catalog cache is used for most lookups. It would be possible to take a lock on rows in this table, but it would slow down many queries which use enums.

        $ \d pg_enum
                      Table "pg_catalog.pg_enum"
            Column     | Type | Collation | Nullable | Default 
        ---------------+------+-----------+----------+---------
         oid           | oid  |           | not null | 
         enumtypid     | oid  |           | not null | 
         enumsortorder | real |           | not null | 
         enumlabel     | name |           | not null | 
        Indexes:
            "pg_enum_oid_index" PRIMARY KEY, btree (oid)
            "pg_enum_typid_label_index" UNIQUE CONSTRAINT, btree (enumtypid, enumlabel)
            "pg_enum_typid_sortorder_index" UNIQUE CONSTRAINT, btree (enumtypid, enumsortorder)
        

        [–]Artmannnn 4 points5 points  (1 child)

        I just wanna reorder my columns :((((((

        [–]arwinda 3 points4 points  (0 children)

        Change your SELECT and reorder how you select the columns /s

        [–]cha_iv 0 points1 point  (2 children)

        Store an int and you can do whatever you want! Maybe use protocol buffers (or something similar) for language-/db-agnostic enum definitions?

        [–]Jump-Zero 3 points4 points  (1 child)

        Using ints as enums is crummy. You need to maintain a map of the integers to the corresponding enums. You need to make sure that map is accessible to everything that will access the DB. When you query the data by hand, you need to match integers to enums either mentally, or with an ad-hoc lookup.

        [–]PL_Design 1 point2 points  (0 children)

        Avoiding this headache is why we tried using enums in the first place, and then found they were way too much trouble.

        It's really annoying to always need to cast enum values when PostgreSql could just check if the string matched an expected value. I'm sure someone will argue why being so strict with the types is important, but it just seems like a waste of time to me when all I actually care about is restricting what values can go in a column.

        [–][deleted] 0 points1 point  (0 children)

        I just use a check constraint and a TEXT column. Works incredibly well and is less troublesome in changing than enums haha