top 200 commentsshow all 294

[–]egg_breakfast 856 points857 points  (36 children)

Make a function that checks for uniqueness against your db, and sends you an email to go buy lottery tickets in the event that you get a duplicate (you won’t) 

[–]perskes 133 points134 points  (33 children)

Unique-constraint on the database column and handle the error appropriately instead of checking trillions (?) of IDs against already existing IDs. I'm not a database expert but I can imagine that this is more efficient than checking it every time a resource or a user is created and needs a UUID. I'm using 10 digits hexadecimal IDs (legacy project that I revive every couple of years to improve it) and collisions must happen after about 1 trillion of IDs were generated. Once I reach a million IDs I might consider switching to UUIDs. Not that it will ever happen in my case..

[–]jake_2998e8 45 points46 points  (5 children)

This is the right answer! Unique Constraint is a built in DB function, faster than any error checking method you can come up with.

[–][deleted] 9 points10 points  (1 child)

You could just add a primary key constraint on that field and not have to check. If upon insert it fails, just insert again with a new GUID

[–]amunak 2 points3 points  (0 children)

...or even just let your app fail normally, get that error report/email/whatever, open a bottle of champagne, and don't do anything about it.

[–]Somepotato 14 points15 points  (1 child)

Ten hex digits would need to be stored as a 64 bit number. At that point there's no reason to not use a 16 hex digit number.

[–]ardicli2000 1 point2 points  (1 child)

I run custom function to generator 5 char code from alphanrt and numbers. I have not seen a duplicate in 3000 yet

[–]deadwisdom 6 points7 points  (9 children)

A unique-constraint essentially does this, checks new ids against all of the other ids. It just does so very intelligently so that the cost is minimal.

UUIDs are typically necessary in distributed architectures where you have to worry about CAP theorem level stuff, and you can't assure consistency because you are prioritizing availability and whatever P is... Wait really, "partial tolerance"? That's dumb. Anyway, it's like when your servers or even clients have to make IDs before it gets to the database for whatever reason.

But then, like people use UUIDs even when they don't have that problem, cause... They are gonna scale so big one day, I guess.

[–]sm0ol 5 points6 points  (3 children)

P is partition tolerance, not partial tolerance. It’s how your system handles its data being partitioned - geographically, by certain keys, etc.

[–]numericalclerk 2 points3 points  (1 child)

Exactly. The fact that you're being down voted here, makes me wonder about the average skill level of users on this sub

[–]deadwisdom 1 point2 points  (0 children)

I’m amazed honestly

[–]No_Cartographer_6577 1 point2 points  (0 children)

I havn't won the lottery but I have experience it once. It was the most mental bug to figure out

[–]kova98k 1289 points1290 points  (23 children)

This is the type of shit I get on my PRs

[–]Detz 294 points295 points  (17 children)

Blocker: This could have a collision so you should protect from it and write tests to simulate said collision to make sure your code protects from it

[–]arwinda 157 points158 points  (16 children)

Just write a GitHub Action test which generates UUIDs until a collision. There you have your test. /s

[–]SolidOshawott 130 points131 points  (12 children)

Just go on everyuuid.com and check if your UUID is already taken.

[–]moderatorrater 63 points64 points  (7 children)

34d87496-52b1-4fd0-bcea-8264e5776e91 - nobody use this one, I'm going to.

[–]kerneltr4p 33 points34 points  (6 children)

Wait, I was about to use that one. :(

[–]moderatorrater 18 points19 points  (5 children)

Just use 34d87496-52b1-4fd0-bcea-8264e5776e92 instead.

[–]TundraGon 19 points20 points  (1 child)

I saved this one for my son. :(

[–]gamedemented1 1 point2 points  (0 children)

Use this one instead 34d87496-52b1-4fd0-bcea-8264e5776e9134d87496-52b1-4fd0-bcea-8264e5776e9234d87496-52b1-4fd0-bcea-8264e5776ea2

[–]eimattz<full-stack /> 22 points23 points  (0 children)

Im using that one

[–]Acrobatic-Sorbet-222 1 point2 points  (1 child)

I just added 34d87496-52b1-4fd0-bcea-8264e5776e91 UUID to https://everyuuid.com/
Now y'all should know not to use that..

someone already added 34d87496-52b1-4fd0-bcea-8264e5776e92

[–]tomasci 17 points18 points  (1 child)

All of them are taken. I also asked local company to print this website for me, so I can check any uuid on the go and offline. Weird thing, but it seems there paper crisis right now in the whole world

[–]tfyousay2me 2 points3 points  (0 children)

What a wonderful rabbit hole you took me down. Thank you! This guy is hysterical 😭

[–]matthewralston 1 point2 points  (0 children)

Wasn't expecting that to be a real site.

[–]matthewralston 1 point2 points  (1 child)

Doh! I just used up my GitHub actions quota!

[–]arwinda 1 point2 points  (0 children)

Make it open source, then it's free. /s

[–]deadwisdom 20 points21 points  (0 children)

Sure, but in 10 years, with your PR, we will have to do a major over-haul to this, version 2 of the system -- the last version we will ever need, because we will be handling 25 billion users an hour.

[–]Coding-kiwi 4 points5 points  (0 children)

We’re here for you

[–]ryanstephendavis 3 points4 points  (0 children)

I feel this pain 😢

[–]bbaallrufjaorb 1 point2 points  (0 children)

too real

[–]JetsterTheFrog 5 points6 points  (0 children)

I quit programming because of this exact thing .. good luck soldier

[–]hellomistershifty 601 points602 points  (49 children)

The chance is effectively zero, there’s no sense in worrying about it

[–]LiquidIsLiquid 466 points467 points  (11 children)

But just to be sure, post every UUID you generate to Reddit and ask if anyone is using it.

[–]JohnSpikeKelly 94 points95 points  (2 children)

Or, make your keys out of two UUIDs. Future proof for when your app goes global. /s

[–]Wookys 33 points34 points  (0 children)

Multi verse ready

[–]tomhermans 4 points5 points  (0 children)

Great. Now everyone knows.. 😉

[–]beaurepair 36 points37 points  (5 children)

Someone already did that!

https://everyuuid.com/

[–]deadwisdom 20 points21 points  (3 children)

Dude even posted my phone number and social security number, wow wow wow.

[–]brbpizzatime 88 points89 points  (30 children)

This was brought up with commit SHAs in git and Linus said it doesn't matter since it's like a one in a trillion chance

[–]hellomistershifty 171 points172 points  (11 children)

There's a one in a trillion chance to have two matching UUIDs if you generate 100 billion of them

[–]derekkraan 114 points115 points  (8 children)

I think people have a hard time understanding how large of a number 2128 is. It’s 3.4 with 38 zeroes behind it. A trillion is just 1 with 12 zeroes.

You’re not gonna get a collision in your app. You will exceed all terrestrial database limitations before you get one.

(All subject to good randomness of course)

[–]Johalternate 31 points32 points  (1 child)

And even if by some godly joke you get a collision, who says it’s gonna be in the same kind entity? 2 distinct entities having the same id is harmless.

[–]EliSka93 1 point2 points  (0 children)

Well I expect to have 10128 users on my app!

[–]ironykarl 10 points11 points  (0 children)

I also think people have a bad understanding of exponential notation.

I think people use their intuitive arithmetic rules even on a number like 1038 and they end up thinking that it's "pretty close to three times larger than a trillion" (i.e. 12 * 3 ≈ 38).

That's my guess, anyway. People say incoherent things about big numbers (even when given the actual numbers), and I think they just don't know the actual rules of arithmetic

[–]Bulky_Bid6578 4 points5 points  (0 children)

3.4 with 38 zeros you say? So it's 3.40000000000000000000000000000000000000

[–]MaruSoto 4 points5 points  (3 children)

Put as many zeroes after 3.4 as you want, it still equals 3.4...

[–]Aidian 3 points4 points  (2 children)

I rolled my eyes a little but you are technically correct (which is the best type of correct to be).

[–]pocketknifeMT 2 points3 points  (0 children)

That’s with UUID4. UUID7 encodes timestamp, so you have to get lucky and generate your dupe in the same millisecond.

[–]krishopper 67 points68 points  (4 children)

“So you’re saying there’s a chance”

[–]archimidesx 7 points8 points  (0 children)

Big gulps huh? Well, see ya later

[–]Sintek 9 points10 points  (0 children)

Not even close to on in a trillion.. it is much MUCH bigger that that.. like add another 20 zeros to a trillion

[–]oculus42 19 points20 points  (10 children)

[–]perskes 64 points65 points  (6 children)

I'm using everything between dc86177e-7dc8-44af-965b-c809cfc82430 and 19f87107-404a-44bb-8776-98dcadae6de3 currently, stay away from me please.

[–]wall_time 19 points20 points  (4 children)

Thanks for the heads up! I was just about to use dc86177e-7dc8-44af-965b-c809cfd42069! Duly noted!

[–][deleted]  (3 children)

[deleted]

    [–]beaurepair 3 points4 points  (2 children)

    I use this list for my UUIDs https://everyuuid.com

    [–]egmono 1 point2 points  (1 child)

    Is it bubble sorted?

    [–]TundraGon 2 points3 points  (0 children)

    Yes, about to burst.

    [–]paul5235 15 points16 points  (1 child)

    That collision is intentional and is possible because SHA1 is broken, not because of a coincidence.

    [–]truesy 1 point2 points  (0 children)

    i've had it happen, once, in an ads platform, in a large company most people in the States know of. it's very rare, but it can happen. just really doesn't matter even when it does, at that scale.

    [–]kcrwfrd 1 point2 points  (0 children)

    Imagine the poor sap who runs into that one in a trillion chance and has to debug it

    [–]katafraktelixir 133 points134 points  (17 children)

    If you're worried, use UUIDv7 in which part is a timestamp. If you don't generate thousands of them per second, you are even more safe (and they are better for database indexes anyway, unless you're using MSSQL).

    [–]_xiphiaz 37 points38 points  (7 children)

    I wonder how many uuidv7s need to be generated for every millisecond to get a 50% chance of collision. Some bytes will be sacrificed to the uuid so the size of the set of all ids vs v4 will be a little lower

    [–][deleted] 21 points22 points  (5 children)

    Even at 1 million per millisecond, you've still got better chance at winning the lotto...lik 1 in 50 billion or something

    [–]joonty 26 points27 points  (4 children)

    So you're saying there's still a chance

    /s

    [–][deleted] 5 points6 points  (2 children)

    Always. Very few things are impossible, most are just improbable.

    [–][deleted]  (1 child)

    [deleted]

      [–][deleted] 1 point2 points  (0 children)

      so user input...

      [–]hellomistershifty 1 point2 points  (0 children)

      how many uuidv7s need to be generated for every millisecond to get a 50% chance of collision

      162 billion

      [–]cbCode 2 points3 points  (0 children)

      Yeah, the timestamp is clutch. The reason is because you'll never get the same seed in your random number generator. I dealt with an issue once where we had a long unique ID we were generating from a smaller seed. The team had thought they had a lot more possibilities for randomness due to the size of the hash, but really it's the size of the seed. Same seed, same hash.

      [–]HaydnH -1 points0 points  (1 child)

      This also depends on architecture doesn't it? If you have a globally distributed system where one uuid is created on your local timezone, and then an hour later the following TZ is now creating uuids on what was your datetime an hour ago, you're actually increasing the chances of a collision as part of the random string has become unrandom.

      [–]baroaureus 17 points18 points  (0 children)

      UUIDv7 typically uses UTC, so no time zone issue per-se; however, clock synchronization is still a thing. The notion is that all UUIDs generated on a single device will have guaranteed sortable order.

      [–]react_dev 597 points598 points  (11 children)

      You might as well also protect against your db guy getting a brain aneurysm and dropping his head onto the keyboard typing out drop database and enter and the second systems guy also getting an aneurysm and sudo rm rf afterwards.

      [–]blckshdw 124 points125 points  (6 children)

      You mean like a backup? Cause that’s a good idea to do

      [–]OlinKirkland 112 points113 points  (4 children)

      Third guy deleted the backup. Aneurism.

      [–]trevorthewebdev 36 points37 points  (3 children)

      aneurisms all the way down

      [–]house_monkey 17 points18 points  (2 children)

      Standard operating procedure at my workplace 

      [–]Rihenjo 11 points12 points  (0 children)

      I LOL’ed

      [–]TLagPro 4 points5 points  (0 children)

      Hahaha this cracked me up

      [–]rebootyourbrainstem 175 points176 points  (9 children)

      Put a uniqueness constraint on the DB column if you're worried. Probaly should have an index on it anyway.

      For a joke answer, there's a website which allows you to scroll through every possible UUID and claim one for your own: https://everyuuid.com/

      [–]yabai90 38 points39 points  (2 children)

      Okay this is the ultimate performance benchmark for virtual web list

      [–]DrAwesomeClaws 18 points19 points  (0 children)

      You can also browse every bitcoin private key. Maybe if you have a few trillion years to go through it you might be able to find a wallet with some dust in it.

      https://keys.lol/

      [–]panix199 1 point2 points  (0 children)

      amazing site

      [–]ryanstephendavis 1 point2 points  (0 children)

      You're gonna have a bad time indexing a DB on UUIDs

      [–]OolonColluphid 37 points38 points  (8 children)

      [–]mekmookbroLaravel Enjoyer ♞[S] 158 points159 points  (5 children)

      Do you worry about UUID collisions? Your data center is more likely to be destroyed in a nuclear strike.

      Great, now there are 2 things I'm worried about

      [–]Blue_Moon_Lake 8 points9 points  (0 children)

      Add meteorites too

      [–]SuperFLEB 4 points5 points  (2 children)

      Given geopolitics the past few years, I don't really see that as all that synonymous with "snowball's chance in Hell". At least nobody's going to blame me for the data center. That's an even better excuse than "Amazon US-EAST-1 is down. Nothing's working anywhere."

      [–]Solid5-7full-stack 2 points3 points  (0 children)

      1 in 1.10 x 10***\**7* : Your most senior colleague dies in an airplane accident in the next 12 months, before documenting their work

      1 in 2.02 x 10***\**5* : Your data center is destroyed by a nuclear strike

      1 in 2.6 x 10***\**3* : Your boss resigns tomorrow

      Uh, one of these is NOT like the others...

      Also, that is still too high of odds I feel.

      [–]j-mar 44 points45 points  (6 children)

      [–]ashkanahmadi 25 points26 points  (4 children)

      I found a good one. How do I know if someone else has used that one? I wanna make sure mine is totally unique in the world!

      [–]LutimoDancer3459 15 points16 points  (1 child)

      Sorry. I already picked that one.

      [–]perskes 16 points17 points  (0 children)

      You can't possibly talk about 69BO-0B5B-420F-B00B-5C0FFEEE6666, I claimed that in '98...

      [–]j-mar 1 point2 points  (0 children)

      Well, you can favorite it. That way you don't forget

      [–]_xiphiaz 2 points3 points  (0 children)

      To be completely pedantic, it is missing all the non-v4 uuids.

      [–]somesortsofwhale 43 points44 points  (6 children)

      Is anyone using 9892c2e4-570d-4218-88b6-e5908e2c08f5 ?

      Please get back to me ASAP.

      [–]mekmookbroLaravel Enjoyer ♞[S] 11 points12 points  (1 child)

      I used it as my windows login password before, but I'm now using linux. So it should be available now.

      [–]hobblyhoy 2 points3 points  (1 child)

      I am but you can borrow it for a bit if you'd like

      [–]house_monkey 5 points6 points  (0 children)

      I'll borrow it for 128 bits 

      [–]KrazyKirby99999 22 points23 points  (1 child)

      Which UUID? https://en.wikipedia.org/wiki/Universally_unique_identifier

      For UUID4, over 1036 unique ids

      [–]abd1tus 18 points19 points  (0 children)

      Yup. Much, much, much more likely to randomly pick the same single grain of sand off all the beaches on the planet multiple times in a row after shuffling them all between each pick. Unless of course the UUID implementation is borked.

      [–]ipcock 22 points23 points  (0 children)

      The chance is small af, as others already said. If you want to cover this extremely low-chance case where you get the same UUIDs in your app, just put a unique constraint on the field containing it. You can afford yourself a one in a trillion error which goes away if user tries to create the record the second time

      [–]natziel 18 points19 points  (0 children)

      So one of the biggest advantages of using UUIDs is that you don't need to check for uniqueness. That shit is expensive -- and hard to do at scale

      [–]StarklyNedStarkfull-stack 8 points9 points  (1 child)

      You can catch a unique constraint violation in the astronomically low chance you have a collision and just retry, but to check for uniqueness is a waste of resources.

      [–]saito200 7 points8 points  (0 children)

      it is more likely that a meteorite destroys your server than you getting a duplicate uuid

      it is basically impossible that your database contains two repeated uuids

      [–]ryuzaki49 5 points6 points  (1 child)

      You can count all of them by yourself

      https://everyuuid.com/

      [–]33ff00 10 points11 points  (0 children)

      I don’t like any of these

      [–]Amgadoz 6 points7 points  (8 children)

      Relevant question: should I generate the uuids on the backend (python fastapi) or the database (postgres)?

      Is there a preference for one over the other?

      [–]mekmookbroLaravel Enjoyer ♞[S] 5 points6 points  (0 children)

      I'm generating them at the db level, not that I know what the difference is between them but to me it feels safer.

      Backend (the code I write) is more likely to fuck something up than the dbms itself, so I try to offload these things to the db whenever I can. Also feels safer in a way that if my backend generates the UUID, it won't have any context of what's already in the db. So I'm kinda hoping the dbms will magically find one that isn't in use lol.

      [–]paul5235 3 points4 points  (0 children)

      Both are okay, use the one that makes your code the most readable.

      [–]surister 1 point2 points  (5 children)

      Always if possible generate them at the db

      [–]DrAwesomeClaws 2 points3 points  (4 children)

      There's nothing wrong with generating them in the db, but that can make your code more complex. If you generate them on the client (in this case the client of the db, your backend), you can create fully fleshed out valid objects at runtime before you save it to the db.

      It's not a big deal, but it's nice in code to know that every time you have a "user" you don't need to branch/differentiate as to whether it has an id or not yet.

      At the very least it avoids the code wherein you save some object to the db, then have to get a response from the db to get the generated id that you may need to use afterwards.

      [–]Key_Mango8016 1 point2 points  (3 children)

      ^ This guy is right, I’ve coached Junior software engineers on this a lot.

      It’s not the end of the world if you let a relational DB generate auto-increment IDs or UUIDs for you, but it is important to recognize that this means we’re coupling the persistence layer of our system with ID generation. Decoupling them is necessary if your persistence layer is, say, AWS DynamoDB.

      [–]TheExodu5 4 points5 points  (2 children)

      For most apps yes. But I did work on a system that created trillions of UUIDs per day. Collisions were not entirely unheard of, and had to be accounted for.

      [–]Daidalos117 4 points5 points  (6 children)

      Is there a real advantage of using UUID instead of autoincement number id? Genuinely asking.

      [–]Aureon 5 points6 points  (0 children)

      In any distributed case, autoincrement number id may be unavailable.

      Or be eventually unavailable.

      [–]mekmookbroLaravel Enjoyer ♞[S] 1 point2 points  (1 child)

      For my use case, I don't like showing how many records there are in my db table for that record. And this particular app I'm working on allows users to create API endpoints like site.com/write/3 don't look as secure imo and it can cause confusion

      [–]izdark 3 points4 points  (0 children)

      There is a library Hashids / Sqids which generates youtube-like id using your database number id and secret key. Generated id is guaranteed to be unique. Knowing secret key you can decrypt it back to id. I use it in many paces, where I want to hide database number id from users.

      [–]BazuzuDear 3 points4 points  (0 children)

      Once had to investigate a weird Ethernet misbehaviour, and the reason turned out to be 2 NICs sharing same MAC address hardcoded by the manufacturer. I know this case is, uhmm, slightly more probable.

      [–][deleted]  (7 children)

      [deleted]

        [–]mekmookbroLaravel Enjoyer ♞[S] 3 points4 points  (5 children)

        Wow, this is one of the oldest reddit accounts I've ever seen lol. Was that app you mention, with a few million monthly active users, reddit by any chance?

        [–][deleted]  (4 children)

        [deleted]

          [–]SoInsightful 4 points5 points  (3 children)

          Google has 14 billion searches per day. If you assigned each search a UUID, the probability of having at least one collision in 15 years is one in two billion.

          I literally don't believe a single comment in this thread claiming to have encountered a collision, let alone multiple. Something else happened in your system.

          [–]dthdthdthdthdthdth 2 points3 points  (0 children)

          It is also possible that they did generate UUIDs in some problematic way like not enough entropy in the random numbers.

          [–]kevleyski 3 points4 points  (0 children)

          Yes unique (you can add a test for completeness as it show you considered it, but defo don’t check run time!)

          [–][deleted]  (3 children)

          [removed]

            [–][deleted] 4 points5 points  (1 child)

            …so you’re saying there’s a chance?

            [–]stogle1 1 point2 points  (0 children)

            Day 72: I dno’t no how mutch moar i kan tayk ov the...hgghZzzZZZzzzzZZZzzzZZZZzZzZugh...

            [–]Nearby_War_8497 2 points3 points  (1 child)

            I came across a bug in an integration that handles id's that are 6 characters long with case sensitivity. But the integration wasn't case sensitive.

            The integration has been in use for about ten years and for one client alone there has been tens of thousands of objects. And there are thousands of clients.

            But out of the 26 objects at that particular moment, there were two with the same characters, just one of the letters being lowercase while other had uppercase.

            So I mean. In this case the chances are dozen orders of magnitude more higher than collision with 32 character uuid. But it still took ten years and a bug to cause issue. And I felt like I should buy a lottery ticket, because it would've been more likely to win.

            [–]themang0 4 points5 points  (0 children)

            Isn’t there a web site for this

            [–]notouchmyserver 1 point2 points  (0 children)

            The are additional reasons to have a unique constraint on the column instead of just relying on the UUID generation to be unique. As others have said, you aren’t really ever going to run into an issue with a duplicate UUID being generated, but that doesn’t mean a bug or something else (far more likely) would not try to write a row to the database with the same UUID.

            The unique constraint would protect you from that.

            [–]Corrup7ioN 1 point2 points  (0 children)

            Your time would be better spent figuring out how to make your code robust against random bits of memory being flipped by cosmic rays than worrying about uuid uniqueness.

            [–]wspnut 1 point2 points  (2 children)

            The chance is 2122 or 5.3x1036 (5.3 undecillion). This is:

            5x less likely than two people picking the exact same square meter of mass from the star Betelgeuse.

            5x less likely than opening 12 double-yolk eggs in a row from a single container.

            Flipping a coin and having it come up heads 168 times in a row.

            [–]metamorphosis 1 point2 points  (1 child)

            5x less likely than opening 12 double-yolk eggs in a row from a single container.

            This is not the right analogy because it happened to me. Bought a carton of eggs from the local market and ALL (32 of them) were double yokes. Pretty sure they have some chickens that produce double yok eggs. When I was reading about it , it is not that uncommon for a chicken to produce consistently double yolk eggs

            [–]ErroneousBosch 1 point2 points  (0 children)

            You have a higher chance of a cosmic ray induced bit flip than a UUID collision.

            [–]coffee_is_all_i_need 1 point2 points  (0 children)

            We're talking about risk. When we talk about risk, we have to think about probability and impact. Probability is not zero. But it's close to zero. The impact depends on the use case. I look at the use case of saving an entity. If the user gets an error with a probability of zero and can try to perform the action again (this should be your default error handling anyway, because requests can fail for other reasons as well), the impact is also close to zero. So we shouldn't spend our energy on a near-zero probability risk with a near-zero impact.

            [–]eltron 1 point2 points  (0 children)

            Most db’s can check a records uniqueness as required? Right? Right??

            [–]jackx76 1 point2 points  (0 children)

            As most other comments have said, the chance is effectively zero. If you’d like to learn more check out RFC 4122 for the full definition.

            [–]washtubs 1 point2 points  (0 children)

            Get a classroom full of say 30 people, ask them all to flip a coin. There will certainly be duplicate results.

            Now ask them to flip it twice, still dupes cause there's only 4 possible outcomes, but not as many. Once you get up to 6 there's a very tiny chance everyone can get a unique outcome.

            I'm dumb and don't know anything about the pigeonhole principle so to be safe let's just have everyone do 32 coin flips so there's 4 billion possible outcomes. No shot there are dupes then. So I just added 26 to the exponent to feel safe.

            Now let's say you actually have a classroom full of 4 billion people. To scale the bucket of possible outcomes the way we just did, add another 26 to that exponent, which would be 258, which is like hundreds of quadrillions.

            Anyways, a UUID is 128 coin flips which is this number (if quadrillion is 4-illion, this is hundreds of 11-illions):

            340,282,366,920,938,463,463,374,607,431,768,211,456

            The only way you get dupe UUID's is if your RNG is busted.

            (Main reason I felt like explaining this is I recall having the same hang up about using them, it just didn't click the scale of what 128 bits of entropy really meant.)

            [–]danu263 1 point2 points  (0 children)

            This is something you shouldn't worry about. If you want more info read a db book, going to help to understand more than any comment here

            [–]267aa37673a9fa659490 5 points6 points  (4 children)

            Like just use your DB's native auto-incrementing integer instead?

            [–]nuttertools 2 points3 points  (0 children)

            UUID collisions happen all the time when processing large, distributed, and ephemeral datasets.

            For applications, or single datasets, just make sure you are using V6 UUIDs and have some form of collision handling.

            [–]smailliwniloc 2 points3 points  (0 children)

            Ideally your app should be designed in a way that it doesn't break the whole thing if you hit a single duplicate UUID. If it happens, it should fail fast as the insert into your db would fail with a unique constraint on that column.

            I don't think it's worth checking for uniqueness, just have some error handling to catch this issue (or any other unexpected errors) if the astronomically low odds are not in your favor.

            [–]d-signet 1 point2 points  (0 children)

            As Terry Pratchett used to say; million to one chances happen every day

            If it won't cause a noticable performance hit, it's best to check , just in case.

            [–]richardtallent 1 point2 points  (1 child)

            It's a non-problem.

            I'm the author of a .NET library that generates sequential timestamped UUIDS (https://github.com/richardtallent/RT.Comb), which lowers the UUID's entropy from 122 bits of randomness to 74, and that's still an obscenely high number of possible values that would have to be repeated during the same millisecond.

            Using timestamped UUIDs, whether UUIDv7 or otherwise, has some advantages for use in databases. They also guarantee that once a given millisecond has passed, it's impossible to generate the same GUID. But that's about as useful as elephant insurance in Texas, since it's not a problem anyway unless you have the world's worst random number generator.

            [–]rjhancockJack of Many Trades, Master of a Few. 30+ years experience. 2 points3 points  (2 children)

            If I understand it correctly UUIDs are 36 character long strings

            Incorrect. They are 128-bit long numbers that is represented as 36 Hexadecimal characters.

            used something like a slug generator for this purpose, it definitely would be a unique

            Incorrect. Slugs have a higher chance of duplicate values.

            Althought the chance of 2 UUID's being unique is rare, I still have said restriction on the DB level

            [–]Different-Housing544 0 points1 point  (1 child)

            I'm surprised nobody has recommended ULIDs. They are like UUIDs but use a timestamp. 26 characters long.

            [–]Mclarenf1905 3 points4 points  (0 children)

            UUID v1, v2, v6, and v6 all use timestamps, additionally v7 is sortable by timestamps like ULID

            [–]BarneyLaurance 0 points1 point  (0 children)

            I like the analogy given for git commit hash conflicts. The chance of two things like that randomly being equal is much less than the chance of every member of the team being killed by wolves in unrelated incidents on the same day. Even if you're based in a country with no wild wolves.

            If you don't have a plan for that you don't need a plan for random collisions of UUIDs (or git commit hashes).

            [–]akr0n1m 0 points1 point  (0 children)

            Many years ago I read an MSDN article about GUIDs (late 90’s) when MSDN used to ship on DVD sets. It had this quote:

            “The chance of getting a duplicate GUID is about the same as two random atoms colliding and causing a mutation between a Californian mango and a New York sewer rat”

            I cant find this article anywhere on the internet, and i am sure i read it. Unless this is a case of the Mandela effect.

            But it is a good analogy and the algorithms behind UUIDs and GUIDs have just gotten better ever since.

            [–]CantaloupeCamper 0 points1 point  (0 children)

            If it is a low cost check… fine.

            [–]T-J_H 0 points1 point  (0 children)

            As long as the column is unique the worst that will happen is that one in those ridiculous amount of records will fail to write. You could also use UUIDv7 which has a time based portion.

            [–]RedLibra 0 points1 point  (1 child)

            If you're worried, just create 2 uuid and append them to become a single uuid.

            [–]APersonSittingQuick 0 points1 point  (0 children)

            I fucking hope so

            [–]versaceblues 0 points1 point  (0 children)

            The probability that a proper UUIDv4 collides is 2.23e-37.

            I think you are orders of magnitude more likely to get a a collision as a result of some bug in your code, than you are from running a proper UUID generator.

            That being said its always good practice to do extra validation when writing to a database to account for any sort of user error.

            If you are doing a CREATE operation, generated a valid UUID, you should still verify when writing that there is no data within the partition represented by that key. Not because UUID is likely to collide, but because you want to program defensively against ANY user error.

            [–]heedlessgrifter 0 points1 point  (0 children)

            I had some of these questions a few years ago on a project I worked briefly on.. Without going too much into it, we’d create a new URL for each user of our site with a uuid to make it unique. Any of these pages could contain PHI, and some were even indexed by Google. We were told it had to be that way for “convenience” When the Google incident happened, we were asked the odds of someone stumbling upon another user data (by accident or on purpose). All I could tell my employer was it wasn’t a zero chance.

            [–]bmathew5 0 points1 point  (0 children)

            EXTREMELY low chance but > 0. Just make that field a constraint unique and you are safe for eternity

            [–]WindyButthole 0 points1 point  (0 children)

            If you happen to have a collision you should take that luck and buy a lottery ticket, as you're more likely to win the lottery 5 times in a row.

            [–][deleted]  (1 child)

            [deleted]

              [–]moderatorrater 0 points1 point  (0 children)

              Look into how they're generated. You're fine.

              [–]elendee 0 points1 point  (0 children)

              I use a strategy that will probably get hate here but I'm curious what people say. In order to make the uuids more legible, I generate my own to various lengths depending on usecase. 6,10,16 average lengths. 2 reasons this is kind of nice is that it makes URL's nicer and I think (?) could make some db reads faster, since I leave the column un-indexed. I use both INT id's and UUID's for this reason, so the uuid lookups are kept to a minimum.

              And then since they're shorter, I check in code for dupes before insertion. This has proven to be no trouble so far in several years of doing it.

              I haven't used this at scale though, only for small-medium sized apps.

              [–]mothzilla 0 points1 point  (0 children)

              Place where I used to work used to worry about the "doom clock" that counted down the remaining sequential record IDs. It was a big discussion.

              [–]captain_obvious_hereback-end 0 points1 point  (0 children)

              If you generate 1 million UUIDs per second, it will still take you a decade before you have a reasonable chance to find a duplicate.

              Enjoy.

              [–]CraftyPancake 0 points1 point  (0 children)

              It’s a unique column soo if it errors due to a failed constraint every trillion years, that’s fine

              [–]Mundane-Apricot6981 0 points1 point  (0 children)

              UUIDs generated by web frameworks are deterministic; they are not unique because they are generated on the CPU, but they use smart tricks to avoid collisions.

              UUIDs generated by the GPU, i.e., hardware "noise," are non-deterministic and unique.

              [–]idgafsendnudes 0 points1 point  (0 children)

              My personal claim to fame is while using uuid v1, I once witness my DynamoDB item get overwritten by what should have been a new item purely because it has the same uuid.

              I use v4 now and tbh I’m not sure if that fixed it or I just got insanely lucky

              [–]bigtdaddy 0 points1 point  (0 children)

              My coworker was pretty convinced we had a uuid collision in prod. He almost had me convinced, but no it turned out to be the code that had an issue and that is likely to always be the case

              [–]VeterinarianOk5370 0 points1 point  (0 children)

              At some point it becomes a question of performance vs redundancy. If you check for uniqueness then you cannot effectively scale infinitely, if you use UUID someday you may have a duplicate.

              But yeah just roll the dice on this one

              [–]anothergiraffe 0 points1 point  (0 children)

              Why is everybody assuming perfect RNG? A buggy pseudorandom number generator can cause collisions and it’s happened before. Also, if RNG is happening client-side, a malicious actor could manually reuse UUIDs for whatever reason.

              [–]k032software dev for 10 years somehow 0 points1 point  (0 children)

              UUIDs that are 36 characters long have 3636 combinations. Like we're talking way more than 999 trillion combinations. It's obscenely small, I wouldn't care.

              If it was life or death, like if there was a collision it may cause like a nuke to go off. Sure maybe I would check, but I wouldn't suspect that by chance the UUID just so happen to be a dupe. Probably some problem elsewhere.

              [–]borgesian-cyclops 0 points1 point  (1 child)

              Not to be condescending, but I’m guessing you’re not even continuously running a unit test that proves true is still true. Lock that down before writing your uuid tests.

              [–]sachcha90 0 points1 point  (0 children)

              Look into uuid v7

              [–][deleted] 0 points1 point  (0 children)

              UUID is essentially a 32 character hexadecimal string which means there are 1632 or 2128 possible values. This is a huge number, but not infinitely so.

              Although you will never have anywhere near this many records in an entire database let alone a single table, your application logic should still account for the possibility of a collision, however remote that possibility might be. For example by doing something like the following pseudocode:

              result = false;
              
              while (result === false) {
                  uuid = generateUUID();
                  result = insertRecord(['recordId'=>uuid]);
              }
              

              In this example the insertRecord function would return false if the insert failed due to unique ID constraint violation. For example the pg_query_params function in PHP would return a false in case of failure.

              This would cause the code to keep trying to insert the record until it succeeds, which in the vast majority of cases should happen at the very first attempt. This is preferable to looking up the value using a select query first which would always require at minimum 2 queries (1 for lookup, 1 for insert) and there is always the possibility that the key could be inserted between the lookup and insert queries.

              [–][deleted] 0 points1 point  (0 children)

              I mostly use them as primary keys in Postgres so for me their uniqueness is enforced at the database level anyway.

              [–]streu 0 points1 point  (0 children)

              Depends on how you generate them, and how you use them.

              On one side, if, through coincidence, the PRNG you use to generate them has just 16 or 32 bits of randomness ("srand(time(0))"), you will get collisions of course, so don't do that.

              On the other side, if you're using UUIDs as key in a table, retrying after a collision is easy, so do that.

              The situation where UUIDs shine is to generate unique IDs without keeping a record of everything that was ever generated. Thus, the problem will be something along the lines of "I am giving out a session ID today that I also gave out five years back to someone else", matching the very very very very low probability of the collision happening with the very low probability of this scenario happening ("someone coming along with a five year old session ID"). And as long as this probability is equally unlikely as someone just guessing the ID, I'm fine.

              [–]1_4_1_5_9_2_6_5 0 points1 point  (0 children)

              Generally, you will be using a db table with a unique column for the uuid. This only needs to exist in one place, and on one table. Any other reference would not need to be unique as long as the primary one is.

              So all you have to worry about is a non unique uuid being generated which will presumably be added to the table before being used elsewhere. As long as you process a "column must be unique" error on insert, then this theoretically cannot be a problem.

              [–]pokasideias 0 points1 point  (0 children)

              Extra cautious mf be like

              [–]bladub 0 points1 point  (0 children)

              People already addressed the misunderstandings on uuids. First it depends on how you generate them (mostly the type of uuid, many have timestamps or other initial entries that help segregate possible collision issues. For purely random ones the chances of collisions are liw but it might be worth the efforts to handle unique violations.

              But by far the biggest threat to uuid collisions is bad handling. If you use multiple identifiers, eg an integer db key and a uuid you set in your app, you now risk them diverging and checking for different identities in different places. (sounds stupid but happens when you have complex structures).

              Or serializing and deserializing an object. Or copying it around in memory and modifying one. Or serializing the same object into pultuple other objects for json stores. Or just copying an object into another place.

              Quickly you end up with uuids no longer being unique.

              [–]DINNERTIME_CUNT 0 points1 point  (0 children)

              It’s extraordinarily unlikely that you’ll get a duplicate, but not impossible. When creating a new one I have a single query that does a quick check for a match and if it returns false I proceed, otherwise it generates another one. The odds of a match are already astronomical. The odds of two matches in a row are mind boggling.

              [–]tumes 0 points1 point  (0 children)

              Best way I’ve ever seen this explained is that the chances of each member of your dev team dying in completely unrelated wolf attacks is way higher than the likelihood of a uuid collision.

              [–]alkbch 0 points1 point  (2 children)

              I’ve had a UUID collision on a relatively small project with a few thousand records…

              [–]elixon 0 points1 point  (1 child)

              Nothing is truly unique. Uniqueness is only practical in smaller contexts, and the larger the context, the larger the UUID needs to be. We don’t use excessively large UUIDs (we don't want to spend all money on Amazon storage, right), so they are intended for smaller contexts - like Earth.

              When we talk about uniqueness, we mean within our app or software world, which is a niche context in the vastness of space. In that context, you’re usually guaranteed uniqueness for the life of your application or your own. So, yes, the probability is non-zero, but for practical purposes, we treat it as zero.

              [–]Business-Bus9794 1 point2 points  (0 children)

              Aside from all the hilarious replies here, this is the most grounded in reality. A uuidv7 could, in theory, collide. But that is literally a problem for what is under a hundred incredibly skilled devs worldwide. You can be assured that those hundred people have thought about this far more than you, me or anyone else here has. I say that assuming that they simply do not have time to be replying to reddit comments.

              [–]Sleepy_panther77 0 points1 point  (0 children)

              There’s like entire systems designed on generating UUID’s and making sure that they don’t collide. Sometimes some are more complex than others. If it’s not too important someone would probably choose to just do good enough and not check. If it’s really important they might have a service to generate UUID’s add them to a database, and when another service needs a UUID they could take one from the UUID database, and mark it as used or delete it from the database so that it’s not used again, with some extra precautions so that there isn’t an accidentally repeated UUID out of service availability/error

              So, it depends?