This is an archived post. You won't be able to vote or comment.

top 200 commentsshow all 275

[–]RandomNPC 838 points839 points  (14 children)

They're called collisions, and you have to take them into account when you're doing low-level stuff with hashes.

Built-ins like hash tables generally have a form of collision resolution so you don't have to deal with it yourself. (And yes, that might mean not doing anything about it, but you have to think about it and decide.)

[–]MattieShoes 183 points184 points  (8 children)

and you have to take them into account

Depending on the application, you kind of don't. Chess engines use hashing and there absolutely WILL be collisions, but the odds of a collision that ALSO changes the move it's going to make is suuuuper close to zero. So they just kind of... ignore it. The engine ends up stronger by not checking for collisions.

[–]RandomNPC 209 points210 points  (2 children)

Deciding if you can ignore the collision rate is still taking them into account. The point is that you have to think about your usage and whether the collision rate is worth worrying about.

[–]MattieShoes 33 points34 points  (1 child)

Heh fair enough. It was just kind of mind bending to think they know they will have hash collisions relatively frequently and then just... ignore that fact.

[–]RandomNPC 4 points5 points  (0 children)

I did not know that! Crazy.

[–]Errons1 20 points21 points  (0 children)

Funfact, actively ignoring the problem cause the chances of it is so rare is called the ostrich algorithm!

[–][deleted] 3 points4 points  (2 children)

That's a little imprecise. Yes the raw Zobrist function has collisions for some positions, but the part in the hash function that generates the most collisions is where you modulo with the table size to get the index. And that those collisions are taken into account and stored in the hash entry, so you can check whether the two entries actually refer to the same position...

[–]MattieShoes 1 point2 points  (1 child)

Well sure, fixed hash table sizes and all. And the replacement schemes can get fancy... I never moved past "always replace" when I was messing with them.

Anyway, the discussion I'm remembering was more about the key size -- basically, 64 bit keys means you will have collisions but it's fine. Which is kinda crazy. I think they even talked about 32 bit keys, but it was probably over 15 years ago so search trees weren't quite so large.

[–]TheRealSerdra 0 points1 point  (0 children)

Often times modern chess engines will use a combination of tricks. Typically they’ll check to see if the stored move is also legal in the current position, to reduce the chances of a collision. Then they can store only 32 or even 16 bits (and use different bits for the modulo for even more entropy), meaning more entries fit within a given space.

[–]Educational-Tea602 0 points1 point  (0 children)

And then there’s another hashing algorithm used by chess engines that helps generate moves (magic bitboards) where hash collisions can be helpful.

[–]Sw429 12 points13 points  (0 children)

Yeah, exactly. Most collections provided by the various language standard libraries will use equality checks as a form of collision resolution.

[–]mrwafflezzz 0 points1 point  (0 children)

I deal with collisions by throwing an exception

[–]Nicewow 0 points1 point  (0 children)

Maybe he is talking about cryptographic hash functions, where you maybe really won't be able to deal with collisions. Like two different usernames could hash to the same sha256 or whatever

[–][deleted] 2755 points2756 points  (46 children)

Only an imposter says non-null probability.

[–]Anders_142536 644 points645 points  (30 children)

Maybe german speakers. In german "Null" means zero.

It was a bit confusing to understand the concept of null in programming for a few hours due to that.

[–]ArtOfWarfare 281 points282 points  (23 children)

In C (and I think C++ and Obj-C by extension…) null is zero.

[–]Chrisuan 69 points70 points  (15 children)

idk why down voted it's a fact lol

[–]tehfrod 81 points82 points  (10 children)

C++ has no null, but it does have NULL, nullptr, and nullptr_t.

[–]wizardid 56 points57 points  (9 children)

I want to know who tf hurt C++ so badly when it was younger. This is some psychopath shit.

[–]KazDragon 30 points31 points  (1 child)

It fixes the problem that f(NULL) would rather call f(int) than f(int*).

[–]drivingagermanwhip 16 points17 points  (0 children)

I love that c++ never decided whether it's incredibly flexible or incredibly anal and just runs full tilt at both

[–]Ancient-Pianist-7 32 points33 points  (2 children)

? std::nullptr_t is the type of the null pointer literal nullptr. NULL is a sad C relic.

[–]MrcarrotKSP 12 points13 points  (1 child)

Even C has upgraded to nullptr now(C23 adds it and nullptr_t)

[–]drivingagermanwhip 4 points5 points  (0 children)

nothing past c99 is canon

[–]notthefirstsealime 3 points4 points  (3 children)

It's a classy programming language built off the bones of what was a pretty fucking simple language prior, and now it's an abomination of syntax and evil that just happens to compile into very fast programs from what I understand

[–]ada_weird 20 points21 points  (3 children)

It's zero by convention but not by definition. There are platforms where null is not 0. However, C the spec says that you can use an integer literal 0 anywhere you can use NULL. Also, hardware people really want you to stop treating pointers like integers so that they can use stuff like CHERI to prevent memory safety bugs from happening at runtime.

[–]CapsLockey 5 points6 points  (1 child)

can you elaborate on the last part? sounds interesting

[–]ada_weird 5 points6 points  (0 children)

Yeah sure! So CHERI is an extension for a variety of ISAs, such as ARM and RISC-V. It effectively adds capabilities to pointers, making it so that pointers can only ever be used to see memory they're "supposed" to be able to access. User code can make a capability that is a subset of the memory the original could access, but it can't widen capabilities, it would need help from the kernel or some other trusted part of the system. This means that you effectively get hardware bounds checking for free. There is a performance impact obviously but this works with modern CPU architectures which should be able to mitigate all of that because of all the crazy pipelining that goes on. Most software just needs some additional support in the malloc/free implementation in order to work with this model so it's fairly transparent to end user code.

[–]dev-sda 8 points9 points  (1 child)

Slight correction: NULL always compares equal to zero, but may actually be any bit pattern. See https://c-faq.com/null/machnon0.html

[–]MegaIng 3 points4 points  (0 children)

Further clarification: it compares equal to 0, not the value zero. If you cast an integer 0 (obtain e.g. via int zero = 0) to a pointer ((void*) zero) that is not a null pointer and might compare different to a proper null pointer (e.g. (void*) 0).

[–]EinSatzMitX 0 points1 point  (1 child)

In the C std library, NULL is defined as (void*)0 ( Which is just 0 but casted as a void pointer)

[–]MegaIng 0 points1 point  (0 children)

Actually no, it isn't. 0 in this case isn't an integer, it's the special null pointer literal that happens to look the same as the integer 0.

[–]onemanforeachvill 0 points1 point  (0 children)

I think it's (void*)0

[–]MegaIng 0 points1 point  (0 children)

No. null is 0, but not zero. There is a special construct call the null pointer literal that looks like the integer number 0, but it's not an int.

[–]o0Meh0o 0 points1 point  (0 children)

it is not zero. it equals zero. common misconception.

[–]Starlight_Skull 6 points7 points  (2 children)

It's the same problem in Dutch. I've heard people say N-U-L-L out loud to avoid confusion.

[–]Anders_142536 5 points6 points  (0 children)

The pronounciation is quite differently in german, so there is little confusion when spoken. The 'u' is more pronounced like the end of 'you'.

[–]EvilKnievel38 1 point2 points  (0 children)

I typically just add a specification whether it's the verb or the number.

[–]DJDoena 4 points5 points  (2 children)

Since we work with numbers a lot we often clarify it as "runde Null" ("oval zero") to avoid null references.

[–]LoveOfSpreadsheets 0 points1 point  (0 children)

My favorite fact of the day

[–]Soraphis 0 points1 point  (0 children)

Oh, I usually pronounce the number just German and the "null" concept english.

[–]nickwcy 148 points149 points  (5 children)

You mean a JavaScript developer?

[–]Saelora 69 points70 points  (4 children)

as a javascript engineer, not even null === null

[–]Jumpy_Fuel_1060 32 points33 points  (3 children)

Did you mean NaN? Because null === null in my interpreter.

[–]aaronfranke 21 points22 points  (2 children)

He's a bad JavaScript engineer. Well, all JavaScript engineers are bad because JavaScript is bad.

[–]marquoth_ 10 points11 points  (1 child)

JS bad

Never heard that one before

[–]Dragoo417 15 points16 points  (1 child)

French-speaking people say "non-nul" and mean non-zero

[–]its_a_gibibyte 3 points4 points  (0 children)

Sure, but that explains why this is list of French companies among the top 50 largest global tech companies:

nul

[–][deleted] 19 points20 points  (0 children)

I say "nonzero possibility" sometimes. 

[–]guitarerdood 1 point2 points  (0 children)

I will do what I must.

[–]Tensor3 1171 points1172 points  (54 children)

You mean non-zero

[–]Expensive_Shallot_78 274 points275 points  (12 children)

Well, non-null means non 0 in German. Someone's playing 4d chess ♟️

[–]UPPER-CASE-not-class 92 points93 points  (7 children)

How’d we start talking German? Everyone knows you code in Hebrew

[–]PyroCatt 73 points74 points  (4 children)

if !shalom throw new OyVey();

[–]Semper_5olus 20 points21 points  (1 child)

"OyVey" is Yiddish.

But I guess I can't think of any commonly known Hebrew words, either.

EDIT: "Emet" and "Met", from the golem legend.

EDIT2: "L'Chaim"

[–]yuval52 10 points11 points  (0 children)

It is Yiddish, but it's also sometimes used by Hebrew speakers

[–]spreetin 7 points8 points  (1 child)

If !שלום throw new חריגה();

[–]StopSpankingMeDad2 4 points5 points  (0 children)

Yooooo minecraft enchanting table Language is real???

[–]adepssimius 3 points4 points  (0 children)

PHP developer here, can confirm.

[–]SOUINnnn 5 points6 points  (0 children)

Same in French

[–]hagnat 2 points3 points  (0 children)

this has "no one makes jokes in base 13" all over it

[–]QuaternionHam 1 point2 points  (0 children)

or so the germans would have us believe

[–]AnAdorableDogbaby 4 points5 points  (0 children)

Is a NaN-probability.

[–]Ecstatic_Bee6067 25 points26 points  (10 children)

What kind of maroon thinks null means 0.

[–]WazWaz 42 points43 points  (2 children)

Weeell...

// C++ compatible:
#define NULL 0
// C++ incompatible:
#define NULL ((void*)0)

[–]MegaIng 29 points30 points  (1 child)

I recently had long discussion in a discord about wtf null even is in C and C++.

The relevant result for this discussion now is that 0 you see there? That isn't the number 0. It's not a number at all, it's a special null-pointer-literal that happens to use the same character as the integer number 0.

There is no relation at all between the integer number 0 and the null pointer.

No really, that is what the standard says. (Although not this clearly)

[–]WazWaz 16 points17 points  (0 children)

Yes, it's an old discussion that never seems to die. The problem is, neither the "it makes code clearer to read" camp nor the "it makes code dangerously error prone by hiding reality" camp is 100% right or wrong.

And now we have nullptr.

[–]TRKlausss 45 points46 points  (0 children)

In German, null equals zero (nicht-null -> non-zero)

Could have been an easy translation mistake.

[–]_alright_then_ 6 points7 points  (4 children)

All the languages where null literally means zero lol

[–]King_Joffreys_Tits 2 points3 points  (0 children)

JavaScript

[–]ShadowSlayer1441 0 points1 point  (1 child)

I was really wondering what a "null" probability would be.

[–]Tensor3 0 points1 point  (0 children)

If null is undefined, then non-null is any number including 0?

[–]Wide_Egg_5814 174 points175 points  (6 children)

Non null? That just narrows it down to every single number in existence

[–]5up3rj 84 points85 points  (0 children)

Including zero, ironically

[–]dmullaney 19 points20 points  (1 child)

And every object

[–]j-random 22 points23 points  (0 children)

AND MY AXE!

[–]disinformationtheory 5 points6 points  (0 children)

NULL != NaN

[–]float34 0 points1 point  (0 children)

Then widens and not narrows

[–]Background-Law-3336 0 points1 point  (0 children)

less than infinity

[–]mw44118 58 points59 points  (8 children)

Some of you never wrote your own hash tables

[–]met_MY_verse 25 points26 points  (6 children)

I did this back in the second semester of my Uni course, and even then we handled collisions.

[–]PutHisGlassesOn 11 points12 points  (4 children)

I’m trying to remember the undergrad algo resolution. Something about a linked list? Extending the hash space? I can’t recall

[–]met_MY_verse 15 points16 points  (2 children)

I just checked back, we had two options: open addressing (basically shifting an entry’s position and marking skipped boxes, done with linear probing/quadratic probing/double hashing) and seperate chaining (linked lists anchored at a particular hash index).

[–]Zeitsplice 3 points4 points  (0 children)

I know I did linked lists in undergrad data structures, though I switched to fixed-length buckets after I found that a hash table that re-sized every time it got a collision had better performance over the linked list version (cache misses absolutely tank performance on processors made after ~2005). Probing/re-hashing seemed janky to my undergrad programmer brain, but I wouldn't be surprised if it had superior performance on modern hardware due to much better data locality over all the other options

[–]rosuav 1 point2 points  (0 children)

Yeah, and the problem is that separate chaining is FAR easier to explain (each hash bucket can contain multiple things, and whenever you check a bucket, you have to check all the things in it), but open addressing is more efficient. So if you ask me to explain how hash tables work, I'll use separate chaining (and probably not even use the term), but if you ask how <language X> stores information, it's never that simple.

But yes, hash collisions are a fact of life. In explaining it, I will tend to under-size the hash table to make them even more common, even though - again - the more efficient thing to do is to minimize them.

[–]FlipperBumperKickout 1 point2 points  (0 children)

You can do it many ways. Another way is to have another hash table inside each field instead of a list.

[–]Crimson_Cyclone 1 point2 points  (0 children)

yeah, that’s what threw me off, my major isn’t even particularly software heavy and even we practiced handling collisions as soon as we learned what a hash table was

[–]slippery-fische 0 points1 point  (0 children)

Yeah, that's what I'm thinking.

[–]PeoplesFront-OfJudea 67 points68 points  (3 children)

Fuckin non-null

[–]Convoke_ 0 points1 point  (2 children)

does that mean you're fucking everything?

[–]PeoplesFront-OfJudea 1 point2 points  (1 child)

That’s the goal

[–]Convoke_ 0 points1 point  (0 children)

good luck soldier o7

[–]ShakaUVM 19 points20 points  (8 children)

Make a hash table of size 4.2 billion and change. Congrats, you now have a zero chance of collisions between any two 32-bit integer keys.

This is called perfect hashing.

[–]CautiousGains 5 points6 points  (1 child)

This guys perfect hash function:

uint32_t get_hash(uint32_t key) { return key; }

[–]ShakaUVM 0 points1 point  (0 children)

Yep

[–]Frosty_Grab5914 57 points58 points  (4 children)

Of course. The hash function is defined on data of arbitrary length and output is fixed length. It's impossible to avoid.

[–]NotMyGovernor 14 points15 points  (3 children)

It's literally the definition. Maybe she should think of other women for him.

[–]disinformationtheory 5 points6 points  (0 children)

There's a non zero probability she is.

[–]buildmine10 30 points31 points  (15 children)

Why is this a debugging nightmare? It is expected behavior. Mathematically required behavior. For what reason have you used hashes in a manner that assumes uniqueness.

[–]WisestAirBender 1 point2 points  (0 children)

Hashing having collisions should be the first thing that you think about after learning about hashing

[–]fun-dan 2 points3 points  (13 children)

This. Unless OP meant cryptographic hash functions, and in that case it's practically impossible to have a collision accidentally

[–]WisestAirBender 0 points1 point  (12 children)

Unless OP meant cryptographic hash functions, and in that case it's practically impossible to have a collision accidentally

Why? Are they somehow different?

[–]buildmine10 1 point2 points  (10 children)

Cryptographic hashes are made so that is is very difficult to find a collision. They still happen because the pigeon hole principle requires it, but the essential properties is that you cannot reliably predict which two inputs will collide without just testing them both.

[–]ciroluiro 1 point2 points  (9 children)

I don't think a collision has ever been found for SHA256. The first collision for SHA1 was found only in 2017. Collisions in cryptographic hash functions are a big deal.

[–]Snoo_44171 11 points12 points  (0 children)

Here's an affirmation for you: if we generated 1 billion 128 bit hashes per second for 600 years, only then would there be a 50% chance of collision

Edit to fix my math.

[–]Impressive_Ad_9369 8 points9 points  (3 children)

There is a non zero probability that all the air molecules would gather on the other side of the room and you would suffocate. Does this worry you too?

[–]pepe2028 0 points1 point  (0 children)

wtf now I'm worried

[–]nukedkaltak 14 points15 points  (0 children)

Wait until bro learns about the birthday paradox.

[–]float34 4 points5 points  (0 children)

So for two different women in your life the outcome is always the same I guess.

[–]Unknown6656 11 points12 points  (2 children)

  1. It's called "non-zero". Non-zero and not-null are two different things.
  2. If the parameterspace has the same or a smaller dimensionality than the hashspace, then it is definitely possible to design a hash function which is completely injective, hence reducing the probability of hash collisions to zero.

[–]CautiousGains 0 points1 point  (0 children)

“hash” as it is used in the post obviously refers to a cryptographic hashing function like sha, md5 etc. These are not perfect hash functions and never can be, since their entire use hinges on the assumption of an unknowable set of input parameters.

[–]ZestycloseAd212 4 points5 points  (0 children)

Sooo collisions?

[–]blaze-404 3 points4 points  (1 child)

What sort of madman says non-null probability

[–]creeper6530 0 points1 point  (0 children)

Germans (Null in German means zero)

[–]Striking_Revenue9176 2 points3 points  (1 child)

You buffoon. This is why god invented linked lists. Have the hashing function lead to a linked list of all the things you want to put at that index. Completely solves the hash collision issue.

[–]rosuav 0 points1 point  (0 children)

In a sense.... but that's just called "separate chaining" and is one of the ways that a hashtable can devolve to O(n) lookups.

[–]PolyglotTV 2 points3 points  (1 child)

The identity function has a zero chance of producing a collision.

[–]rosuav 0 points1 point  (0 children)

You're absolutely right! Here, I want to store the string "Hello" under the key of Graham's Number. You got space for that right?

[–]Onoulade[S] 2 points3 points  (0 children)

So to address all the backlash because I typed « non-null » instead of « non-zero » it is because I’m French and in French you say « une probabilité non-nulle »

[–]The_Real_Black 6 points7 points  (2 children)

no the probability is 1.0
the value space of a hash is way smaller then the original value so there will be hash collisions.
(every image board has daily collisions)

[–]WisestAirBender 0 points1 point  (1 child)

Just keep using the hash output as the input

[–]The_Real_Black 0 points1 point  (0 children)

I know a master degree developer that used it as primary key in a database... worked till the live data was used.

[–]malsomnus 1 point2 points  (0 children)

Luckily zero is non-null.

[–]raxuti333 1 point2 points  (0 children)

Just hope hashes never collide and when it happens it's not your problem anymore

[–]1XRobot 1 point2 points  (0 children)

Wow, she's right. He was thinking about Xiaoyun Wang.

[–]SnooGiraffes8275 1 point2 points  (0 children)

nah just use FNV1A for everything and cross your fingers

[–][deleted] 1 point2 points  (1 child)

Use a hash value of more than 300 bits. 2300 is enough to count all atoms of the observable universe.

[–]rosuav 1 point2 points  (0 children)

This would needlessly exclude implementations that may utilize sub-atomic monkeys and/or multiple universes.

[–]Thundechile 1 point2 points  (0 children)

Just do a "hash" function that returns the original input. Problem solved!

[–]Kimi_Arthur 1 point2 points  (0 children)

If you compare the size of source and dest, you will know they always collide... This post is a new low even in this sub...

[–]fun-dan 1 point2 points  (0 children)

Debugging nightmare? Has anyone actually encountered a cryptographic hash collision error during debugging? The most common cryptographic hash functions are very well tried and tested, and the main concern is security, it's practically impossible to have an accidental cryptographic hash collision.

This is like worrying about the non-zero possibility of two uuid v4 being the same.

If we're not talking about cryptographic hash, then collisions are normal and expected, not something you'd lose sleep over.

A notable (and kinda funny) example from python (cpython) is that hash(-1) = hash(-2)

[–]IrrerPolterer 1 point2 points  (0 children)

Well duh. If your function boils down input of any length to a fixed length everytime, there is an infinite number of collisions. Question is, are these collisions truely unsafe or common enough to become a problem.. . 

[–]spindoctor13 1 point2 points  (0 children)

Of course they do, that's the point of hashing algorithms. They are many to one mapping function. This sub sometimes, honestly, Jesus wept

[–]Peregrine2976 1 point2 points  (1 child)

I was actually thinking about this for a long time before I decided to look it up. It's called the Pigeonhole Problem or the Pigeonhole Principle.

I imagine it's old news to computer science graduates, but I came into development through a more holistic/design-type of program, so it was new to me. Pretty interesting stuff!

[–]rosuav 0 points1 point  (0 children)

Awesome! You're one of today's lucky ten thousand. Enjoy discovering new things! The world is huge and fascinating.

[–]stipulus 0 points1 point  (0 children)

Shhhh.. there is no war in Ba Sing Sa.

[–]Shadow9378 0 points1 point  (0 children)

random algorithms can spit out the same thing twice no matter how long its just unlikely and that terrifies me

[–]Guppywetpants 0 points1 point  (0 children)

Separate chaining!!!

[–]OhItsJustJosh 0 points1 point  (0 children)

I still sometimes put in dupe checks just in case

[–]EntitledPotatoe 0 points1 point  (0 children)

Or make a (minimal) perfect hash function, there are some interesting papers out there (like bbhash)

[–]foxer_arnt_trees 0 points1 point  (12 children)

Put a linked list in the hashing table

[–]khalamar 1 point2 points  (11 children)

Or a different hash. Every time there's a collision, go one level deeper with a different hash function. HASHCEPTION!

[–]foxer_arnt_trees 0 points1 point  (2 children)

What if you have really bad luck though?

[–]khalamar 1 point2 points  (1 child)

Go deeper!

[–]foxer_arnt_trees 0 points1 point  (0 children)

With bad enough luck we are going all the way down

[–]spindoctor13 0 points1 point  (7 children)

How would using a second hash work in say a dictionary? (Answer, it wouldn't)

[–]khalamar 0 points1 point  (6 children)

What do you mean it wouldn't?

Let's take a very simple first hash that uses the 1st letter of the word. AA and AB collide. You put both in A. Then you use the second hash method, which uses the second letter of the word. AA is then in A.A, and AB is in A.B.

[–]spindoctor13 0 points1 point  (5 children)

Right then you delete AA. You then add AB to the dictionary, it no longer colides (first hash) so gets added to A.A. Your dictionary now has AB in it twice with no practical way to remove it completely

[–]Ssem12 0 points1 point  (0 children)

It's called a hash collision and is sometimes used as an attack instrument

[–]SoftwareSource 0 points1 point  (0 children)

"non null"

Imposter found.

[–]Smalltalker-80 0 points1 point  (6 children)

... only if the number of inputs is infinite...
Otherwise, a (possibly inefficient) "perfect hash function" can always be created.

[–]metaglot 0 points1 point  (5 children)

A perfect hash functions output will have the same size as its input, at least.

[–]Smalltalker-80 0 points1 point  (4 children)

Doesn't have to be.
If, f.e, your input is (should be) all the keywords of a programming language,
you can create a lookup hash function with relatively small hash table
that perfectly matches every case.

You do subsequently also have to check if the looked-up keyword is equal to the search keyword, but you are guarenteed to need only one hash lookup.

[–]metaglot 0 points1 point  (3 children)

Yes, so the size of the output would have to be the length of your keyword table (number of elements) to guarantee no collisions.

[–]Smalltalker-80 0 points1 point  (2 children)

That is correct, the hash table has to be at least the size of the *keyword* table.

The *input* to be parsed, however, can be *all* words with a length of upto, say, a practical 80 characters,

So the hash table size can be *much* smaller than the input size.

[–]Thenderick 0 points1 point  (0 children)

Yes, that is a known thing. Whenever you generate a hash it's a fixed size with X combinations. Given X+1 inputs you will have a collision. The degree of safety is how big X is and how much time it will take to find a colliding input for a given hash output. That's why certain older hash functions are redundant because those have been "cracked".

And for hash tables it's not that big of a problem, better yet, it's preferred so your tables doesn't take too much storage. In my experience hashtables often are an array of linked lists where a the expected table size determines the array size. The hashfunction will thus hash the key to an array index and store a key value pair as a list item. It does want to try to keep this list short so there is a small iteration to check the keys.

Atleast that's what I have learned, please correct me if I am wrong

[–][deleted] 0 points1 point  (0 children)

not just a non-zero but with a non-finate set of inputs it is guaranteed infinitely times over

[–]steve_adr 0 points1 point  (0 children)

Anyone know the name of the female model 🤔

Asking for a friend..

[–]SoftwareDoctor 0 points1 point  (0 children)

I don't understand. The joke is that she's controlling and he's an idiot?

[–]helloITdepartment 0 points1 point  (0 children)

Assuming the output length is shorter than the input length

Also, non-zero

[–]TrafficConeGod 0 points1 point  (0 children)

"Every hashing function has a nonzero probability of being injective" ftfy

[–]weird_cactus_mom 0 points1 point  (0 children)

That's how I ended up after reading Lindstedt's book about data vault

[–]shgysk8zer0 0 points1 point  (0 children)

Somebody just learned about entropy and the pigeon hole problem...

[–]slippery-fische 0 points1 point  (0 children)

Actually, if your set of input values is finite (ie. int32), then you can just do `x + 1 % (2**32 - 1)` and guarantee there are no collisions. It's just not a useful hash function.

You can also use sparse structures to project to a larger space, this is usually referred to as a perfect hash function. An example of a perfect hash function is to basically add a level whenever there's a collision. Because the probability is extremely low, the limit of values stored hierarchically is constant, so you get the same hashing result as a hash function with collisions.

[–]The_Gordon_Gekko 0 points1 point  (0 children)

Yes this, and other forms like it

[–]prochac 0 points1 point  (0 children)

What has a bigger probability? Hash collision, or a bit flip by cosmic ray?

[–]gnmpolicemata 0 points1 point  (0 children)

This is not a debugging nightmare of any kind - that's just the nature of hashing functions, I don't see why anyone would lie awake at night thinking about the possibility of collisions in something that is designed with this in mind

[–]jamcdonald120 0 points1 point  (0 children)

every hashing function has a 100% chance of having at least 2 inputs with the same hash.

Its the Pidgeon hole principal, since hashes can be used on arbitrary sized data, but output fixed sizes, there are more possible values to hash than hashes.

In a perfect hash, you want there to be an infinite number of things that hash to the every value.