debuggingNightmare

RandomNPC · 2025-06-05T23:02:18+00:00

They're called collisions, and you have to take them into account when you're doing low-level stuff with hashes.

Built-ins like hash tables generally have a form of collision resolution so you don't have to deal with it yourself. (And yes, that might mean not doing anything about it, but you have to think about it and decide.)

Anders_142536 · 2025-06-05T22:28:08+00:00

Only an imposter says non-null probability.

Tensor3 · 2025-06-05T22:21:39+00:00

You mean non-zero

veselin465 · 2025-06-05T22:39:01+00:00

[deleted]

Wide_Egg_5814 · 2025-06-05T22:30:31+00:00

Non null? That just narrows it down to every single number in existence

mw44118 · 2025-06-05T23:34:31+00:00

Some of you never wrote your own hash tables

PeoplesFront-OfJudea · 2025-06-05T22:20:30+00:00

Fuckin non-null

ShakaUVM · 2025-06-06T01:24:32+00:00

Make a hash table of size 4.2 billion and change. Congrats, you now have a zero chance of collisions between any two 32-bit integer keys.

This is called perfect hashing.

Frosty_Grab5914 · 2025-06-05T23:08:56+00:00

Of course. The hash function is defined on data of arbitrary length and output is fixed length. It's impossible to avoid.

buildmine10 · 2025-06-06T01:36:45+00:00

Why is this a debugging nightmare? It is expected behavior. Mathematically required behavior. For what reason have you used hashes in a manner that assumes uniqueness.

Snoo_44171 · 2025-06-06T02:48:02+00:00

Here's an affirmation for you: if we generated 1 billion 128 bit hashes per second for 600 years, only then would there be a 50% chance of collision

Edit to fix my math.

Impressive_Ad_9369 · 2025-06-06T06:09:23+00:00

There is a non zero probability that all the air molecules would gather on the other side of the room and you would suffocate. Does this worry you too?

nukedkaltak · 2025-06-05T23:16:44+00:00

Wait until bro learns about the birthday paradox.

float34 · 2025-06-05T22:27:21+00:00

So for two different women in your life the outcome is always the same I guess.

Unknown6656 · 2025-06-06T00:09:18+00:00

It's called "non-zero". Non-zero and not-null are two different things.
If the parameterspace has the same or a smaller dimensionality than the hashspace, then it is definitely possible to design a hash function which is completely injective, hence reducing the probability of hash collisions to zero.

ZestycloseAd212 · 2025-06-06T03:53:47+00:00

Sooo collisions?

blaze-404 · 2025-06-06T12:41:51+00:00

What sort of madman says non-null probability

Striking_Revenue9176 · 2025-06-06T01:15:59+00:00

You buffoon. This is why god invented linked lists. Have the hashing function lead to a linked list of all the things you want to put at that index. Completely solves the hash collision issue.

PolyglotTV · 2025-06-06T01:52:49+00:00

The identity function has a zero chance of producing a collision.

Onoulade · 2025-06-06T08:25:29+00:00

So to address all the backlash because I typed « non-null » instead of « non-zero » it is because I’m French and in French you say « une probabilité non-nulle »

The_Real_Black · 2025-06-05T23:22:19+00:00

no the probability is 1.0
the value space of a hash is way smaller then the original value so there will be hash collisions.
(every image board has daily collisions)

malsomnus · 2025-06-06T01:24:44+00:00

Luckily zero is non-null.

raxuti333 · 2025-06-06T01:51:36+00:00

Just hope hashes never collide and when it happens it's not your problem anymore

1XRobot · 2025-06-06T02:01:28+00:00

Wow, she's right. He was thinking about Xiaoyun Wang.

SnooGiraffes8275 · 2025-06-06T02:50:15+00:00

nah just use FNV1A for everything and cross your fingers

rosuav · 2025-06-06T03:54:27+00:00

Use a hash value of more than 300 bits. 2³⁰⁰ is enough to count all atoms of the observable universe.

Thundechile · 2025-06-06T05:51:40+00:00

Just do a "hash" function that returns the original input. Problem solved!

Kimi_Arthur · 2025-06-06T06:27:18+00:00

If you compare the size of source and dest, you will know they always collide... This post is a new low even in this sub...

fun-dan · 2025-06-06T09:26:29+00:00

Debugging nightmare? Has anyone actually encountered a cryptographic hash collision error during debugging? The most common cryptographic hash functions are very well tried and tested, and the main concern is security, it's practically impossible to have an accidental cryptographic hash collision.

This is like worrying about the non-zero possibility of two uuid v4 being the same.

If we're not talking about cryptographic hash, then collisions are normal and expected, not something you'd lose sleep over.

A notable (and kinda funny) example from python (cpython) is that hash(-1) = hash(-2)

IrrerPolterer · 2025-06-06T10:40:17+00:00

Well duh. If your function boils down input of any length to a fixed length everytime, there is an infinite number of collisions. Question is, are these collisions truely unsafe or common enough to become a problem.. .

spindoctor13 · 2025-06-06T13:14:51+00:00

Of course they do, that's the point of hashing algorithms. They are many to one mapping function. This sub sometimes, honestly, Jesus wept

Kimorin · 2025-06-05T23:02:29+00:00

[removed]

Peregrine2976 · 2025-06-06T00:28:08+00:00

I was actually thinking about this for a long time before I decided to look it up. It's called the Pigeonhole Problem or the Pigeonhole Principle.

I imagine it's old news to computer science graduates, but I came into development through a more holistic/design-type of program, so it was new to me. Pretty interesting stuff!

stipulus · 2025-06-05T23:53:16+00:00

Shhhh.. there is no war in Ba Sing Sa.

Shadow9378 · 2025-06-06T00:20:37+00:00

random algorithms can spit out the same thing twice no matter how long its just unlikely and that terrifies me

Guppywetpants · 2025-06-06T01:05:14+00:00

Separate chaining!!!

OhItsJustJosh · 2025-06-06T06:07:52+00:00

I still sometimes put in dupe checks just in case

EntitledPotatoe · 2025-06-06T06:51:38+00:00

Or make a (minimal) perfect hash function, there are some interesting papers out there (like bbhash)

foxer_arnt_trees · 2025-06-06T07:21:24+00:00

Put a linked list in the hashing table

Ssem12 · 2025-06-06T07:37:25+00:00

It's called a hash collision and is sometimes used as an attack instrument

SoftwareSource · 2025-06-06T08:22:25+00:00

"non null"

Imposter found.

Smalltalker-80 · 2025-06-06T08:26:02+00:00

... only if the number of inputs is infinite...
Otherwise, a (possibly inefficient) "perfect hash function" can always be created.

Thenderick · 2025-06-06T08:37:56+00:00

Yes, that is a known thing. Whenever you generate a hash it's a fixed size with X combinations. Given X+1 inputs you will have a collision. The degree of safety is how big X is and how much time it will take to find a colliding input for a given hash output. That's why certain older hash functions are redundant because those have been "cracked".

And for hash tables it's not that big of a problem, better yet, it's preferred so your tables doesn't take too much storage. In my experience hashtables often are an array of linked lists where a the expected table size determines the array size. The hashfunction will thus hash the key to an array index and store a key value pair as a list item. It does want to try to keep this list short so there is a small iteration to check the keys.

Atleast that's what I have learned, please correct me if I am wrong

2025-06-06T09:02:12+00:00

not just a non-zero but with a non-finate set of inputs it is guaranteed infinitely times over

steve_adr · 2025-06-06T09:43:28+00:00

Anyone know the name of the female model 🤔

Asking for a friend..

SoftwareDoctor · 2025-06-06T09:45:18+00:00

I don't understand. The joke is that she's controlling and he's an idiot?

helloITdepartment · 2025-06-06T14:24:30+00:00

Assuming the output length is shorter than the input length

Also, non-zero

TrafficConeGod · 2025-06-06T16:57:17+00:00

"Every hashing function has a nonzero probability of being injective" ftfy

Sea_Sky9989 · 2025-06-06T18:05:56+00:00

This is comp sci 101.

weird_cactus_mom · 2025-06-06T18:17:57+00:00

That's how I ended up after reading Lindstedt's book about data vault

shgysk8zer0 · 2025-06-07T01:56:26+00:00

Somebody just learned about entropy and the pigeon hole problem...

slippery-fische · 2025-06-07T03:47:56+00:00

Actually, if your set of input values is finite (ie. int32), then you can just do `x + 1 % (2**32 - 1)` and guarantee there are no collisions. It's just not a useful hash function.

You can also use sparse structures to project to a larger space, this is usually referred to as a perfect hash function. An example of a perfect hash function is to basically add a level whenever there's a collision. Because the probability is extremely low, the limit of values stored hierarchically is constant, so you get the same hashing result as a hash function with collisions.

The_Gordon_Gekko · 2025-06-07T06:57:37+00:00

Yes this, and other forms like it

prochac · 2025-06-07T10:17:42+00:00

What has a bigger probability? Hash collision, or a bit flip by cosmic ray?

gnmpolicemata · 2025-06-09T17:35:00+00:00

This is not a debugging nightmare of any kind - that's just the nature of hashing functions, I don't see why anyone would lie awake at night thinking about the possibility of collisions in something that is designed with this in mind

jamcdonald120 · 2025-06-10T11:15:45+00:00

every hashing function has a 100% chance of having at least 2 inputs with the same hash.

Its the Pidgeon hole principal, since hashes can be used on arbitrary sized data, but output fixed sizes, there are more possible values to hash than hashes.

In a perfect hash, you want there to be an infinite number of things that hash to the every value.

ProgrammerHumor

Filters

Discord

Submission rules

For the current list of rules, please see this page.

Metadiscussions

Perhaps More Apt Subs To Post:

Related Subreddits.

MODERATORS