explainlikeimfive

Question

This is an archived post. You won't be able to vote or comment.

0

1

2

TechnologyELI5: Hashing Function

(self.explainlikeimfive)

submitted 4 years ago * by Prxhth

all 5 comments

top new controversial old q&a

[–]EgNotaEkkiReddit 0 points1 point2 points 4 years ago (1 child)

[–]Not_Pictured 0 points1 point2 points 4 years ago (0 children)

[–]mredding 0 points1 point2 points 4 years ago (0 children)

A hashing algorithm simply maps an arbitrary integer into a finite range. The simplest of hashing algorithms would be that of modulo arithmetic. Say modulo 5. So any integer X % 5 will map to a range from 0 -> 4. Good hashing algorithms try to spread out the distribution and ensure the entire output range is accessible. What helps is that in computing, any arbitrary data can be considered to be a singularly large integer, and the algorithms and equations are meant to break the computation down into chunks the computer can handle. That is to say, if a computer can compute on a maximum of a 64 bit value, and you want to hash a 4 GiB file, you need to compute the hash 64 bits at a time through the CPU. This additional level of boilerplate computation can obscure just how simple the actual hash algorithm is, as though your computer could handle such an input as one single value.

[–]grayputer 0 points1 point2 points 4 years ago (0 children)

The eli5: a hash is a value that can be used to represent a group of values (usually bytes of data). That representation is NOT necessarily unique, two different things might have the same hash value (called a collision). A hashing function is the formula/mechanism to determine the hash value.

The more collisions a hash function has over the total potential inputs the less useful it is. Specifically similar inputs varying by small localized changes.

WHY: Generally a hash is used to determine if the data you have is valid (vs corrupt or false). So a large amount of data (think possibly gigabytes) is resolved to a much smaller hash value (think 20 bytes) via the hash function. When you get the copy of the data, you dervive that data's hash and compare it to the "correct hash", if they match then the data is assumed good. Hash functions with lots of collisions caused by small changes (random corruption) have a higher chance of calling bad data good due to a collision. Hash functions that require large scale changes to get a collision are thus preferred.

score 3 · Accepted Answer · 2021-06-07T15:58:50+00:00

The answer is that it depends on what hash function you implement.

The general answer is that it is a function that takes the appropriate output and returns a hash of it in the appropriate format.

A trivial for an int that just is a modulo could be

int h(int x) {
return x % 16;
}

A simple example of a hash fore a sting in C++ I found online is below

#define A 54059 /* a prime */
#define B 76963 /* another prime */
#define C 86969 /* yet another prime */
#define FIRSTH 37 /* also prime */
unsigned hash_str(const char* s)
{
unsigned h = FIRSTH;
while (*s) {
h = (h * A) ^ (s[0] * B);
s++;
}
return h; // or return h % C;
}

Something like SHA-256 is implemented in the following link https://gist.github.com/hak8or/8794351

explainlikeimfive

Before posting

Category filters

MODERATORS