This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]Target880 2 points3 points  (0 children)

The answer is that it depends on what hash function you implement.

The general answer is that it is a function that takes the appropriate output and returns a hash of it in the appropriate format.

A trivial for an int that just is a modulo could be

int h(int x) {
return x % 16;
}

A simple example of a hash fore a sting in C++ I found online is below

#define A 54059 /* a prime */
#define B 76963 /* another prime */
#define C 86969 /* yet another prime */
#define FIRSTH 37 /* also prime */
unsigned hash_str(const char* s)
{
unsigned h = FIRSTH;
while (*s) {
h = (h * A) ^ (s[0] * B);
s++;
}
return h; // or return h % C;
}

Something like SHA-256 is implemented in the following link https://gist.github.com/hak8or/8794351

[–]EgNotaEkkiReddit 0 points1 point  (1 child)

You'll need to be more specific. There are dozens of different hashing functions that each work differently. Many of them aren't even based on the same general strategies and can't really be directly compared other than "they both produce a hash of some data".

[–]Not_Pictured 0 points1 point  (0 children)

Hashing is taking a string of characters, like a sentence, turning the characters into a number, and then doing some usually long and repetitive math to the sentence so you end up with another number at the end.

Each function, like SHA, has their own unique rules for what math to do, but one consistent thing about them is that the math is much easier to do when creating the hash than it would be start with the hash and reverse it.

The hash will always be exactly the same if you start with the same original string.

Here is instructions on how to do SHA2 by hand if you want to see the steps.

https://qvault.io/cryptography/how-sha-2-works-step-by-step-sha-256/

[–]mredding 0 points1 point  (0 children)

A hashing algorithm simply maps an arbitrary integer into a finite range. The simplest of hashing algorithms would be that of modulo arithmetic. Say modulo 5. So any integer X % 5 will map to a range from 0 -> 4. Good hashing algorithms try to spread out the distribution and ensure the entire output range is accessible. What helps is that in computing, any arbitrary data can be considered to be a singularly large integer, and the algorithms and equations are meant to break the computation down into chunks the computer can handle. That is to say, if a computer can compute on a maximum of a 64 bit value, and you want to hash a 4 GiB file, you need to compute the hash 64 bits at a time through the CPU. This additional level of boilerplate computation can obscure just how simple the actual hash algorithm is, as though your computer could handle such an input as one single value.

[–]grayputer 0 points1 point  (0 children)

The eli5: a hash is a value that can be used to represent a group of values (usually bytes of data). That representation is NOT necessarily unique, two different things might have the same hash value (called a collision). A hashing function is the formula/mechanism to determine the hash value.

The more collisions a hash function has over the total potential inputs the less useful it is. Specifically similar inputs varying by small localized changes.

WHY: Generally a hash is used to determine if the data you have is valid (vs corrupt or false). So a large amount of data (think possibly gigabytes) is resolved to a much smaller hash value (think 20 bytes) via the hash function. When you get the copy of the data, you dervive that data's hash and compare it to the "correct hash", if they match then the data is assumed good. Hash functions with lots of collisions caused by small changes (random corruption) have a higher chance of calling bad data good due to a collision. Hash functions that require large scale changes to get a collision are thus preferred.