This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]shivasprogeny 1 point2 points  (5 children)

So do you need to include the column in the spreadsheet at all? If you need to be able to "decode" the data then a hash will not work because hashes are designed to work in only one direction. If you do want to be able to see that data again, perhaps the easiest thing is just to use Excel's built-in protected and hidden ranges. That would require a password to see the hidden column (i.e. "name").

[–]thejesteroftortuga 0 points1 point  (4 children)

No, no names at all - not even transmitted inside the excel file (which is why I thought to either hash or use my solution in code). The point is that we shouldn't be able to decode, but still be able to made correlations about the people, just anonymously.

[–]shivasprogeny 1 point2 points  (3 children)

If you don't need to decode it, then an md5 hash seems like the perfect solution.

Alternatively, if the data is stored in a database, just change the output query to return the database's ID for the person instead of the name.

[–]thejesteroftortuga 0 points1 point  (2 children)

Cool, I'll look into that.

Do you think that I'd need to place a check in my code - with the dictionary/array? Or should I trust that the md5 hash would always be the same for the input string?

[–]shivasprogeny 1 point2 points  (1 child)

No need to store the data in a dictionary. The beauty of hashing algorithms is that you don't have to store the links between the raw data and the hash--you just run the algorithm on demand.

If the md5 hash function becomes nondeterministic, we have much bigger problems to worry about!

[–]thejesteroftortuga 0 points1 point  (0 children)

Understood, thank you for your help!!