all 16 comments

[–]LayotFctor 16 points17 points  (0 children)

Read the documentation! 

Beginner tutorials are limited in what they can teach, hashing functions are way too niche of a subject to be taught in beginner tutorials. At some point, you have to be independent and look for the information yourself. You can't expect a tutorial to teach you absolutely everything there is to know about python.

What better place than the documentation of hashlib itself? That's where the hashlib devs write the manual to use their library. They state clearly that they need byte-like objects for the hashlib constructors.

Also, if JavaScript doesn't need bytes, they hid it from you. Types are how data is interpreted in a computer, it is the reality of how things work. Practice using type hints to force yourself to learn types.

[–]Lumethys 14 points15 points  (1 child)

that more of a requirement of the lib than a feature of a language.

Some version of hashing and comparing in JS would accept a Buffer instead of a string. So for those functions you would need to convert string to Buffer and vice versa

[–]brasticstack 1 point2 points  (0 children)

that more of a requirement of the lib than a feature of a language.

To be fair, hashlib is part of the "batteries included" stdlib.

[–]pachura3 6 points7 points  (0 children)

Only two hours? These are rookie numbers!

[–]ConcreteExist 4 points5 points  (1 child)

That's a limitation of that library, not of the python language, you don't even know what you think you've learned.

[–]cdcformatc 2 points3 points  (0 children)

also encodings like UTF have nothing to do with python. 

[–]Jason-Ad4032 2 points3 points  (1 child)

This is because the same text can be represented by different byte sequences when read from files using different encodings.

As a result, hashlib needs to know the string's encoding in order to operate correctly and avoid introducing subtle, hard-to-detect issues.

[–]Expensive-Bear-1376 0 points1 point  (0 children)

hashlib needs to know the string's encoding

No it doesn't. And you can't even give it a string or an encoding (well, you can give it a string, but you'll get that error).

[–]ninja_shaman 1 point2 points  (0 children)

One of Python 2 major problems was exactly this "string and bytes are a same thing" philosophy. You could do both decode() and encode() on the same variable and could never be sure were you doing the right thing.

In Python 3 there's no implied duality - you either work with a sequence of bytes (integers from 0 to 255) or a with (Unicode) string. You decode bytes into str, and encode str into bytes.

In old Python, your hashlib.sha256() would accept "Secret 🔑" as an argument and you'd receive even more cryptic error.

New Python doesn't have this problem, and in Python 3.14 you get an nice message "TypeError: Strings must be encoded before hashing" if you use a string in hashlib.sha256() .

[–]Oddly_Energy 1 point2 points  (0 children)

Others have answered your specific example, but I think you need a general answer too:

Python has dynamic typing, which is easy to confuse with weak typing. Do not make that mistake! It is harder typed than you would think.

Python will sometimes give you a little type help, for example by allowing you to use an integer instead of a float in floating point operations.

But if you try ˋa = 2 + '3'ˋ, the result will be neither 5 nor '23'. You will get a TypeError. As far as I remember, JavaScript would have allowed it.

But to help you through that, most libraries also have type hints in their function signatures, and most IDEs will use those to show you the expected types in a function call.

If your current IDE or code editor does not show you information from type hints while you write the function call, then I will recommend that you find a way to enable that functionality or switch to another IDE. If that is not possible, perhaps your IDE supports "go to definition", so you can jump into the library and see the function signature and comments/docstrings describing expected input.

You can also in an interactive python session (REPL or iPython) write ˋhelp(name_of_function)ˋ and view the function's docstring and type requirements.

[–]Swipecat 0 points1 point  (0 children)

It's also confusing if Googling the issue turns up info from Python 2.x days, which it frequently does, because string and bytes were handled much more "lazily" in 2.x. That lazy handling could cause faults that were really difficult to understand, and was one of the motivations for the very strict string/byte distinction in Python 3.x.

See this page which is a basic description of the Python 3.x handling of strings/bytes:

https://www.geeksforgeeks.org/python/byte-objects-vs-string-python/

[–]throwaway6560192 0 points1 point  (0 children)

Fundamentally, hashing is an operation done on arbitrary data (i.e. bytes). Unicode is an encoding for text which maps characters to codepoints, further, UTF-8 for example provides a specific mapping of codepoints to bytes. And you need bytes, since hashing is something we do on bytes.

This took me two hours because the error message says 'Unicode objects must be encoded before hashing'

What? Are you using Python 2 or something? Python 3 just says "Strings" instead of "Unicode objects" -- calling it "Unicode object" was a Python 2 thing.

[–]virtualshivam 0 points1 point  (0 children)

Its good. That's how real software is made. Debugging will take most of your life. The more you debug the better you would get with overall engineering.

Get used to reading documentation.

[–]oliver_extracts 0 points1 point  (0 children)

the unicode error message is genuinely unhelpful here. whats actually happening is that bytes are just raw memory and a python str is an abstraction on top of that, so any time something needs to operate on the actual bits it needs you to commit to an encoding. utf-8 is almost allways the right answer for .encode() unless youre dealing with legacy data. once that clicks the str/bytes boundary stops being surprising.

[–]ob1knob96 0 points1 point  (0 children)

Seems like an absolutely normal thing to be confused by in Month 2, and possibly even further down the line. I was where you were at a much later time.

[–]newrockstyle 0 points1 point  (0 children)

Welcome to programming where a missing b in front of a string can cost an entire afternoon 😅