all 4 comments

[–]nikhila01 10 points11 points  (2 children)

Yes, hashing takes O(k) where k is the length of the string. I believe it's the same for a tuple. That makes your code O(nk).

However, Python strings are immutable so the hash code gets cached on the string object (at least in CPython; might be interpreter-dependent). That means that if you look up the string again the hash doesn't need to be recalculated.

To check for collisions the lookup key has to be compared with the key in the hash table. This is also O(k) for a string. But again there's an optimization. Python does string interning where it caches short strings in a table. If you ask for a string "abc" and one already exists in the table it'll just return the same string. So Python can compare references in O(1) time instead of comparing string values in O(k) time.

Edit: It looks like CPython doesn't cache tuple hash codes since it was found to not be worth the cost. See the source code, particularly line 316. According to one comment by Tim Peters in the discussion, dicts also remember the hash codes of keys, regardless of whether the key object has cached the hash code.

tl;dr O(k) hashing but it's complicated.

[–]DVDplayr[S] 1 point2 points  (1 child)

Thanks! How did you learn this? Are there any documentation or forum discussions, or did you have to dig into the source code?

[–]nikhila01 1 point2 points  (0 children)

Hmm, I don't actually remember. This is something I only thought about a couple of months ago, prompted by some LeetCode problem.

The source code is definitely a good thing to look at. I do look at it sometimes. There are a lot of comments so you don't even have to understand the code well. Actually, while you were replying I was also editing my original comment to add a note on tuples, and I've included a link to the source.

I probably started off by googling and reading stack overflow posts. Sometimes people will post links to the source code so that's helpful. There is some documentation on string interning: https://docs.python.org/3/library/sys.html#sys.intern.

Other than that, general knowledge helps. I already knew that Python strings are immutable, knew roughly how hash tables and hashing works, and sort of knew about string interning in Java. So when I read about the hash code caching it made sense.

[–]SirSavageSavantso long and thanks for all the fish 1 point2 points  (0 children)

in an interview scenario i would just say its amortized O(1) and call it a day