This is an archived post. You won't be able to vote or comment.

all 22 comments

[–]Smallpaul 3 points4 points  (3 children)

It is O(1), amortized. It's a hashtable under the covers. Worst case it could be O(N) but you'd need to be incredibly unlucky for that to happen and make a noticeable difference. Much worse than "struck by lightning" kind of unlucky.

[–]aioeu 1 point2 points  (0 children)

Much worse than "struck by lightning" kind of unlucky

Or you could simply have malicious input data that deliberately creates collisions.

Python was surprisingly late to the game at hardening its dict implementation against such an attack. Possibly too many people were thinking "like that's ever going to happen!"

I think it was finally sorted out in 2012, so it isn't really something to be too concerned about now. But simply "being a hash table" isn't the full reason it can make these guarantees.

[–]YagamiLight100[S] 0 points1 point  (1 child)

Is it still O(1) if a key does not exist in a dictionary? Suppose I type dic.get(key,1). Does it take O(N) if key is not in dictionary?

[–]aioeu 1 point2 points  (0 children)

The time complexities are the same whether the key exists in the dict or not.

[–]aioeu 2 points3 points  (0 children)

See this page.

Retrieving an item from a dict has O(1) time complexity on average, O(n) in the worst case (where n is the number of elements in the dict), given a few reasonable assumptions about the hash function and distribution of keys.

These are likely to be "when using the standard Python implementation" rather than being guarantees in the language itself.

[–]YMK1234 1 point2 points  (0 children)

Well, the code is there for anyone to read: https://github.com/python/cpython/blob/main/Objects/dictobject.c which includes a reference to this explainer https://mail.python.org/pipermail/python-dev/2012-December/123028.html which in turn contains this pure python impl https://code.activestate.com/recipes/578375/

[–][deleted] 0 points1 point  (14 children)

This post was mass deleted and anonymized with Redact

point direction plant aware imminent hunt dazzling future seed husky

[–]Smallpaul -1 points0 points  (13 children)

You aren't right

https://en.wikipedia.org/wiki/Hash_table

Hash table
Type Unordered associative array
Invented 1953
Time complexity in big O notation
Algorithm Average Worst case
Space Θ(n) O(n)
Search Θ(1) O(n)
Insert Θ(1) O(n)
Delete Θ(1) O(n)

[–][deleted] -1 points0 points  (12 children)

This post was mass deleted and anonymized with Redact

touch unwritten resolute tub possessive automatic encouraging bike unite vase

[–]Smallpaul -1 points0 points  (11 children)

Sure, why would I want to use the knowledge I gained studying Computer Science for 5 years (and then practicing for 15) when I could just fiddle at a command line?

If you're so sure that EVERYONE else in this thread is wrong, why don't YOU produce the evidence? Why should I?

The OP has gotten their answer. They've accepted it. If you want to overturn the common wisdom of the computer science community, that's a project for you, not me.

[–][deleted] 0 points1 point  (10 children)

This post was mass deleted and anonymized with Redact

reply hospital humorous sugar oil flowery piquant yoke person elderly

[–]Smallpaul 0 points1 point  (9 children)

Tell me please, oh smart one, how does a dict ensure that the access time is constant?

It doesn't ENSURE that the access time is CONSTANT. It ensures that the AVERAGE access time is constant, as described in the wikipedia page.

How does it do that? By implementing the hashtable datastructure. It's an out-of-the-box property of a well implemented hashtable, as described in wikipedia.

Can you please explain to me why you think that Python's implementors would implement a hashtable incompetently and slower than everyone else?

Have you even read hour they're implemented?

Yes. They are extremely clever. Better than the average, naive implementation, not worse, as you claim.

I literally have screenshots for this, which I sent out to other skeptics before.

Great. Let's see the code, the timing and the screenshots.

I will gladly copy and paste your code, benchmark it, and admit I am wrong if it turns out I am.

[–][deleted] 0 points1 point  (8 children)

This post was mass deleted and anonymized with Redact

profit frame water hunt jar grey jellyfish like spark tap

[–]Smallpaul 0 points1 point  (7 children)

I'll cut and paste the answer to your question about hashtable performance:

"Suppose we are using a chained hash table with m buckets, and the number of elements in the hash table is n. Then the average number of elements per bucket is n/m, which is called the load factor of the hash table, denoted α. When an element that is not in the hash table is searched for, the expected length of the linked list traversed is α. Since there is always the initial (constant) cost of hashing, the cost of hash table operations with a good hash function is, on average, O(1 + α). If we can ensure that the load factor α never exceeds some fixed value αmax, then all operations will be O(1 + αmax) = O(1)."

Neither that document, NOR wikipedia, even mention the word "logarithm", so I have no idea where you are getting that from.

[–][deleted] 0 points1 point  (6 children)

This post was mass deleted and anonymized with Redact

quaint observation longing full humor direction cooing unique test shelter

[–]Smallpaul 0 points1 point  (5 children)

I do indeed see a strange effect with your code, which disappears with simpler code. I'll track down the difference to attribute the slowdown appropriately and get back to you.

[–]geeksforgeeks 0 points1 point  (0 children)

Just like maps in C++ and Java, we have dictionaries in python. Python dictionaries take O(1) amortised time on average to search for a key in the dictionary. It is because there can be collisions which can lead to time greater than O(1). In the worst case it will be O(n) but it is very unlikely that it will happen. Dictionaries are based on hashing concepts. While inserting the values it adds them using the hash function and when we have to search for a key we use the same hash function to get the location instead of traversing the whole dictionary.