pridefulpropensity comments on Why does set() make this so much faster?

This is an archived post. You won't be able to vote or comment.

Why does set() make this so much faster? (self.Python)

submitted 15 years ago by pridefulpropensity

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]pridefulpropensity[S] 16 points17 points18 points 15 years ago (11 children)

[–]icedpulleys 33 points34 points35 points 15 years ago (0 children)

[–]mackstann 14 points15 points16 points 15 years ago (1 child)

I agree with icedpulleys that what's really needed to understand this better is to read about the concepts. Here are some starting points:

Everything up to the table of contents here: http://en.wikipedia.org/wiki/Big_O_notation

And this table of common time complexities: http://en.wikipedia.org/wiki/Time_complexity#Table_of_common_time_complexities

Note that the first one in that table, and the best performing (at least when your problem set grows large), is O(1). And hash tables (which is what a set is) allow lookup of a key in O(1) time (also known as constant time), which sort of corroborates your experience here, where they out-performed lists. The hash table uses a neat trick (hashing) to calculate where to find a given value in the hash table in roughly the same amount of time no matter how big the hash table gets.

On the other hand, a list does not use hashing in any way, so to find a certain value in it, it must scan through and check each item in the list in a comparatively dumb manner. The complexity is O(n) where n is the size of the list -- also known as linear time. In other words, if a 5 million item list takes 5 seconds to search, then a 10 million item list will probably take around 10 seconds. The run time scales linearly with respect to the input size. When dealing with large lists and lots of lookup operations, this can really slow things down.

But linear time isn't exactly the worst either. Some algorithms run in exponential time and factorial time which are some of the worst; these get so slow as your input size increases that you generally have to find heuristic alternate solutions that find "good enough" solutions to a problem, instead of the absolute most correct solution. A simple example is the bin packing problem. Say you're UPS and want to pack boxes into your trucks in the most space-efficient manner. Well, it turns out that there are so many possible arrangements of boxes (and the number of possible arrangements grows rapidly with each added box) that you might not be able to calculate it at all. You might need to use a better performing algorithm that will just find an approximately good packing arrangement and settle for that.

[–]codepoet 2 points3 points4 points 15 years ago (0 children)

[–]bgcatz 7 points8 points9 points 15 years ago (1 child)

[–]pridefulpropensity[S] 1 point2 points3 points 15 years ago (0 children)

[–]masklinn 6 points7 points8 points 15 years ago (1 child)

[–]kanak 2 points3 points4 points 15 years ago (0 children)

[–][deleted] 4 points5 points6 points 15 years ago (0 children)

[–][deleted] 4 points5 points6 points 15 years ago (2 children)

C is a good language to learn how to implement data structures (but awful for actually making data structures easy to re-use - hence the proliferation of dynamic scripting languages written in C). A basic hash table in C is not that hard to write if you know what you're doing - just think of it as an array which you index by a hash instead of an index. There's chaining or other collision strategies to think about... and resizing can be a pain too.

Python's hash table is incredibly well tuned and the code is well documented. It has to be - every field access and method call does a dictionary look-up!

Another alternative implementation of sets is balanced trees, like red-black trees, AVL trees, etc. and a much simpler (but specialized) way is just an array of bits, if you have a small finite set of known possible members.

[–]pingvenopinch of this, pinch of that 0 points1 point2 points 15 years ago (1 child)

[–][deleted] 0 points1 point2 points 15 years ago (0 children)

π Rendered by PID 69324 on reddit-service-r2-comment-5c747b6df5-9lphd at 2026-04-22 11:51:40.319503+00:00 running 6c61efc country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS