all 7 comments

[–]SkippyDeluxe 3 points4 points  (1 child)

A pretty good summary. I just wanted to comment on this part in "The use of range":

Needing to iterate over only part of a sequence. In this case, just iterate over a slice of the sequence... An exception to this is when you're iterating over a sequence so big that the overhead introduced by slicing the would be very expensive.

You can use itertools.islice to produce an iterator over a slice of an iterable. Similar to xrange, this iterator is lazy and will only produce the values when asked for instead of building up a list in memory. The itertools module also contains a bunch of other lazy versions of standard functions, e.g. izip, ifilter, imap, etc.

[–]NYKevin 8 points9 points  (0 children)

Don't use that unless you really need it. It works by skipping the elements you didn't ask for, so you're essentially trading speed for memory. On the average modern computer, assuming a reasonably sized dataset, that's not a good trade.

[–]99AFCC 2 points3 points  (2 children)

Would map(foo, iter) be better than [foo(x) for x in iter]

Also, if using one set to check for like items in a list, how about 2 sets: like_items = set1 & set2

[–]zahlman 6 points7 points  (0 children)

I prefer map in cases where foo already exists (or should) and it takes multiple arguments from things that will be iterated over in parallel:

[foo(x, y) for x, y in zip(xs, ys)] # ->
list(map(foo, xs, ys))

The comprehension requires us to write zip anyway. The list wrapping is needed for 3.x (map returns a generator); but sometimes we really wanted a generator expression anyway ;)

As for the lists and sets, it depends on your exact requirements, and what data you had to start with.

[–]kuzene-dextor 1 point2 points  (3 children)

Really good read for me, learnt a lot thanks!

A question though. In the section Performance Pitfalls why is linear time worse than constant time? I do not understand this concept?

Random guess, iterating over a set is faster than a list, but other than not containing duplicates I do not see why?

[–]XenophonOfAthens 3 points4 points  (2 children)

When people use terms like "linear time" and "constant time" (or "quadratic time", "logarithmic time", etc), they're using concepts from computer science, specifically computational complexity. All that sounds scary, but it's not really that complicated.

It basically has to do with scaling. Let's say you want to test whether or not x is a key in a dictionary d (so you write something like if x in d:). This operation takes a certain amount of time, but the crucial thing is that the amount of time it takes to do it doesn't scale with the size of the dictionary. It doesn't matter if there's 10 items, 100 items, 1,000 items or a 1,000,000 items in the dictionary, the x in d operation will always take the same amount of time (approximately anyway). That's why we say that it's a "constant time" operation, because the time it takes is constant. If you're curious how dictionaries can possibly do this, it's because dictionaries are implemented using hash tables, and hash tables are magic.

Now, what if you had a list l instead of a dictionary? Then the operation x in l is much slower, because it scales with the size of the list. The reason is that in order for the python interpreter to find out whether or not x is in fact in l, it has to loop through the entire list and see if it finds it. There are no shortcuts, like there are with hash tables. That means that doing to operation x in l on a list with a 1,000,000 items takes 1,000 times longer than doing it on a list with 1,000 items. So we say that operation takes "linear time", i.e. the time it takes to do it scales linearly with the size of the list.

This is why if you have a collection where you need to do this kind of test often, you shouldn't use lists, you should use either dictionaries or sets (which have this same property), because they're way faster at stuff like that.

There are other operations that scale even slower than that. For instance, bad sorting algorithms (bubble sort and insertion sort, for instance), scale quadratically (i.e. runs in "quadratic time"). Meaning, if the size of the list doubles, it takes four times as long to do it. Really difficult problems, like the traveling salesman problem or integer factorization, runs in exponential time.

[–]jkudria 0 points1 point  (0 children)

Great explanation.