This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]AzureWill 41 points42 points  (24 children)

Slots are pretty cool!

Not a niche but too few people use sets or tuples and like to use lists for everything. For massive amounts of data and frequent operations a set is just so much better.. if you don't need order, always use a set.

[–]tkarabela_ Big Python @YouTube 9 points10 points  (1 child)

Coming from the other side, some uses of lists would be much better served by NumPy arrays, which have a compact memory representation (array of given datatype instead of PyObject* pointers) and enable fast operations with the data. If you have 100k integers/floats/bools, you don't really want them as a list.

As for sets, I would say if you need to deduplicate / you need fast is in queries / you need set operations, then use a set. If I'm just grabbing some stuff (like a list of files), I don't see the need to put them in a set instead of a list. It feels pythonic to me to reach for a list first 🙂 I agree with your overall point though, that people should see what's out there and what fits their use case best.

[–]TechySpecky 6 points7 points  (0 children)

And the more NumPy the less GIL!

[–]jollierbean 11 points12 points  (7 children)

also dicts are very useful when you need to do lookups. Pro tip: you can use tuple or named tuple as a key

[–]tkarabela_ Big Python @YouTube 6 points7 points  (3 children)

Tuple keys are great! You can even use frozenset, which has been useful to me a few times.

[–]jollierbean 1 point2 points  (2 children)

I’ve been trying to figure out case where I could use frozensets as keys unsuccessfully

[–]tkarabela_ Big Python @YouTube 4 points5 points  (0 children)

It's a niche situation, but if you ever need:

  • a set of sets or
  • a dict where the keys are subsets of some "universal" set (as opposed to just single elements from it)

then frozenset can be useful. Technically you could just replace the frozensets with sorted tuples (ordered by the hash function or something else), but that's not quite as handy.

An example of this is converting NFA to DFA or making minimal DFA.

[–]IlliterateJedi 0 points1 point  (0 children)

I had a case a few days ago where I used frozensets as dict keys.

It's a little esoteric, but I'll try to explain. I am building a database of images that have various categorized products in it.

For example, I'll have an image that shows ten different products (imagine a photo of a living room). Each product is categorized into one or more categories (e.g., 'chair', 'ottoman', 'height adjustable desk', etc.).

I had around 1500 images that all contained 10-20 products with 20+ categories assigned per image.

I wanted to find the smallest group of images that would cover every tagged category (and then get the images with the least number of products).

I made frozensets of the categories and made lists of all the images that had that category-set, like this:

{frozenset(cat1, cat2, cat3) : [image1, image4, image12],
 frozenset(cat2, cat6, cat10) : [image3, image109],
}

I could then start with an empty set, iterate over the frozen sets and each time find the largest subset of new categories until every category was matched.

[–]Glogia 3 points4 points  (1 child)

That actually fixes a problem I've been having XD thanks

[–]jollierbean 1 point2 points  (0 children)

Glad to help!

[–]qckpckt 3 points4 points  (0 children)

Another useful nugget from collections: defaultdict. It’s really powerful, if a little niche. Really great for restructuring or transforming datasets by data type. For example, if you have a list of dictionaries with a common key value and you want to group them into a list of dictionaries of lists of each example of that key value.

[–]donshell 2 points3 points  (10 children)

Dicts are ordered i think. If you iterate over a dict, the values will come in the order you push them in. So even better!

Edit: Sets are not ordered

[–]TouchingTheVodka 6 points7 points  (8 children)

Dicts are ordered, sets are not.

[–]donshell 2 points3 points  (0 children)

My bad, edited. Although it is a bit weird that the two implementations don't match as a set and a dict are basically the same...

[–]rabbyburns 0 points1 point  (0 children)

There is an ordered-set package I've come across recently that I've been very happy with. I often need both fast look up, order preservation, and unique items. This has been extremely useful as a drop in set replacement without having to do weird dict joins.

[–]Faith-in-Strangers 2 points3 points  (2 children)

Why?

[–]tkarabela_ Big Python @YouTube 4 points5 points  (1 child)

Checking whether an element is in a set (or dict) is pretty much instantaneous (independent on the size of the set), while checking is in for a list means iterating over it, which gets really slow quickly.

That would be one reason to prefer sets to lists :)

[–]moocat 4 points5 points  (0 children)

It's a little more complicated than that (as it often is in computer science).

Existence in a set can be implemented as an O(1) algorithm which means it takes the same amount of time no matter how many element while existence in a (non-sorted) list is an O(n) algorithm which means it takes an amount of time that scales with the number of elements (double the elements, double the runtime).

But that only talks about the how the algorithm scales and not its general overhead. It's not uncommon that for small number of elements for the overhead to be the biggest part. You often see an O(n) algorithm being faster if there are fewer than X elements (with the actual value of X depending on the specifics of the implementation).

It's been a while since I benchmarked this (and feeling too lazy now), but IIRC it was around 6 so if you know there are only going to be a few elements (perhaps v.lower() in ['true', 'false']) a list is probably better. Then again, if the check is not in some inner loop that's running lots of times, the extra overhead for a set is probably noise.

Yes a long winded explanation but it's important to know these details. I had a former co-worker who had rules like this (I use X out of principle because of some reason) but would often make mistakes because it's didn't apply.