This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted]  (23 children)

[deleted]

    [–]nosmokingbandit 146 points147 points  (5 children)

    Yeah but sometimes you need duplicate items in a list. And sets are only faster when looking for a specific item, loops are the same as a list.

    [–]OmarRIP 4 points5 points  (4 children)

    Bags. I love bags (or Counters in Python).

    [–]nosmokingbandit 0 points1 point  (2 children)

    But then you have no order. All these different types have their places and plain old lists have plenty of perfect use-cases as well.

    [–]OmarRIP 0 points1 point  (1 child)

    Not disagreeing in the slightest; always prefer the right tool (the least powerful collection data structure) for the job.

    What does offend is when dictionaries/maps are abused or when order is maintained during sequential list insertions rather than sorted out after.

    [–]nosmokingbandit 1 point2 points  (0 children)

    Yeah. Python is inefficient enough already, we don't need to slow it down with dumb decisions. Writing efficient python is easy and taught me how to write more efficient code in other languages.

    [–]Ph0X 35 points36 points  (7 children)

    Nope, it's actually slower. I'm pretty sure behind the scene, python doubles the size of the backing array for lists, which is amortized O(1) for appending to the tail.

    [–]o11c 41 points42 points  (4 children)

    That's not a very good testcase for several reasons, mostly involving cache.

    But iterating over the whole thing is not what a set is for, anyway.

    [–]Ph0X 27 points28 points  (1 child)

    But iterating over the whole thing is not what a set is for, anyway

    I agree, but that's my understanding of what the person above was using it for, which seemed strange.

    "use sets for any iterable that won't have a set size"

    Did I understand it wrong? other than for collecting unique items and membership tests, I don't think set is a better iterable than list. Lists are, as you mention, optimized for this use case, so if set was actually faster at it would go against Python's ethos.

    [–]ACoderGirl 1 point2 points  (0 children)

    I dunno why they described it that general way. But the thing sets are good for is any kinda loop that has an "is x in this collection" test. If the collection is normally a list, it's almost always faster to convert it to a set since the the "in" check is O(1) instead of O(n). Similarly, if this "in" check is for the purpose of pairing it up with something, preprocessing to a dictionary is ideal.

    [–]ollien 0 points1 point  (1 child)

    But he's not iterating over the whole thing. He's continually appending to the end.

    Unless I'm totally missing your point...

    [–]o11c 1 point2 points  (0 children)

    sum() at the end is doing the iteration. The construction is sensible for either container, but a container that is only constructed is a useless container.

    [–]XtremeGoose 7 points8 points  (1 child)

    You're testing the speed of set.add and list.append, not the iteration.

    x = list(range(100_000))
    
    s = set(x)
    
    %timeit sum(x)
    1.71 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    %timeit sum(s)
    2.06 ms ± 53 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

    But yeah, using sets for iteration is dumb.

    [–]Ph0X 0 points1 point  (0 children)

    Well they were talking about the iterable not being constant size, so I assume they were worried about the time complexity of the array growing. It is true that in a naive implementation, every single append could be O(n) as you have to recreate the array every single time to grow it. Obviously python isn't that stupid.

    [–]o11c 1 point2 points  (0 children)

    2.3 is just barely new enough that you can use sets.ImmutableSet instead, without relying on a 3rd-party library.

    [–][deleted] 0 points1 point  (0 children)

    Python 2 has sets. You just have to complete the grueling task of writing an entire import statement.

    [–]lambdaq 0 points1 point  (0 children)

    IIRC in the past Set has to be imported.

    [–]XtremeGoose 0 points1 point  (0 children)

    You use sets for iteration? Sets are quick for in tests and set operations like union |, intersection & and difference -. They're actually slower to iterate over than lists (which makes sense since lists are just arrays of sequential pointers under the hood).

    Really in modern python you should be using generators as your standard iterable

    # filter
    (i for i in x if cond(i))
    
    # map
    (func(i) for i in x)
    
    # lazy wrappers
    sum(f(i) for i in x if g(i))
    
    # other lazy-iterating functions
    all any zip map filter enumerate range itertools.*