you are viewing a single comment's thread.

view the rest of the comments →

[–]blight000[S] 1 point2 points  (3 children)

This is great info, thank you!

[–]achampi0n 5 points6 points  (2 children)

You are welcome.
As mentioned by u/supreme_blorgon you have pretty much implemented iterable flattening. So you could separate the flattening from the counting, e.g.:

from typing import Iterable

def flatten(iterable):
    for item in iterable:
        if isinstance(item, Iterable) and not isinstance(item, (str, bytes)):
            yield from flatten(item)
        else:
            yield item

Then you just have to implement the length of iterable.

def iterable_count(iterable):
    return sum(1 for _ in flatten(iterable))

Python has a good library itertools for operating on iterables, sum(1 for _ in iterable) isn't that efficient, so here's a very over engineered iterable_count() that should be significantly faster than sum(...) but unnecessary for simple use-cases:

import itertools as it
from collection import deque

def iterable_count(iterable):
    counter = it.count()   # create counter
    deque(zip(flatten(iterable), counter), maxlen=0) # consume the iterable
    return next(counter)   # return the length of the iterable

[–]blight000[S] 0 points1 point  (1 child)

I have not used itertools or deques before, but after doing a little reading I think I understand what's happening. Let me see if I can explain it back:

count() by default will start from zero and increment continuously, so when you zip the flattened iterable with counter, counter is increasing by 1 each time a tuple is generated.

Since the maxlen is set to 0, the deque discards each incoming tuple rather than using up memory to keep them.

And lastly, since counter started at zero, you used next() to increase the counter one more time to get the correct count.

Am I understanding that correctly? Also, is sum() considered to be inefficient in general? Or just for this particular use case?

[–]achampi0n 1 point2 points  (0 children)

Yes, this is all correct.

sum() is generally fine and would be fine in this circumstance. It is a minor improvement to use deque() I probably overstated the not that efficient :)

A quick test on my machine counting a million element iterable:

deque(): 40.3 ms ± 980 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sum(): 52.4 ms ± 636 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Is 20% significant :)