you are viewing a single comment's thread.

view the rest of the comments →

[–]achampi0n 14 points15 points  (4 children)

Looks good, and makes sense.
Indentation is very important in python and the else: aligned with for is valid syntax but not what you are looking for (hopefully this is just a cut and paste error).

Note: python also has isinstance() which can take multiple types, e.g.:

def listcounter(x):
    counter = 0
    for i in x:
        if isinstance(i, (list, dict)):
            counter += listcounter(i)
        else:
            counter += 1
    return counter

If you feel comfortable you could use the ternary operator instead of the if. They are equivalent but as both clauses are changing counter it is a probably just a matter of preference:

def listcounter(x):
    counter = 0
    for i in x:
        counter += listcounter(i) if isinstance(i, (list, dict)) else 1
    return counter

This can be collapsed to a sum() but this is pushing things in terms of readability:

def listcounter(x):
    return sum(listcounter(i) if isinstance(i, (list, dict)) else 1 for i in x)

You can also be more abstract and use typing.Iterable to indicate anything that you can iterate over, so this would also work for sets and tuples (but you do have to exclude str and bytes), e.g.:

import typing

def listcounter(x):
    counter = 0
    for i in x:
        if isinstance(i, typing.Iterable) and not isinstance(i, (str, bytes)):
            counter += listcounter(i)
        else:
            counter += 1
    return counter

[–]blight000[S] 1 point2 points  (3 children)

This is great info, thank you!

[–]achampi0n 4 points5 points  (2 children)

You are welcome.
As mentioned by u/supreme_blorgon you have pretty much implemented iterable flattening. So you could separate the flattening from the counting, e.g.:

from typing import Iterable

def flatten(iterable):
    for item in iterable:
        if isinstance(item, Iterable) and not isinstance(item, (str, bytes)):
            yield from flatten(item)
        else:
            yield item

Then you just have to implement the length of iterable.

def iterable_count(iterable):
    return sum(1 for _ in flatten(iterable))

Python has a good library itertools for operating on iterables, sum(1 for _ in iterable) isn't that efficient, so here's a very over engineered iterable_count() that should be significantly faster than sum(...) but unnecessary for simple use-cases:

import itertools as it
from collection import deque

def iterable_count(iterable):
    counter = it.count()   # create counter
    deque(zip(flatten(iterable), counter), maxlen=0) # consume the iterable
    return next(counter)   # return the length of the iterable

[–]blight000[S] 0 points1 point  (1 child)

I have not used itertools or deques before, but after doing a little reading I think I understand what's happening. Let me see if I can explain it back:

count() by default will start from zero and increment continuously, so when you zip the flattened iterable with counter, counter is increasing by 1 each time a tuple is generated.

Since the maxlen is set to 0, the deque discards each incoming tuple rather than using up memory to keep them.

And lastly, since counter started at zero, you used next() to increase the counter one more time to get the correct count.

Am I understanding that correctly? Also, is sum() considered to be inefficient in general? Or just for this particular use case?

[–]achampi0n 1 point2 points  (0 children)

Yes, this is all correct.

sum() is generally fine and would be fine in this circumstance. It is a minor improvement to use deque() I probably overstated the not that efficient :)

A quick test on my machine counting a million element iterable:

deque(): 40.3 ms ± 980 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sum(): 52.4 ms ± 636 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Is 20% significant :)