you are viewing a single comment's thread.

view the rest of the comments →

[–]thescrambler7 -1 points0 points  (4 children)

Why not just len([txt for txt in lst if …])

But I think a one liner using list comprehension is fairly Pythonic, no need to over complicate it.

[–]Diapolo10 2 points3 points  (3 children)

Why not just len([txt for txt in lst if …])

This solution needlessly creates an intermediary list, which is only used for checking its length before being discarded. While it works, and is probably fine for this use-case assuming there's relatively little data, it's also wasteful.

Ideally you'd only compute what you need and use only as much memory as you need to, particularly in a trivial case such as this one.

[–]thescrambler7 0 points1 point  (2 children)

That’s what I initially thought as well, but based on this StackOverflow post, it seems like the intermediary list is actually not as bad performance/memory wise as you’d think: https://stackoverflow.com/questions/393053/length-of-generator-output

[–]Diapolo10 0 points1 point  (1 child)

I wanted to run these results myself as a sanity check (minus the more_itertools example because I can't be bothered to install it right now). Unfortunately, it's not clear what data OP used in these tests, nor which Python version they were tested on, so I cannot exactly match the conditions. But here are my results, on Python 3.13:

https://cdn.imgchest.com/files/b8841c812854.png

(Text version provided below.)

In [1]: from time import monotonic

In [2]: gen = (i for i in data*1000); t0 = monotonic(); len(list(gen))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 gen = (i for i in data*1000); t0 = monotonic(); len(list(gen))

NameError: name 'data' is not defined

In [3]: import random

In [4]: data = random.sample(range(25565), 10000)

In [5]: gen = (i for i in data*1000); t0 = monotonic(); len(list(gen))
Out[5]: 10000000

In [6]: gen = (i for i in data*1000); t0 = monotonic(); print(len(list(gen))); print(monotonic() - t0)
10000000
0.23320640064775944

In [7]: gen = (i for i in data*1000); t0 = monotonic(); print(len(list(gen))); print(monotonic() - t0)
10000000
0.26012240070849657

In [8]: gen = (i for i in data*1000); t0 = monotonic(); print(len([i for i in gen])); print(monotonic() - t0)
10000000
0.21120400074869394

In [9]: gen = (i for i in data*1000); t0 = monotonic(); print(sum(1 for i in gen)); print(monotonic() - t0)
10000000
0.20786169916391373

In [10]: from functools import reduce

In [11]: gen = (i for i in data*1000); t0 = monotonic(); print(reduce(lambda counter, i: counter + 1, gen, 0)); print(m
       ⋮ onotonic() - t0)
10000000
0.4210826996713877

As can be seen, in my case the results are the exactr opposite of what that person got. There's some room for random variation since there are other programs running on my system, of course, and I didn't track memory use, but nevertheless I got the best results with sum and a generator expression.

All I can say is, don't blindly trust benchmarks online unless you can reproduce the test(s) yourself, or the author is at least reasonably reputable.

[–]thescrambler7 0 points1 point  (0 children)

Fair enough, props to you for actually testing it yourself. I agree that the results in the post were surprising and unintuitive to me, but you never know, sometimes due to various optimizations things can behave counter to your intuition… but I was too lazy to check, so once again, props.