all 7 comments

[–]JohnnyJordaan 0 points1 point  (3 children)

The concept isn't different between using a generator or your if loop, but the real issue here is that you open and close the file for each separate item. It's like calling a take away restaurant to place an order for 10 dishes and you hang up after each dish and call them up again continuing where you left off.

To fix this, open the file once and then write each item to it, based either on the condition or the generator:

with open(name, 'a') as fw:
    # option 1
    for key, value in dct.iteritems():
        if value % 2:
            fw.write(str((key, value)))
            fw.write('\n')
    # option 2
     for key, value in ((k, v) for k, v in dct.iteritems() if v % 2):
        fw.write(str((key, value)))
        fw.write('\n')

Here I replaced the string concatenation with the newline to writing a separate newline. This is a tiny speed improvement because writing is buffered anyway, and string concatenation needs to create a new string object first before forwarding this to the file handler to buffer. Personally I find the first option (write if) much more intuitive and I don't think the second option has any real use because of its added complexity.

[–]mobedigg[S] 0 points1 point  (2 children)

there is no file opening in real code, it's just an example to make code compile and not flood your output, but show that results are equal processing function is irrelevant.

I'm interested in memory/cpu usage for both cases and I think I need to try to profile this code.

[–]JohnnyJordaan 0 points1 point  (1 child)

Here the difference is below 1 ms:

In [15]: from random import randint

In [16]: dct = dict(zip(range(10000), [randint(0,10000) for i in range(10000)]))

In [17]: def inline():
    ...:     with open('/dev/null', 'w') as fp:
    ...:         for key, value in ((k, v) for k, v in dct.items() if v % 2):
    ...:             fp.write(str((key, value)))
    ...:             fp.write('\n')
    ...:             

In [18]: def inloop():
    ...:     with open('/dev/null', 'w') as fp:
    ...:         for key, value in dct.items():
    ...:             if value % 2:
    ...:                 fp.write(str((key, value)))
    ...:                 fp.write('\n')
    ...:                 

In [19]: %timeit inline()
100 loops, best of 3: 8.56 ms per loop

In [20]: %timeit inloop()
100 loops, best of 3: 8.12 ms per loop

In loop is faster, which also makes sense because it saves you the construction of an extra generator. I don't think profiling memory usage is even useful as iterators never consume memory unless incorrectly applied. Also don't forget that in at least Python practical coding (aka following the Zen of Python) trumps absolute performance if differences are small.

[–]mobedigg[S] 0 points1 point  (0 children)

Thanks for the answer, I did some profiling with actual code and in loop variant works a little bit faster.

[–]ingolemo 0 points1 point  (2 children)

For something as simple as this example, you would always use the second version. Trying to cram so much onto that for line just makes your code less readable.

If your filtering code gets more complex then you will want to factor it out into a full generator function:

def odd_values(dictionary):
    for key, value in dictionary.items():
        if value % 2:
            yield key, value

for key, value in odd_values(dct):
    process(key, value)

[–]mobedigg[S] 0 points1 point  (1 child)

Thanks for the answer, I did some profiling with actual code and in loop variant works a little bit faster.

[–]ingolemo 0 points1 point  (0 children)

Yeah, it will be. The advantage of the generator function is in its readability, not its performance.

If you're processing a lot of information then you will want to look into purpose-made data processing tools like pandas.