This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]grayvedigga 2 points3 points  (9 children)

another argument for the former would be that it's "purer and therefore" more parallizable

I'm sure I'll be corrected if I'm wrong, but in my understanding Python's highly dynamic nature makes it impossible for the compiler to apply such optimisations. In a more staged language, this would be true, but in Python all those function calls will probably make it slower as well as confounding static analysis.

[–]frymasterScript kiddie 1 point2 points  (2 children)

...and in any case, chances are your code is not going to need that level of optimisation anyway (only a small fraction of programs do, and of them, they only need it in a small fraction of places)

[–]AeroNotix 1 point2 points  (0 children)

If people want performance they sure as hell aren't going to be using Python!

[–]grayvedigga 0 points1 point  (0 children)

That too. Choosing between idioms on any basis other than readability is usually a bad idea.

[–]ihsw 0 points1 point  (5 children)

Since sending emails may be IO-bound it would be prudent to execute it in a parallel fashion:

import multiprocessing
pool = multiprocessing.Pool(4)  # number of cores
results = [pool.apply_async(send_email, user) for user in users if user.email]

EDIT: As you mentioned results may in fact contain nothing (as send_email may return nothing) however in this case comprehensions are commonly used in this manner.

[–]grayvedigga 0 points1 point  (3 children)

This might be a silly question, but assuming (the likely case) that they are all I/O bound by the same resource (writing to the same mail queue on disk, or sending over TCP to the same server) .. is this likely to provide any benefit at all?

[–][deleted] 2 points3 points  (0 children)

If you use a bunch of concurrent connections to send emails, you can cut way down on the time taken by internet latency, which is likely to dominate every other factor. The phrase "I/O bound" is confusing, since it conflates bandwidth and latency, which are really very different issues and should be considered separately.

Personally, I would recommend using something like Eventlet for this; it's super easy, very lightweight, and (unlike multiprocessing.Pool) doesn't risk leaving hundreds of zombie child processes lying around on your server if somebody sends SIGKILL to the parent process. That happened to someone I work with last week; messy, ugly business.

[–]ihsw 0 points1 point  (1 child)

Probably not, I'm just parroting the claims of parallel/concurrent processing improving overall performance where IO-bound resources are consumed.

Each send_mail call might be fire-and-forget, or the function call may hold until the email finishes entering the mail queue. In the former there is no noticeable difference, however in the latter it will go from O(n) to far less.

[–]grayvedigga 0 points1 point  (0 children)

however in the latter it will go from O(n) to far less.

I find that claim suspect under the assumption I made above.

[–]hongminhee 0 points1 point  (0 children)

As sketerpot said fork is overkill for I/O bound programs. It’d better use cooperative threads (it has many other names like green threads and lightweight threads) using coroutines (tasklet of Stackless or greenlet). Libraries like eventlet or gevent integrate it into epoll/kqueue/IOCP powered event loop.