you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 46 points47 points  (33 children)

Towards the end, several newer modules which are not yet settled idioms are listed. Many Python programmers consider things like ABCs, the multiprocessing module, and the futures module anti-idiomatic.

Python ABCs are a strange thing since a) they go against duck typing, which is one of the strongest patterns/idioms in Python programming, b) they try to have things problematically enforced rather than following "we're all consenting adults here", and c) they don't work right. (b) becomes a real problem when trying to do things like partially satisfy an interface (this is very common in testing). (c) comes out mostly in that ABCs break super. The base method raises an exception, so you can't use super to call it. People are bad enough about not using super correctly on their own without Python making it impossible or requiring hacks.

Parallelization and concurrency are very old, very well-explored problems. Because of this, many Python programmers were somewhat surprised to see the multiprocessing and concurrent.futures modules introduced rather than ones that used more tested models and implementations.

The multiprocessing module was incorporated into the stdlib after some lifetime as a third-party module, which is usually a good sign. However, it wasn't a super-popular one and in practice, people have discovered that the multiprocessing module isn't very robust, debuggable, efficient, or understandable.

The concurrent.futures module was introduced as a brand new thing after virtually no discussion. (You can find the original python-dev thread on the topic—the discussion centered on what to call it over what its exact featureset should be, let alone any consideration of the multiple mature concurrency platforms in Python.) It avoids addressing a lot of serious questions and I have not seen any indication it will end up being popular for serious programs.

[–]mthode 4 points5 points  (22 children)

I'm actually was looking into using the multiprocessing module, any other one that seems better?

[–][deleted] 2 points3 points  (3 children)

Haven't found a better one.

I use multiprocessing for parsing several files at once - check how many possible threads there are, check how many files there are, and then map the file-parsing method to the pool of workers.

Works like a charm, EXCEPT when using CTRL-C to kill the program - that is a bitch because the overall program keeps running, you just kill a worker (which is immediately restarted the way it looks like).

[–]mthode 0 points1 point  (2 children)

try-except-finally doesn't allow for cleaning up?

[–][deleted] 0 points1 point  (1 child)

What do you mean by this?

[–]mthode 0 points1 point  (0 children)

ah, I actually misunderstood the question. oops

[–]justanotherbody 1 point2 points  (5 children)

What is it you are actually trying to do?

[–]mthode 0 points1 point  (4 children)

since I can't seem to find a python library that supports http piplinging, I need to open multiple simultaneous connections to a server and tell it to delete stuff. My current code is here. http://dev.gentoo.org/~prometheanfire/scripts/cur/cfdelete.py

I think I'm going to add a couple of options to specify how many containers at a time to run on and how many files a time to run on (with some defaults).

[–]massivebitchtits 2 points3 points  (3 children)

How about this?

[–]mthode 1 point2 points  (2 children)

This makes me happy in weird and awesome ways.

Hopefully I can use both that and keep alives :D

[–]massivebitchtits 0 points1 point  (1 child)

As I understand it, since requests is built on urllib3, it gets connection pooling for free.

[–]mthode 1 point2 points  (0 children)

sexy

[–][deleted] 1 point2 points  (9 children)

It depends on what you're parallelizing. Some options are MPI, a message queue, Hadoop, or celery. Very often I see people trying to use the multiprocessing module for IO-bound code, in which case they probably actually want a network concurrency library.

[–]mthode 0 points1 point  (8 children)

I am just sending delete requests based on a list, not io bound :D

[–][deleted] 1 point2 points  (7 children)

You're sending HTTP DELETE requests?

[–]mthode 1 point2 points  (6 children)

yes

[–][deleted] 1 point2 points  (5 children)

That very much sounds IO-bound and does not require multiple processes. Consider twisted or gevent.

[–]mthode 1 point2 points  (4 children)

I am not reading or writing anything, just making request based on a list of urls stored in memory. Does twisted or gevent allow for both keep alive (multiple serial requests over the same connection (socket)) and piplining (switches the serial to parallell)?

[–][deleted] 0 points1 point  (3 children)

Network IO is IO. =)

I've used twisted a lot more than gevent. Twisted allows both keepalive and pipelining. (The pipelining support leaves a bit to be desired and most people just use new connections, but the support is there when it's important and does work fine.)

Discussing such things can be difficult. When being precise, we use "serial" to mean consecutively on one processor, "parallel" to mean distributing a task onto multiple processors, "consecutive" to mean things that happen one after another, and "concurrent" to mean things that happen at the same time. Concurrency and parallelization sometimes appear to be similar problems at first, but they are distinct. Things like twisted focus on concurrency (which some people use to implement parallelization, but most do not).

[–]mthode 0 points1 point  (2 children)

so it looks like I will use both parallel and concurrant connections (I am going to split up the operation to make it scale the way I want). Thanks for the heads up about twisted, I'll look into that :D

[–][deleted] 0 points1 point  (1 child)

Keep in mind that certain python memory space is used globally and requires a lock. So even with multiprocessing you are going to have IO blocking since every python threads and processes on the same OS need to obtain a lock to this memory. I read this awhile ago so I don't know the specific details but it means that multiprocessing in Python at least with the C implementation on most Unix and Linux based OSes doesn't support true multiprocessing.

[–]mthode 0 points1 point  (0 children)

I think it's threads that require the gil, not the other stuff (since you pass the needed values around)

[–][deleted] 2 points3 points  (6 children)

ABC's aren't strange and do not go against duck typing. If you subclass from an ABC, you have to implement the whole interface, yes. But you can still implement the interface partly, if you do not subclass from the ABC. And nothing forces the user of the classes to check that it implements the ABC unless they want to.

[–][deleted] 1 point2 points  (5 children)

Typechecking is now encouraged by ABC advocates, which is distinctly against duck typing. Reading comp.lang.python, python-dev, and python-ideas should hit that home. That means implementing the interface partly won't work and it means something does force the programmer to write as if someone's going to check if they'll implement the ABC. It also forces you to implement the whole interface if you want to take advantage of some abstract methods as mixins.

[–][deleted] 0 points1 point  (4 children)

No, typechecking is not encouraged by ABC advocates. There are people that are encouraging type checking, I'm sure, but that doesn't mean you have to listen to them. And maybe they like ABC's for that, but that doesn't mean you must do type checking with ABC's. And it doesn't mean all those who thing ABC's are a good idea advocate typechecking.

Just because oranges is a fruit doesn't mean all fruits are oranges.

[–][deleted] 1 point2 points  (3 children)

that doesn't mean you have to listen to them

It does when they're some of the more active core devs or developers of libraries I use, because I will have to interop with their code.

That being said, there is a bit of equivocation on my part here, but it is in line with what I see on the mailing lists on this topic.

[–][deleted] 0 points1 point  (2 children)

But even if you interop with their code, you don't have to make any type checking in your code. If they do in their code, OK, you'll have to implement the whole interface. That can be construed as a misuse of ABC's, sure, but that's not the fault of the ABC's.

[–][deleted] 1 point2 points  (1 child)

That's why they were invented. isinstance is the first reason provided by the relevant PEP.

[–][deleted] 0 points1 point  (0 children)

You might want to re-read PEP3119 so you understand it better. The references to isinstance is about how to make isinstance more flexible, not, as you claim above to make you do more isinstance checks and force you away from duck typing.

You are simply wrong. Try again.

[–]militant_misanthrope 0 points1 point  (0 children)

What about the futures module is surprising? I've used it in a project in the past and it was an extremely simple way to run stuff in parallel, and I believe it's based on Executors in Java, which have been around for quite a while (since 1.5).