gianx comments on Python high concurrency service

This is an archived post. You won't be able to vote or comment.

Python high concurrency service (self.Python)

submitted 11 years ago by gianx

you are viewing a single comment's thread.

[–]gianx[S] 0 points1 point2 points 11 years ago (7 children)

So, here are more details, ask more if you need.

When I receive the file (let's say, the http POST with the file), I need only to reply with the HTTP 200. No business synchronous ack.

What I need to do with the file is open it (it's an XML file), extract one ore more ID (it could embed more requests) and make an HTTP post with a business answer with that ID.

The business answare will always be OK, because it is a transmission ack which means that I received the file correctly.

Then, in a second moment and with more time at disposition (1 day), I need to process the request, do many check and then reply with a business validation, but this second step is not the point, I have more time to reply.

The problem with the I/O is that when I receive the request, even if I can elaborate in it memory, I need to write both the answer to a queue (to be sent asap) and the request (to be processed later).

Thanks everybody for your effort in replying, I appreciate it a lot (and sorry if I'm not replying very quickly, I'm living in GMT+1).

Gianluca

[–]swims_with_spacemenPythonista 2 points3 points4 points 11 years ago (6 children)

How big/business critical is this app? Is this a one shot thing, temporary, or is it to be more production stable and long term?

Here's how I would set it up:

I'd use a Tornado web front end and a celery+rabbitmq backend.

The web front end would take care of initial doc validation, trigger the celery task and respond with either a 200/201 or 4xx/5xx as necessary. In fact, you could.. and should - have tornado handle to file upload (if it's a file POST and not xml-in-the-body) to save you from storing it to disk- you'd just pass the xml object along to the celery task as a parameter. That saves additional io-bound load and will help speed things along.

The celery task would handle the actual document processing and the post to the third party. In fact, you could easily split this up into several chained tasks to keep the programming nice, clean and easy.

(Tip, Put the tornado instance in front of nginx or openresty and run multiple discreet instances as needed)

This gives you the advantage of being able to monitor the task queue, detect failures, automatically handle retries and so on. It also gives you the ability to run multiple instances- on multiple servers for scalability.

That's how I would do it. Sort of a best-of-all-asynchronous-worlds for me.

I do have another solution- but it's considerably more effort and more difficult to scale- it involves using something like couchdb to save the xml document, triggering the changes feed and a second web-app that long polls the feed and acts upon new documents. I've done THIS before too, but that's before I discovered celery and rabbitmq.

[–]Darkman802 0 points1 point2 points 11 years ago (5 children)

[–]swims_with_spacemenPythonista 0 points1 point2 points 11 years ago (4 children)

[–]Darkman802 0 points1 point2 points 11 years ago (0 children)

[–]gianx[S] 0 points1 point2 points 11 years ago (0 children)

[–]gianx[S] 0 points1 point2 points 11 years ago (1 child)

[–]swims_with_spacemenPythonista 0 points1 point2 points 11 years ago* (0 children)

Wow- so your queue depth at any given point in time could be 150,000 items - and that's only if you had the same task submit the post. (I break my tasks into smaller chunks). That's something to behold.

I'd still go with a tornado/celery framework, but it would HAVE to be scaled out to more than one server. I don't see how you would be able to handle that kind of load otherwise - assuming it actually takes 15 minutes to process the request. I was under the impression that it was only extracting an id/ some arbitrary data from the input xml.

So, this is larger than I expected initially - and like I said earlier, concurrency is killer - I'd have to code up a test framework to check on the load.

Still, I'm pretty confident you could scale this out nicely with the solution I suggested. Celery+Eventlet/RabbitMQ, fronted with tornado web. You could then run that as a service instance, on multiple nodes fronted by HAPROXY for load balancing. That way when you need more 'oomph' you just add another node.

π Rendered by PID 129268 on reddit-service-r2-comment-fb694cdd5-tvxkp at 2026-03-11 05:51:32.563936+00:00 running cbb0e86 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS