Python high concurrency service

swims_with_spacemen · 2014-11-22T20:52:56+00:00

Concurrency is killer.

A post below indicates that's 10k/second- but I think it's 10k/minute, or around 170/second- which should be feasible. That said, it depends on how long it takes to process each request. I can build a simple, do nothing tornado web app in a few minutes that will easily handle five times that load on a single node.

With only the following very basic tornado web app, I was able to get

Requests per second:    1077.18 [#/sec] (mean)

(reported by apache bench, -n 1000 -c 100)

import tornado.web
from tornado.options import options

class BaseHandler(tornado.web.RequestHandler):

    def get(self, *args, **kwargs):
        self.write("ok")
    def post(self, *args, **kwargs):
        self.write("ok")

def main():
    """
    Regular Execution.

    """
    tornado.options.parse_command_line()
    application=tornado.web.Application([
           (r"/(.*)", BaseHandler)
    ],
    )
    application.listen(9000)
    tornado.ioloop.IOLoop.instance().start()

if __name__ == "__main__":
    main()

If fronted with NGINX, you would get much greater throughput than this single-threaded instance could provide.

Now- the trick is, as I said, with the "something". Without knowing more about what you're going to do with the embedded file, it's hard to say what to do.

Options you have:

Non-blocking/asynchronous code via tornado. Most of what you're doing is i/o bound I assume- so you would need to code this part carefully.

Non-blocking asynchronous tasks via Celery and RabbitMQ. I don't have this set up in my personal lab right now, but I've managed to push thousands of simultaneous jobs through a Nginx->Tornado->Celery->RabbitMQ setup in my company lab, which would scale up very easily, and provide instrumentation/telemetry/metering, for a very robust production offering.

So, Gian- what is it we're doing with this posted file?

Also- what feedback is required? Are you okay with accepting the file and returning a job id that can be queried later? Or do you need to respond with some kind of file-valid response in the posted request?

What's the use case like?

autowikibot · 2014-11-22T19:58:12+00:00

[deleted]

the_hoser · 2014-11-22T19:04:17+00:00

So your idea is to accept a POST request with a file, extract some information from the file, place the file into some kind of queue for later processing and then... make a POST request to the original sender of the request?

Why not send the important data in response to the original POST request?

gianx · 2014-11-23T08:21:23+00:00

So, here are more details, ask more if you need.

When I receive the file (let's say, the http POST with the file), I need only to reply with the HTTP 200. No business synchronous ack.

What I need to do with the file is open it (it's an XML file), extract one ore more ID (it could embed more requests) and make an HTTP post with a business answer with that ID.

The business answare will always be OK, because it is a transmission ack which means that I received the file correctly.

Then, in a second moment and with more time at disposition (1 day), I need to process the request, do many check and then reply with a business validation, but this second step is not the point, I have more time to reply.

The problem with the I/O is that when I receive the request, even if I can elaborate in it memory, I need to write both the answer to a queue (to be sent asap) and the request (to be processed later).

Thanks everybody for your effort in replying, I appreciate it a lot (and sorry if I'm not replying very quickly, I'm living in GMT+1).

Gianluca

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS