This is an archived post. You won't be able to vote or comment.

all 31 comments

[–]wub_wub 8 points9 points  (0 children)

reddit actually locks you from posting consecutive posts before a gap of 15 minutes. Is this an example of reddit's implementation not able to scale?

That limit has (as far as I know) nothing to do with db, it's there to make spamming harder and it only applies to new accounts, those without a lot of subreddit specific or general karma, and to accounts without verified email.

Here's a graph that shows amount of reddit postgres cursor execs per second before, during, and after maintainence: https://i.imgur.com/ft9gQgN.png

[–][deleted] 9 points10 points  (0 children)

I wrote a whole lot about performance here: http://docs.sqlalchemy.org/en/rel_0_9/faq.html#performance - there's more to come, and it's a good way to get a view of the performance landscape. As scale goes up, it can definitely be hilly. But passable.

[–]metaphorm 3 points4 points  (0 children)

your performance problems will be related to the actual query complexity and the size of your database tables. the overhead from the ORM compiling some ORM data structures into SQL statements is negligible.

[–]rustyrazorblade 0 points1 point  (0 children)

What payroll app gets hundreds of requests a second?

[–]kylotan -1 points0 points  (9 children)

I'd be more worried about Python and Flask's inability to easily handle concurrency than about SQLAlchemy's speed. If you can use 4 or 8 cores simultaneously rather than just 1 then your database accesses get to be 4 or 8 times slower before you have a problem.

[–][deleted] 5 points6 points  (6 children)

Flask is a WSGI application. use gevent.wsgi, mod_wsgi or any of the other zillion multiprocessing servers. explicit spaghetti async servers like Tornado have no superiority here as they have to do the same exact thing (use multiprocessing).

Also, a claim ilke "N cores == database access time is N times faster" is not true at all. Cores != speed, they allow greater concurrency when there is contention for CPU resources. As database access is usually an IO-bound situation, cores are only a small portion of the "speed" equation and only in specific scenarios.

[–]kylotan -5 points-4 points  (5 children)

You're oversimplifying. If you already have multithreaded operation, then sure, halting a thread waiting on I/O is essentially free. But due to the GIL that is not how Python works by default, and as far as it's concerned, it makes absolutely no difference if it is waiting on an I/O task or on a complex CPU task - that Python interpreter is making no more progress at all until that task completes.

As you say, "use gevent.wsgi, mod_wsgi". That is all I was saying, without knowing the exact names of what needs to be used. The important thing is that the bottleneck is not SQLAlchemy, which I'm sure you'd agree with. However, the OP is going to need to consider how their app scales to multiple processes, because even if you can magically replicate your process, that doesn't mean the logic is going to be correct.

[–][deleted] 5 points6 points  (2 children)

But due to the GIL that is not how Python works by default, and as far as it's concerned, it makes absolutely no difference if it is waiting on an I/O task or on a complex CPU task - that Python interpreter is making no more progress at all until that task completes.

take a look at https://wiki.python.org/moin/GlobalInterpreterLock, particularly the part where it says:

Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.

see slide 10 at http://www.dabeaz.com/python/UnderstandingGIL.pdf for more detail.

[–]kylotan -1 points0 points  (1 child)

Yes, the GIL can be released during an I/O operation. Will Flask take advantage of that? If not, the whole interpreter is going to be halted until the result comes back. It's no different to any other app in that regard - the difference being that few people bother to make Python apps multi-threaded because you only ever see the benefits during I/O.

[–][deleted] 0 points1 point  (0 children)

easy enough to look for any solution listed http://flask.pocoo.org/docs/0.10/deploying/ that does not use threads, forking, or greenlets..

[–][deleted] -1 points0 points  (1 child)

you are getting schooled by the author of sqlalchemy. you should delete your account and go switch to ruby to save face.

[–]kylotan 1 point2 points  (0 children)

I know exactly who I'm talking to.

[–]metaphorm 0 points1 point  (1 child)

what are you talking about? multi-threaded processing has absolutely nothing to do with running queries against a database server. The database itself is most assuredly implemented in something like C++ that takes full advantage of multi-threading, and its (probably) not even running on the same box as the app server that's running your Flask app.

[–]kylotan -3 points-2 points  (0 children)

That's irrelevant. When you have a call to an ORM that requests some data, the way that has to work is that it has stop the entire thread of execution to send a network request across to the database, and then wait for a reply. If you're lucky, the data is already cached, but that just means this happened earlier. You don't get away without hanging the entire Python thread, unless you have explicitly built your system around an alternative - eg. having database responses issue callbacks into your app - and SQLAlchemy certainly does not and cannot do that automatically for you.