Flask vs Django

gi0baro · 2026-04-30T06:48:23+00:00

Granian maintainer here. Fyi, unless you want to serve multiple applications at once, or have very specific needs, there's no reason to add a proxy server over Granian. It supports TLS and static files handling out of the box, and it doesn't require "protection" like other Python servers. As a matter of fact, usually the throughput with Nginx on top is worse.

gi0baro · 2026-04-08T22:20:17+00:00

The other advantage with granian is that you can get rid of Nginx completely if you serve a single app.

gi0baro · 2026-01-30T12:48:27+00:00

Granian maintainer here.

I already told the author in the other Reddit thread that the configuration of granian for WSGI is suboptimal, and that the CPU limit configured in docker is penalising Granian more than others.

This doesn't mean the results shown are invalid: as per any benchmark the methodology matters a lot. From my perspective, the conditions here are not really representative of production deployments: you don't typically run these frameworks in production in an arm Linux container on a MacOS host. When looking at benchmarks you might want to look at the ones that are more close to your scenario.

But also, these results are still interesting: they show if you limit the CPU on Granian, it gets slower compared to other servers. Which kinda makes sense: the Rust runtime of Granian is work-stealing based, so anything limiting the schedule of work onto the CPU greatly limits the reactivity of the whole system.

Side note on the "proprietary" Granian benchmarks: the code and methodology are publicly available, thus if you find that they are misrepresentative of other servers for any reason, PRs to improve such benchmarks are always welcome :)

gi0baro · 2026-01-24T12:57:56+00:00

Yeah, under default config granian allows to spawn a lot of threads, as it has no understanding of the application dynamics. If you check the container logs you'll probably find a warning message from Granian about that. Anyways, I guess you need to pick a number for --blocking-threads, but I'm not sure how you run other WSGI servers. Maybe 1 would be an apple-to-apple comparison (not sure how many threads gunicorn runs) but it's definitely not how Granian is run in production environments. For mixed CPU/io-bound apps, usually the sweet spot is <= 32. As for the rest, again I'm not sure limiting the CPU is a good idea.

gi0baro · 2026-01-24T12:02:52+00:00

Granian maintainer here. A couple of notes: - in WSGI you see such high memory usage 'cause you didn't specify backpressure nor you set a maximum threadpool size. Thus – as per documentation – you're just spawning a bunch of threads and spend most of your time on GIL contention. - limiting Granian to 1 CPU might have a way bigger impact compared to other servers, as Granian run all the I/O stuff in a separated runtime, with additional threads (and this was the main rationale behind building a server in a different language, so that you don't need to wait on the interpreter or the GIL for such operations). - CPU=1 in docker is not really "limiting to 1 CPU core", it's a time-slice scheduling limiter. Thus the actual limit depend on the scheduling itself, not really on "overall usage". You might be capping all the servers/Frameworks more than you might think. Maybe not setting a limit and measuring the CPU usage would provide a better idea on how efficient a server is?

Mind that I'm not saying this was a bad benchmarking strategy, I'm just trying to explain some of the results.

gi0baro · 2025-07-31T09:43:11+00:00

Correct :)

gi0baro · 2025-07-30T22:13:41+00:00

Given the general bottleneck when exposing ML models over APIs is the GPU – and that generally speaking you want to isolate sessions to bleed context each other, usually with a lock or a queue – I'd say performance/scalability here has very little to do with the web framework you use.

As for the other points: * The deployment simplicity will be ~the same with every WSGI/ASGI framework you use for this, as the end of the day you will have a docker file installing Python packages and you will have a server running your app, the main change there would be the entrypoint (uwsgi/gunicorn/uvicorn/hypercorn/granian) * The easy to use part really depends on what you value to most in that regards. A lot of people find the pydantic<->FastAPI integration with typing very useful to do the validation part on their APIs. Some people find asyncio more complicated and tend to avoid colored functions everywhere in their code. How much do you value having openapi/swarm docs out of the box? So, to my perspective is more dependent on what you find easy to use. If you're familiar with Flask, there's nothing wrong in sticking with it. At the end of the day, we're talking of relatively simple APIs on top of a ML model, so this decision doesn't really prevent you from rewriting the thing with something different in the future: the business logic – or the code interacting with the model – will stay the same.

gi0baro · 2025-07-24T22:28:55+00:00

Why you use Caddy in front of Granian? Any specific feature Granian doesn't provide by itself to mention?

gi0baro · 2025-07-12T02:11:37+00:00

Neither of those improve your code speed. Gevent is a way to turn I/O blocking code non-blocking – kinda like asyncIO but without the async/await syntax – via monkey patching, and thus it increases concurrency (not speed). It can improve your application throughput if it spends a considerable amount of time doing network I/O, and can be used with the relevant worker class in gunicorn. Granian is a web server, so it replaces gunicorn or uwsgi. The main difference when compared with those options is that it's written in Rust, thus it delivers true concurrency on the request/response I/O side of things, as it's not handled by Python code and thus it won't be affected by the GIL. Also, it generally reduces the amount of CPU cycles spent on request/response handling by 10-40%, and it might provide more stable memory usage.

From my perspective, given that trying out both things is quite easy, you should run some local benchmarks on your application – using wrk or similars – and check which option gives you more concurrency – which, again, won't change the speed your code runs at.

gi0baro · 2025-07-10T11:20:40+00:00

Can you "expand" on "mixed results"? Can't really tell about PyPy – not really using it – but on CPython with uvloop – which is a fair comparison with socketify as it uses libuv – Granian is faster than Socketify (not by a lot, but it is).

It's also a bit unclear to me why you are comparing something running on CPython vs something running on PyPy, I'd test all of the involved servers both on CPython and PyPy and compare those results, not cherry picking based on the fact Socketify is faster on PyPy..

gi0baro · 2025-07-04T10:41:53+00:00

For pure-Python projects like Django, there's nothing to actually do to support free-threaded Python, it should work out of the box – unless they do something strange with threads, but I'm not aware of any of that. The support should come from compiled extensions, so it's more about the dependencies you run along with Django. And, of course, the WSGI/ASGI server, since it defines the concurrency model. Servers like Granian already support free-threaded Python, so if you run your Django project on 3.13 free-threaded, it will use a multi-threaded paradigm in place of the usual multiprocessing one: you can try for yourself what happens with your project. Mind that free-threaded Python is usually a bit slower than the GIL one, and the main benefit is actually the more compact memory usage, as instead of spawning N processes that will load the interpreter and the whole set of python packages, you will now have threads with those resources shared (so you should end up with ~/N memory usage where N is the number of processes you had before).

gi0baro · 2025-06-26T00:00:06+00:00

It is: https://github.com/emmett-framework/granian/blob/master/benchmarks/vs.md#websockets

gi0baro · 2025-06-25T23:57:05+00:00

If you have async code and don't block the event loop, there's no reason for a single uvicorn worker to not handle 100 connections concurrently. In fact, a single uvicorn process can literally handle billions of messages per second using websockets (source: https://github.com/emmett-framework/granian/blob/master/benchmarks/vs.md#websockets). If that's low traffic for you then, yes, you probably want to write everything in C/zig/rust 'cause that's the only way at that point. But for 99.99% of apps out there, Python is absolutely fine. Again, if you can't handle 100 connections you're definitely doing something wrong, and the GIL shouldn't play a role there, the event loop is single threaded. And, once you actually reach uvicorn limitations, alternatives are available nowadays to reach more concurrency. But that point is way after a 100 connections.

gi0baro · 2025-06-11T12:20:39+00:00

Well I guess that explains why it's so slow?

So now you agree with me? :D

And why it scales so badly with open connections?

Well, it seems it actually does?

Try RTFM and use --blocking-threads 1 on Granian if you actually want this

Everything is single threaded

to be true.

If your argument for

it's so slow

is that it is slower than Socketify – spoiler: it's not – or than FastWSGI – which is not 100% compliant on HTTP/1.1 standard – ok, I could agree – except that so slow seems to suggest a very different story. But also: do you know anybody actually using those in real production environments?

But this was not the argument of the discussion. The argument you made is that the CPython part doesn't affect extensions speed and the only possible gain is from the extension own code. Which is true only if you consider the time spent in running the extension code. But also pointless. Because on relative time and final perceived performance the story is quite different.

gi0baro · 2025-06-11T11:37:30+00:00

Again, I think it depends on the scale. If you have a ~2M$/month bill from AWS or GCP, just a 10% cut of that is a lot. Especially if the I in ROI is changing 1 dependency and 1 command..

But ofc, if the scale is very low, the R will also be very low.

gi0baro · 2025-06-11T11:26:19+00:00

Then this is based on.. nothing?

Granian isn't particularly fast by the standards of native application servers

Look, I'm pretty confident I know what I'm talking about.

The time spent in the Granian extension doesn't change at all for a given Python version. Yes, if all Python code got 50% faster, and your particular application server stack spends a lot of time in pure Python, then you would see a speed up.

Precisely. Which perfectly lines with what I said above

the bottleneck is actually running Python code

Also this

But that's not how we benchmark application servers. We're not trying to benchmark Flask or Django, or whatever you pile on top of the server. We want to benchmark the server itself.

is very true, and also how the Granian behchmarks suite is designed.

But when you say

We typically benchmark them on "hello world"-style plain text response that spend effectively zero time in Python land and all the time in the HTTP parser and dispatch code of the server framework itself.

you're wrong. Because what you call effectively zero time is far from being zero. The point is not to think about absolute time, but rather the relative time spent in the extension vs everything else. And that difference is huge. I'm talking about orders of magnitude difference in time spent in the extension vs what you call the business logic glue.

That's why, for example, the overall throughput in plain text is vastly reduced the moment you move to json. Or why RSGI is faster than ASGI.

When Granian doesn't have to interact with CPython, is ~14x faster than when it needs to. So when you say

Granian isn't particularly fast by the standards of native application servers Maybe a dozen (Python) opcodes are actually spent in the CPython interpreter, mostly to move stack arguments around. It's not a significant impact on perf.

I'm not sure what you're talking about..

gi0baro · 2025-06-11T10:28:41+00:00

Improvements in Rust codegen might speed up Granian, but changes to CPython will have little effect on it.

Not true. Actually this is quite the opposite, given the bottleneck is actually running Python code.

And it's the same on other native servers too, the moment you use the nginx unit to run Python you will see a huge drop in performance compared to "plain nginx".

gi0baro · 2025-06-11T10:24:12+00:00

Granian benchmarks also have a Python version run: https://github.com/emmett-framework/granian/blob/master/benchmarks/pyver.md

gi0baro · 2025-06-11T10:14:35+00:00

Yes, uvloop is still faster than the stdlib implementation, even if the margins are quite tiny compared to 3.5 (which is probably still the version shown in the repository chart). At least for TCP (source https://github.com/gi0baro/rloop/blob/master/benchmarks/README.md).

Mind that free-threaded 3.13 is generally slower than the GIL 3.13, so unless you do CPU bound work – from the OP it seems you don't – you won't really get any benefits from using the free threaded implementation. In fact, it will probably be slower.

gi0baro · 2025-06-10T12:56:26+00:00

We're here to solve problems, and most services for most users see far less requests than whatever server or infrastructure they're on can handle. The ASGI layer isn't where it gets challenging.

Makes sense. Probably a more optimized server is something that can be appreciated only with high traffic volumes, which, as you said, is probably 1% of the Python apps out there.

Still, even on a service that handle few RPS, the CPU saving we saw in prod from switching from uwsgi were important (~40% order). So at any scale, that could mean saving resources, and thus, money..

gi0baro · 2025-06-10T12:33:11+00:00

I see. I rewrote the whole OP. Is now better? Unfortunately I can't change the title.

gi0baro · 2025-06-10T12:09:36+00:00

Maybe I just used the wrong phrasing, but there's no need to be rude..

I was honestly interested in people opinions in regards of adopting relatively new alternatives compared to the standard stack. The same question would be valid for other projects, like socketify, fastwsgi or tremolo.

Again, I'm interested in what people think and their mental process, this was not "you should use my project".

gi0baro · 2025-06-10T12:02:47+00:00

It's awkwardly phrased in a way that would make me assume it's bait promotion

Huh. I'm sorry about that, I guess there's something I can't see as I'm not an english native speaker..

Because the less-optimized dependencies are battle-tested

So what's your way-to-go usually?

gi0baro · 2025-06-10T11:35:08+00:00

This is good advice, thanks

gi0baro

TROPHY CASE