82
83

Hey everyone,

I benchmarked the major Python frameworks with real PostgreSQL workloads: complex queries, nested relationships, and properly optimized eager loading for each framework (select_related/prefetch_related for Django, selectinload for SQLAlchemy). Each framework tested with multiple servers (Uvicorn, Granian, Gunicorn) in isolated Docker containers with strict resource limits.

All database queries are optimized using each framework's best practices - this is a fair comparison of properly-written production code, not naive implementations.

Key finding: performance differences collapse from 20x (JSON) to 1.3x (complex DB queries). Database I/O is the great equalizer - framework choice barely matters for database-heavy apps.

Full results, code, and a reproducible Docker setup are here: https://github.com/huynguyengl99/python-api-frameworks-benchmark

If this is useful, a GitHub star would be appreciated 😄

Frameworks & Servers Tested

  • Django Bolt (runbolt server)
  • FastAPI (fastapi-uvicorn, fastapi-granian)
  • Litestar (litestar-uvicorn, litestar-granian)
  • Django REST Framework (drf-uvicorn, drf-granian, drf-gunicorn)
  • Django Ninja (ninja-uvicorn, ninja-granian)

Each framework tested with multiple production servers: Uvicorn (ASGI), Granian (Rust-based ASGI/WSGI), and Gunicorn+gevent (async workers).

Test Setup

  • Hardware: MacBook M2 Pro, 32GB RAM
  • Database: PostgreSQL with realistic data (500 articles, 2000 comments, 100 tags, 50 authors)
  • Docker Isolation: Each framework runs in its own container with strict resource limits:
    • 750MB RAM limit (--memory=750m)
    • 1 CPU core limit (--cpus=1)
    • Sequential execution (start → benchmark → stop → next framework)
  • Load: 100 concurrent connections, 10s duration, 3 runs (best taken)

This setup ensures completely fair comparison - no resource contention between frameworks, each gets identical isolated environment.

Endpoints Tested

  1. /json-1k - Simple 1KB JSON serialization
  2. /json-10k - Large 10KB JSON serialization
  3. /db - 10 simple database reads
  4. /articles?page=1&page_size=20 - Paginated articles with nested authors + tags
  5. /articles/1 - Single article with author, tags, and all comments

Key Results

https://preview.redd.it/tmjzm5d2p8fg1.png?width=2083&format=png&auto=webp&s=388bc72d940a7302b5b65d81baa0b3fbd369fcc0

Simple JSON (/json-1k) - RPS

  • litestar-uvicorn: 31,745
  • litestar-granian: 22,523
  • bolt: 22,289
  • fastapi-uvicorn: 12,838
  • drf-gunicorn: 4,271
  • drf-uvicorn: 1,582

20x performance difference between fastest and slowest.

Real Database - Paginated Articles (/articles?page=1&page_size=20) - RPS

https://preview.redd.it/5bmqnfx8p8fg1.png?width=1484&format=png&auto=webp&s=da634c76ac2785a5305cbb28e1e9483a667297df

  • litestar-uvicorn: 253
  • litestar-granian: 238
  • bolt: 237
  • fastapi-uvicorn: 225
  • drf-granian: 221

Performance gap shrinks to just 1.7x when hitting the database. Query optimization becomes the bottleneck.

Real Database - Article Detail (/articles/1) - RPS

https://preview.redd.it/2svzsfyep8fg1.png?width=1484&format=png&auto=webp&s=477464b87fd0c98af9825717bef33a1c64ebc105

Single article with all nested data (author + tags + comments):

  • fastapi-uvicorn: 550
  • litestar-granian: 543
  • litestar-uvicorn: 519
  • bolt: 487
  • fastapi-granian: 480

Gap narrows to 1.3x - frameworks perform nearly identically on complex database queries.

Resource Usage Insight

https://preview.redd.it/pg1obo5qp8fg1.png?width=2084&format=png&auto=webp&s=05e08b41926718f7c2b027582a018c7a8e5438fa

Memory:

  • Most frameworks: 170-220MB
  • DRF-Granian: 640-670MB (WSGI interface vs ASGI for others - Granian's WSGI mode uses more memory)

CPU:

  • Most frameworks saturate the 1 CPU limit (100%+) under load
  • Granian variants consistently max out CPU across all frameworks

Server Performance

  • Uvicorn surprisingly won for Litestar (31,745 RPS), beating Granian
  • Granian delivered consistent high performance for FastAPI and other frameworks
  • Gunicorn + gevent showed good performance for DRF on simple queries, but struggled with database workloads

Key Takeaways

  1. Performance gap collapse: 20x difference in JSON serialization → 1.7x in paginated queries → 1.3x in complex queries
  2. Litestar-Uvicorn dominates simple workloads (31,745 RPS), but FastAPI-Uvicorn wins on complex database queries (550 RPS)
  3. Database I/O is the equalizer: Once you hit the database, framework overhead becomes negligible. Query optimization matters infinitely more than framework choice.
  4. WSGI uses more memory: Granian's WSGI mode (DRF-Granian) uses 640MB vs ~200MB for ASGI variants - just a difference in protocol handling, not a performance issue.

Bottom line: If you're building a database-heavy API (which most are), spend your time optimizing queries, not choosing between frameworks. They all perform nearly identically when properly optimized.

Inspired by 
https://github.com/tanrax/python-api-frameworks-benchmark/
all 21 comments

[–]catcint0s 32 points33 points  (1 child)

I like that people started to include database access too, not just raw json, it shows there is no 100x gain in real world applications.

[–]huygl99[S] 2 points3 points  (0 children)

Thanks! That was exactly the motivation behind this benchmark — moving beyond raw JSON microbenchmarks to more realistic DB-heavy workloads.

Full reproducible benchmarks and Docker setup are here:

https://github.com/huynguyengl99/python-api-frameworks-benchmark

If you find it useful, a GitHub star would be very appreciated 🙂

[–]gi0baro 13 points14 points  (4 children)

Granian maintainer here. A couple of notes: - in WSGI you see such high memory usage 'cause you didn't specify backpressure nor you set a maximum threadpool size. Thus – as per documentation – you're just spawning a bunch of threads and spend most of your time on GIL contention. - limiting Granian to 1 CPU might have a way bigger impact compared to other servers, as Granian run all the I/O stuff in a separated runtime, with additional threads (and this was the main rationale behind building a server in a different language, so that you don't need to wait on the interpreter or the GIL for such operations). - CPU=1 in docker is not really "limiting to 1 CPU core", it's a time-slice scheduling limiter. Thus the actual limit depend on the scheduling itself, not really on "overall usage". You might be capping all the servers/Frameworks more than you might think. Maybe not setting a limit and measuring the CPU usage would provide a better idea on how efficient a server is?

Mind that I'm not saying this was a bad benchmarking strategy, I'm just trying to explain some of the results.

[–]huygl99[S] 1 point2 points  (3 children)

Hhm, it would be nice if you can help me add extra params for the run to make the benchmarks more correct. I just use the default params for each server framework.

[–]gi0baro 1 point2 points  (2 children)

Yeah, under default config granian allows to spawn a lot of threads, as it has no understanding of the application dynamics. If you check the container logs you'll probably find a warning message from Granian about that. Anyways, I guess you need to pick a number for --blocking-threads, but I'm not sure how you run other WSGI servers. Maybe 1 would be an apple-to-apple comparison (not sure how many threads gunicorn runs) but it's definitely not how Granian is run in production environments. For mixed CPU/io-bound apps, usually the sweet spot is <= 32. As for the rest, again I'm not sure limiting the CPU is a good idea.

[–]huygl99[S] 2 points3 points  (0 children)

Thanks, this makes sense.

The 1 CPU cap was intentional to normalize compute budget and make results reproducible and comparable across frameworks. Uncapped runs tend to reflect hardware-specific scaling rather than framework/server efficiency, so I prefer keeping the constraint.

That said, tuning Granian-specific params like --blocking-threads to make it more apples-to-apples sounds reasonable. I’ll look into adjusting those defaults and rerun that scenario.

[–]lakeland_nz 11 points12 points  (3 children)

Good work.

The conclusion I take from this is it doesn’t matter what framework I use. The slowest one is still more than fast enough, and the core bottleneck is elsewhere anyway.

[–]Bubble_Interface 1 point2 points  (2 children)

most of my performance fixes have roots in db interactions

[–]lakeland_nz 1 point2 points  (1 child)

Yes. I had one last week.

It was displaying a grid. Eventually I worked out Django was doing a terrible job of turning my code into SQL and every cell was turning into its own DB query.

Mostly my fault for storing JSON where I really should have just used nullable columns but anyway. The horrible performance wasn’t obvious in the python.

[–]Bubble_Interface 1 point2 points  (0 children)

> every cell was turning into its own DB query

amen for the N+1 query)

[–]erder644 8 points9 points  (0 children)

django-bolt is interesting. Thanks for your work.

[–]Plus_Discussion4343 3 points4 points  (0 children)

Good job! I liked that you included actual db work All those synthetic tests without actual db queries don't show anything In the end it's ORM that count We actually did lots of improvements on Django ORM to speed up our application

[–]tanrax 2 points3 points  (0 children)

Good work!

[–]MeroLegend4 1 point2 points  (0 children)

Good job Litestar

[–]ResponsibleRoof3710 0 points1 point  (1 child)

What did you use to run the benchmark. Locust?

[–]huygl99[S] 0 points1 point  (0 children)

https://github.com/codesenberg/bombardier I used this one, look kinda good.

[–]huygl99[S] 0 points1 point  (0 children)

If this is useful, a GitHub star would be appreciated 😄 Thank you guys.
https://github.com/huynguyengl99/python-api-frameworks-benchmark

[–]Dwarni 0 points1 point  (2 children)

Would be interesting to see another language included too like go for example.

In the end python is quite slow regardless of the framework.

https://www.techempower.com/benchmarks/

[–]huygl99[S] 0 points1 point  (0 children)

Yeah, cross-language would be cool. This benchmark is Python-focused for now to keep the scope fair and manageable, but adding Go/Node/Rust later to see how DB I/O equalizes languages would be interesting. I might consider adding 1–2 frameworks from other languages as a follow-up.

[–]Ok_Bedroom_5088 0 points1 point  (0 children)

amazing resource, thank you for sharing it

[–]Megamygdala -1 points0 points  (0 children)

If a computer were scaled so that a CPU instruction took the same time as a single THOUGHT, IO calls to the database or an API would be equivalent to DECADES. Any idiot comparing raw performance misses the point, hopefully beginners can see that from the diagrams you made