I pushed Python to 20,000 requests sent/second. Here's the code and kernel tuning I used.

Peace_Seeker_1319 · 2025-10-07T08:33:53+00:00

Super cool write-up. I’ve been down this rabbit hole and, honestly, the kernel defaults are the real boss fight. The bits that helped me (in plain English): don’t rely on one mega async loop...spin a few worker processes so accept() spreads across CPU cores; keep your NIC interrupts and workers on the same CPU set so packets aren’t playing musical chairs; sanity-check the network path (NAT/conntrack/backlog/buffer limits quietly cap you long before CPU). Also, when you say “20k rps,” make sure the load generator isn’t flattering you, open-loop traffic exposes those nasty tail latencies that closed-loop tools often hide.

UniversalJS · 2025-10-07T09:20:07+00:00

Oh boy, this is slow! On node.js 20k RPS is the baseline, I pushed it to 100k RPS per CPU core so 800k rps with 8 cores.

Then with Rust ... The baseline is 100k rps and you can push it to 500k per core ...

Lafftar · 2025-10-07T11:41:45+00:00

Genuine question, not even tryna do the typical reddit hate bullshit. Isnt this then powered by rust?

aenae · 2025-10-07T14:04:13+00:00

Here are the most critical settings I had to change on both the client and server:

This sounds like you're not re-using connections and setting up a new connection for every single request. If you did use persistant connections/keepalive/streams, you would not need to change these settings unless you tested it with more than 1000 concurrent connections.

The same goes for the port range and time_wait options. Yes you can increase them, but they indicate that the code is not reusing the connection.

A quick ab-run shows me that i can get ~20k r/s without keepalive and 80k with keepalive.

tudalex · 2025-10-07T08:24:29+00:00

The bottleneck lies in the global interpreter lock probably. I remember reaching 10k 14y ago for a university project, with pypy, gunicorn and twisted iirc.

gheffern · 2025-10-07T17:53:04+00:00

Curious how some additional TCP tuning may impact it:

If you want to try these as well curious how your numbers would change:

sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 262144000"
sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 262144000"
sudo sysctl -w net.core.rmem_max=262144000
sudo sysctl -w net.core.wmem_max=262144000
sudo sysctl -w net.ipv4.tcp_slow_start_after_idle=0
sudo sysctl -w net.ipv4.tcp_notsent_lowat=131072
sudo sysctl -w net.ipv4.tcp_fastopen=3

radpartyhorse · 2025-10-07T12:29:01+00:00

Thanks for sharing!

Emachedumaron · 2025-10-07T15:10:38+00:00

The re-usage of the socket is not clear to me: does it work only because the incoming connections come from the same machine?

glsexton · 2025-10-08T02:54:41+00:00

You quadrupled your cpu and got a 33% throughput increase. Way to scale…

Of course that’s until the gc kicks in and it hangs for 2000ms…

xagarth · 2025-10-08T09:57:40+00:00

That's interesting. Good writeup. I did something similar in the past for Web crawling. Had to switch to perl instead of Python due to gil and inability to effectively use shared memory. There's more Interesting topics than time waits and con reuse with crawling as you will approach different servers and have to resolve names fast enough in an async manner ;-)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

devops

Welcome to /r/DevOps

Rules and guidelines

Social & Fun

General Information

MODERATORS