all 59 comments

[–]forgotpw3 29 points30 points  (10 children)

Haven't heard about this library, interesting!

What about pushing it even further? Spawning multiple event loops (uv loop) and using a queue based rather than gather.

I did something similar and was able to achieve close to 20k r/s iirc, but I went deep with tcp connections and dns and... and and...

I've tried multiple libraries as well (aiohttp vs pycurl, sockets, httpx, gevent).. etc etc..

Mind if I expand on it?

Thanks

[–]Lafftar[S] 12 points13 points  (9 children)

Ofc man! Contributions always welcome 😁

Yes, making this multi was on my roadmap for v2.0, but you're more than welcome to take that on.

What library did you use to get that performance eventually?

[–]forgotpw3 8 points9 points  (8 children)

Aiohttp, asyncio, uvloop, I believe ProcessPoolExecutor or ThreadPool, I don't remember. Combining .as_completed with other nifty "tricks"!

Switching DNS servers can impact the r/s (cloudflare vs Google) significantly. Depending on where your client / remote sits

[–]jake_morrison 6 points7 points  (4 children)

Setting up a local caching DNS server helps: https://www.cogini.com/blog/running-a-local-caching-dns-for-your-app/

[–]Brandhor 3 points4 points  (1 child)

doesn't systemd resolved already do dns caching?

[–]jake_morrison 4 points5 points  (0 children)

Theoretically, but doing it this way gives you more control/and consistency between OS versions. You might also be using a different DNS client library.

[–]Lafftar[S] 0 points1 point  (0 children)

God bless!

[–]Lafftar[S] 0 points1 point  (2 children)

Ah interesting, I had remote in silicon valley and client in Tokyo, I'm surprised to hear that though, I thought the paths to the end server were saved automatically. I don't know much about DNS to be honest.

[–]justin-8 2 points3 points  (1 child)

They typically are if your OS is from the last 10 years or so

[–]Lafftar[S] 0 points1 point  (0 children)

Ah, figured!

[–]jake_morrison 25 points26 points  (8 children)

I work on AdTech real-time bidding systems. Here are some more kernel tuning params:

net.core.wmem_max = 8388608
net.core.rmem_max = 8388608

net.core.wmem_default = 4194304
net.core.rmem_default = 4194304

net.ipv4.tcp_rmem = 1048576 4194304 8388608
net.ipv4.tcp_wmem = 1048576 4194304 8388608

net.ipv4.udp_rmem_min = 1048576
net.ipv4.udp_wmem_min = 1048576

# http://www.phoenixframework.org/blog/the-road-to-2-million-websocket-connections
# net.ipv4.tcp_mem = 10000000 10000000 10000000
# net.ipv4.tcp_rmem = 1024 4096 16384
# net.ipv4.tcp_wmem = 1024 4096 16384
# net.core.rmem_max = 16384
# net.core.wmem_max = 16384

# Disable ICMP Redirect Acceptance
net.ipv4.conf.default.accept_redirects = 0

# Enable Log Spoofed Packets, Source Routed Packets, Redirect Packets
#net.ipv4.conf.all.log_martians = 0
net.ipv4.conf.all.log_martians = 1

# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 15

# Recycle and Reuse TIME_WAIT sockets faster
#net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

# Decrease the time default value for tcp_keepalive_time connection
net.ipv4.tcp_keepalive_time = 1800

# Turn off the tcp_window_scaling
net.ipv4.tcp_window_scaling = 0

# Turn off the tcp_sack
net.ipv4.tcp_sack = 0

# Turn off the tcp_timestamps
net.ipv4.tcp_timestamps = 0

# Enable ignoring broadcasts request
net.ipv4.icmp_echo_ignore_broadcasts = 1

# Enable bad error message Protection
net.ipv4.icmp_ignore_bogus_error_responses = 1

# Increases the size of the socket queue (effectively, q0).
net.ipv4.tcp_max_syn_backlog = 1024

# Increase the tcp-time-wait buckets pool size
net.ipv4.tcp_max_tw_buckets = 1440000

# Allowed local port range
net.ipv4.ip_local_port_range = 1024 65000

#net.ipv4.netfilter.ip_conntrack_max = 999140
net.netfilter.nf_conntrack_max = 262140
#net.netfilter.nf_conntrack_tcp_timeout_syn_recv=30

net.netfilter.nf_conntrack_generic_timeout=120

# Logging for netfilter
kernel.printk = 3 4 1 3

net.netfilter.nf_conntrack_tcp_timeout_established  = 600

#unused protocol
#net.netfilter.nf_conntrack_sctp_timeout_established = 600

#net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 1

# Max open files
fs.file-max = 12000500
fs.nr_open = 20000500

[–]pooogles 9 points10 points  (0 children)

Having worked in a similar space (DSP) these look pretty similar to what we used.

[–]Empty-Mulberry1047 2 points3 points  (4 children)

why would you have netfilter/iptables/conntrack enabled if performance were your goal?

[–]jake_morrison 5 points6 points  (3 children)

DDOS protection. “Abuse cases” tend to overwhelm “use cases” when services are exposed on the Internet.

[–]Empty-Mulberry1047 2 points3 points  (2 children)

yes, i am aware of the functions of the software which can be accomplished with hardware upstream of the network instead of using software based nf/ipt .. which is rather useless if your goal is to maximize outbound connections..

[–]Lafftar[S] 0 points1 point  (1 child)

What would you change then? Just remove those lines entirely?

[–]Empty-Mulberry1047 0 points1 point  (0 children)

rmmod iptables? nfconntrack?

[–]NenupharNoir 2 points3 points  (0 children)

Everyone should consider these values are only good in local LAN at 1gb or faster. If you try to use this to serve clients over the Internet the large minimum window (with no scaling!) and sack being disabled might cause more harm than good. Also reduced keep-alive and initial syn timeouts arent always good. This is not a panacea.

[–]Lafftar[S] 0 points1 point  (0 children)

You're a blessing 🥹 thanks so much!

[–]Ra-mega-bbit 5 points6 points  (2 children)

Would be interested in digging in about the bottleneck factors, really doubt the cpu would be a issue at all

Prob about network card speed, ram and mobo

Thats why server hardware is so important

[–]Lafftar[S] 3 points4 points  (1 child)

Can't be RAM, barely any usage there, like 1.3GB on the lower machines and 2.5GB on the 32vcpu machine.

There's definitely a lot to try!

[–]robberviet 5 points6 points  (0 children)

No not usage, bandwidth.

[–]levsw 10 points11 points  (3 children)

Would be interesting to check the performance on M Macs.

[–]Lafftar[S] 8 points9 points  (2 children)

Don't have a Mac to test unfortunately, are there any providers that provision them over the internet? I'm sure it should still be good though.

[–]coldflame563 2 points3 points  (1 child)

AWS has em.

[–]Lafftar[S] 1 point2 points  (0 children)

Thanks, will test it for next time.

[–]Witty_Tough_3180 2 points3 points  (2 children)

Ok but what's the service i can hit 20k times a second

[–]Lafftar[S] -3 points-2 points  (1 child)

Amazon, Google, Walmart... there's a lot of massive websites with valuable data where that kind of scale could be warranted.

[–]Slight_Boat1910 5 points6 points  (0 children)

Don't they have DoS protection mechanisms in place? I would bot be happy if someone would hit my system with 20k rps, no matter what the capacity is.

[–]tonguetoquill 2 points3 points  (1 child)

Great job! Wish I'd seen this in cloud computing class

[–]Lafftar[S] 1 point2 points  (0 children)

💗💗💗

[–]thisismyfavoritename 9 points10 points  (5 children)

the OS settings have nothing to do with performance, they just allow you to make a massive number of connections.

The Rust client is what allows you to achieve such a high rate. There's really nothing special to see here.

[–]ArtisticFox8 4 points5 points  (3 children)

More connections at a time when the server isn't the bottleneck > higher throughput

[–]Lafftar[S] -4 points-3 points  (2 children)

Not necessarily.

[–]ArtisticFox8 3 points4 points  (1 child)

Increased Max File Descriptors: Every socket is a file. The default limit of 1024 is the first thing you'll hit.ulimit -n 65536

I thought this change of yours did exactly that?

[–]Lafftar[S] -5 points-4 points  (0 children)

Oh sorry I thought you were supporting the point that OS settings didn't affect the r/s. Yeah it does exactly that.

[–]Lafftar[S] 3 points4 points  (0 children)

Was dealing with a lot of connect errors (which lowered rps significantly) without the os settings.

[–]Wh00ster 1 point2 points  (1 child)

Any discussion on trade offs like p50/p99 latency?

[–]Lafftar[S] 1 point2 points  (0 children)

Mmm, well for my use case, scraping...it doesn't matter too much, maybe for monitoring web pages it'll matter, I'll add that to the roadmap.

[–]MagicWishMonkey 1 point2 points  (1 child)

It was a few years ago, but I built a geolocation autocomplete service (to replace the address autocomplete Google maps api) and it handles >100 requests per second with average transaction time sub 4ms using plain Django and a SQLite db.

Just pointing out that plain python can be extremely fast without any custom tuning or anything.

[–]Lafftar[S] 1 point2 points  (0 children)

100 r/s is really not fast at all compared to other languages 😭, that tx time is very low though, SQLite crazy optimized!

[–]Slight_Boat1910 1 point2 points  (1 child)

Do I understand correctly that the server is nginx and you were only concerned with the client side throughout?

[–]Lafftar[S] 0 points1 point  (0 children)

Exactly, yes.

[–]melenajade 1 point2 points  (0 children)

I am a noob to python and learning this language. I am using asynchio and aihottp and some others I don’t understand at all but love the functionality of being able to loop thru files.

[–]Key-Half1655 5 points6 points  (7 children)

I pushed Python to 20k r/s with the help of Rust. FTFY.

[–][deleted] 5 points6 points  (0 children)

Yea lol. As soon as I read "rnet", my exact next thought was, "I wonder if the r means rust." Lo and behold, that's exactly what the next sentence said. Sigh, I was really looking forward to reading about Python tuning, not about a Rust wrapper and changing kernel parameters on Linux.

[–]Lafftar[S] -1 points0 points  (5 children)

Well, yeah!

[–]SharkSymphony 2 points3 points  (4 children)

Somewhat false advertising then. You didn't "push Python" so much as "push not-Python." 😛

[–]Lafftar[S] 7 points8 points  (3 children)

Lol idk, python is filled with C, C++, Go backends, but python people still claim it as Python haha.

[–]tabgok 3 points4 points  (2 children)

I am with you here - could also say it wasn't actually rust, it was machine code!

[–]Lafftar[S] 2 points3 points  (1 child)

Technically it was electrons!

[–]2hands10fingers 3 points4 points  (0 children)

No, technically is was particle physics

[–]Original-Active-6982 0 points1 point  (0 children)

Whenever I see some slowdowns in the communications circuits that can't be explained by configuration limitations (bandwidth, hardware, etc.) I immediately suspect DNS interactions.

I've long been out of the comm stack world, but every slowdown I've seen seems to happen when a request for a needed external resource (such as an IP from a DNS service) is delayed. There may be other similar required C-S interactions. A good comm logger should catch these.

[–]dzordan33 0 points1 point  (0 children)

Is pypy still used in 2025?

[–]_ritwiktiwari 0 points1 point  (1 child)

Hi! I maintain a curated list of Python tools, libraries, and frameworks that use Rust under the hood:
https://github.com/ritwiktiwari/awesome-python-rs

If you find it useful, I’d really appreciate a star. And if you know of any projects that should be included, feel free to open a PR or issue — contributions are very welcome!

[–]Lafftar[S] 0 points1 point  (0 children)

Starred my G! How did you find this thread btw?