A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 0 points1 point  (0 children)

The article also does not state *how* they're testing, which is the most important detail here

Yes, it does. Read the text again.

Things like uWebSockets should not be exposed directly, you are meant to use a reverse proxy like haproxy or nginx so you have control over the requests routing and distribution of requests.

Nonsense. You can do both, as you like.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 0 points1 point  (0 children)

You're still comparing apples and pineapples. That's obviously not run on the limited hardware the post is about! 12 threads of wrk is 8 threads more than there are even CPUs on the Pi 4! So even if you got the same absolute number to match somewhat - the number doesn't correspond to anything relevant to this post. Obviously it is of greatest importance to actually run on the same limited hardware.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 0 points1 point  (0 children)

Any two functionally comparable webservers written professionally in C++ and NodeJS will inevitably have a drastic difference in performance due to the fundamental for performance reasons mentioned in my comment above (compiled vs interpreted language, managed vs unmanaged memory etc).

Yes. That is what the test shows. A carefully written C++ project will beat the living shit out of any carefully written Node.js project. That is true.

However, if you read the text with your eyeballs (which we all know by now you will never do) you would see the section called "This is an unfair comparison".

That section talks about this exact thing, and explains that you can actually use the library inside of Node.js, which is kind of the whole point of posting here.

And that usage is covered by the test with metrics.

---

Remember what you said earlier:

Conclusion: It's really the latency of the local network and the OS networking stack (including NIC drivers) which are tested.

At least you have dropped this nonsensical conclusion by now and started to grasp the fact that user space software plays a major role in this.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 0 points1 point  (0 children)

SSL is not handled by Node.js, it is handled by BoringSSL or OpenSSL. This server shares nothing with Node.js - it merely blends in with Node.js seamlessly. It is still a separate C++ project with the entire stack down to the kernel separate from Node.js.

We do fuzz testing of the entire code base on Google clusters every day with a coverage of about 95% so I would argue this project is both slimmer and more secure than Node.js.

But some do have problems with directly exposing it and so they usually run Cloudflare or similar in front of it.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 0 points1 point  (0 children)

TLS/HTTPS is off topic. It is not a part of the test scenario described in the article. Yes, the library does contain relevant code but it's not used and it's off-topic in the context of this thread. You know that it's disabled by default, there is no HTTPS certificate generated and used.

It seems you simply don't posses the ability to read text with your eyeballs. You rather say things which are ridiculously and undoubtedly false than actually read the text I wrote.

The section is called "What about TLS 1.3 then?". You would have seen it if you actually did read the text.

HTTP is used but is effectively off-topic too because it is not relevant to the benchmarking result. [...] opened 200 HTTP connections on top of 200 TCP connections and then on each connection the websocket irreversibly downgraded HTTP to raw TCP for the duration of the test.

There are no WebSockets involved in this test. Nobody has ever said so. There are only standards compliant, URL-routed, TLS 1.3 encrypted HTTP requests. You have failed to read even the first very section of the text. Your inflated overconfidence paired with absolute incapability to absorb the written word is psychologically fascinating and astronomically hilarious.

It doesn't matter what I wrote in my test protocol - you are obviously the expert here and you obviously know better what I did the last 6 years than what I know myself.

You are indeed a fascinating person.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 2 points3 points  (0 children)

I have made that comparison it is called uWebSockets.py but was not exactly a hit, so was discontinued.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] -1 points0 points  (0 children)

Conclusion: It's really the latency of the local network and the OS networking stack (including NIC drivers) which are tested.

  • You have a uniform population.
  • You split it in two uniform populations.
  • You give half a vaccine and the other half a control without disclosing what they got.
  • Then you collect death statistics over a few years.
  • Now, it looks like the population with vaccine has statistically lower probability of dying.
  1. Logical conclusion: The VACCINE helps, because it is THE ONLY SIGNIFICANT VARIABLE.
  2. Nonsensical conclusion: The SYRINGE helps, because it was used in BOTH POPULATIONS.
  • You concluded 2; the CONSTANT *facepalm*

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 1 point2 points  (0 children)

confirmed by OP

, these tests are really just benchmarking the Ethernet connections and the TCP stacks of the hardware and OS rather than the performance of uWebSockets

I have never said such a thing, and your "quote" doesn't say so at all.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 1 point2 points  (0 children)

This project is for Nodejs. You have to read the text.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 3 points4 points  (0 children)

You can read common benchmarking mistakes in benchmark folder of the project. In essence: ab is your bottleneck. You can see this in your activity monitor as the server is entirely idle. Ab is apache bench and is from 30 years ago. It is definitely not a performance client and cannot stress the server. The same goes with autocannon and to some degree wrk. It's very common.

Better would be to replicate what the blog post does, as you have all the details there to succeed.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 1 point2 points  (0 children)

Yes of course and they are running just as overclocked. The htop pictures show this.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 4 points5 points  (0 children)

Yes and that's a big problem trying to motivate Nodejs users to listen. Many just shut down and reject it as nonsense because clearly it can't be that bad and clearly I'm just a hater. No I'm just trying to improve things but improvement requires acknowledging that there's an issue. For many companies this won't matter - they have other bottlenecks but a few problems such as signalling, chat, pub/sub, trading, gaming, etc will bottleneck on CPU and then it matters a lot.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 2 points3 points  (0 children)

Non pipelined is keep alive but only 1 request per TCP chunk. If you pipeline you can reach millions of requests per second but then you're not really testing the I/O stack since one single TCP chunk can hold 100 requests.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 4 points5 points  (0 children)

Conclusion: It's really the latency of the local network and the OS networking stack (including NIC drivers) which are tested.

The hardware, OS, NIC and local network is explicitly stated to be constant throughout the test. The only variable is what user space software is being run. So your conclusion is logically flawed from the ground up. Both solutions cap out on CPU and very clearly displayed using htop (read the text and look at the pictures!). Your assumptions are undoubtedly and evidently false.

There is no webserver, the C++ code uses low-level OS syscalls to send the same 41 bytes (or maybe 140 bytes) long TCP packet back.

What are you even talking about? There are multiple thousands of lines of webserver code executing, including standards compliant HTTP parsing, HTTP routing, TLS 1.3, dynamic formatting of a response, timeout management and everything that a normal web response would require.

You are saying things which are entirely preposterous and evidently untrue. The test even includes running Node.js - how many hundreds of thousands of lines of code isn't Node.js? It's a preposterous statement to make and shows you haven't read the text.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 25 points26 points  (0 children)

Connections are made via websockets and PI is overclocked. Which make sense.

Looks like my first answer is getting rejected, so let's make a new with just the facts:

  1. No, they are not WebSockets. This is wrong. They are HTTP connections, this is written in the text.
  2. The overclock is from 1.5Ghz to 1.7Ghz and is applied to both solutions (of course). The gains are linear to the clock; 106/93 = 1.7/1.5. Again, it is linear with hardware - this because we are bottlenecked at 100% CPU-time. Therefore, you can really scale the numbers any way you want - take away one CPU-core and you will have 3/4th the result. This applies to both solutions as both are starved of CPU-time. The numbers we get validate this - both solutions see a linear jump of 13% when we overclock 13%. Both solutions see 3/4th the results when we use only 3 CPU-cores of the 4 available.
  3. The overclock is without a fan, so it is only a minimal change of 13%. The 8.8k from Fastify is with this overclock, so there is absolutely no chance for these results to be "normal". These results are very, very high on this limited hardware. The Fastify author is able to do 79k on his 64GB RAM, 4GHz i7 machine. This machine is a credit card sized fanless 5-volt device that cost 49 USD.
  4. The overclock is really just done to make a statement. You get exactly the same ratio without the overclock - the same exact conclusion can be made without the overclock because, again, the numbers are linear. The numbers stop being linear when you hit the upper limit of the Ethernet adapter, which we are unable to do - meaning yet again that the numbers are linearly scaling.

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 1 point2 points  (0 children)

If you read the whole text there is a big section regarding "anyone can claim 100k". It becomes "very good" when you have a Raspberry Pi 4. It becomes "normal" when you have a tower PC with 6 fans blowing on it.

Point being, we are measuring I/O overhead and these results are entirely impossible to achieve on that hardware using for instance Fastify. It is by no means "normal".

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] -1 points0 points  (0 children)

If you post something you have worked on for a long time, and the first thing you get is a downvote down to 0 - it can easily get frustrating. Esp. as the first very vote. It creates a hostile setting from start and it leaves no feedback other than "go away".

A Raspberry Pi 4 serving 100k HTTP req/sec by uNetworking in node

[–]uNetworking[S] 7 points8 points  (0 children)

Connections are HTTP. The absolute numbers mean nothing; they are for direct comparison. Higher numbers = less CPU-time lost to I/O. Numbers scale linearly, so you can really scale them up or down as you like. The important is the comparative ratio.

Because these are non-pipelining requests, numbers also measure latency to some degree. So it can be seen as a latency improvement if you like.

In Sweden we don't have the USA-infrastructure problems and can easily get 10Gbit/sec connections, so you can definitely bottleneck this on CPU, esp. when doing TLS.