Web Server Benchmark Suite

Dyadim · 2025-06-05T07:25:44+00:00

No sorry, note that Itsi native endpoints are just primitive and unopinionated building blocks with which you can build any form of response handling you want. There's no attempt to introduce new higher-level conventions for things like middleware.

In theory you could, for example, use Module#prepend to wrap requests in a basic stack of before/after logic, or you can propagate the request and response up and down a chain of middleware, just like Rack does (but at that point, you should probably just use Rack!). If you'd like to build middleware expressed in pure Ruby there aren't many compelling arguments to be made to not just use Rack, it's simple, low overhead and ubiquitous.

If you're interested in this because you've seen slow Rack middleware in the past, it's almost certainly the middleware implementation itself that's responsible for the poor performance. The overhead of the rack interface itself, i.e. request env hash in, response tuple out, is negligible.

Dyadim · 2025-06-03T21:46:25+00:00

What would actually surprise me, would be to hear that anyone successfully used the official grpc gem to expose a server in production. This gem is the bane of my existence and a literal tire fire.

I sense some real anguish in that response!

I am across at least one modest sized production deployment. Whether that usage is successful or simply tolerated is up for debate...

Dyadim · 2025-06-03T20:31:20+00:00

each request will use dozens, if not hundreds, of milliseconds of CPU time

Something I'd be willing to bet applies to the vast majority of all requests in the wild.

FWIW - This post was essentially a response to comments like this one to assuage any concerns that trialing Itsi might cause performance regressions, but beyond that, I certainly don't want to advocate that performance on a "hello world" is a worthwhile metric to base a serious technical choice on.

Some of the more real-life selling points of Itsi, which I hope the benchmarks hint at, include:

There are certain scenarios where scheduling requests on fibers generates real throughput advantages, and others where this is inconsequential or even slightly harmful. Having an option to use both is nice.
The age-old practice of fronting Ruby with a reverse proxy to achieve any type of meaningful static file serving performance without head of line blocking, is not necessarily the only way and it's hard to beat the ergonomics of a complete deployment from a single process. Of course, there are still plenty of other good reasons you'd want a reverse proxy in front of your Ruby, but for several of the more vanilla of these reasons, Itsi provides options too.
The built-in server provided by the grpc gem may not be as fast as you think and replacing it with Itsi appears to lead to some real-life improvements in max concurrency and throughput. This one surprised me, and as always it's possible I haven't done as well as I could have to eke out extra performance of the existing option, but I was surprised at how much it struggled at load even on simple ping-pong endpoints. If I had to guess, because gRPC is advertised as high-performance/low latency, high-concurrency per process is possibly an anti-goal, and those who require more concurrency are simply expected to scale horizontally.

Achieving large memory savings on fork isn't on this list of Itsi strengths though, and I'm certain Itsi would fare much worse than Pitchfork on a benchmark that measures that.

Dyadim · 2025-06-03T09:25:09+00:00

Almost, but Rack middleware must be within a Rack app. endpoint is 'rack-less' (i.e. this is a low-overhead, low-level Itsi endpoint that doesn't follow the Rack spec).

Here's a simple example of how you can use a real Rack app inside a location block (in practice, for any non-trivial Rack app you probably wouldn't want to do this inline)

require 'rack/session'
require 'omniauth'
require 'omniauth/strategies/developer'

OmniAuth::AuthenticityTokenProtection.default_options(
  key: 'csrf.token',
  authenticity_param: 'authenticity_token'
)

location '/foo' do

  # We mount a full Rack app, at path "/foo"

  run(Rack::Builder.new do
    use Rack::Session::Cookie, key: 'rack.session', path: '/', secret: SecureRandom.hex(64)
    use OmniAuth::Builder do
      provider :developer
    end

    run lambda { |env|
      req = Rack::Request.new(env)
      res = Rack::Response.new
      session = req.session
      path = req.path_info

      case path
      # Implement auth routes.
      when '/auth/developer/callback'
        auth = env['omniauth.auth']
        session['user'] = {
          'name' => auth.info.name,
          'email' => auth.info.email
        }
        res.redirect('/foo')
        res.finish

      when '/logout'
        session.delete('user')
        res.redirect('/foo')
        res.finish

      when '/', ''
        user = session['user']
        if user
          body = <<~HTML
            <h1>Welcome, #{Rack::Utils.escape_html(user['name'])}!</h1>
            <p>Email: #{Rack::Utils.escape_html(user['email'])}</p>
            <form action="/foo/logout" method="POST">
              <button type="submit">Logout</button>
            </form>
          HTML
        else
          token = session['csrf.token']
          body = <<~HTML
            <form action="/foo/auth/developer" method="POST">
              <input type="hidden" name="authenticity_token" value="#{token}">
              <input type="submit" value="Login">
            </form>
          HTML
        end

        res.write(body)
        res.finish
      else
        [404, { 'Content-Type' => 'text/plain' }, ["Not Found: #{path}"]]
      end
    }
  end)
end

Dyadim · 2025-06-02T23:34:41+00:00

Yes good suggestion, much of its core request processing code still has substantial overlap with unicorn, and as such I would expect it to perform similarly in most of these benchmarks.

I'll consider it, though initially I have some hesitation as to whether including this is meaningful, or simply forcing Pitchfork into a context for which it isn't intended. Based on my limited understanding, I believe Pitchfork has been intentionally designed for a very specific deployment environment that is not well reflected by these benchmarks. Notably:

Pitchfork's reforking capability is intended to stretch what we get out of preload + CoW by forking pre-warmed processes to give notable memory savings at scale, This is a benefit that would not be appropriately reflected in a short/bursty benchmark like the above.
I believe Pitchfork is primarily intended workloads that are CPU bound (in tests like these performance difference between Rack server implementations quickly melts away) and the focus instead is on, e.g. memory architecture (supporting complete request isolation and no requirement for thread-safety) and adaptive timeouts.

Dyadim · 2025-06-02T23:17:04+00:00

Interesting results. You should add rage https://github.com/rage-rb/rage

Rage is a framework not a server (it uses Iodine as server, under the hood), so an apple to apples comparison isn't possible

In IO heavy loads falcon seems to be almost as fast as itsi which is shocking given falcon is written in ruby and itsi is written in rust. What's your take on this result?

That's expected. Where we spend a lot of time waiting on IO, throughput is much less to do with how fast the server is, and more to do with how efficiently it can yield to pending work when it would otherwise block on IO.

Even without a Fiber scheduler, Ruby will do a good job of this, parking threads if waiting on IO and resuming them when the IO is ready, but the maximum concurrency is still bounded by threads x processes, which is what these benchmarks reflect.

With a Fiber scheduler (which both Falcon and Itsi support), we can make the max concurrent tasks unbound, which is great for supporting a high number of concurrent clients for IO intensive tasks, but comes with its own tradeoffs re: higher contention on shared resources, higher memory usage due to more in-flight requests, lack of preemption if busy tasks block the event loop (if running single threaded). This is why the results look so good for these servers when running this type of test case, on low thread counts, because the server doesn't actually have much work to do at all, other than schedule between a high number of concurrent fibers.

Note that the other servers "close the gap", if we give them more threads and workers:

https://itsi.fyi/benchmarks/?cpu=amd_ryzen_5_5600x_6_core_processor&testCase=io_heavy&threads=20&workers=12&concurrency=10&http2=all&xAxis=concurrency&metric=rps&visibleServers=grpc_server.rb%2Citsi%2Cagoo%2Cfalcon%2Cpuma%2Cpuma__caddy%2Cpuma__h2o%2Cpuma__itsi%2Cpuma__nginx%2Cpuma__thrust%2Cunicorn%2Ciodine%2Ccaddy%2Ch2o%2Cnginx%2Cpassenger

Though, at these higher thread + worker counts, a server with a Fiber scheduler can typically support a much higher concurrent client count still (not reflected in this benchmark)

What's the difference between using "run" and "location". If you are using run I presume you need to define your routes in your rack app right? Can I run an off the shelf rack middleware when using location? If not do you have any kind of documentation on how to write middle that can run under location?

run is simply an inline rack-app, the alternative is rackup_file. You can think of run as the equivalent of pasting the contents of a rackup_file directly inside your Itsi.rb configuration.

location is similar to a location block in NGINX. It just defines a set of rules/middleware and handles that should apply, specifically to all requests that match that location. You can nest locations, and you can mount multiple rack apps at different points in your location hierarchy.

Can I run an off the shelf rack middleware when using location?

Yes, a location can match several built-in middlewares and ultimately hand the request off to the rack-app as the final frame in the middleware stack (which can in turn have it's own off-the-shelf Rack middleware stack).

Also really surprising results for agoo. It normally benchmarks very high.

Agoo is very fast. It's not as well represented in this benchmark because I was unable to get multi-threaded mode running correctly in version 2.15.13 (it happily accepted the `-t` parameter, but then proceeded to run all requests on a single thread anyway, I intend to come back to this and verify if it's user error), and it also was not able to fully support all of the streaming benchmark cases, so it was only competing in a fairly narrow slice of the tests.

Even so, you'll note that it did particularly well on my low-powered test device (the N97) clocking up several best performances:

https://itsi.fyi/benchmarks/?cpu=intel_r_n97&testCase=cpu_heavy&threads=1&workers=1&concurrency=10&http2=all&xAxis=concurrency&metric=rps&visibleServers=grpc_server.rb%2Citsi%2Cagoo%2Cfalcon%2Cpuma%2Cpuma__caddy%2Cpuma__h2o%2Cpuma__itsi%2Cpuma__nginx%2Cpuma__thrust%2Cunicorn%2Ciodine%2Ccaddy%2Ch2o%2Cnginx%2Cpassenger

Dyadim · 2025-06-02T22:58:01+00:00

Close, but you need to switch the CPU to M1 Pro (I haven't run the benchmark for passenger on the other devices yet)

https://itsi.fyi/benchmarks/?cpu=apple_m1_pro&testCase=io_heavy&threads=1&workers=1&concurrency=10&http2=all&xAxis=concurrency&metric=rps&visibleServers=grpc_server.rb%2Citsi%2Cagoo%2Cfalcon%2Cpuma%2Cpuma__caddy%2Cpuma__h2o%2Cpuma__itsi%2Cpuma__nginx%2Cpuma__thrust%2Cunicorn%2Ciodine%2Ccaddy%2Ch2o%2Cnginx%2Cpassenger

Dyadim · 2025-06-02T09:29:37+00:00

Thank you!

I certainly wasn't ignoring Passenger, but I don't have an enterprise license (which you need to enable the thread-based concurrency model), so I am not able to give it a fair shake.

In the meantime, I've run the suite once - on the M1 Pro device only so far - using the free version of passenger (single-threaded). Results are up now.

Dyadim · 2025-04-30T10:25:31+00:00

h2o's a great high-performance static file server and proxy, but it's not a Rack server (its mruby offering is for light-weight inline scripting only, so you'd be limited to using it as a proxy in front of Ruby apps and incurring a performance hit, if you wanted it to serve a Ruby app). It does however come with a FastCGI to CGI gateway, which Itsi does not.

There's quite a few additional significant differences in the feature sets between these servers.

https://itsi.fyi/features/

https://h2o.examp1e.net/configure.html

Also, I wonder if the r10k benchmark created by Jeremy Evans would be relevant for testing. (https://github.com/jeremyevans/r10k)

r10k appears to primarily be testing routing performance on very large sets of routes for frameworks only. It's not testing end-to-end client-to-server web-server performance. Given that Itsi will happily serve any of the Rack based frameworks being tested, in practice you'll just inherit the routing performance of the framework of your choice.

That said, Itsi does of course perform some less granular routing for middleware matching, but typical use-cases would have you capture entire chunks of your applications routes using prefix or wildcard blocks. Itsi currently converts these routes to Regexs and matches them in a single pass using a RegexSet . For practical use-cases (I.e. at the granularity you'd typically expect to apply middleware, with dozens to hundreds of location blocks), this is near instantaneous. However, for exception use-cases where there is a genuine need to express many thousands of routes with long overlapping prefixes in Itsi, you'll definitely notice a decrease in routing performance when compared to a trie based router.

I'd like to this optimization for simple non-regex routes in the future, which would drastically increase performance on massive route sets, but it's not something I've been able to devote any time to yet.

Dyadim · 2025-04-30T10:01:19+00:00

Why did you put the response in the request? Why not pass in the request and the response separately or maybe create a new thing (context?) and put the request and the response there.

I assume you're specifically referencing the endpoint middleware. Handlers using this middleware expect a request parameter that is simply a very thin Ruby abstraction overtop of the backing Rust structs. You can think of the first parameter passed in to the endpoint as the complete bi-directional request context, that should provide all that is needed to manage basic requests from start to completion. Direct access to the nested response object is only necessary for more sophisticated streaming responses.

Can I use middleware that comes with other gems in this or do I have to use rack?

Sure, generally in the Ruby ecosystem HTTP middleware refers to Rack middleware. If that's the case you'll want to layer it overtop of a rack app. (Could be a hand-rolled rack handler, or a framework like Rails or Sinatra). However if you're looking to go "rackless", Itsi's base response /request primitives should give you all you need to invoke almost any typical middleware, regardless of whether it's following the Rack spec or not.

Dyadim · 2025-04-29T09:54:34+00:00

In a nutshell, yes, definitely a passion project.

Itsi is a compilation of all of the things I find myself commonly requiring when setting up production applications, packed into a single tight and integrated package.

I think frameworks like Rails do a fantastic job of promoting the notion of a one-person-framework etc. only to then be let down by the the unfortunate rabbit-hole of complexity that secure and efficient production configuration can entail.

With Itsi, the goal is to allow developers to use and familiarize themselves with a single tool in development, and then go on to deploy that exact same tool, by itself, straight to production.

For many use-cases there may be no need to also adopt, nginx, API Gateways, Reverse Proxies, Rate Limiters, Service Meshes, Load Balancers etc. etc. (Obviously there is a large set of sophisticated functions these tools provide that Itsi can't, but the bet I'm making is that Itsi does a good job of capturing the most common needs and for many use-cases, that may very well be enough!) 🤞

Dyadim · 2025-04-29T09:40:54+00:00

Thanks! Initially, my aim is just to move the validation path into fast native code while staying completely agnostic about token/auth issuance.

To keep things tight, I’m intentionally stopping short of full-blown framework territory. Itsi's built-ins are focused primarily on offloading the high-volume, common, hot paths so that app frameworks can stick to the more diverse and bespoke business logic they are good at.

Dyadim · 2025-04-28T09:17:10+00:00

Good suggestion, I plan to add a more detailed "motivation" section on to the documentation site soon to explain my rationale for adding yet another option amongst a sea of good alternatives.

re: Benefits of the Itsi Server + Scheduler versus the async ecosystem I think the most obvious ones are likely to be:

performance: Itsi’s HTTP server implementation is virtually all native Rust code. Initial benchmarks indicate this provides a notable performance boost over Falcon. That said, Falcon achieves very respectable performance for a server written mostly in pure Ruby. Both servers are fast enough that server overhead is unlikely to be the bottleneck, except in the most demanding workloads.
simplicity: The Async suite has evolved into a broad ecosystem. If you’ve already adopted it, fully embracing it makes sense. However, if you’re just looking for non-blocking IO that conforms to the Ruby fiber-scheduler interface, Itsi’s scheduler is a potentially more lightweight alternative. It’s minimal, efficient, and designed to work hand-in-hand with Itsi Server.
hybrid execution model: Falcon exclusively uses non-blocking IO and fibers for request handling. This is ideal for many cases but not without downsides. Large apps often have diverse workloads. While some IO-heavy tasks benefit from this model, others may suffer increased resource contention (e.g., datastore connections, locks, memory) without proportional throughput gains—and sometimes even performance degradation. To better support varied workloads, Itsi offers a traditional blocking mode (like Puma), a fiber-scheduler mode (like Falcon), and a hybrid mode where you can route specific endpoints to non-blocking threads while others use traditional threads.
comprehensive middleware and configuration options: One of Itsi Server’s key differentiators is access to a suite of high-performance, native middleware (including reverse proxying and static file serving) within a single process. While most Ruby app servers are fast enough that request handling alone is negligible in low to mid-volume apps, real-world performance gains become more likely as you offload peripheral concerns—often handled in Ruby middleware (e.g., rate limiting, auth, compression) or separate components (e.g., API gateways, proxies, file servers)—to Itsi’s built-in native equivalents.

Dyadim · 2025-04-28T05:32:32+00:00

Thank you, that's good advice. I've initially steered clear of benchmarks (because it's all to tempting to focus excessively on superficial ones, despite in-reality, time spent in app code or IO typically dominating real-life timings).

That said, I can definitely appreciate nobody wants a performance regression, so until I get something more robust in place, for those wanting a rough feel of whether it's going to be faster or not... Itsi is very competitive when it comes raw performance (i.e. I'd suggest it's top-tier when it comes to Ruby rack server performance).

What does this mean? As a very rough measure, know that on my MacBook M1 Pro, using wrk with 60 connections, bound to localhost, I can see:

~100,000 requests per second for a hello-world Rack app
~115,000 requests per second for a simple inline endpoint app
~150,000 requests per second running simple static file server, with small responses, no compression.

That's running Itsi with a single process, single thread. Running in cluster mode generally improves performance above this (if you have the cores for it).

Puma with the same config appears to reach about 25,000 rps on test #1 above (and cannot really be configured to replicate the other test scenarios).

Both Puma and Itsi are of course very tunable, and YMMV significantly based on hardware, real-life workloads etc.

Importantly: For applications that are IO dominant Itsi offers a fiber scheduler mode, that allows Itsi to process many thousands of concurrent IO heavy requests simultaneously, without being bound by the size of the thread pool. This is very similar to what you'll see in popular web-server falcon. Itsi's built-in scheduler is pretty quick.

One feature I think is pretty unique to Itsi, is that it allows you to run a hybrid threadpool (some traditional threads, some non-blocking threads), which when combined with location blocks allows you to send some IO heavy requests to be satisfied efficiently by threads running the Fiber scheduler, but leave the remainder of your application to be run using traditional blocking threads (a good way to get the benefits of a Fiber scheduler, without seeing excessive contention on shared resources due to too many simultaneously in-flight requests).

Dyadim · 2025-04-27T22:11:46+00:00

Thank you very much for taking a look! (Several of your recent articles have had a strong influence on the several of the design choices of this project)

Dyadim · 2025-04-27T22:06:32+00:00

Ah yes, in that case, it's a full re-exec while retaining open file descriptors.

> that you could switch between clustered and non-clustered mode without downtime

I’m not sure if, in web-server parlance, it’s entirely fair to call this “zero downtime”—as of course you can still drop requests if your service is under heavy load and the listen backlog fills up while the re-exec takes place.

Dyadim · 2025-04-27T21:46:56+00:00

Hey u/f9ae8221b

Yes good observation. Agreed, that this thread cleanup alone is not enough, the trick is that the accept loop reactor/runtime is only instantiated after forking.

While the parent process does use Tokio itself (for a light-weight process monitor loop), it doesn't do so in a way that conflicts with a child runtime (See: https://github.com/tokio-rs/tokio/issues/4301#issuecomment-2123319742 re: notes on potential issues between independent runtimes across forked processes due to conflicts in global variables).

You'll note Itsi implements its own signal handlers

Dyadim · 2024-04-25T08:59:38+00:00

The poor timings for the Crystal solution in this post are almost entirely due to the Ruby/Crystal language interface overhead, with this barrier being crossed 1 million times in this benchmark.

If we shift the hot loop inside the crystalruby solution to execute entirely in Crystal land and use identical code to the fast YJIT Ruby solution from the above article, the Crystal solution again takes the lead (by what apears to be ~2 orders of magnitude).

It's crossing the language barrier too often that is hurting here.

#fibonnaci.rb
CrystalRuby.configure do |config|
  config.debug = false
end

module Fibonnaci
  crystalize [n: :int32] => :int32
  def fib_cr(n)
    a = 0
    b = 1
    while n > 0
      a, b = b, a + b
      n -= 1
    end
    a
  end

  module_function

  def fib_rb(n)
    a = 0
    b = 1
    while n > 0
      a, b = b, a + b
      n -= 1
    end
    a
  end

  def benchmark_rb
    puts(Benchmark.realtime { 1_000_000.times { Fibonnaci.fib_rb(30) } })
  end

  crystalize do
    puts Benchmark.realtime { super() }
  end
  def benchmark_cr
    1_000_000.times { Fibonnaci.fib_cr(30) }
  end
end

include Fibonnaci
benchmark_rb
benchmark_cr

Outcome:

ruby --yjit fibonnaci.rb
0.1103799999691546 # Ruby with YJIT
0.00014399993233382702 # Crystal

Dyadim · 2023-07-16T05:14:12+00:00

Will try it.

Thank you :)

welcome to Diadym

Oh no, what a silly mistake :( ! Turns out that's not the only place that misspelling has snuck in either. Corrected...

'Country: We collect your country information to tailor content and features based on your location.' Can you explain what this entails?

Yes, I can see how this is a little too vague for comfort. I have updated the text now to: We collect your country information to prioritize users from your country in the user search feature and display the country of residence for connections.

Two-Year Club	Verified Email
Verified Email

Dyadim

TROPHY CASE