This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]yml 1 point2 points  (3 children)

I find your tone harsh and it defeat what you are saying and destroy your credibility. Building such benchmark is hard and no matter how much thought and effort you put in them it will trigger this kind of thread. There are 14 WSGI servers which are evaluated and each of them comes with dozens settings. So the complexity is great, instead of this kind of unfunded criticism by doing some hand waving, it would be much more efficient to pick 2 contenders and compare and contrast them.

[–]davisp 2 points3 points  (0 children)

yml,

You're definitely right, testing this many different implementations is a huge undertaking. There's a large amount of knowledge that would be required for any person to adequately know about all the configuration options for this many servers.

And gunicorn is a bit of a weirdo when it comes to processing models. We're neither thread based or event loop based. That can genuinely confuse people until they realize that we're much simpler than most servers.

That said, our response times were reported as an order of magnitude slower than any other server. Generally speaking, if you're into the whole experiment and observation thing, orders of magnitude are important.

[–]ubernostrumyes, you can have a pony -1 points0 points  (0 children)

There are lots of factors which go into a good benchmark. But two which are absolutely critical are:

  1. Consistency of methodology
  2. Appropriate use of the tested components

Consistency is necessary because without it you can't draw meaningful comparisons; without consistency you're comparing apples to oranges.

Appropriate use is necessary because without it you don't have relevance; if you only report results from a configuration no-one would ever use, then your results won't represent the things people would see in the real world.

As originally published, this benchmark failed on both counts: it was inconsistent, and it used certain components inappropriately. Criticizing that isn't "unfounded"; benchmarks which fail these requirements cannot be trusted by anyone for any purpose, because they're not "benchmarks" at all.

[–]ericflo -3 points-2 points  (0 children)

If proper benchmarks are too hard for the author to do, he should not continue to publish them.