Crowdsourcing my OS Bridge Talk Proposal

Update: Submitted my proposal, but please still leave me comments! I can still edit my proposal.

Web server performance has been a hot topic for some time now as Apache hasn’t been the only viable option for a few years.

I want to do a talk at OS Bridge doing practical comparisons between the popular web site hosting options available today. There’s plenty of requests per second comparisons between the major HTTP servers out there, so I want to go beyond and compare a wide variety of metrics in a wide variety of environments.

Here’s what I’m thinking so far, but I’d love your feedback! I’m really excited about the possibility of presenting at a large conference like OS Bridge and want to make my talk as exciting and useful as possible.

Metrics

  • Requests per second (serially* and concurrent)
  • Memory usage
  • Features (extensibility, modularity, etc)
  • Community (support options, tools/modules available, etc)
  • Portability
  • Reliability (can I crash it? how well does it handle load?)
  • Ease of setup and maintenance (floods of patches? hideous config file?)

Web Servers

The usual suspects (will definitely test):

  • Apache 2.2 (prefork and worker mpms)
  • Nginx 0.6.x (latest stable)
  • Lighttpd 1.4.x (latest stable)

Maybes:

  • Nginx 0.5.x (legacy)
  • Nginx 0.7.x (latest dev)
  • Cherokee (not sure if its really popular enough yet to bother)
  • CherryPy and some other language specific or just less known servers would be fun.
  • IIS – It’d be really fun if I could find someone willing to help me run my tests on IIS. I definitely won’t be setting it up myself though!

Test Cases

This is where things become really complicated. I don’t want to just download static files from each server and pretend like thats a real world use case, but at the same time the various environments I could come up with are nearly limitless.

I want to strive to cover the most common environments** as well as a few variations that produce interesting results.

  • WordPress (mod-php where applicable and FastCGI)
  • Drupal (mod-php where applicable and FastCGI)
  • Bugzilla, Redmine, and Trac (to test a few non-trivial apps from 3 popular languages)
  • trivial Django app (mod-python & mod-wsgi on Apache, FastCGI everywhere)
  • trivial Ruby on Rails app (need advice from RoR devs on common deployment methods)
  • trivial Zend Framework app (or is there a more popular php framework?)
  • something from the Java world?
  • maybe something fairly obscure like Seaside?

Conclusion

Wow, that seems like a lot to cover thoroughly, but I think the research will be a lot of fun to do.

Please let me know your thoughts either in the comments, on twitter (@schmichael), on IRC (schmichael again), or via e-mail.



* Its interesting to compare serial to parallel req/sec to give people an idea of how well the technology will scale on a single server. I’ll make sure to note good serial performance can often be turned into good parallel performance by using a load balancing proxy and multiple instances of your application server.
** All Linux based for simplicity unless someone volunteers to run the tests on other platforms.

This entry was posted in GNU/Linux, Open Source, Technology and tagged . Bookmark the permalink.

6 Responses to Crowdsourcing my OS Bridge Talk Proposal

  1. thesethings says:

    It seems like one thing supporting the proliferation of web servers, is the disparate uses. Load-balancing/proxy/reverse-proxying is becoming more and more common, even for very simple, not-so-big sites. (I recently read a great post advocating for a load-balancer helping even a single-server site, and it made sense! :D )
    In any case, I’d love a talk that didn’t feel obligated to compare these servers apples-to-apples, but rather addressed the places each server fits best, major sites using them, etc.

  2. @thesethings: Thanks for the positive feedback. Yeah, I’m really hoping to address the subject as objectively and practically as possible. I don’t think anyone would be happy if the talk somehow turned into a Django vs. RoR or Apache vs. Lighttpd deathmatch.

    Load balancing does make things more complicated. I think I’d probably just address that briefly as in: “Running a load balancer could mitigate bottlenecks like this.” (where “this” is something like servers/applications with poor concurrency).

    There’s so much variability these days I’m just going to have to hand wave at some point. :-)

  3. If you want to add another parameter, you could add a higher level to compare similar configurations among today’s popular web hosts: mediatemple, webfaction, slicehost, linode, GAE, and EC2. Those are obviously not apples-to-apples comparisons, but it would be great to see some practical benchmarks applied to the various services offering the different types of hosting. Even if they could be inferred by the configs they advertise.

    If you really want to put a lot of effort into this, you’re headed towards having a real benchmark suite. So maybe keep the possibility in mind when you prepare your tests.

  4. @Micah: Comparing cross-hosting-providers would definitely be interesting, but each new level of tests basically increases my talks complexity and density exponentially. Since this is an open source conference I’d rather focus on just the “higher” software stack rather than including the hosting provider platform which in some cases introduces proprietary components (I’m looking at you GAE).

    You bring up an interesting point about the benchmark suite. The more successful I am at automating my build out and benchmarking, the more technologies and configurations I’ll be able to test. Hopefully at the end of all this I’ll have a benchmarking suite to release that would allow others to extend my work to cover compare things like GAE vs. EC2 or Linux vs. *BSD.

  5. Just keep in mind that Apache/mod_wsgi has two modes of operation. The daemon mode is the safer option as harder to do too much wrong. With embedded mode you have to be careful, just as with mod_python, that the Apache MPM settings are tuned properly for fat Python web applications. See ‘http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html’. Also watch out when comparing different options where one preloads an application and another performs lazy loading. If one isn’t careful and take this into consideration, and so the initial loading is counted in the request times, it can make a solution appear worse than it is, when in reality it is only an issue for the first requests. Strive therefore to configure systems to preload where they can.

  6. @Graham: Thanks for the tips. If my talk gets accepted I’ll make sure to post results and configurations as I go. Hopefully I’ll also remember to post them to their respective project’s mailing list as well so maintainers like you can help make sure I’m reproducing realistic conditions.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">