A Complete Noobs Guide to Hacking Nginx

At Urban Airship our RESTful HTTP API uses PUT requests for, among other things, registering a device. Since the application registering the device is the HTTP Basic Auth username, there’s often no body (entity body in HTTP parlance). Unfortunately nginx (as of 0.8.54, and I believe 0.9.3) doesn’t support PUT requests without a Content-Length header and responds with a 411 Length Required response. While the chunkin module adds Transfer-Encoding: chunked support, it doesn’t fix the empty PUT problem since HTTP requests without bodies don’t require Content-Length nor Transfer-Encoding headers.

So let’s hack nginx shall we?

I know a bit of C but am primarily a Python developer, so hacking an established C project doesn’t come easily to me. To make matters worse, as far as I can tell there’s no official public source repository (but there’s a mirror) and it seems to be mainly developed by the creator, Igor Sysoev. At least the code looks clean.

First Pass

I had nginx-0.8.54.tar.gz handy from compiling the source and nginx was nice enough to log an error for PUTs without Content-Length:

client sent PUT method without “Content-Length” header while reading client request headers, client: …, server: , request: “PUT / HTTP/1.1″ …

So let’s use ack to find it:

ack-grep 'client sent .* method without'

A quick vim +1532 src/http/ngx_http_request.c later and we’re looking at the problem:

    if (r->method & NGX_HTTP_PUT && r->headers_in.content_length_n == -1) {
        ngx_log_error(NGX_LOG_INFO, r->connection->log, 0,
                  "client sent %V method without \"Content-Length\" header",
                  &r->method_name);
        ngx_http_finalize_request(r, NGX_HTTP_LENGTH_REQUIRED);
        return NGX_ERROR;
    }

This code returns the 411 Length Required response for PUTs lacking a Content-Length header. Remove it, recompile (make -j2 && sudo make install && sudo service nginx restart), and test:

$ curl -vk -X PUT -u '...:...' http://web-0/api/device_tokens/FE66489F304DC75B8D6E8200DFF8A456E8DAEACEC428B427E9518741C92C6660
* About to connect() to web-0 port 80 (#0)
* Trying 10.... connected
* Connected to web-0 (10....) port 80 (#0)
* Server auth using Basic with user '...'
> PUT /api/device_tokens/FE66489F304DC75B8D6E8200DFF8A456E8DAEACEC428B427E9518741C92C6660 HTTP/1.1
> Authorization: Basic ...
> User-Agent: curl/7.19.5 (i486-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.15
> Host: web-0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/0.8.54
< Date: Tue, 28 Dec 2010 23:06:54 GMT
< Content-Type: text/plain
< Transfer-Encoding: chunked
< Connection: keep-alive
< Vary: Authorization,Cookie,Accept-Encoding
<
* Connection #0 to host web-0 left intact
* Closing connection #0

Success! Now create a patch diff -ru nginx-0.8.54 nginx > fix-empty-put.patch and post it to the nginx-devel mailing list.

Now to play Minecraft for 12 hours as you wait for the Russian developers to wake up and take notice of your patch. Possibly sleep.

Second Pass: Fixing WebDAV

A positive reply from Maxim Dounin to my patch! I don't use WebDAV though, but if I want this patch accepted I better make sure it doesn't break official modules.

This time around I wanted to work locally, so I installed nginx with the following configuration:


./configure --prefix=/home/schmichael/local/nginx --with-debug --user=schmichael --group=schmichael --with-http_ssl_module --with-http_stub_status_module --with-http_gzip_static_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_dav_module

Note that I set the prefix to a path in my home directory, turned on debugging and the dav module, and set nginx to run as my user and group. A quick symlink from /home/schmichael/local/nginx/sbin/nginx to ~/bin/nginx, and I can start and restart nginx quickly and easily. More importantly I can attach a debugger to it.

The importance of being able to attach a debugger became clear as soon as I tested dav support (with their standard config):

$ curl -v -X PUT http://localhost:8888/foo
* About to connect() to localhost port 8888 (#0)
* Trying ::1... Connection refused
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 8888 (#0)
> PUT /foo HTTP/1.1
> User-Agent: curl/7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
> Host: localhost:8888
> Accept: */*
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server
* Closing connection #0

My patch was causing a segfault in the dav module that killed nginx's worker process. Bumping up my error logging to debug level didn't give me many clues:

2010/12/28 10:32:55 [debug] 15548#0: *1 http put filename: "/home/schmichael/local/nginx/dav/foo"
2010/12/28 10:32:55 [notice] 15547#0: signal 17 (SIGCHLD) received
2010/12/28 10:32:55 [alert] 15547#0: worker process 15548 exited on signal 11

Time to break out the debugger! While I've used gdb --pid to attach to running processes before, I'd just installed Eclipse to work on some Java code and wondered if it might make debugging a bit easier.

After installing plugins for C/C++, Autotools, and GDB, I could easily import nginx by creating a "New Makefile project with existing code":
Import existing code

Now create a new Debug Configuration:
Debug Configuration

Note on Linux systems (at least Ubuntu): by default PTRACE is disabled in the kernel. Just flip the 1 to 0 in /etc/sysctl.d/10-ptrace.conf and run sudo sysctl -p /etc/sysctl.d/10-ptrace.conf to allow PTRACE.

Finally click "Debug" and select the nginx worker process from your process list:
Select Process

By default GDB will pause the process it attaches to, so make sure to click the Resume button (or press F8) to allow nginx to continue serving requests.

Crashing nginx
Now cause the segfault by running our curl command curl -v -X PUT http://localhost:8888/foo. This time curl won't return because gdb/Eclipse caught the segfault in the nginx child process, leaving the socket to curl open. A quick peek in Eclipse shows us exactly where the segfault occurs:
Debugging in Eclipse

Eclipse makes it quick and easy to interactively inspect the variables. Doing that I discovered the culprit was the src variable being uninitialized. Bouncing up the stack once you can see dav's put handler expects to be given a temporary file (&r->request_body->temp_file->file.name) full of PUT data (of which we sent none), and it copies that to the destination file (path).

Bounce up the stack again to ngx_http_read_client_request_body and you can see this relevant code:

if (r->headers_in.content_length_n < 0) {

nginx's core HTTP module short circuits a bit when there's no Content-Length specified. It skips the temp file creation because there's no data to put into the temp file!

So we have our problem:

  1. The dav module put handler expects a temp file containing the data to be saved.
  2. The http module doesn't create a temp file when there's no body data.

The 2 solutions I can think of are:

  1. Always create a temp file, even if it's empty.
  2. Add a special case to the dav module's put handler for when the temp file doesn't exist.

I really don't want to hack the core http module just to make a sub-module happy. It makes sense that no temporary file exists when there's no body data. Sub-modules shouldn't be lazy and expect it to exist. So I decided to try #2.

The Fix

You can see my implementation of solution #2 on GitHub. Simply put, if the temp file exists, follow the existing logic. If the temp file does not exist we have a PUT with an empty body: use nginx's open wrapper to do a create or truncate (O_CREAT|O_TRUNC) on the destination file (since an empty PUT should create an empty file).

I don't know if this is the best solution or even a correct one, but it appears to work and was a fun journey arriving at it. You can follow the discussion on the mailing list.

Updated to switch from bitbucket to github.

This entry was posted in Open Source, Technology and tagged , , , . Bookmark the permalink.
  • http://magarshak.com Gregory Magarshak

    It sounds like what you’ve done amounts to removing a check for an error condition. In this case that check itself might not follow the standard. But removing it might now cause a bunch of possible error conditions with different modules, not just WebDAV. Perhaps it’s better to have a config setting so you could turn this check on and off, since you aren’t aware of all the code that might break as a result.

    This is an interesting point — how does a developer check which functionality would break if they changed the behavior in the core? An automated test suite should fully document the API, and any module relying on undocumented behavior should submit an additional test to the core — that’s the only thing I can think of. What do you think?

  • http://michael.susens-schurter.com/blog/ Michael Schurter

    @Gregory:
    In the Python world I’d definitely write tests to try to discover the impact of my change, however I don’t see a test suite for nginx (yikes) and am not personally familiar with unit testing in the C world.

    As far as the setting goes, I think that’d do more harm then good. Empty PUTs are an entirely sane thing to support, and I’m afraid a setting would just allow module bugs to hide longer.

    That being said your fears are completely legitimate, and I’d be happy to see empty PUT support added any way possible including via a setting.

  • http://flameforged.net Chris McDonald

    Extensive guide, was an educational read. I know a good number of people that have never played with gdb, I may direct them to this so they can see some of the power.

  • http://softboysxp.com softboysxp

    I had the same problem with a vendor’s REST APIs, as their server is not under my control, I just send “Content-Length: 0″ without a body (or you could consider it as an empty one)

  • Bron

    Nginx is not “an established C project”, it’s the insane product of Igor’s twisted mind. It’s a brilliant totally asynchronous giant state machine – which is how it’s so blindingly fast – because everything is non-blocking, and everything is super efficient.

    That said, half the efficiency comes because everything is totally open-coded. Every piece of state machine is right there in hand-spun C. Amazingly long-winded code, which makes it hard to understand immediately what’s going on if you don’t already have the entire state machine in your head!

    Which is probably why your patch causes crashes – because removing a test for “no content length header” is going to make things go ouch. If nothing else, you really don’t want to be allowing keep-alive if you’re not reading a content length.

  • Samuel

    Thanks for this post buddy.

    I have always been using gdb from the command line/Emacs but seems like Eclipse can make stuff a bit easier.

  • http://thepumpkin.posterous.com/ Johan Hernandez

    Thanks, It was pretty helpful.

    j.h

  • Nick Stielau

    Hey,
    Nice post, digging into the code is always a good way to gain a better understanding. After trying this patch and others I still wasn’t able to get what I wanted (bodiless PUTs for a RESTlike API). I have to agree with Bron, excising from Nginx internals is likely to propagate weirdness, which is what I saw (did indeed get the bodiless PUTs working, but caused some other content-length related issues). Thanks to Nginx’s 0-downtime new-binary upgrades it is a pretty smooth process to attempt. Cheers!

  • Joshua Thompson

    Great blog; Easy to understand, and follow.

    Just fixed my site so the same wouldn’t happen to it.
    http://www.centralprint.co.nz

  • Asim

    Hi, can you please message me regarding using this post on a project? Thanks!

  • Asim
  • Michael Schurter

    You may consider all content on this page and the linked patch public domain. I don’t care to enforce copyright on any of it.