Asynchronous MongoDB in Tornado with asyncmongo

5 December 2010

EDIT: This blog now runs on Golang

This blog runs on the Tornado web framework. Tornado is one of a handful of C10K web servers which scale concurrent requests through a single threaded, event-driven asynchronous architecture. By using a single thread to manage multiple requests, a single C10K server can scale to thousands of concurrent requests.

One thread for the entire web server, doesn't that block other requests when hitting the database you might ask? Absolutely it does if you're using blocking style I/O. Tornado differs from conventional web servers which use multiple threads to handle concurrent requests. When using traditional threaded web servers, each blocking call to the database or a remote service will be isolated in a separate thread to unblock execution of other requests. While multi-threaded servers provide a very clean programming model to handle concurrency, the overhead of managing multiple threads can pose challenges when C10K scale is required. Over the past decade, this has led many large web offerings to take a bet on event-driven servers. Some notable sites using C10K servers include YouTube, Wikipedia, Wordpress, FriendFeed, and bit.ly.

So, what's the catch? While the single-threaded nature of event-driven servers is great for scaling concurrency, additional care is required in the mid-tier to preserve the non-blocking programming style. For remote web service calls, Tornado includes a non-blocking HTTP client.

While this is great for HTTP based services, most database access in Tornado has typically been blocking. FriendFeed uses a blocking MySQL client under the assumption that if database queries were causing concurrency issues in the mid-tier, the database servers would not be able to scale to the corresponding load. FriendFeed also uses nginx as a load balancer in front of Tornado which helps to distribute any momentary blocking calls to the database amongst several frontend machines.

Bummer, all that C10K perf only to block on every call to the database? Not a problem anymore for MongoDB users. Bit.ly recently contributed their work on asyncmongo to the community which provides non-blocking access to mongo over the PyMongo API.

import asyncmongo
import tornado.web

class Handler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def get(self):
        db = asyncmongo.Client(pool_id='mypool', host='localhost', 
            port=27107, dbname='mydb')

        db.users.find_one({'username': self.current_user}, 
            callback=self._on_response)

    def _on_response(self, response, error):
        if error:
            raise tornado.web.HTTPError(500)
        self.render('template', first_name=response['first_name'])

Bit.ly's asyncmongo fills a huge gap in data access for Tornado. This client driver preserves the non-blocking programming style required by event-driven architecture in the mid-tier to a major NoSQL player focused on scaling big data. Looking forward to seeing other client libraries follow suite. Pycassa users have been calling for it.

By Aaron Dunnington