Less Pagination, More More

We live in a brave new (to some) world of databases other than a relational database with a SQL interface. Normally end users never notice a difference, but the astute viewer may notice the slow demise of an old friend: pagination.

Traditionally with SQL databases pagination has looked something like this:

There are previous and next links as well as links for jumping right to the beginning and end. Pretty boring stuff.

What’s interesting is that this standard interface is disappearing in favor of something like this:

Twitter

Facebook

And soon beta testers of Urban Airship’s push service for Android will see a More link on the page that lists devices associated with their app:

The simplest possible explanation for this dumbing down of pagination is that count (for total pages) and skip/offset are expensive operations.

Not only are those operations expensive, but in eventually consistent databases, which many modern non-relational databases are, they’re extremely expensive, if not impossible, to perform.

Cassandra

At Urban Airship we, like Facebook, use Cassandra: a distributed column-based database. This deals two deadly blows to traditional pagination:

  1. No way to count columns in a row (without reading every column).
  2. No way to skip by numeric offset (so you can’t say, skip to page 5).

In Cassandra columns are ordered, so you start reading from the beginning and read N+1 columns where N is the number of items you’d like to display. The last column’s key is then used to determine whether the More link is enabled, and if so, what key to start the next “page” at.

Both of those are solvable problems if you really need them, but I would suspect you would end up creating a column count cache as well as some sort of table of contents for the various page offsets. Not what I want to spend my time implementing.

The fact of the matter is that for many use cases, a simple More button works just as well (if not better) than traditional pagination. It’s also far cheaper to implement, which means more developer time free to work on features and more hardware resources available to push your 140 character insights around the web.

MongoDB

I should note that MongoDB is fairly unique in the non-relational database world as its dynamic querying features include count and skip operations. However, as with any database, you’ll want to make sure these queries hit indexes.

Sadly MongoDB currently doesn’t have the distributed features necessary to automatically handle data too big for a single server.

Posted in SQL, Technology | Tagged , , | Leave a comment

New Job, New Blog

The title is a bit misleading, but I haven’t updated my blog in far too long.

In April I started working for Urban Airship, and I’ve been meaning to upgrade my blog and move to it to a shorter URL for some time. You should be reading this on schmichael.com instead of on the old site at michael.susens-schurter.com. I think I setup the write .htaccess magic to make all of the old links properly redirect to the new domain. Sorry if I broke anything.

For anyone confused by the personal rebranding from “Michael Susens-Schurter” to “schmichael”, it’s all because I’m lazy and hate typing. Also, there’s already a Michael at Urban Airship, so I’m pretty much “schmichael” to everyone these days.

Posted in Personal | Leave a comment

Making Server-Side MongoDB Functions Less Awkward

I’ve recently switched my project at work to use MongoDB for the user database and a few other datasets.

Currently I don’t use many JavaScript functions, but when I do I like to store them on the server so that they’re accessible when I’m poking around in a console.

I use something similar to the following function to load all of my JS functions onto the server when my app starts:

import os
import pymongo
import pkg_resources
 
# Relative to distribution's root
SCRIPT_DIR = os.path.join('model', 'js')
 
def init_js(db):
    '''Initializes server-side javascript functions'''
    scripts = filter(
            lambda f: f.endswith('.js'),
            pkg_resources.resource_listdir(__name__, SCRIPT_DIR)
        )
    for script in scripts:
        # Name the function after the script name
        func_name, _ = script.split('.', 1)
        script_path = os.path.join(SCRIPT_DIR, script)
 
        # Create a pymongo Code object
        # otherwise it will be stored as a string
        code = pymongo.code.Code(
                pkg_resources.resource_string(__name__, script_path))
 
        # Upsert the function
        db.system.js.save({ '_id': func_name, 'value': code, })

However, using server-side functions from Python is awkward at best. Say I have the JavaScript function:

add.js

function(x, y) {
    return x + y;
}

To run that function via PyMongo requires wrapping the function call with placeholder parameters in a Code object and passing in values as a dict:

var1 = 1
var2 = 2
result = db.eval(pymongo.code.Code('add(a, b)', {'a': var1, 'b': var2,}))
assert result == 3

Update: See MongoDB dev Mike Dirolf comment to see a much more concise way of executing server-side functions.

Bearable for simple functions, but having to manually map parameters to values is tiresome and error prone with longer function signatures.

What I wanted was something more natural like:

var1 = 1
var2 = 2
result = db.add(var1, var2)
assert result == 3

I use a simple PyMongo Database object wrapper to make my life easier:

import string
 
from pymongo.code import Code
 
class ServerSideFunctions(object):
    def __init__(self, db):
        self.db = db
 
    def func_wrapper(self, func):
        '''Returns a closure for calling a server-side function.'''
        params = [] # To keep params ordered
        kwargs = {}
        def server_side_func(*args):
            '''Calls server side function with positional arguments.'''
            # Could be removed with better param generating logic
            if len(args) > len(string.letters):
                raise TypeError('%s() takes at most %d arguments (%d given)'
                        % (func, len(string.letters), len(args)))
 
            # Prepare arguments
            for k, v in zip(string.letters, args):
                kwargs[k] = v
                params.append(k) 
 
            # Prepare code object
            code = Code('%s(%s)' % (func, ', '.join(params)), kwargs)
 
            # Return result of server-side function
            return self.db.eval(code)
        return server_side_func
 
    def __getattr__(self, func):
        '''Return a closure for calling server-side function named `func`'''
        return self.func_wrapper(func)
 
dbjs = ServerSideFunctions('foo')
var1 = 1
var2 = 2
result = dbjs.add(var1, var2)
assert result == 3

I’m tempted to monkey-patch PyMongo’s Database class to add a ServerSideFunctions instance directly as a js attribute, so then I could drop the confusing dbjs variable and just use:

assert db.js.add(1,2) == 3

If someone knows of a better way to access server-side MongoDB functions from Python, please let me know!

I modified this code to remove code specific to my project, so please let me know if there are errors.

Posted in Open Source, Python, Technology | Tagged , , | 7 Comments

Web Developer Contractor Rates

We just happened* to start chatting in the #pdxdjango IRC channel on Freenode about what the rates web developer contractors charge today, and I wanted to post my experiences after leaving the contractor world a few months ago after 2 years of more or less successful contracting either individually or via Lo-Fi Art.

A really rough table of my rates as a contractor:

Language Experience Rate per hour
PHP Entry Level $8-20
PHP Experienced $20-65
PHP Specialist never got here with PHP (thankfully ;) )
Sysadmin Slightly Experienced $45
Python Entry Level $25-35
Python Experienced $35-65
Python Specialist (Django) $65-85

However, I think I’ve billed pretty cheaply, especially for Python work. If I had continued in the contracting world I think I would have been aiming for north of $100/hr for new contracts by the end of 2009.

Important Notes

  • All of the experience levels and rates are really rough estimates, please don’t read too much into it. I just wanted to give people some idea of what rates are floating around. (I also have a terrible memory, so these numbers could be way off. Mea culpa.)
  • The sysadmin job is a career oddity for me and consisted of mostly doing Active Directory / Exchange setup (snuck in a Debian server of course). That being said I still enjoy sysadminish type work today.
  • Experienced means you have a few “serious” projects under your belt (not the meaingless “5 years of experience” so many job descriptions call for).
  • Specialist is a poor term, but I needed someway to describe the shift from “I’ll do anything if it’s PHP or Python” to “I’m a Django” developer. My guess is that real specialists (contributors to major projects or popular plugin/module authors) fall into the upper end of this spectrum and can often charge well over $100/hr for highly sought after specialties (Anything + Facebook might be a good example of that right now).
  • I started with PHP first (2000-2006), so I was just less experienced in general.
  • Not only does supply & demand help Python devs fetch a higher rate (reasonable demand, with low supply), but also a Python developer knows how to write code.

    A PHP “developer” could just be someone who has setup a few WordPress or Drupal sites and maybe done some theming. I think you’d be hard pressed to find a web developer who couldn’t be described as having PHP “experience.”
  • My entire career in the “Specialist (Django)” range was in Portland, OR which has a vibrant web related economy (at least as far as my untrained eye can tell). All other rates fell at least partially into time periods where I lived in Illinois (and not Chicago), so that could account for some of the upward shift in the my rates.
  • These numbers are also rough estimates because I’ve done flat per-project billing, retainers, and a variety of other crazy ways of exchanging money for labor. Dollars per hour is still what it all comes down to in the end (like DPS for you MMORPG freaks).

So I’m {ripping off,getting ripped off by} my clients?

I don’t know, but I doubt it. If anything my rough estimates should show what an inexact science billing is. It probably varies more on project factors than on the contractor’s experience.

Right up until I took my full time job at YouGov my favorite client was still paying me at my $35/hr rate. In fact sometimes I wonder if there might have been an inverse relationship between hourly rate and job satisfaction.

This could be a quirk of me being a pretty neurotic person and therefore feeling more pressure when working at a higher rate. At lower rates I generally worked more hours and spent more time tweaking designs, writing tests, and doing other tasks other than putting my head down and coding. Thus at the end of the day, the more hours I worked on projects I liked, the less money I made (relative to working fewer hours on less enjoyable projects).

* Ok, so it looks like I brought it up… but I’d like to think it spawned some good discussion.
Left off the Python category as that gets syndicated on Unofficial Planet Python, and I don’t think this post is high enough quality to deserve that. :)

Posted in Open Source, PHP, Personal, Technology | Tagged , , , , , , | 3 Comments

Ubuntu 9.10 on a Thinkpad T400

I upgraded from Ubuntu 9.04 to 9.10 on my Lenovo Thinkpad T400 about a week ago, and thought I’d write down some of my impressions.

Good news

  • ATI graphics card with proprietary binary driver – just works (much better than 9.04 as well).
  • Suspend & hibernate work flawlessly.
  • Boots fast.
  • Upgrade worked flawlessly.
  • Sound (better volume panel!), wifi, USB, network printer, bluetooth (option to turn it on/off!), webcam, etc. all still work.
  • New theme is nice.
  • I manually upgraded my filesystem to ext4 and grub from version 1 to 2. It was a bit scary but worked out fine. Can’t really feel a difference, but my laptop is mainly a dumb terminal for running web browsers and ssh.

Empathy

Ubuntu replaced Pidgin with Empathy as the default IM client in Ubuntu 9.10. I think because Empathy supports voice/video chat and Pidgin doesn’t? I’ve used video chat once in my life and that was through Skype on Linux. It worked great, but it’s really not a feature I care about.

So for someone like me who doesn’t care about Empathy’s singular advantage over Pidgin, Empathy is a major step backward. At first it was extremely crashy, but a recent update seems to have fixed that. However, now it mysteriously loses messages. I’m a very light IM user, but it would start silently missing messages a few hours into the day every day.

I’ve switched back to Pidgin and couldn’t be happier.

Message Notification applet deal

Empathy integrates with the message notification applet deal along with Evolution. Other apps may as well, but evidently I don’t use any of them. My top panel is 75% whitespace, so the singular benefit of Ubuntu’s consolidated message notification applet was completely lost on me.

So much like Empathy, I removed this specialized applet as well. I’m much happier with per-application icons anyway and fail to see what the benefit of consolidating them is (unless you have a really cramped top panel).

New Theme

I like it, but then I installed Chromium. Now the bold window titles and expansive title bars in Metacity look bulky and antiquated. Gnome really needs to evolve their window manager and default UI. Chrome is an excellent example of how to design a compact, minimal, yet still pleasant and intuitive user interface.

Ubuntu Software Center

What a curious little replacement for the old Add/Remove Applications program. I think I see the direction they’re headed, but it definitely feels like rolled out a beta program to replace a perfectly functional and stable one.

The left pane with expansive whitespace and 2 options hints that there might be more categories in the future, but right now it just looks like a mistake. Like maybe something isn’t working properly, and I’m not seeing all the options I should see.

At any rate, I hope Ubuntu adds an App Store that even includes evil proprietary software. I’d love to be able to plunk down a few bucks for a game like Braid directly from Ubuntu Software Center. That’d be great! Maybe Canonical could even pocket a few pennies and start making Linux-on-the-desktop profitable. Now I’m just dreaming though…

Bottom Line

9.10 is a solid and safe upgrade for any users of previous versions. Not sure there’s anything new to win over users from OS X or Windows though.

Posted in GNU/Linux, Open Source, Technology | Tagged | 1 Comment

lxml vs. ElementTree

While lxml has some excellent benchmarks about the speed of lxml.etree vs. ElementTree, I wanted to run some tests that were as close as possible to my own use case (fairly simple multi-megabyte XML files).

Here are the results of my little test script lxml-v-etree.py (times are in milliseconds):

name           generate | tostring | total | write | parse | find | total
------------------------+----------+-------+-------+-------+------+------
xml.cElementTree    132 |   2430   |  2562 |  2433 |   158 |   58 |   216
xml.cElementTree    112 |   2384   |  2497 |  2387 |   158 |   25 |   183
xml.cElementTree    113 |   2393   |  2507 |  2396 |   161 |   25 |   187
xml.ElementTree     591 |   2571   |  3163 |  2574 |  3613 |   25 |  3638
xml.ElementTree     619 |   2567   |  3187 |  2570 |  3589 |   55 |  3644
xml.ElementTree     609 |   2578   |  3188 |  2581 |  3564 |   55 |  3619
lxml                333 |     75   |   409 |    82 |   200 |    0 |   201
lxml                355 |     93   |   448 |    95 |   182 |   32 |   214
lxml                310 |     94   |   404 |    96 |   156 |   56 |   213
------------------------+----------+-------+-------+-------+------+------
name           generate | tostring | total | write | parse | find | total
------------------------+----------+-------+-------+-------+------+------

Note that the first “total” is “generate + tostring” while the second “total” is for the 2 parsing related tests (previous 2 columns summed).

My parsing tests are basically “etree.parse” and then running “Element.getchildren()” 3 times, which is ridiculously simplistic and should probably be ignored. My writing tests are far more thorough/realistic.

I’m running Python 2.6.2 with lxml 2.1.5 and libxml2 2.6.32 on Ubuntu 9.04 x86_64.

Posted in Python, Technology | Tagged , , , | 7 Comments