Handy Python Progress for JSON module


I’ve been spending a good deal of time the past couple of days processing large JSON files to try and fix some corrupted data (long story, short version: my fault). While JSON is a fast file format to work with, processing > 50 MB of any data format takes some time.

So to give myself some idea of what was going on, I whipped up a small progress bar for Python 2.6’s json module (works on simplejson if you’re still using 2.42.5):

import sys

class JsonProgress(object):
    def __init__(self):
        self.count = 0

    def __call__(self, obj):
        self.count += 1
        sys.stdout.write("\r%8d" % self.count)
        return obj

And then use it as the object_hook when loading JSON:

f = open('foo.json')
foo = json.load(f, object_hook=JsonProgress())
print "\rDone" # \r in the next line erases the progress output

Although JsonProgress is a poor name since its also useful in generic list comprehensions:

progress = JsonProgress()
foo = [progress(x) for x in bar]
print "\nDone" # \n prints a newline so the progress output is kept

Obviously this is a performance hit, but still quite handy for personal use when you just want to know that something is happening.