I’ve been spending a good deal of time the past couple of days processing large JSON files to try and fix some corrupted data (long story, short version: my fault). While JSON is a fast file format to work with, processing > 50 MB of any data format takes some time.
So to give myself some idea of what was going on, I whipped up a small progress bar for Python 2.6′s json module (works on simplejson if you’re still using 2.4/2.5):
import sys class JsonProgress(object): def __init__(self): self.count = 0 def __call__(self, obj): self.count += 1 sys.stdout.write("\r%8d" % self.count) return obj
And then use it as the object_hook when loading JSON:
f = open('foo.json') foo = json.load(f, object_hook=JsonProgress()) print "\rDone" # \r in the next line erases the progress output
Although JsonProgress is a poor name since its also useful in generic list comprehensions:
progress = JsonProgress() foo = [progress(x) for x in bar] print "\nDone" # \n prints a newline so the progress output is kept
Obviously this is a performance hit, but still quite handy for personal use when you just want to know that something is happening.
In progress prints, i find it handy to just print every n’th time through the loop. Something like:
- Paddy.
@Paddy3118: Good call! (no pun intended) I went ahead and corrected the formatting because WordPress kills formatting in comments, but lets admins correct it evidently.