lxml vs. ElementTree

While lxml has some excellent benchmarks about the speed of lxml.etree vs. ElementTree, I wanted to run some tests that were as close as possible to my own use case (fairly simple multi-megabyte XML files).

Here are the results of my little test script lxml-v-etree.py (times are in milliseconds):

name           generate | tostring | total | write | parse | find | total
------------------------+----------+-------+-------+-------+------+------
xml.cElementTree    132 |   2430   |  2562 |  2433 |   158 |   58 |   216
xml.cElementTree    112 |   2384   |  2497 |  2387 |   158 |   25 |   183
xml.cElementTree    113 |   2393   |  2507 |  2396 |   161 |   25 |   187
xml.ElementTree     591 |   2571   |  3163 |  2574 |  3613 |   25 |  3638
xml.ElementTree     619 |   2567   |  3187 |  2570 |  3589 |   55 |  3644
xml.ElementTree     609 |   2578   |  3188 |  2581 |  3564 |   55 |  3619
lxml                333 |     75   |   409 |    82 |   200 |    0 |   201
lxml                355 |     93   |   448 |    95 |   182 |   32 |   214
lxml                310 |     94   |   404 |    96 |   156 |   56 |   213
------------------------+----------+-------+-------+-------+------+------
name           generate | tostring | total | write | parse | find | total
------------------------+----------+-------+-------+-------+------+------

Note that the first “total” is “generate + tostring” while the second “total” is for the 2 parsing related tests (previous 2 columns summed).

My parsing tests are basically “etree.parse” and then running “Element.getchildren()” 3 times, which is ridiculously simplistic and should probably be ignored. My writing tests are far more thorough/realistic.

I’m running Python 2.6.2 with lxml 2.1.5 and libxml2 2.6.32 on Ubuntu 9.04 x86_64.

Posted in Python, Technology | Tagged , , , | 8 Comments

Google Chrome Frame

I maintain a web application that must support Internet Explorer 6, so I was delighted to hear about Google Chrome Frame.

Unfortunately their warnings of its alpha status seem to be well founded. The 2 show-stoppers I ran into fairly quickly were:

  1. Flash is difficult to install. (Tip: Try visiting: cf:http://www.adobe.com/software/flash/about/)
  2. The Location bar did not always change to reflect the current page.

I’ll definitely be following this project closely as it may allow corporate users stuck on IE to use a different browser relatively easily.

Posted in Open Source, Technology | Tagged , , , | 1 Comment

Great article on big government

Havoc Pennington has a great post entitled Which Piece of Big Government Are You Against?

War on Terror + DoD ~= 21.6% of our budget. I know we can’t cut it all, but a guy can dream can’t he? :-)

My favorite line is in regards to how different political parties approach budgetary spending:

Republicans and Democrats judged by actions not rhetoric: government should be 105% of whatever it just was. Disagreement on where the new 5% goes.

So true. Neither party has been fiscally conservative (or even responsible) for the past 30 years. The Republicans just spend more on bombs while the Dems spend it on evil socialist programs like welfare.

Emphasis added to denote sarcasm.

I’d much rather my tax dollars paid for some poor person’s laziness (as many conservatives see welfare) than have it spent bombing the same country into oblivion twice in a bit over a decade.

It is sad to see us throw away 9% of our federal budget on interest payments. Hopefully our economy will recover quickly enough for President Obama to start paying back the deficit like Clinton did in the 90s.

Posted in Personal | Tagged , | 2 Comments

Python Packaging Talk

I gave a talk at PDX Python last night on Python Packaging. It’s just an overview and introduction completely lacking in any practical examples.

Let me know if the ODP source is messed up. OpenOffice.org liked randomly losing background images and forgetting other formatting.

So as penance I quick hacked up a silly little command line utility and uploaded it to PyPI to serve as a simple packaging example:

It’d be nice to add some more advanced features like test running, including package data, and building C extensions. If you feel adventurous please fork it and send me a pull request on BitBucket.

Thanks to everyone who came to PDX Python last night! Especially Armin Ronacher who was able to clarify and elaborate on a number of different distutils/setuptools topics!

Update: Just spotted an excellent post on distutils and setuptools by Tarek Ziadé. Make sure to read his blog if you’re interested in packaging in Python.

Posted in Open Source, Python, Technology | Tagged , , , , , , , | Leave a comment

Switched tc-rest to webob

Small update on my toy tc-rest project: I switched to using WebOb for creating HTTP Request and Response objects. Cleaned up the code a bit, but a real dispatcher is what’s needed to really remove the cruft.

I’m anxious to extend the API and add features, but I have no clue when I’ll have time to touch it again. In the mean time I’ve pushed tc-rest to bitbucket.org if you want to take a look.

Posted in Open Source, Python, Technology | Tagged , , , , | 2 Comments

TokyoCabinet + fapws3 = tc-rest

Have you ever wondered how hard it would be to tack a RESTful HTTP interface on top of a fast key/value database like TokyoCabinet?

Probably not, but I did: tc-rest.tar.gz

Components:

  • TokyoCabinet – my favorite persistent key/value database
  • pytc – a wonderful Python wrapper for TC
  • fapws3 – a fast libev based HTTP/WSGI server
  • simpleson – (or Python >= 2.6) for encapsulating HTTP responses
  • okapi – a fantastic little static HTML page for testing HTTP APIs

Getting TokyoCabinet+pytc to work inside a virtualenv was a bit tricky, so check out my run.sh script if you’re having trouble getting it to start.

Once you get it started, load okapi in your browser:

http://localhost:8080/static/okapi.html

And then create a database by doing a POST like:

http://localhost:8080/foo/

And finally store/get keys and values using GET and POST requests like:

http://localhost:8080/foo/bar/
http://localhost:8080/foo/baz/

Doing a GET request to a database URL lists keys.

At any rate, I had big dreams for building a system where you would store JSON, specify indexes on certain keys, and the server would maintain those indexes for you by creating ad hoc TokyoCabinet databases.

Instead I ended up wasting most of my time learning how to write a low-level WSGI app. I should have just used CherryPy or Django from the beginning, but I had never written a pure WSGI app before. It was a good lesson even if it meant not getting some of my features implemented.

I’ll probably keep playing with this idea, but the next version will probably be based on some existing framework. Parsing environ['PATH_INFO'] and running start_response(...) manually gets old fast.

fapws3 is pretty neat, but had lots of annoying rough edges. I had to manually create a README file because its setup.py expects one to exist. Then I had to manually allow DELETE HTTP methods in fapws/base.py, otherwise it would return an HTML error message for me! That was a bit shocking since I was working under the assumption fapws3 is just a low-level HTTP/WSGI server.

Update

  1. If you’re new to TokyoCabinet, I posted my presentation on it that I did at Portland Python meetup.
  2. Someone want to benchmark this for me? Might be interesting since its made with the fastest libs available in Python for their respective tasks. I’m just feeling lazy at this point. :-)
Posted in Open Source, Python, Technology | Tagged , , , | Leave a comment