Dirkjan Ochtman: writing

Single-source Python 2/3 doctests

Somewhere in 2009, I took over maintenance of CouchDB-Python from Christopher Lenz. While maintenance has slowed down over the years, since the core libraries work well and the CouchDB API has been quite stable, I still feel responsible for the project (I also still use it in a bunch of places). This being a Python project, it always felt like it would have to be ported to Python 3 sooner or later. Since it's working with a fairly deep HTTP API (as in, it uses a large subset of the protocol, with extensive hacking of httplib/http.client), the changes needed in string/bytes handling are quite involved.

My first serious attempt started in November of 2012, as evidenced from some old patches that I have lying around in mq repositories. I picked it back up again about a year later, until I had most of the tests passing, save for one specific category: the doctests. Specifically, the problem I had was with unicode literals (like u'str'). For Python 2.7 doctests, I needed the unicode annotation to pass the test. In Python 3, all strings are unicode; while unicode literals can be used in source code in Python 3.3 and later, the repr() of a string always lacks the unicode annotation. This resulted in lots of test failures like this:

======================================================================
FAIL: client (couchdb)
Doctest: couchdb.client
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.3/doctest.py", line 2154, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for couchdb.client
  File "./couchdb/client.py", line 8, in client

----------------------------------------------------------------------
File "./couchdb/client.py", line 15, in couchdb.client
Failed example:
    doc['type']
Expected:
    u'Person'
Got:
    'Person'
----------------------------------------------------------------------
File "./couchdb/client.py", line 17, in couchdb.client
Failed example:
    doc['name']
Expected:
    u'John Doe'
Got:
    'John Doe'

While these simple cases might have been easy to fix some other way (e.g. by printing the value instead of just asking for the representation), other cases would be significantly harder to fix that way. Here's one example:

----------------------------------------------------------------------
File "./couchdb/mapping.py", line 343, in couchdb.mapping.Document.items
Failed example:
    sorted(post.items())
Expected:
    [('_id', 'foo-bar'), ('author', u'Joe'), ('title', u'Foo bar')]
Got:
    [('_id', 'foo-bar'), ('author', 'Joe'), ('title', 'Foo bar')]

After asking around on the Python 3 porting mailing list, Lennart Regebro (the author of the Porting to Python 3 book) kindly pointed me to the relevant section of his book, but it didn't contain any great suggestions for this particular problem. It took me a few months to get back into it, but I started looking into the doctest APIs yesterday, and managed to figure out a fairly clean solution:

class Py23DocChecker(doctest.OutputChecker):
  def check_output(self, want, got, optionflags):
    if sys.version_info[0] > 2:
      want = re.sub("u'(.*?)'", "'\\1'", want)
      want = re.sub('u"(.*?)"', '"\\1"', want)
    return doctest.OutputChecker.check_output(self, want, got, optionflags)

As it turns out, the doctest API is pretty well-designed, so it allows you to pass in your own OutputChecker object. As its name indicates, this is the bit of code that compares the actual output and the expected output of a given example. By slightly processing the expected value when running on Python 3, we can make sure that actual and expected output match on both versions. Use it like this:

doctest.DocTestSuite(mod, checker=Py23DocChecker())

Fixing these test failures has cleared the way (along with some other fixes) for a Python 3-compatible CouchDB-Python release soon. I hope this will enable other projects to start moving in the direction of 3.x; at the very least, it should significantly lower the barrier for my own projects to start using Python 3.