djc: thinking in writing

Single-source Python 2/3 doctests

Somewhere in 2009, I took over maintenance of CouchDB-Python from Christopher Lenz. While maintenance has slowed down over the years, since the core libraries work well and the CouchDB API has been quite stable, I still feel responsible for the project (I also still use it in a bunch of places). This being a Python project, it always felt like it would have to be ported to Python 3 sooner or later. Since it's working with a fairly deep HTTP API (as in, it uses a large subset of the protocol, with extensive hacking of httplib/http.client), the changes needed in string/bytes handling are quite involved.

My first serious attempt started in November of 2012, as evidenced from some old patches that I have lying around in mq repositories. I picked it back up again about a year later, until I had most of the tests passing, save for one specific category: the doctests. Specifically, the problem I had was with unicode literals (like u'str'). For Python 2.7 doctests, I needed the unicode annotation to pass the test. In Python 3, all strings are unicode; while unicode literals can be used in source code in Python 3.3 and later, the repr() of a string always lacks the unicode annotation. This resulted in lots of test failures like this:

FAIL: client (couchdb)
Doctest: couchdb.client
Traceback (most recent call last):
  File "/usr/lib/python3.3/", line 2154, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for couchdb.client
  File "./couchdb/", line 8, in client

File "./couchdb/", line 15, in couchdb.client
Failed example:
File "./couchdb/", line 17, in couchdb.client
Failed example:
    u'John Doe'
    'John Doe'

While these simple cases might have been easy to fix some other way (e.g. by printing the value instead of just asking for the representation), other cases would be significantly harder to fix that way. Here's one example:

File "./couchdb/", line 343, in couchdb.mapping.Document.items
Failed example:
    [('_id', 'foo-bar'), ('author', u'Joe'), ('title', u'Foo bar')]
    [('_id', 'foo-bar'), ('author', 'Joe'), ('title', 'Foo bar')]

After asking around on the Python 3 porting mailing list, Lennart Regebro (the author of the Porting to Python 3 book) kindly pointed me to the relevant section of his book, but it didn't contain any great suggestions for this particular problem. It took me a few months to get back into it, but I started looking into the doctest APIs yesterday, and managed to figure out a fairly clean solution:

class Py23DocChecker(doctest.OutputChecker):
  def check_output(self, want, got, optionflags):
    if sys.version_info[0] > 2:
      want = re.sub("u'(.*?)'", "'\\1'", want)
      want = re.sub('u"(.*?)"', '"\\1"', want)
    return doctest.OutputChecker.check_output(self, want, got, optionflags)

As it turns out, the doctest API is pretty well-designed, so it allows you to pass in your own OutputChecker object. As its name indicates, this is the bit of code that compares the actual output and the expected output of a given example. By slightly processing the expected value when running on Python 3, we can make sure that actual and expected output match on both versions. Use it like this:

doctest.DocTestSuite(mod, checker=Py23DocChecker())

Fixing these test failures has cleared the way (along with some other fixes) for a Python 3-compatible CouchDB-Python release soon. I hope this will enable other projects to start moving in the direction of 3.x; at the very least, it should significantly lower the barrier for my own projects to start using Python 3.

No Close Buttons

Or, How To Turn Your Firefox userChrome.css Hack Into a Neat Little Restart-Less Add-on, in Five Delightfully Simple Steps.

For as long as I can remember, I've had a small number of tweaks set in my Firefox profile's about:config. One of these was the browser.tabs.closeButtons pref, which I had set to 2. By default, Firefox shows a little close button on the right of every tab, but since I pretty much always use a keyboard shortcut to close tabs, these little buttons aren't that helpful, and they end up obscuring parts of their tab's titles. Setting the value to 2 removes all of the buttons.

In Firefox 31 (to be released to the general public in about 12 weeks), this preference has been removed, leading me to look into other ways of removing the buttons. A commenter on the bug noted the CSS required to remove the buttons again, saying that this could be added to the userChrome.css file in a profile. However, I don't really like that solution, since it would require me to port the fix to every computer I use, and it would be easy for me to lose it. Instead, I wanted to put it in an add-on, which would make it easy for me to install on other computers, in addition to being relatively easy to find. As an added benefit, others can benefit from the same add-on.

The result of this is the No Close Buttons add-on, which I put up on AMO yesterday. It was promptly reviewed by a friendly reviewer from the Dutch community, so that it can be installed without trouble. However, it's currently restricted to Firefox 31 and later, since I figured people on earlier versions wouldn't need it. Because it took me a while to piece together everything for what I thought should be a well-documented process (turning simple chrome CSS hacks into an add-on), I figured I'd document the process here.

First off, create a file called install.rdf:

<?xml version="1.0" encoding="utf-8"?>
<!-- This Source Code Form is subject to the terms of the Mozilla Public
   - License, v. 2.0. If a copy of the MPL was not distributed with this
   - file, You can obtain one at -->
<RDF xmlns="" xmlns:em="">
  <Description about="urn:mozilla:install-manifest">
    <!-- Firefox -->
    <!-- Front End MetaData -->
    <em:name>No Close Buttons</em:name>
    <em:description>Remove close buttons from tabs</em:description>
    <em:creator>Dirkjan Ochtman</em:creator>

Second, add a file named chrome.manifest, with a single line:

content              no-close-buttons        content/

Third, add some JavaScript code to register and unregister the stylesheet to a file named bootstrap.js:

/* This Source Code Form is subject to the terms of the Mozilla Public
 * License, v. 2.0. If a copy of the MPL was not distributed with this
 * file, You can obtain one at */

'use strict';

var sss = Components.classes[";1"].getService(Components.interfaces.nsIStyleSheetService);
var ios = Components.classes[";1"].getService(Components.interfaces.nsIIOService);
var uri = ios.newURI('chrome://no-close-buttons/content/style.css', null, null);

function startup(data, reason) {
    sss.loadAndRegisterSheet(uri, sss.USER_SHEET);

function shutdown(data, reason) {
    sss.unregisterSheet(uri, sss.USER_SHEET);

Fourth, add your CSS, to the file named in your bootstrap.js. It should go inside the content directory you've already referenced in the manifest.

.tab-close-button { display: none !important; }

Fifth, and finally, if you zip up the four resulting items (install.rdf, chrome.manifest, bootstrap.js and the content directory), rename the resulting file so that its extension is xpi, and drop it into the extensions directory in your Firefox profile, Firefox should prompt you to install your new add-on!

I cobbled together all these bits and pieces by looking at a small add-on by Benjamin Smedberg (who has a number of UI-related add-ons up on AMO), some code from Stack Overflow, and MDN pages on bootstrapped add-ons and chrome registration. I put the result up on GitHub, any feedback is most welcome.

Giving in to Git(Hub)

Published on 2014-03-30 by Dirkjan Ochtman in tech, code

Last month, I moved most of my code from Bitbucket to GitHub:

As a former Mercurial developer, this feels like an admission of defeat. Most of hg's user interface still seems superior to Git's, even if Git was quicker to get the branching model right. The Mercurial code base, in many ways, is a testament to how approachable a Python application can be, and the extension possibilities stemming from writing a few Python functions seem far more attractive than Git's apparent hodge-podge of C, shell and Perl. It's good that people at Mozilla and Facebook are starting to talk more about hg's advantages, though.

While I wanted to learn Git sooner, the lack of usability made me mostly avoid it until about 8 months ago, when I became a CouchDB committer and thus could no longer escape. Two months ago, I also got a new job where Git is the primary VCS, so I've been diving in. Obviously, it's a pretty great VCS, but some aspects of the (command-line) user interface are still baffling to me. This has been written about in plenty of places, so I won't go point-to-point here. And I'll have to admit that many commands are starting to be ingrained in muscle memory, to the point that I sometimes use Git-like commands in places where I use still hg.

However, now that I have basic usage down such that my lack of experience with Git is no longer a limiting factor, the network effect values from Git (and GitHub, specifically) outweigh my usability concerns. The GitHub UX feels more polished (and seems to receive more attention) than Bitbucket's, and makes me quite happy to use it. I also feel that the community on GitHub is quite a bit larger than on Bitbucket, which could make my projects more accessible (see also this account from Eli Bendersky). I've already gathered some stars (mostly for Persona-TOTP, so far) over the past six weeks; I hope that's just the start.

Changing your OS X Mavericks user icon without iPhoto

Published on 2013-10-24 by Dirkjan Ochtman in tech

I wanted to update the user icon/picture for my OS X user (which may include the iCloud/Apple ID picture as well), but it turned out to be harder than I thought. Here's to hoping this post may help others who run into the same problem. tl;dr: use iCloud's web app to upload the new picture for your own Contacts entry.

Update (2013-11-02): on Twitter, both Christopher Lenz and Justin Mayer pointed out that you can just drag and drop an image onto the System Preferences panel. I thought I'd tried that, but apparently not! Still, I wonder if that UI is sufficiently ingrained that discoverability is not important.

Update (2013-11-25): Hugh Hosman, via email, points out that you can also drop an image into /Library/User Images if you have super user privileges.

Like any person who values 0-day upgrades more than their system's stability, I recently upgraded to OS X Mavericks. Going into the Users & Groups preferences panel, double-clicking my current picture provided me with 6 possible options:

  • Defaults: a sample of pictures provided by Apple
  • Recents: contains the current picture, but no others
  • iCloud: is apparently connected to my iCloud Photo Stream
  • Faces: a selection based on the iCloud Photo Stream
  • Camera: take a new picture from my laptop camera
  • Linked: appears to have something to do with my Contacts

In other words, there was no way here to simply link in a JPEG. Apparently, the way to get pictures into the Photo Stream is either through an iOS device (probably through the Camera app) or via Apple's iPhoto or Aperture photo software, neither of which I own (though iPhoto is apparently free for everyone who buys a new machine from now on). I did some Googling, which yielded precisely zero useful results; apparently, using a JPEG was still supported under Mountain Lion, and no one had documented this problem yet. (One of the more promising venues appeared to be the Apple StackExchange site Ask Different.)

But, I figured it out:

  • Go to the web interface for iCloud
  • Go to the Contacts interface
  • Find your own Contacts entry
  • Click "Edit"
  • Click the picture
  • Click "Choose Photo..."

You can now upload the picture. Now, you should be able to go back to the Users & Groups panel and select the uploaded picture from the Linked list of pictures.

Not a great user experience, but at least it works.

A nanomsg presentation

Update (2013-10-17): slides and video are now available.

For Software Freedom Day 2013, which is on Wednesday, the 18th of September, I will give a presentation about nanomsg at the Centrum Wiskunde & Informatica (the Center for Mathematics & Computer Science) at the Amsterdam Science Park. If you're in the neighborhood and/or interested in nanomsg, come visit!

nanomsg: simple smart sockets

nanomsg is a socket library that provides several common communication patterns to help build distributed systems. It aims to make the networking layer fast, scalable, and easy to use. Implemented in C, it works on a wide range of operating systems with no further dependencies.

This talk will give a short history of the nanomsg project, an explanation of the value provided by nanomsg in building distributed systems, and a demonstration of some key features.

Are We Meeting Yet?

For a few months now, I've worked on a little single-file web thingy: Are We Meeting Yet? (AWMY for short). Here are two example URLs:

Gervase Markham kindly wrote about it on his blog after I recommended it for a Firefox development meeting, which made me think I should write about it here.

What it is

AWMY is a tool to communicate event (meeting) times to geographically dispersed and therefore timezone-challenged audiences. This means it displays date/time values in (a) an original timezone, (b) the UTC timezone and (c) the user's local timezone, with a title or description and a countdown timer.

Critically, it supports recurring meetings in a way that a single URL will show the next meeting in the series no matter when it's loaded into the browser. This makes it a good fit for use in automatically generated meeting announcements. Currently, the only supported repeating modes are weekly and bi-weekly.

One of the design goals is to have nice-looking URLs; ideally, you can understand the meeting date/time from the URL even without clicking the link. For now, hacking the URL is the only way to create a new event page; this should be easy in most cases. I hope to add a form to make it even easier sometime soon.

Timezone support is based on the venerable Olson timezone database. I've put some thought into handling events near daylight savings transitions and tried to put in some warnings, but it's probably not perfect yet. At least weekend events close to daylight savings transitions should be somewhat rare.

The domain name was chosen because it fits in with a Mozilla meme (e.g. fast, pretty, small, popular, flash and probably others); I couldn't come up with a better alternative that was also still available. This one will hopefully be memorable at least for some part of the intended audience.

How to use it

In the current iteration, the page accepts a maximum of 5 arguments:

  • A timezone: a subset of Olson timezones are accepted and can be referenced in a few different forms. Only the continent timezones are accepted (e.g. "America/Los_Angeles", "Europe/Amsterdam"), plus the "UTC" timezone. The continent is optional (and left out in the canonical versions). A space can be used where underscores are used in timezone names.
  • A date: an ISO 6801-formatted date, like "2013-08-26". A three-letter weekday abbreviation also works here (like "Mon"), but it will emit a warning if used without the weekly repeating mode.
  • A time: ISO 6801-formatted 24-hour time, like "15:30".
  • A repeating mode: currently "w" for weekly or "b" for bi-weekly.
  • A title: any text.

If no timezone is provided, it's assumed to be UTC. Some examples:


I got started based on some discussion on the mozilla-governance mailing list. Most Mozilla meetings are coordinated based on the timezone for the Mozilla HQ, in California. For many non-US participants, it's easier if meeting times are communicated in UTC, because they know their own UTC offset. However, this would change actual local meeting times based on daylight savings, which is a bit of a pain for recurring meetings. Therefore, it makes more sense to keep the reference meeting time in a timezone that has daylight savings, on the premise that most people live in zones that use mostly similar daylight savings schedules.

Some tools exist: for example, here's a link use for a Firefox developer meeting. Although has most of the information available from AWMY, it's provided in a much more cluttered fashion. Personally, I find it quite hard to visually parse that page to find the data I need. Of course, it does provide other useful features that AWMY does not currently offer.

I've also seen used for this kind of thing; here's an example. It does provide the user with a sense of context, which is probably useful when you want to see what meeting times make sense in timezones you care about. For the purpose of communicating a single meeting time, it feels rather unfocused.

The user experience for these tools doesn't work well for this use case, so I thought I might be able to do better. On top of that, the other tools don't appear to handle recurring meetings. Having a stable URL for a series of events is useful when you want to point to a meeting time from many different places, but having to update each pointer every week is kind of a drag. Thus was born AWMY.

Future plans

At the top of my to do list is a feature to combine event series. This is mostly inspired by CouchDB meetings, which take place at alternating 13:00 UTC and 19:00 UTC times to accomodate people in different timezones. My current implementation strategy is to have a "merge" flag that signals another meeting series, such that two bi-weekly events series can be joined together.

As mentioned before, friendlier UI to build new events is one of my other priorities. A few form elements could go a long way, though I probably want a slightly more polished experience. I'll also have to figure out how to make dealing with series easy, in particular when working with the merging feature.

It would make sense to add a few other repeating modes, in particular "3rd Wednesday of each month"-like functionality. Offering ICS downloads would be nice. I would like each page to show the next meeting instance, if only as an indication that you're dealing with a recurrent event.

Because there's no server side component, I really want to keep all state in the URLs. On the other hand, I also want readable URLs. These goals don't always align well, so balancing them is an interesting act. I'm thinking about a way to generate alternative URLs that aren't as readable, but significantly shorter.

Wrapping up

I hope this will be a useful tool for the open source community (and anybody else who has a use for it). I'd be interested to hear your thoughts on what features would be most useful to add. If you want to contribute some code, that would be even better; check it out via the Bitbucket project. All feedback is welcome!

A Persona interview

Published on 2013-07-26 by Dirkjan Ochtman in tech, mozilla

I have recently been contributing to Mozilla's Persona project, which is an awesome way to make authentication easier for sites and their users. They kindly published an interview with me, which I reproduce here in full for archival purposes.


Over the past year, Dirkjan Ochtman has been a consistent, constructive voice in the Persona community. His involvement has helped ensure that we stay true to Mozilla’s mission of open, transparent, and participatory innovation.

More impressively, Persona’s new backgroundColor feature is the direct result of Dirkjan’s efforts.

We hope this interview highlights his contributions and inspires others to get involved.

From the rest of us at Mozilla, thank you.

Who are you?

I’m Dirkjan Ochtman, a 30-year old software developer living in Amsterdam. I work for a financial startup by day; in my free time, I contribute to a bunch of open source projects, like Mercurial, Python, Gentoo Linux and Apache CouchDB. I also started a few things of my own.

Have you contributed to Mozilla projects in the past? How did you get involved in Persona?

I started using Firefox almost ten years ago, and I’d been watching Mozilla before that. The Mozilla mission of an open Internet resonates with me, so I tend to try and find stuff around the edges of the project where I can help. This year, I also became a Mozilla Rep.

I find BrowserID/Persona compelling because I hate having to register on different sites and make up passwords that fit (often inane) security requirements. And you just know that many sites store passwords insecurely, leaking sensitive information when they get hacked. Persona allows me to authenticate with my email address and a single password; no more guessing which username I used. I trust Mozilla’s password storage to be much more secure than the average Internet site, and because Persona is open source, I can verify that it is.

In addition to setting up Persona sign in on a small community site I run, I’ve also implemented my own Python-based Identity Provider. This means that when I use Persona, I control my own login experience. My Identity Provider uses Google Authenticator, so now I don’t have to remember any passwords at all.

The documentation for building an Identity Provider was scattered and incomplete, so I helped improve that. From that work, I got to know some of the great people who work on Identity at Mozilla.

What have you hacked on recently?

There has been a long-standing issue that the Persona dialog contained too much Mozilla branding and did not sufficiently emphasize the individual websites that users were signing into. There was an issue about this on Github, but I seem to remember complaints on the mailing list from even longer ago.

Of course, I prefer to use Persona over Facebook Connect or Twitter, so I decided to see if I could fix some of these issues. Luckily one of the Persona developers, Shane Tomlinson, was available to work on this at roughly the same time.

To improve the branding balance, we first de-emphasized the Persona branding. I focused on allowing websites to specify a background color for the Persona dialog. This is important because it can make the dialog feel much more “at home" on a site. We had to work out some tricks to ensure that text stayed readable regardless of the background color specified.

What was that experience like?

It was great. I had no previous experience with Node.js, but getting the application up and running was easy. I got basic backgroundColor support working in a few hours, but it took a few nights to tweak things and write tests. Fortunately, Shane is also based in Europe, so we could easily work together. When Shane showed our work on the mailing list, response from the other developers was very positive.

It would be really great if this helps drive Persona adoption amongst large websites.

Any plans for future contributions?

I’ll probably stay involved for the foreseeable future. Now that I know what I’m doing with the dialog, I would like to help out with further improvements to the login flow and website API. I’m also very interested in stabilization and/or standardization of the Identity Provider API.

Tracing a path

Published on 2012-11-28 by Dirkjan Ochtman in code

Two weeks ago, I posted a graphic showing a visualization of 2.5 years of my location data to my social media feeds. I wanted to jot down a few notes on how the plan to create this image came together.

A Google Latitude history visualization.

As I remember it, I saw similar locative art from some artist a few years ago. It was like the mapping work from Daniel Belasco Rogers, but I think it was done by a Dutch guy, with very sparse white maps with red lines, who had mapped several European cities (including Amsterdam). I spent a few fruitless hours last week trying to find the "originals" I remember; Mr. Rogers' is the closest analogue to the other guy's work I could find.

There's something about these maps that resonated with me: the patterns of a familiar city combined with the paved cow paths of a person's routine, seen from above, in an entirely different perspective. I soon decided I would like to build one of these from my own paths, but I didn't own a GPS device, and some idea of an "art project" certainly wasn't reason enough to buy one.

Fast forward through time, and we get smartphones with location sensors, Google's Latitude service, adding location history and a limited API (last 30 days of history only) soon after launching. The API was announced in May 2010; it might not be a coincidence that my Latitude history starts on 2010-05-20. Finally, Google's Data Liberation Front posted a short blog post last week announcing the availability of data dumps containing all Latitude data.

So after a few years of gathering data while waiting for devices and software to align, I could get to work. Drawing little dots onto an SVG canvas is actually very easy: the hard part was making up some heuristics to create a sensible bounding box. If the bounding box is too large, you get a large white space with a few clumps of dots in different places; if it's too small, you get a view of your home town with a whole lot of dots in it. I ended up implementing a slightly convoluted algorithm to measure the ratio of required surface versus the amount of points in it and taking a derivative from that line. I ended up with satisfactory results on my own data set, but I have no clue as to how robust the algorithm is.

Implementing an idea that's been kicking around for a few years is an (oddly?) satisfying experience. If you happen to be interested in this kind of thing, my code just needs a Python 2.7 environment. I'm not sure the resulting images would qualify as "art", but I'm happy with how this turned out.

Compilers on Coursera

Published on 2012-07-21 by Dirkjan Ochtman in learning

Two weeks ago, I completed the Compilers course on Coursera; it was a very worthwhile experience. From the amount of discussion about e-learning, it seems a pretty hot topic. So far, I haven't seen any posts about what it's like to actually participate in a Massive Open Online Course (MOOC), so I figured I'd write up some thoughts on my experience. It's pretty detailed, so:


Taking this course was an great way to learn more about compilers and fill a hole in my CS curriculum. Professor Alex Aiken is a great instructor and covers a good amount of material. I learned a lot about compiler construction despite having toyed with my own compiler before starting the course. The programming assignments were particularly tough, giving me useful experience in building compilers and a great sense of achievement. Coursera seems a nicely designed platform, and I'd like to try some other courses next year.

Joining up

I heard about the course via a Google plus post at the end of April. I'd been playing with writing my own compiler for a language I'm experimenting with, and I figured this would be a good way to learn a few things about what works in compiler design. For my own project, I had gotten started in Python, with a custom regex-based lexer, a Pratt parser and doing fairly basic code generation by writing out LLVM IR. At some point, the code generation code grew unwieldy and I split it up into a "flow" stage, doing what I now know to be called semantic analysis and turning the AST into a control flow graph, and an actual code generation stage to translate the CFG to IR.

There was no compilers class for my CS program in university, and I had substituted another programming class for the assembler programming class they offered, but one of Steve Yegge's rants had stuck with me. So I figured that, with my masters in CS and some experience writing a basic compiler "from scratch" I would be able to handle the course next to my full-time day job.

This compilers course is originally from Stanford, and supervised by professor Alex Aiken and some staff via Coursera. There is a similar class on Udacity, which I didn't find until after I'd already started at Coursera. The Coursera version is comprised of lectures, quizzes, proof assignments, mid-term and final tests and programming assignments, with the programming assignments being graded separately so that there are two levels of participation. This installment (I think it will run again in the future) ran from April 19, when the first lectures were available, to July 6, when the final test was due.


Lectures were posted each week. There was at least about 90 minutes of video per week, though some weeks it ran up to 160 minutes. It's divided up into pieces of about 5 to 25 minutes, which made the viewing significantly more manageable. I used HTML5 video inside the browser (which worked great even on 3G internet), but you can also download each video separately. For the in-browser viewer, there's an option to view at speeds from 0.5x to 2x, but it's hidden in the user preferences, so I didn't find it until after I was done with the course.

Each video starts with a short introduction where you can see the professor talking; after that, you see the slides, which the professor scribbles notes on as he goes along explaining the material. The slides can be downloaded separately as PDFs for later review, in two versions: the pristine version and the one with the notes scribbled on them by the professor.

I didn't like lectures much in university, but found that I actually liked watching these. The pacing is pretty good, although of course it's sometimes a little slow and sometimes a little fast, but the professor was engaging and the scribbling on the sheet makes it feel a little more interactive than your standard slides plus narration. It also helped me that the videos were generally pretty short, so you can watch one, do something else for a bit, then watch another one.

Quizzes & tests

There were 6 quizzes, spaced out in time throughout the course, each covering the material in the lectures posted since the last quiz. There were two deadlines on each quiz: the early deadline, something like a week after the quiz was posted, and the end of the course, for half credit. Each quiz could be taken as many times as you wanted, the highest score would count, and you got to see the correct answers as soon as you submitted the quiz.

The questions were pretty challenging. In the first few quizzes, I just clicked through and didn't end up getting very good scores. If I felt the score was too low, I'd take it again and see if I could do better after studying the correct answers for a bit, but it didn't always get much better. I was treating it like a small exam on what I'd learned from the last set of lectures.

At some point halfway through, I changed my strategy and started seeing the quizzes more as additional material. I took extra time, started noting things down on paper and really working through the problems, as many times as necessary to get a perfect score. I feel this was a much better way to do it, because I actually learned things by checking my answers.

I found some of the questions annoying because I felt they required very detailed reasoning through an example DFA or generating MIPS assembly code by hand. I don't mind building up assembly code for the programming assignment (that much -- see below) but I was really hoping more for questions that tested my understanding of the underlying theory rather than my reproduction skills of detailed algorithms laid out in the lectures. Of course, taking more time for the quizzes helped with that, too, but I'm still inclined to dislike those kinds of questions.

The tests were more or less like the quizzes, though slightly harder. I didn't do that well on the mid-term, where I got about 50%; I got 75% on the final, which was much more satisfying. There was some grumbling in the forums about some of the questions being ambiguous or even just wrong, but this wasn't a big issue for me.


There were 6 proof assignments, run via a small web app called DeduceIt. The web app is a little rough around the edges, and so these assignments could just get you extra credit. In all of these assignments, some part of the week's material was represented in a proof assignment, where we were suplied with a number of given statements, a goal statement and a few production rules. We had to apply the rules to the given statements in a particular number of steps to derive the goal statement.

This would all be fine if it wasn't for the representation used in the assignments. To require students to carefully apply the rules to given statements themselves, both rules and statements are given as LaTeX-like text expressions. These are hard to read and very tedious to reproduce, making the assignments more about understanding the representation than actually going through the proof.

On the other hand, the proof assignments are a good way to go through the algorithms that were covered in the lectures without having to dive into the details of actual code implementing the algorithm. I liked doing them for this reason, but felt the representation got in the way more and more as the assignments got more complicated towards the end of the course.

Programming assignments

The real meat of the course, for me, was in the programming assignments, implementing a compiler for the Classroom Object-Oriented Language (COOL). Each assignment corresponded to one of the essential compiler stages: a lexer, a parser, a semantic analyzer and a code generator. All assignments were available in C++ or Java flavors, with the first two being built around on the flex/JLex and bison/CUP tools respectively. It was also allowed to just use another language, with the caveat that it required reimplementing some of the support code that was made available.

I decided to go with the Java variants of the first two assignments, on the premise that it would be educational to learn the usage of tools like JLex and CUP rather than building my own lexer and parser like I'd done for my own compiler. Getting started with JLex was fairly frustrating, so I almost dumped it in favor of doing it in Python, but once I got the hang of it it turned out okay.

(I've since been wondering whether students could learn more if they implemented a lexer or parser from scratch versus using something like flex or bison. At least from my own experience, writing a lexer/parser for a realistic language is not that hard. Of course flex results in a highly optimized lexer, but writing up a lexing algorithm from scratch would seem more instructive. On the other hand, perhaps using the generator tool allows you to focus more on the characteristics of the grammar you're developing, instead of it potentially getting lost in procedural code.)

The reference compiler is implemented in the same four stages, with a simple text format used to communicate the relevant data through UNIX pipes: the lexer consumes program code and spits out a simple line-based list of nodes, which the parser consumes and turns into an indentation-based AST representation, the semantic analyzer augments the AST with more typing information and finally the code generator outputs MIPS assembler code.

Grading was done via Perl scripts that fuzzily compare the output of your code to that of the reference component, running over a list of sample programs (both valid and invalid) to score your program. This worked quite nicely and made it straightforward to find the remaining issues. In fact, the main way I wrote my code was by starting from a small program, trying to generate some reasonable output, then use diff -u to compare it to the output from the reference program and see where my code failed. I often find that it's easier to stay motivated when generating small successes with some regularity, and this way of working made it harder to get stuck. On the other hand, it might have been better to do a little more design up front, forcing me to think through issues rather than hacking away at problems that come up.

I ended up doing the last two assignments in Python. I wrote up a small parser for the textual representation of the AST used and some infrastructure to easily write passes over the AST, then got started with the actual assignment. This worked great; Python is the language I use most, and not having to think so much about typing or memory management made it easier to get the assignments done. In my opinion, Python is a much better language for most teaching purposes than either Java or C++, because it enables more focus on the actual algorithm (versus the "details" of programming). Apparently the Udacity compilers course uses Python, so I might have gone there if I'd known about it before starting with Coursera.

The programming assignments took a lot of time. With my full-time job, I found it pretty hard to find 10-25 hours per assignment of focused time reading documentation, writing code and testing against sample programs. However, these assignments were also the most instructive and rewarding for me, so I wouldn't want to have missed them, and having some deadlines also helped with not procrastinating so much.

As a result, I ended up giving up on the last 20% of the code generation assignment. Assembly is pretty verbose and hard to read, and working with MIPS assembly certainly made me appreciate LLVM IR more (even if x86 assembly is uglier still). Debugging assembly is also pretty painful, so I got a little frustrated as the deadline was getting closer. The simulator used to run MIPS, SPIM, was also a little limited and buggy in places, which certainly didn't make things easier. In short, I'll be happy to return to LLVM and its suite of tools.


Part of why this course worked for me was definitely the community. The deadlines can be harsh if you're not a full-time student, but being able to get some help from staff, confer about particularly hairy parts of the semantic analyzer or get some extra explanation about a quizz question makes it much more fun to do. It should also make it less likely you hit a wall and have to give up on the course and give some extra incentive to finish the whole course. This made the forums an integral part of the experience for me.

Overall experience

If you similarly lack any education in compilers and find that Steve Yegge's aforementioned rant has you inspired, Coursera's course is a great way to learn a lot about compilers. I'm sure the Udacity course is nice, too, but the Coursera environment seems more attractive. I also found Coursera's relation to renowned universities appealing, though that might just be good marketing. In any case, I look forward to taking another course through their site.


Published on 2012-06-16 by Dirkjan Ochtman in meta

I used to have a weblog. I took the posts offline 4 years ago, when I realized that some posts weren't fit for the public internet. Since then, I've used Twitter to vent random thoughts and links, but over the past year I've started to miss having an outlet for longer articles (and practicing my writing).

Iteration 2 will be mostly about techology; programming languages, version control systems, web technology and software engineering at large are just some of the topics I like to think about. If that seems a broad selection, I would agree: there are too many things I want to do. We'll see what's on here in a year or so.

So, consider this a blog reboot. I have a few topics lined up, but it's going to take some time to think through turning them into something presentable.