Dirkjan Ochtman: writing

Patreon update #2

A little over 2 months ago I started a Patreon page to see if I could gain some financial support for my open source work (with the goal of doing more such work). I posted an update on Patreon after 10 days to talk about what I had been up to, so I figured it was time for an update. Instead of posting it to Patreon, I'm posting these updates to my blog from now on; this gives me full control over the posts and will hopefully boost the update frequency of the blog a little bit.

Askama

Askama is the type-safe Jinja-like Rust template engine I created. Askama 0.7.1 was released in on July 23rd with the following improvements:

Since that release, botika has contributed a few fixes around nested macro scopes (possibly a regression from 0.6), which I will release soon.

Quinn

Although Quinn (my nascent QUIC implementation in Rust) saw little progress in code over the past two months, I still have good hopes for the future. I talked to Benjamin Saunders about merging his quicr implementation with Quinn, and it looks like we will move forward with that. For now we'll likely keep the Quinn name, but start from Benjamin's code base. Since building and maintaining a QUIC implementation is a lot of work it probably doesn't make much sense to have two, and I think merging these projects is for the best.

Gentoo

  • Updated the Rust ebuilds to 1.27.0 and 1.28.0. As discussed in the previous update, these versions now allow installing the cargo, rustfmt and rls components as built by the Rust build system (or the binaries in the case of rust-bin). As upstream makes the distribution more monolithic, this will make it easier to get updates into the Gentoo repository.
  • Introduced a new virtual/cargo ebuild to abstract over the cargo builds installed by dev-lang/rust, dev-lang/rust-bin, and dev-util/cargo.
  • Updated the CouchDB ebuild to 1.7.2 after a vulnerability was reported in 1.7.1. This was the last CouchDB release in the 1.x range, and since another vulnerability was disclosed, 1.7.2 is known to be vulnerable. I have been dissatisfied with the direction of the project, so I've finally removed myself as a Gentoo CouchDB maintainer. No one has stepped up, so I'll start the process for having CouchDB removed from the Gentoo repository soon.
  • Updated the ripgrep ebuild to 0.9.

abna

In June, I released a tiny Python library called abna that will log into the Dutch ABN Amro bank's web interface and download your transactions for you. I forgot to mention it in the previous update, so will expand on it a little.

It's been a long-standing annoyance for me that automating this process was impossible. For a long time, the web interface used a hardware token which relies on debit card access and punching in your PIN code. However, in recent years they've added a five-digit so-called soft token that enables limited access to the account, including seeing past account mutations. In June, I decided to reverse engineer their login process and figured out the code to support it (spoiler alert: it involves some RSA encryption).

In July, there was a nice PR from Ivan Vasić to allow downloading the mutations for another account than the login account, which I promptly merged and released as 0.2 after improving the packaging situation a little more.

Assorted other work

Many thanks to my patrons; I hope this is worthy of your support.

Rust in 2018

Published on 2018-01-14 by Dirkjan Ochtman in tech, code, rust

In a call for blog posts, the Rust community team asked community members to write up their vision for what the Rust community should focus on this year. I've wanted to contribute my thoughts and have been thinking about what to write ever since. I've been able to benefit from the many people who already posted their thoughts to sharpen my own thinking. I came up with 5 categories:

  1. Unused inventory
  2. Meta-ergonomics
  3. Deep docs
  4. Web development
  5. Paper cuts

In rough priority order, these range from high level community and product thinking to technical things that I'd like to see. Let's dive in.

Unused inventory

Avery Pennarun recently wrote an essay called An epic treatise on scheduling, bug tracking and triage; it's long, but if you're interested in software engineering process, I think it'll be worth your time. One of the points that really stuck with me is where he talks about unreleased software as inventory.

Avery discusses how Kanban was invented in the Japanese car industry, where minimizing inventory was one of the driving factors. As Avery explains, this very much applies to shipping software: unreleased code means you have spent the time to design and implement the feature, but now you have to pay for maintenance of this code (in terms of complexity in implementing other bugs or features) without realizing the value of shipping the actual feature to your customers. And that's not even discussing the opportunity costs of how you could have spent the time spent on unreleased feature A to ship released feature B sooner, increasing the value brought to your customers.

If you doubt the analogy, please go and read Avery's essay, since I cannot do it justice here. My point is this: rustc has a lot of unused inventory.

So I found myself agreeing with Nick Cameron's Rust 2018 post, where he describes his wish for 2018 to be "boring", because we should just finish up what we've got in the pipeline instead of starting new things. Pascal Hertleif's take on Rust 2018 is even stronger, calling for consolidation and rapidly reducing the number of in-flight unstable features. Currently there are 113 in the language and 155 in the library; see also my analysis of open library tracking issues.

If you think about that in terms of unused inventory: how much time has the community spent on designing and implementing these features? How much value, in return, has made it into the stable Rust compiler that most people use? How much time has been spent on maintaining these features while we were shipping other things? How many of the commits going into the master branch during any given release cycle actually affect stable Rust users' life six weeks later? When is an unstable feature actually used so much that it has effectively prematurely stabilized? For example, if there is agreement that the design can be improved, are we still willing to break the codegen features Rocket uses?

One challenge here is that Rust is an open source community, not a hierarchically-driven top-down organization where the leaders can just tell people what to do. Still, if the community can come together and agree on priorities, we can focus more on shipping features in stable Rust. We could adopt a rule that features cannot stay in nightly if no progress on stabilization is being made for more than 4 cycles. Or agree as a community that no more than 25 language and 25 library features can be in-flight at any time.

My other conclusion is that stabilizing features is a bottleneck in our factory for shipping Rust. So while we work on reducing inventory, we should at the same time try to increase the capacity of bottlenecks in the stabilization process.

Meta-ergonomics

I've been thinking about metacognition recently as a nice word for a useful concept. Similar to how metacognition means the awareness and analysis of one's own learning or thinking processes, and playing off of the 2017 year theme of ergonomics, maybe 2018 should be the year of going meta: meta-ergonomics, where we focus on the process of improving language ergonomics.

In order to scale the number of people that can help design, implement and stabilize new features and fix bugs, how can we best connect the high-level goals to the lower-level implementation process? Can we highlight the current blockers and accessible "good first bugs" where mentoring is available?

I subscribed to some issues in the GitHub roadmap issue tracker last year, but did not find myself deriving a lot of value over the course of the year. There were good write-ups of what needed to get done, but few updates over time and little connection to the ongoing implementation effort. The best way I've found to keep track of the non-lexical lifetime effort, for example, was just to check out links from This Week in Rust, which mostly talked about stuff that was completed rather than upcoming opportunities for contribution.

One important part has already been kicked off by Niko Matsakis: the Rust Compiler Book will hopefully become an important resource this year for helping people hack on the compiler. When I tried my hands at one small language feature this year, documentation like that was sorely lacking.

Deep docs

As with missing documentation for the compiler, the other area where Rust can improve next year is documentation. While The Rust Programming Language is a great resource for people just starting with Rust, I ran into a number of cases where it didn't fulfill my needs and I had to ask around for help.

The community is wonderful in providing such help, but at the same time it felt frustrating when there was no documentation to better describe the language's syntax and semantics: what edge cases are allowed or not, what parts remain unimplemented, why certain restrictions are there and what relevant RFCs are in the pipeline. I'm not sure whether this is intermediate-level documentation or reference documentation, or maybe those really are the same thing.

Web development

The big ticket item here is WebAssembly. I believe that WebAssembly is about to take off in a big way, particularly once it gets access to DOM APIs. My personal end goal here is to be able to write web apps where both the backend and the frontend are written in Rust. Ideally, the UI would leverage functional reactive programming (see Yew), so that the app's state lives on the client and the server just ships state updates as required. (Elm, but in a rustic way.) Lots of progress was made with WASM support in 2017, but making that really polished would make Rust a top contender for greenfield WASM projects (that is, not using legacy C/C++), which seems like an important use case going forward.

Even if you just want to write Rust on the backend, the infrastructure is still maturing. Rocket has many cool ideas, but only works on nightly and doesn't yet support asynchronous programming. Gotham seems to be the next viable option, on stable and with support for futures, but it seems to be in the very early stages (starting with the documentation). For simpler API services, it looks like hyper (hopefully soon with built-in HTTP 2 support) and serde work well together, although something with more polish and less boilerplate would be nice.

So far Askama, my take on templating for Rust, hasn't taken off in a big way (although I've been very happy with contributions!). I'm not sure why exactly; I'll keep iterating as I haven't seen any competition that makes more sense to me. I would like to explore how it can fit in with functional reactive programming -- if that ends up working out, it may also draw more people in.

In general, it doesn't feel like Rust is web yet, so I hope this will improve in 2018.

Paper cuts

And then there's a list of things I've ran into over time that I'd like to see some traction on somehow. Note that I do pretty much all of my development on stable Rust, and that contributes to some of these problems.

  • When I tried to expose a parser result for imap-proto, I was surprised to find out you cannot access enum variants through type aliases. I wrote up a pre-RFC to help fix that; stabilizing that in 2018 seems like a challenge.
  • In one case I wanted to use Vec::resize_default(), which has been waiting for stabilization for about 8 months now without any signs of progress.
  • Installing clippy or rustfmt (as a stable user) and keeping them running takes hard work and troubleshooting time. Updating these tools means I have to update nightly, and the other way around. On the latest update, rustup warned me I should delete the separate installations to allow it to install the rustup-managed ones, but that actually didn't work.
  • Writing a Tokio network protocol client (tokio-imap) took a lot of time, since the documentation focused much more on writing servers at the time. It feels like Tokio made little progress in 2017; I hope 2018 will be better.
  • The ergonomics around futures are not where they should be. I hope impl Trait and async/await can make enough progress in 2018 to make this better.

Conclusion

I deeply believe in Rust; I've been trying to articulate that in another blog post (still in progress). I hope 2018 will be another great year for Rust, and I am eager to participate more in the Rust community over the coming year.

Rust is a big tent

Published on 2017-05-19 by Dirkjan Ochtman in tech, code, rust

Cue images of fireflower-decorated tents.

Over the last year, Rust has replaced Python as my favorite programming language. Recently, the Rust community celebrated the second birthday of Rust 1.0, and the birthday blog post mentioned that 438 people contributed for the first time to the compiler and standard library this year.

This led me to wonder how this compares to other modern programming languages. Specifically, I was wondering about Go and Swift, languages that are of similar vintage and that compete in the same space of compiled, statically-typed languages with a performance focus.

One of the concerns I have seen from people about Rust's viability is the size of its community -- can Mozilla and volunteer contributors evolve Rust at a fast enough pace? And how does that pace compare to these other languages, both of which are backed by large corporations?

So I pulled Git repositories for each of these three projects and graphed the number of non-merge commits in each of the repositories from the first day:

Commits over time

Then, I also graphed the cumulative number of unique authors:

Commits over time

Code and data can be found on GitHub. I manually culled the first 4 commits from the Go repository, which reported being from 1972, 1974 and 1988; the first commit that I kept is from March 2, 2008. Swift started on July 17, 2010 and Rust on June 16 of the same year, surprisingly close to each other.

Clearly, this is a crude analysis (even ignoring the Excel charts). Repositories may not all contain the same depth of components (like compiler, standard library, documentation), and commit sizes could substantially differ due to differing project cultures. Still, two broad patterns are apparent:

  • Rust has way more unique contributors than the other languages
  • Rust gets many more commits than Go, but Swift is moving faster

As a result, I'm confirmed in my optimism about Rust's future.

My first FOSDEM

Published on 2014-08-30 by Dirkjan Ochtman in tech, mozilla

This year, I attended the Free Open Source Developers European Meeting for the first time. For those who don't know, FOSDEM is a big event (5000 people) held every year in Brussels, organized by the community and free to come to (you don't even need to register!). Mozilla happily sponsored my travel and hotel for the 2014 edition, so I wanted to write a few things about my experience. Unfortunately, that took some time; planning for next year has already started!

The schedule was very diverse, and had lots of stuff I found interesting. On Saturday, I mostly hung around the Mozilla devroom, where I was assigned to help out with speaker assistance. It turned out that the extra effort for that wasn't needed most of the time, although I helped out a few times guarding the doors at the start of particularly popular talks (mostly the JavaScript ones). In between, I got to focus on the talks, some of which I really liked.

A moderately full Mozilla devroom

The talk on Firefox for Android provided some nice background about what the Fennec team had been up to, presented in an engaging way. The talk on Persona got quite a bit of interest, though I thought the speaker wasn't great. I helped answering some questions on Persona, and engaged with some of the more interested people after the talk, which was nice. Servo is Mozilla's research browser engine, and it's actually being written in Rust, a very interesting systems language being developed in tandem with Servo. The talk about it was one of the most interesting ones to me, even though I'd heard some of the content before at the Mozilla Summit. The Q&A after the talk was also quite interesting, continuing with a small circle of people outside the room for quite a bit.

Josh Matthews telling us about Servo

At the end of the day, I went to the alpha announcement for Mailpile, which I'd previously seen mentioned on Hacker News. I thought the presentation was great, and the room felt very enthusiastic to me. I even cloned the Mailpile repo during the talk to see if I could get it running; unfortunately there were some issues getting a profile set up. I later tried the second alpha, which exhibited similar failures in the setup processes, so it's been a little disappointing so far. However, I filed an issue and hope things will be better in the beta.

After that, I attended a Gentoo BoF. I've been a Gentoo developer for a few years, but hadn't really met anyone in person before. Since most of my fellow Gentoo devs already know each other or had been hanging around the Distributions devroom all day, it was a bit weird to get in at the end of the day. Fortunately there was a nice round of introductions, so at least I have met some people now. On the other hand, I had to skip the Gentoo dinner for a Mozilla party.

The party was great, of course; it's always nice to get to know Mozillians. Afterwards, four of us went to get a beer on the way back to the hotel, where we had some good discussions about Mozilla's strategy and future. In the end, I always like these conversations in a smaller group the best.

On Sunday, there was no longer any room available to Mozilla and the booth we had was already pretty well-staffed, so I felt free to visit some other talks I wanted to see. I started off early trying to get into the clang talk in the LLVM devroom, but couldn't get in, even at 9 AM! Many of the devroom were full for most of Sunday, which was a pity. On the other hand, being at the door of the Go devroom really early got me a nice seat for the Camlistore talk by Brad Fitzpatrick. Camlistore is an impressive project, and I actually tried to get it running the same day, but the lack of end-user documentation makes it hard to get started (and I'm not a fan of Go) or how to use it fruitfully.

Early in the afternoon we had a small CouchDB community meetup. Benoit Chesneau and I and a few CouchDB fans got together to discuss some things that were going on. I particularly liked talking to our users to hear what problems they were trying to solve with CouchDB. After this I went to the big keysigning session and exchanged verifications with some 80 other hackers, so that my GPG key should be pretty well-connected by now (at least in the FOSS ecosystem).

The final talks I saw were the Python CFFI talk, which was nice but not that interesting since I already had some experience thanks to nnpy (a nanomsg binding), and the satirical NSA keynote from phk, which was fun.

In the end, I had a really good time at FOSDEM. It's a very large event, so I was happy to be able to hang out with the great Mozillians in the devroom and near the booth. Thanks to the Reps program for sponsoring my going there!

Single-source Python 2/3 doctests

Somewhere in 2009, I took over maintenance of CouchDB-Python from Christopher Lenz. While maintenance has slowed down over the years, since the core libraries work well and the CouchDB API has been quite stable, I still feel responsible for the project (I also still use it in a bunch of places). This being a Python project, it always felt like it would have to be ported to Python 3 sooner or later. Since it's working with a fairly deep HTTP API (as in, it uses a large subset of the protocol, with extensive hacking of httplib/http.client), the changes needed in string/bytes handling are quite involved.

My first serious attempt started in November of 2012, as evidenced from some old patches that I have lying around in mq repositories. I picked it back up again about a year later, until I had most of the tests passing, save for one specific category: the doctests. Specifically, the problem I had was with unicode literals (like u'str'). For Python 2.7 doctests, I needed the unicode annotation to pass the test. In Python 3, all strings are unicode; while unicode literals can be used in source code in Python 3.3 and later, the repr() of a string always lacks the unicode annotation. This resulted in lots of test failures like this:

======================================================================
FAIL: client (couchdb)
Doctest: couchdb.client
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.3/doctest.py", line 2154, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for couchdb.client
  File "./couchdb/client.py", line 8, in client

----------------------------------------------------------------------
File "./couchdb/client.py", line 15, in couchdb.client
Failed example:
    doc['type']
Expected:
    u'Person'
Got:
    'Person'
----------------------------------------------------------------------
File "./couchdb/client.py", line 17, in couchdb.client
Failed example:
    doc['name']
Expected:
    u'John Doe'
Got:
    'John Doe'

While these simple cases might have been easy to fix some other way (e.g. by printing the value instead of just asking for the representation), other cases would be significantly harder to fix that way. Here's one example:

----------------------------------------------------------------------
File "./couchdb/mapping.py", line 343, in couchdb.mapping.Document.items
Failed example:
    sorted(post.items())
Expected:
    [('_id', 'foo-bar'), ('author', u'Joe'), ('title', u'Foo bar')]
Got:
    [('_id', 'foo-bar'), ('author', 'Joe'), ('title', 'Foo bar')]

After asking around on the Python 3 porting mailing list, Lennart Regebro (the author of the Porting to Python 3 book) kindly pointed me to the relevant section of his book, but it didn't contain any great suggestions for this particular problem. It took me a few months to get back into it, but I started looking into the doctest APIs yesterday, and managed to figure out a fairly clean solution:

class Py23DocChecker(doctest.OutputChecker):
  def check_output(self, want, got, optionflags):
    if sys.version_info[0] > 2:
      want = re.sub("u'(.*?)'", "'\\1'", want)
      want = re.sub('u"(.*?)"', '"\\1"', want)
    return doctest.OutputChecker.check_output(self, want, got, optionflags)

As it turns out, the doctest API is pretty well-designed, so it allows you to pass in your own OutputChecker object. As its name indicates, this is the bit of code that compares the actual output and the expected output of a given example. By slightly processing the expected value when running on Python 3, we can make sure that actual and expected output match on both versions. Use it like this:

doctest.DocTestSuite(mod, checker=Py23DocChecker())

Fixing these test failures has cleared the way (along with some other fixes) for a Python 3-compatible CouchDB-Python release soon. I hope this will enable other projects to start moving in the direction of 3.x; at the very least, it should significantly lower the barrier for my own projects to start using Python 3.