Two weeks ago, I posted a graphic showing a visualization of 2.5 years of my
location data to my social media feeds. I wanted to jot down a few notes on
how the plan to create this image came together.
As I remember it, I saw similar locative art from some artist a few years
ago. It was like the mapping work from Daniel Belasco Rogers, but I
think it was done by a Dutch guy, with very sparse white maps with red lines,
who had mapped several European cities (including Amsterdam). I spent a few
fruitless hours last week trying to find the "originals" I remember; Mr.
Rogers' is the closest analogue to the other guy's work I could find.
There's something about these maps that resonated with me: the patterns of
a familiar city combined with the paved cow paths of a person's routine,
seen from above, in an entirely different perspective. I soon decided I would
like to build one of these from my own paths, but I didn't own a GPS device,
and some idea of an "art project" certainly wasn't reason enough to buy one.
Fast forward through time, and we get smartphones with location sensors,
Google's Latitude service, adding location history and a limited API (last
30 days of history only) soon after launching. The API was announced in May
2010; it might not be a coincidence that my Latitude history starts on
2010-05-20. Finally, Google's Data Liberation Front posted a short blog
post last week announcing the availability of data dumps containing all
Latitude data.
So after a few years of gathering data while waiting for devices and software
to align, I could get to work. Drawing little dots onto an SVG canvas is
actually very easy: the hard part was making up some heuristics to create a
sensible bounding box. If the bounding box is too large, you get a large white
space with a few clumps of dots in different places; if it's too small, you
get a view of your home town with a whole lot of dots in it. I ended up
implementing a slightly convoluted algorithm to measure the ratio of required
surface versus the amount of points in it and taking a derivative from that
line. I ended up with satisfactory results on my own data set, but I have no
clue as to how robust the algorithm is.
Implementing an idea that's been kicking around for a few years is an (oddly?)
satisfying experience. If you happen to be interested in this kind of thing,
my code just needs a Python 2.7 environment. I'm not sure the resulting
images would qualify as "art", but I'm happy with how this turned out.
Two weeks ago, I completed the Compilers course on Coursera; it was a very
worthwhile experience. From the amount of discussion about e-learning, it seems
a pretty hot topic. So far, I haven't seen any posts about what it's like to
actually participate in a Massive Open Online Course (MOOC), so I figured I'd
write up some thoughts on my experience. It's pretty detailed, so:
TL;DR
Taking this course was an great way to learn more about compilers and fill a
hole in my CS curriculum. Professor Alex Aiken is a great instructor and
covers a good amount of material. I learned a lot about compiler construction
despite having toyed with my own compiler before starting the course. The
programming assignments were particularly tough, giving me useful experience
in building compilers and a great sense of achievement. Coursera seems a
nicely designed platform, and I'd like to try some other courses next year.
Joining up
I heard about the course via a Google plus post at the end of April. I'd
been playing with writing my own compiler for a language I'm experimenting
with, and I figured this would be a good way to learn a few things about what
works in compiler design. For my own project, I had gotten started in Python,
with a custom regex-based lexer, a Pratt parser and doing fairly basic code
generation by writing out LLVM IR. At some point, the code generation code
grew unwieldy and I split it up into a "flow" stage, doing what I now know to
be called semantic analysis and turning the AST into a control flow graph, and
an actual code generation stage to translate the CFG to IR.
There was no compilers class for my CS program in university, and I had
substituted another programming class for the assembler programming class they
offered, but one of Steve Yegge's rants had stuck with me. So I figured
that, with my masters in CS and some experience writing a basic compiler "from
scratch" I would be able to handle the course next to my full-time day job.
This compilers course is originally from Stanford, and supervised by professor
Alex Aiken and some staff via Coursera. There is a similar class on Udacity,
which I didn't find until after I'd already started at Coursera. The Coursera
version is comprised of lectures, quizzes, proof assignments, mid-term and
final tests and programming assignments, with the programming assignments being
graded separately so that there are two levels of participation. This
installment (I think it will run again in the future) ran from April 19,
when the first lectures were available, to July 6, when the final test was due.
Lectures
Lectures were posted each week. There was at least about 90 minutes of video
per week, though some weeks it ran up to 160 minutes. It's divided up into
pieces of about 5 to 25 minutes, which made the viewing significantly more
manageable. I used HTML5 video inside the browser (which worked great even on
3G internet), but you can also download each video separately. For the
in-browser viewer, there's an option to view at speeds from 0.5x to 2x, but
it's hidden in the user preferences, so I didn't find it until after I was
done with the course.
Each video starts with a short introduction where you can see the professor
talking; after that, you see the slides, which the professor scribbles notes
on as he goes along explaining the material. The slides can be downloaded
separately as PDFs for later review, in two versions: the pristine version and
the one with the notes scribbled on them by the professor.
I didn't like lectures much in university, but found that I actually liked
watching these. The pacing is pretty good, although of course it's sometimes
a little slow and sometimes a little fast, but the professor was engaging and
the scribbling on the sheet makes it feel a little more interactive than your
standard slides plus narration. It also helped me that the videos were generally
pretty short, so you can watch one, do something else for a bit, then watch
another one.
Quizzes & tests
There were 6 quizzes, spaced out in time throughout the course, each covering
the material in the lectures posted since the last quiz. There were two
deadlines on each quiz: the early deadline, something like a week after the
quiz was posted, and the end of the course, for half credit. Each quiz could
be taken as many times as you wanted, the highest score would count, and you
got to see the correct answers as soon as you submitted the quiz.
The questions were pretty challenging. In the first few quizzes, I just
clicked through and didn't end up getting very good scores. If I felt the score
was too low, I'd take it again and see if I could do better after studying
the correct answers for a bit, but it didn't always get much better. I was
treating it like a small exam on what I'd learned from the last set of
lectures.
At some point halfway through, I changed my strategy and started seeing the
quizzes more as additional material. I took extra time, started noting things
down on paper and really working through the problems, as many times as
necessary to get a perfect score. I feel this was a much better way to do it,
because I actually learned things by checking my answers.
I found some of the questions annoying because I felt they required very
detailed reasoning through an example DFA or generating MIPS assembly code by
hand. I don't mind building up assembly code for the programming assignment
(that much -- see below) but I was really hoping more for questions that
tested my understanding of the underlying theory rather than my reproduction
skills of detailed algorithms laid out in the lectures. Of course, taking
more time for the quizzes helped with that, too, but I'm still inclined to
dislike those kinds of questions.
The tests were more or less like the quizzes, though slightly harder. I didn't
do that well on the mid-term, where I got about 50%; I got 75% on the final,
which was much more satisfying. There was some grumbling in the forums about
some of the questions being ambiguous or even just wrong, but this wasn't a big
issue for me.
DeduceIt
There were 6 proof assignments, run via a small web app called DeduceIt. The
web app is a little rough around the edges, and so these assignments could
just get you extra credit. In all of these assignments, some part of the week's
material was represented in a proof assignment, where we were suplied with a
number of given statements, a goal statement and a few production rules. We
had to apply the rules to the given statements in a particular number of
steps to derive the goal statement.
This would all be fine if it wasn't for the representation used in the
assignments. To require students to carefully apply the rules to given
statements themselves, both rules and statements are given as LaTeX-like text
expressions. These are hard to read and very tedious to reproduce, making the
assignments more about understanding the representation than actually going
through the proof.
On the other hand, the proof assignments are a good way to go through the
algorithms that were covered in the lectures without having to dive into the
details of actual code implementing the algorithm. I liked doing them for this
reason, but felt the representation got in the way more and more as the
assignments got more complicated towards the end of the course.
Programming assignments
The real meat of the course, for me, was in the programming assignments,
implementing a compiler for the Classroom Object-Oriented Language (COOL).
Each assignment corresponded to one of the essential compiler stages: a lexer,
a parser, a semantic analyzer and a code generator. All assignments were
available in C++ or Java flavors, with the first two being built around on
the flex/JLex and bison/CUP tools respectively. It was also
allowed to just use another language, with the caveat that it required
reimplementing some of the support code that was made available.
I decided to go with the Java variants of the first two assignments, on the
premise that it would be educational to learn the usage of tools like JLex
and CUP rather than building my own lexer and parser like I'd done for my own
compiler. Getting started with JLex was fairly frustrating, so I almost dumped
it in favor of doing it in Python, but once I got the hang of it it turned out
okay.
(I've since been wondering whether students could learn more if they
implemented a lexer or parser from scratch versus using something like flex
or bison. At least from my own experience, writing a lexer/parser for a
realistic language is not that hard. Of course flex results in a highly
optimized lexer, but writing up a lexing algorithm from scratch would seem more
instructive. On the other hand, perhaps using the generator tool allows you to
focus more on the characteristics of the grammar you're developing, instead of
it potentially getting lost in procedural code.)
The reference compiler is implemented in the same four stages, with a simple
text format used to communicate the relevant data through UNIX pipes: the
lexer consumes program code and spits out a simple line-based list of nodes,
which the parser consumes and turns into an indentation-based AST
representation, the semantic analyzer augments the AST with more typing
information and finally the code generator outputs MIPS assembler code.
Grading was done via Perl scripts that fuzzily compare the output of your
code to that of the reference component, running over a list of sample
programs (both valid and invalid) to score your program. This worked quite
nicely and made it straightforward to find the remaining issues. In fact, the
main way I wrote my code was by starting from a small program, trying to
generate some reasonable output, then use diff -u to compare it to the
output from the reference program and see where my code failed. I often find
that it's easier to stay motivated when generating small successes with some
regularity, and this way of working made it harder to get stuck. On the other
hand, it might have been better to do a little more design up front, forcing
me to think through issues rather than hacking away at problems that come up.
I ended up doing the last two assignments in Python. I wrote up a small
parser for the textual representation of the AST used and some infrastructure
to easily write passes over the AST, then got started with the actual
assignment. This worked great; Python is the language I use most, and not
having to think so much about typing or memory management made it easier to
get the assignments done. In my opinion, Python is a much better language for
most teaching purposes than either Java or C++, because it enables more
focus on the actual algorithm (versus the "details" of programming).
Apparently the Udacity compilers course uses Python, so I might have gone
there if I'd known about it before starting with Coursera.
The programming assignments took a lot of time. With my full-time job, I
found it pretty hard to find 10-25 hours per assignment of focused time reading
documentation, writing code and testing against sample programs. However,
these assignments were also the most instructive and rewarding for me, so I
wouldn't want to have missed them, and having some deadlines also helped with
not procrastinating so much.
As a result, I ended up giving up on the last 20% of the code generation
assignment. Assembly is pretty verbose and hard to read, and working with
MIPS assembly certainly made me appreciate LLVM IR more (even if x86 assembly
is uglier still). Debugging assembly is also pretty painful, so I got a little
frustrated as the deadline was getting closer. The simulator used to run MIPS,
SPIM, was also a little limited and buggy in places, which certainly didn't
make things easier. In short, I'll be happy to return to LLVM and its suite
of tools.
Community
Part of why this course worked for me was definitely the community. The
deadlines can be harsh if you're not a full-time student, but being able to
get some help from staff, confer about particularly hairy parts of the
semantic analyzer or get some extra explanation about a quizz question makes
it much more fun to do. It should also make it less likely you hit a wall and
have to give up on the course and give some extra incentive to finish the
whole course. This made the forums an integral part of the experience for me.
Overall experience
If you similarly lack any education in compilers and find that Steve Yegge's
aforementioned rant has you inspired, Coursera's course is a great way to learn
a lot about compilers. I'm sure the Udacity course is nice, too, but the
Coursera environment seems more attractive. I also found Coursera's relation
to renowned universities appealing, though that might just be good marketing.
In any case, I look forward to taking another course through their site.
I used to have a weblog. I took the posts offline 4 years ago, when I realized
that some posts weren't fit for the public internet. Since then, I've used
Twitter to vent random thoughts and links, but over the past year I've started
to miss having an outlet for longer articles (and practicing my writing).
Iteration 2 will be mostly about techology; programming languages, version
control systems, web technology and software engineering at large are just some
of the topics I like to think about. If that seems a broad selection, I would
agree: there are too many things I want to do. We'll see what's on here in a
year or so.
So, consider this a blog reboot. I have a few topics lined up, but it's going
to take some time to think through turning them into something presentable.