Note to self – source code documentation

February 22, 2012

A number of years ago I wrote a program for the W3C to automatically migrate a Moin-Moin wiki to MediaWiki. At the time, I was the Chair of one of the W3C groups and we had just decided to replace our wiki technology because participants were finding the limitations of Moin-Moin to be a problem. Similar complaints were heard from dozens of other groups in the consortium and it was inevitable that we would have to find a way to reformat thousands of documents. Furthermore, since the W3C insists on accurate records of the evolution of the technologies in its care, we would have to migrate every version of every document. The entire history had to be preserved. And at the same URLs, if possible.

Doing it manually was just impossible, even for just our own wiki, never mind the whole collection of wikis in W3C. Since it was obvious that this was going to be a pain point for many groups, I resolved to write code to migrate my group’s wiki and make the code configurable enough to handle any wiki.

I’m happy to say the solution worked for the DD group wiki, and following that success I donated the code to Open Source (under the W3C license). Since then the code underwent some minor updates to keep up with MediaWiki changes. Some people were charmed by the results. (It’s nice to know that someone outside W3C found it useful!) The most recent update by Rufus Pollock went up on GitHub in the middle of 2011. The W3C did eventually use the code to migrate the entire collection of W3C wikis in its care. They even took the time to write a short manual of how to use the tool I created.

Which brings me to the subject of documentation.

The mm2mw solution was written in Perl, one of a few programming languages that can be a joy to use, but also comes with huge risks of creating code that few can decipher. Including the author! Since nothing annoys me more than poorly commented code, I tried to make it reasonably structured, with sensible naming and plenty of comments, hints and explanations embedded in the “tricky” areas. These comments are as much for me as they are for others who might maintain my code. But the truth is that I seldom return to my old code, so I don’t normally get to discover how successful (or unsuccessful) my documentation has been.

The other day I was clearing out some of my old code archives when I came across a copy of mm2mw. “I remember that,” I thought, and out of curiosity I opened it up. So, after a number of years gathering dust, here are my first impressions as I scrolled through my old code:

What the…?
ASCII pictures of the directory hierarchies used/created. So glad I put those in.
It even generates its own help file when it has finished, so you know what to do next!
20% of the source is preamble/explanation at the start. Keeping the doc as close to the code as possible.
Another 20% of the source is the interactive UI. Nothing complicated so far.
Ouch. Around line 400 it starts to get hairy.
Relying heavily on “meaningful” variable/subroutine names. These lines are wide!
Have to read slowly. Line… by… line…
Did I really write this?!: if ( $x =~ m/^\s*((<.[^>]+>|<$>|<:>|<$>)+)(.+)/ )
And why didn’t I comment that line???
Seriously??: s/\[:([^:\]]+):([^\]]+)\]/[[$1|$2]]/g;
Now I’m really having fun!: s/(?<![\&!\/#])\b([A-Z][a-z0-9]+){2,}(\/([A-Z][a-z0-9]+){2,})*\b/[[$&]]/g;
At this point I remember how to spin the scroll wheel on the mouse. My head hurts.

The source reads better colour-coded on GitHub, but it’s still quite rough to read. I thought at the time (2007) that I’d done a reasonably good job. Just as well the program works and has needed very little maintenance, as I’d sure feel sorry for anyone who has to pick up where I left off!

Categorised as: Coding

Posted by Rotan on February 22, 2012 at 9:22 pm