July 12, 2008

Redirecting Blogger posts to WordPress

Filed under: Tech — Chris @ 3:48 pm

My move from Blogger to WordPress was made possible by this tutorial. However, the post redirection widget has some bugs: it doesn’t handle posts with “a”, “an”, or “the” in the title (seriously!) or titles with non-ASCII characters in them.

Here’s an upgraded widget, which works for a broader class of posts. Basically, I just translated the WordPress PHP code for generating a permalink from a title into Javascript. I cut some corners, but it’s good enough to handle 99% of the posts in my archive.

<b:widget id='Redirector' locked='true' title='Blog Posts' type='Blog'>
<b:includable id='main'>
<b:if cond='data:blog.pageType == "item"'>
<b:loop values='data:posts' var='post'>
<div id='redirectorTitle' style='visibility:hidden'><data:post.title/></div>
<script type='text/javascript'>
var new_domain = 'YOUR_BLOG_URL_HERE'

function utf8_uri_encode( str ) {
  var high_code = new RegExp(/[\u0080-\uffff]+/);;
  new_str = str;;
  while( m = high_code.exec( new_str ) ) {
    new_str = new_str.replace(m,encodeURIComponent(m));;
  return new_str;;

var title = document.getElementById('redirectorTitle').innerHTML;;
// [INCOMPLETE] Keep percent signs that aren't part of an octet?
title = title.replace(/&lt;[^&gt;]*?&gt;/g,'');; // remove tags
title = title.replace(/&amp;.+?;/g,'');; // remove entities
title = utf8_uri_encode(title);; // handle UTF-8 characters
title = title.toLowerCase();;
title = title.replace(/[^%a-z0-9 _-]/g,'');; // remove punctuation
title = title.replace(/\s+/g,'-');; // turn spaces into hyphens
title = title.replace(/-+/g, '-');; // collapse runs of hyphens
title = title.replace(/^-+/g,'');; // remove prefixed hyphens
title = title.replace(/-+$/g,'');; // remove suffixed hyphens
var timestamp = '<data:post.timestamp/>';
timestamp = timestamp.split('/');
timestamp = timestamp[2]+'/'+timestamp[0]+'/'+timestamp[1];
var new_page = new_domain + '/' + timestamp + '/' + title + '/';;
document.location.href = new_page;


  • Timestamps on posts must be in MM/DD/YYYY format. This is easily changed in the Blogger control panel via “Settings -> Formatting”.
  • You should set the correct time zone for your blog in WordPress before you import the posts. Otherwise, Blogger and WordPress won’t always agree on the date of a post. I didn’t do this and, as a consequence, about one in five of my archived posts have bad redirect links. (This can be fixed by manually editing the post’s timestamp, but that is a big pain.)
  • This mostly handles Unicode (see, e.g., this post), but there is a bug in there somewhere. I had to manually change the permalink on this post fromएक्ष्केल्लेन्त्-वर्क-टो/


    so that it redirected to the right place. (Can any Hindi readers help me out with that? Is there punctuation in there? I don’t remember what the title was supposed to say. And I can’t figure out how to back-transliterate it into English.)

  • Blogger does some weird things with the widget code after you save. The code will disappear from the “Edit Template” text box, replaced by a tag like:

    <b:widget id='Blog2' locked='true' title='Blog Posts' type='Blog'/>

    (the tag with id Blog1 is your actual posts; don’t delete it). If the widget doesn’t work, or you want to remove it, just remove that tag. If you want to fiddle with the widget, clicking “Expand Widget Templates” will reveal the underlying code (and a lot else besides). The widget will also show up in the Blogger “Layout -> Page Elements” as a mysterious second “Blog Posts” box, with all kinds of spurious configurable elements. Just ignore that.

June 29, 2008

OCaml’s Unix module and ARGV

Filed under: OCaml, Tech — Chris @ 9:10 pm

Be warned: the string array argument to Unix.create_process et al. represents the entire argument vector: the first element should be the command name. I didn’t expect this, since there is a separate prog argument to create_process, and ended up with weird behavior* like,

# open Unix;;
# create_process "sleep" [|"10"|] stdin stdout stderr;;
10: missing operand
Try `10 --help' for more information.
- : int = 22513

This can be a bit insidious—in many cases skipping the first argument will only subtly change the behavior of the child process.

Note that the prog argument is what matters in terms of invoking the sub-process—the first element of the argument vector is what just what is passed into the process. Hence,

# create_process "gcc" [|"foo";"--version"|] stdin stdout stderr;;
- : int = 24364
foo (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7)

* Actually, this “weird behavior” is the test that finally made me realize what was going on. The emergent behavior of my app was much more mysterious…

June 24, 2008

Resetting a Terminal

Filed under: Linux, Tech — Chris @ 2:04 pm

You tried to cat a binary file and now your terminal displays nothing but gibberish? Just type reset (it may look like ⎼␊⎽␊├).

It has taken me more than 10 years to learn this.

[UPDATE] Interestingly, this doesn’t work in my (alas, ancient) Mac OS X 10.3.9 terminal. Any tips? Also, why did curl URL_TO_BINARY hose my terminal in the first place?

June 23, 2008

Ripping a Muxtape

Filed under: Tech — Chris @ 2:50 pm

So Muxtape is a pretty cool site, but a little frustrating. If a friend posts a really cool mixtape (maybe you know somebody who just barely entered the Aughties), it would be nice to be able to download it and save it, just like all those old cassette mixtapes sentimentally rotting underneath your bed.

Enter muxrip. This simple Ruby script takes the name of the mixtape, downloads it, and creates a playlist for you in M3U or iTunes format. (Acknowledgments: the script basically just adds some polish to this previous effort.)

PLEASE: Use this script responsibly. It would be a shame for Muxtape to get shut down.

ALSO: I wouldn’t be surprised if this suddenly stopped working. It depends on elements of the page layout and URL scheme that might (almost certainly will) change without notice.

June 8, 2008

Tweaking an RSS Feed in Python

Filed under: Python, Tech — Chris @ 8:00 pm

I’ve been teaching myself a bit of Python by the just-in-time learning method: start programming, wait for the interpreter to complain, and go check the reference manual; keep the API docs on your hard disk and sift through them when you need a probably-existing function. Recently, I wanted to write a very simple script to manipulate some XML (see below) and I was surprised (though it has been noted before) at the relatively confused state of the art in Python and XML.

First of all, the Python XML API documentation is more or less “go read the W3C standards.” Which is fine, but… make the easy stuff easy, people.

Secondly, the supposedly-standard PyXML library has been deprecated in some form or fashion such that some of the examples from the tutorial I was working with have stopped working (in particular, the xml.dom.ext module has gone somewhere. Where, I do not know).

So, in the interest of producing more and better code samples for future lazy programmers, here’s how I managed to solve my little problem.

The Problem: Twitter’s RSS feeds don’t provide clickable links

The Solution: A script suitable for use as a “conversion filter” in Liferea (and maybe other feed readers too, who knows?). The script should:

  1. Read and parse an RSS/Atom feed from the standard input.
  2. Grab the text from the feed items and “linkify” them
  3. Print the modified feed on the standard output.

Easy, right? Well, yeah. The only tricky bit was using the right namespace references for the Atom feed, but again that’s only because I refuse to read and comprehend the W3C specs for something so insignificant. I ended up using the lxml library, because it worked. (The script would be about 50% shorter if I hadn’t added a command-line option --strip-user to strip the username from the beginning of items in a single-user feed and a third shorter than that if it only handled RSS or Atom and not both.)

Here’s the code, in toto. (You can download it here.)

#! /usr/bin/env python

from sys import stdin, stdout
from lxml import etree
from re import sub
from optparse import OptionParser

doc = etree.parse(stdin)

def addlinks(path,namespaces=None):
    for node in doc.xpath(path,namespaces=namespaces):
        # Turn URLs into HREFs
        node.text = sub("((https?|s?ftp|ssh)\:\/\/[^\"\s\&lt;\&gt;]*[^.,;'\"&gt;\:\s\&lt;\&gt;\)\]\!])",
                        "&lt;a href=\"\\1\"&gt;\\1&lt;/a&gt;",
        # Turn @ refs into links to the user page
        node.text = sub("\B@([_a-z0-9]+)",
                        "@&lt;a href=\"\\1\"&gt;\\1&lt;/a&gt;",

def stripuser(path,namespaces=None):
    for node in doc.xpath(path,namespaces=namespaces):
        node.text = sub("^[A-Za-z0-9_]+:\s*","",node.text)

parser = OptionParser(usage = "%prog [options] SITE")
parser.add_option("-s", "--strip-username",
                  help="Strip the username from item title and description")
(opts,args) = parser.parse_args()

# For RSS feeds
# For Atom feeds
addlinks( "//n:feed/n:entry/n:content",
           {'n': ''} )

if opts.strip_username:
     # RSS title/description
     stripuser( "//rss/channel/item/title" )
     stripuser( "//rss/channel/item/description" )
     # Atom title/description
     stripuser( "//n:feed/n:entry/n:title",
                 namespaces = {'n': ''} )
     stripuser( "//n:feed/n:entry/n:content",
                 namespaces = {'n': ''} )


If there are any Python programmers in the audience and I’m doing something stupid or terribly non-idiomatic, I’d be glad to know.

Thanks in part to Alan H whose Yahoo Pipe was almost good enough (it doesn’t handle authenticated feeds, as far as I can tell) and from whom I ripped off the regular expressions.

[UPDATE] Script changed per first commenter.

April 30, 2008

Linux Quickies

Filed under: Emacs, Linux, Tech — Chris @ 8:31 pm

The upgrade from Ubuntu Gutsy to Hardy Heron (cool logo, right?) was relatively uneventful. Some minor points…

  • I always thought the main Ubuntu servers would farm my downloads off to an appropriate mirror, but apparently that’s not the case. You’re likely to get better download times if you choose a mirror in System -> Administration -> Software Sources. If you choose “Other…”, there’s a “Select Best Server” feature. Oddly, my best response times were from New Zealand… maybe because they were all asleep when I tried it.
  • The “ugly fix” for the infamous hard disk annihilating bug stopped working after I upgraded. This new, different (but still ugly) fix worked for me. It would be really great if the Ubuntu team could find a way to make the OS stop trying to kill my hard disk by default.
  • My WiFi light stopped working after the upgrade. This is very easily fixed by installing the package linux-backports-modules-hardy.
  • etckeeper is a great idea: it puts all the config files in /etc under Git, Mercurial, or Bazaar source control and forces APT to commit before and after any upgrade, so it’s easy to isolate and revert changes. (As a side note, using Bazaar for a few weeks makes it physically painful to be forced to deal with CVS.)
  • Anti-aliased fonts in Emacs are really nice. On Ubuntu Hardy, install emacs-snapshot-gtk (on prior releases, downloads “Pretty Emacs”), then run emacs-snapshot instead of emacs (or run update-alternatives to set emacs-snapshot as the default). You should then be able to run, e.g., emacs --font "Monospace-10" and get pretty, pretty (lick-able, as they say) fonts. Other reasonable choices are "BitstreamVeraSansMono-X" or "LiberationMono-X", where X is your desired point size. You can also invoke M-x set-default-font and type your choice interactively, but for some reason the TrueType fonts above won’t tab-complete—if you type a non-existent font, Emacs will silently use the default system fixed-width font (see System -> Preferences -> Appearance -> Fonts). I’ve added the following to my .emacs:

    (if (>= emacs-major-version 23)
    (set-default-font "Monospace-10"))

    (The conditional is necessary if you may come into contact with earlier versions of Emacs, which will barf on TrueType fonts.)

  • In my experience, the fonts in your web browser will look better if you don’t use Microsoft’s gratis TrueType core fonts (package msttcorefonts in Ubuntu/Debian). In particular, the Trebuchet font (which crops up frequently, including at the top of this page) tends to look pretty bad with subpixel rendering turned on. Red Hat’s Liberation fonts (package ttf-liberation) are designed as drop-in replacements for the Microsoft fonts, but I haven’t seen much value in installing them.
  • The instructions I gave last month for hooking up to a projector aren’t complete, because they often won’t let you run the projector at a resolution greater than 640×480. This led to a rather embarrassing scene in front a class of undergraduates, where simply refused to operate at such a pathetic resolution. This problem can be solved by the methods presented here, though it requires a bit of tweaking to get things just so. I haven’t yet discovered a minimal solution—first I need to crack the meaning of the X11 “MetaModes” option. When I do, you’ll be the first to know.

April 15, 2008

Only Thus Can It Be Unmade

Filed under: Linux, Tech — Chris @ 3:08 pm

The cleverer among you will espy the problem below immediately

$ export DATE=`date`
$ echo $(DATE)
bash: DATE: command not found

In my half-caffeinated state, it took several minutes of frustration to figure out what was wrong: $(DATE) is a Make-style variable; in Bash, $(DATE) is the same as `DATE` (a command substitution). The correct token is $DATE.

$ echo $DATE
Tue Apr 15 11:08:38 EDT 2008

I apologize for inflicting my stupidity upon you.

April 2, 2008

Using an External Monitor or Projector With My Linux Laptop

Filed under: Linux, Tech — Chris @ 9:26 pm

For years, it was difficult enough to get my laptop working with an external monitor that I didn’t even bother trying: I would boot into Windows in order to give a presentation. (This is the only reason I ever booted into Windows (or have a Windows install).) It either got dramatically easier to accomplish this at some point in the last year, or I’ve been incredibly stupid all this time. Just in case, here’s how it works on my Dell Inspiron 6400 running Gutsy. My video card is an NVIDIA GeForce Go 7300

  1. Plug in the external monitor or projector. The monitor may work immediately (especially if you’re repeating this step after fiddling about below), but it may be at the wrong resolution.
  2. Open “Applications -> System Tools -> NVIDIA Settings” or execute sudo nvidia-settings on the command line. This utility is provided by the nvidia-glx-new package, which you should probably have installed.
  3. Choose “X Server Display Configuration” and click “Detect Displays” at the bottom of the screen.
  4. The external monitor should appear in the Layout pane. Click on it, then click “Configure”. Choose “TwinView” (which should hopefully not say that it requires an X restart).
  5. In the “Display” box, choose “Position: Clones”. This means that you want the same display to appear on both monitors. This is what works best for me, particularly for giving presentations. Having separate displays seems to confuse applications—for example, “Presentation Mode” in Evince will “center” the slides, displaying the left half of a slide on the right half of the laptop screen and the right half of a slide on the left half of the projector. It’s probably possible to tweak this with exactly the right viewport/workspace settings (ugh), but that’s not how I roll.
  6. If the display is smaller than the default display—the display’s square will be smaller in the Layout pane and the displayed area will be cropped on the screen—click on the
    default display in the Layout pane and choose a lower resolution. 1024×768 is usually safe. The laptop display will probably look bad, but the external display should look fine.

    Be careful: any smaller than 1024×768 and the Settings applet will be too big to display on the screen. If this happens, you’ll have to navigate blind or hit Ctrl-Alt-Backspace to restart X (or don’t automatically hit OK after the resolution changes and it will revert after 15 seconds).

To remove the external monitor or projector:

  1. Unplug the monitor.
  2. Click “Detect Displays”.
  3. A message “The display device FOO has been unplugged…” will appear. Click “Remove.”
  4. Click “Quit”.

Under no circumstances should you click “Save to X Configuration File” at any point in this process. That’s just asking for trouble.

Some sequence of actions—it’s not clear which—may screw up the “X Server Display Configuration” pane. The display will
continue to function in the meanwhile, but all the above commands are inaccessible. Restarting X made it go away (for me).

[UPDATE] It seems it’s necessary to update your xorg.conf to get decent resolution on some projectors. I’m still investigating… In the meantime, this should help.

March 24, 2008

LaTeX Appendectomies

Filed under: LaTeX, Tech — Chris @ 4:41 pm

I have need of a LaTeX package. I think a lot of people would find this package useful. I would prefer not to write it myself.

This package would take a mode argument in the preamble and format the document in one of three ways: as a conference submission, as a camera-ready conference paper, or as a tech report.

Suppose I have a theorem and that theorem has a proof.

  • In a conference submission, the theorem would appear in the main text and would be re-stated along with its proof in an appendix.
  • In a camera-ready conference paper, the theorem would appear in the main text and the proof would not appear at all.
  • In a tech report, the theorem and the proof would appear inline in the main text.

Preferably, proofs could be included in the main text or sent to an appendix on a case-by-case basis. Proofs could also have “sketch” versions and full versions: the sketch version appears in the main text of a conference paper (either kind) and the full version appears only in a tech report.

Suppose that, in proving a theorem, I first prove a lemma.

  • If the proof of the theorem appears in the main text (or an appendix), then the lemma and its proof should also appear in the main text (or the appendix), before the theorem.
  • If the proof of the theorem is omitted, or if a proof sketch is included which makes no reference to the lemma, then the lemma and its proof should not appear at all.

One should be able to conditionally include text depending on the mode. For example, in camera-ready conference mode, one would probably include the sentence: “Full proofs of all theorems appear in a technical report [citation here].”

The only package I’ve found that does anything like this is thrmappendix , but it doesn’t allow for a proof to appear in the main text at all. It’s primarily concerned with the appearance and re-appearance of the theorem, with or without its proof; I’m primarily concerned with the appearance or suppression of the proof.

January 24, 2008

The Triumphant Return of C-c C-t

Filed under: Emacs, OCaml, Tech — Chris @ 11:46 pm

The upgrade to Ubuntu gutsy and/or Emacs 22 broke my favorite feature of tuareg/ocaml-mode: C-c C-t for “show type” in OCaml buffers (this requires compiling with -dtypes, which generates type annotation files). I suffered without this for a length of time which is either embarrassing or impressive, depending on whether you consider poking around inside Emacs Lisp files a productive or unproductive use of time…

I finally broke down and fixed it today. The problem is simply that Emacs and OCaml packages aren’t cooperating properly. My solution, which may or may not be optimal, is as follows:

  1. Copy the directory /usr/share/emacs/site-lisp/ocaml-mode to a path of your choosing, say ~/.emacs.d/emacs22/ocaml-mode. Let’s call this directory DIR
  2. (Optional) In Emacs 22, execute C-u 0 M-x byte-recompile-directory and choose DIR.
  3. Add the following line to your .emacs file:
    (or (< emacs-major-version 22) (push "DIR" load-path))

The test for whether it worked is: load a .ml file and type C-c C-t. In the mini-buffer, you’ll either see “type: ...“; “Point is not within a typechecked expression or pattern“; or “No annotation file...” If it says “C-c C-t is undefined“, then you have failed.

« Newer PostsOlder Posts »

Blog at