Skip to Content

Gareth53.co.uk - the online home of Gareth Senior

AutoCasing Javascript - Correcting Bad Data

4:01p.m., Sat 16 May 2009

I blogged sometime ago about a script that would auto-correct the casing of artist names.

The impulse to do this was bad data being supplied by a 3rd party system I had no control over - the playout system used by Xfm.

Xfm staff were twitchy about auto-correcting the case and it being potentially more harmful than helpful.

I've written a javascript to auto-correct case since then, I'm not going to go into great detail about what it does, since that would be just going over old ground.

On a technical level though, here's what it does to a submitted string:

1 - uses a regular expression to strip all characters that aren't alpha-numeric and then compares this string against a list of known exceptions to the autocasing

2 - if it finds a match by looping through the array of known exceptions it returns that

3 - if not it auto-cases using two regular expressions.

The demo is here. There's a set of test-cases that I was using to develop which is also available. Code available to download from the demo page.

What finally nudged me to blog about it was a development over on MusicBrainz: their GuessCase script. There's much more detailed and rigorous casing here, backed by in-depth debate about what should be uppercase and what shouldn't. Go read about it for yourself and I'll stop writing about it.

Still reading? OK. I was just going to mention the testing that I had in place. What this does is take 211 examples of incorrectly cased artist names and runs them through the script. This 211 includes all the exceptions and then an example that tests each part of the regular expression.

I'm also capturing how long the test takes. Here's the response times for A-grade browsers (and Chrome). The times are an average of five 'scores'.

OS Browser time taken (secs)
OSX Safari 4 Public Beta 0.084
OSX Firefox 3.0.8 0.149
Windows XP Firefox 3.0.8 0.195
Windows XP Google Chrome 1.0 0.215
Windows XP Opera 9.52 0.225
Windows XP Internet Explorer 8 0.315
OSX Opera 9.52 0.462
Windows XP Internet Explorer 7.0 0.579
Windows XP Internet Explorer 6 0.692

These figues should only be used for comparison of course. And they widely tally with other regular expression benchmarks that you can find with a quick google:

http://www.codinghorror.com/blog/archives/001023.html

Latest Posts

Blog Categories