AutoCasing Javascript - Correcting Bad Data
I blogged sometime ago about a script that would auto-correct the casing of artist names.
The impulse to do this was bad data being supplied by a 3rd party system I had no control over - the playout system used by Xfm.
Xfm staff were twitchy about auto-correcting the case and it being potentially more harmful than helpful.
I've written a javascript to auto-correct case since then, I'm not going to go into great detail about what it does, since that would be just going over old ground.
On a technical level though, here's what it does to a submitted string:
1 - uses a regular expression to strip all characters that aren't alpha-numeric and then compares this string against a list of known exceptions to the autocasing
2 - if it finds a match by looping through the array of known exceptions it returns that
3 - if not it auto-cases using two regular expressions.
The demo is here. There's a set of test-cases that I was using to develop which is also available. Code available to download from the demo page.
What finally nudged me to blog about it was a development over on MusicBrainz: their GuessCase script. There's much more detailed and rigorous casing here, backed by in-depth debate about what should be uppercase and what shouldn't. Go read about it for yourself and I'll stop writing about it.
Still reading? OK. I was just going to mention the testing that I had in place. What this does is take 211 examples of incorrectly cased artist names and runs them through the script. This 211 includes all the exceptions and then an example that tests each part of the regular expression.
I'm also capturing how long the test takes. Here's the response times for A-grade browsers (and Chrome). The times are an average of five 'scores'.
| OS | Browser | time taken (secs) |
|---|---|---|
| OSX | Safari 4 Public Beta | 0.084 |
| OSX | Firefox 3.0.8 | 0.149 |
| Windows XP | Firefox 3.0.8 | 0.195 |
| Windows XP | Google Chrome 1.0 | 0.215 |
| Windows XP | Opera 9.52 | 0.225 |
| Windows XP | Internet Explorer 8 | 0.315 |
| OSX | Opera 9.52 | 0.462 |
| Windows XP | Internet Explorer 7.0 | 0.579 |
| Windows XP | Internet Explorer 6 | 0.692 |
These figues should only be used for comparison of course. And they widely tally with other regular expression benchmarks that you can find with a quick google:
Latest Posts
Walking In Andorra
5:24p.m., 11 Jul
I've just returned from a week in Arinsal, Andorra. It's a popular ski resort, but almost the whole town seems ...A New Mix In The Cloud
5:27p.m., 30 May
I uploaded a new mix to mixcloud.com yesterday. The notion for the mix came from the grunge podcast we recorded ...League One Round-Up, Player Of The Year Awards 2009-2010
5:53p.m., 19 May
I've enjoyed this 2009/2010 football season. Huddersfield Town have managed to put together a young, hungry side with a smattering ...Wii Sports Resort Table Tennis - Tips To Beat Lucia
7:02p.m., 25 Apr
I love my Nintendo Wii. Partly it's because of the Wiimote, but mostly I love games that are simple to ...