AutoCasing Javascript - Correcting Bad Data
I blogged sometime ago about a script that would auto-correct the casing of artist names.
The impulse to do this was bad data being supplied by a 3rd party system I had no control over - the playout system used by Xfm.
Xfm staff were twitchy about auto-correcting the case and it being potentially more harmful than helpful.
I've written a javascript to auto-correct case since then, I'm not going to go into great detail about what it does, since that would be just going over old ground.
On a technical level though, here's what it does to a submitted string:
1 - uses a regular expression to strip all characters that aren't alpha-numeric and then compares this string against a list of known exceptions to the autocasing
2 - if it finds a match by looping through the array of known exceptions it returns that
3 - if not it auto-cases using two regular expressions.
The demo is here. There's a set of test-cases that I was using to develop which is also available. Code available to download from the demo page.
What finally nudged me to blog about it was a development over on MusicBrainz: their GuessCase script. There's much more detailed and rigorous casing here, backed by in-depth debate about what should be uppercase and what shouldn't. Go read about it for yourself and I'll stop writing about it.
Still reading? OK. I was just going to mention the testing that I had in place. What this does is take 211 examples of incorrectly cased artist names and runs them through the script. This 211 includes all the exceptions and then an example that tests each part of the regular expression.
I'm also capturing how long the test takes. Here's the response times for A-grade browsers (and Chrome). The times are an average of five 'scores'.
| OS | Browser | time taken (secs) |
|---|---|---|
| OSX | Safari 4 Public Beta | 0.084 |
| OSX | Firefox 3.0.8 | 0.149 |
| Windows XP | Firefox 3.0.8 | 0.195 |
| Windows XP | Google Chrome 1.0 | 0.215 |
| Windows XP | Opera 9.52 | 0.225 |
| Windows XP | Internet Explorer 8 | 0.315 |
| OSX | Opera 9.52 | 0.462 |
| Windows XP | Internet Explorer 7.0 | 0.579 |
| Windows XP | Internet Explorer 6 | 0.692 |
These figues should only be used for comparison of course. And they widely tally with other regular expression benchmarks that you can find with a quick google:
Latest Posts
The Axe Over BBC 6 Music
3:41p.m., 2 Mar
So BBC Director General Mark Thompson has announced that (among other cuts) the DAB radio services BBC 6 Music and ...The Broken Family Band and Clem Snide
12:14p.m., 14 Feb
I was chatting to a friend last week and found out they love Clem Snide but have never listened to ...Chris Evans Podcast Tops The ITunes Chart
1:33p.m., 14 Jan
Podcasts are a weird thing aren't they. The podcast charts on iTunes are even weirder. Case in point: BBC Radio ...The Xmas 2009 Chart War - Some iTunes Chart Graphs
7:17p.m., 9 Jan
A while back, when I blogged about "How The iTunes Charts Work" I mentioned that I'd been monitoring iTunes Charts ...