AutoCasing Javascript - Correcting Bad Data
I blogged sometime ago about a script that would auto-correct the casing of artist names.
The impulse to do this was bad data being supplied by a 3rd party system I had no control over - the playout system used by Xfm.
Xfm staff were twitchy about auto-correcting the case and it being potentially more harmful than helpful.
I've written a javascript to auto-correct case since then, I'm not going to go into great detail about what it does, since that would be just going over old ground.
On a technical level though, here's what it does to a submitted string:
1 - uses a regular expression to strip all characters that aren't alpha-numeric and then compares this string against a list of known exceptions to the autocasing
2 - if it finds a match by looping through the array of known exceptions it returns that
3 - if not it auto-cases using two regular expressions.
The demo is here. There's a set of test-cases that I was using to develop which is also available. Code available to download from the demo page.
What finally nudged me to blog about it was a development over on MusicBrainz: their GuessCase script. There's much more detailed and rigorous casing here, backed by in-depth debate about what should be uppercase and what shouldn't. Go read about it for yourself and I'll stop writing about it.
Still reading? OK. I was just going to mention the testing that I had in place. What this does is take 211 examples of incorrectly cased artist names and runs them through the script. This 211 includes all the exceptions and then an example that tests each part of the regular expression.
I'm also capturing how long the test takes. Here's the response times for A-grade browsers (and Chrome). The times are an average of five 'scores'.
| OS | Browser | time taken (secs) |
|---|---|---|
| OSX | Safari 4 Public Beta | 0.084 |
| OSX | Firefox 3.0.8 | 0.149 |
| Windows XP | Firefox 3.0.8 | 0.195 |
| Windows XP | Google Chrome 1.0 | 0.215 |
| Windows XP | Opera 9.52 | 0.225 |
| Windows XP | Internet Explorer 8 | 0.315 |
| OSX | Opera 9.52 | 0.462 |
| Windows XP | Internet Explorer 7.0 | 0.579 |
| Windows XP | Internet Explorer 6 | 0.692 |
These figues should only be used for comparison of course. And they widely tally with other regular expression benchmarks that you can find with a quick google:
Latest Posts
Muppets Birthday Card
5:47p.m., 28 Nov
Emma loves The Muppets. She even has her own Muppet who we call Emma Too and who was born at ...Detecting Online Status In The Browser
11:55a.m., 28 Nov
I was just heading into a meeting when I was asked how our (mostly web-based) iOS application was going to ...Dropping Support for Internet Explorer 6
2:37p.m., 11 Oct
Microsoft's Internet Explorer 6 has long been the bane of every front-end developer's life. It's a 10-year old browser - ...Xfm Buzz - A Radio Hack
1:15p.m., 31 May
At Global Towers we developers have 10% time to go away and hack at something that might, ultimately, bring value ...