Skip to Content

Gareth53.co.uk - the online home of Gareth Senior

Paths in JS variables affecting SEO

3:43p.m., Thu 8 May 2008

Google's Webmaster tool run against xfm.co.uk recently returned 14,000+ 404 errors on the domain.

90%% of these were puzzling URLs - half with a subdirectory that doesn't exist on the site and half with paths to the (third party) ad server apended to the site domain.

It appears that Googlebots are indexing paths that it finds as javascript variables in the host document.

Adverts and Tracking to the site are dealt with by two third party sites and copius amounts of JS.

Omniture's SiteCatalyst (formerly Websidestory's HBX) script includes a "category" variable in the HTTP request to their server with multiple levels of categories separated by back-slashes. For e.g.

//MULTI-LEVEL CONTENT CATEGORY hbx.mlc="/XFM/Yo+Want+Rock+You+Got+It++The+Darkness+Cometh";

Googlebots are then trying to retrieve the URL:
http://www.xfm.co.uk/XFM/Yo+Want+Rock+You+Got+It++The+Darkness+Cometh
which returns a 404 error and, presumably, affects the search ranking of the site. This mlc value is unique to every page in the site - which, presumably, menas that for every valid page in the google db, there's a corresponding 404 error :(

Likewise with the ad code which likes to generate script includes client side by using document.write (*sigh*) In and amongst the javascript is the part of the path that's common to all the script includes on the page:

var sitearea = "/SITE=xfm/AREA=Home/GUID=5911643/pageid=5911643/LOGIN=0/CODE1=/CODE2=/CODE3=/";

Resulting in more bad URLs:
http://www.xfm.co.uk/SITE=xfm/AREA=Home/GUID=5911643/pageid=5911643/LOGIN=0/CODE1=/CODE2=/CODE3=/

Given that the GUID value is unique to not just every page but every http request that's even more 404 errors being indexed by google :(

Latest Posts

Blog Categories