Running a French Holiday Gite in Rural Brittany

Thursday, May 07, 2009

301 rewrite broke the website

Google Analytics - sudden drop in website visitors
I admit it, I'm a muppet.

We tell the kids (especially Jack, our youngest) that they're "a Muppet" when they do something stupid.

I can now confirm, as suspected for a long time, that the kids must get their Muppet genes from me as I managed to completely break the holiday gite website for a week as the Google Analytics graph above all too clearly shows.

Back in January 2009 I wrote about the "extreme website trickery" of using .htaccess and rewriterule to auto-magically transform requests for web pages such as http://giteinbrittany.com/gite.html to http://www.giteinbrittany.com/gite.html.

I concluded with noting that I still had some work to do with ensuring that http://www.giteinbrittany.com (i.e. the default home page) and http://www.giteinbrittany.com/index.html (i.e. the actual home page contents) were appropriately dealt as a single indexed page (in Google and the other search engines) to maximise my page rank opportunity.

Well with some on and off fiddling of the way that the website menu structures were automatically generated I managed to change all the internal home page references from /index.html to just / so this did most of what I needed to do.

All that remained was to do some fiddling with the .htaccess file to return 301 (permanently moved) for any direct page requests to index.html and the job was done.

Although of course 'it ain't never that simple' with me!

I remembered that over the years I had moved around a few of the website pages and I was concious that there were links "out there" on the web that still pointed to the old website pages that were now broken and returning an unfriendly HTML 404 'page not found' error.

Easy I thought, I'll use a bit of .htaccess trickery to return a 301 response code and redirect any 'old page' requests to the shiny 'new page', page rank will improve, broken links will be banished and customer experience will be improved as a result, and all will be well with the world!

In my travels to work out how to decipher the intricacies of page redirection I'd come across an article by Steven Hargrove on redirecting moved web pages which simply said to include an additional line in the .htaccess file:

Redirect 301 /old/old.html http://www.you.com/new.html

So that's what I did, I added a new line like this:

Redirect 301 /test/index_lytebox_mod.html http://www.giteinbrittany.com/test/lytebox/index.html

Job done, I uploaded the new .htaccess file, and left the website to it.

You'd think that after 22 years working in the IT Industry I would know better and remember to actually test any changes I make - especially those that are as 'deep rooted' as this in the webserver config file.

Well, Mr Muppet didn't bother testing this config change at all, and purely by chance a week or so later I went onto the website to check whether a particular week was available or not and found that EVERY SINGLE WEBSITE REQUEST was returning a HTML 500 error - "fatal server error". I hadn't just managed to break requests to the page that had moved on the website, I'd broken everything.

Google's Webmaster console was full of error messages about unreachable pages and most telling of all was the Google Analytics report of visitor details (above), graphically showing how I caused us to "drop off the internet" for nearly a week.

Commenting out the offending line of the .htaccess file immediately fixed the problem, but finding a working solution to redirecting moved pages has taken me considerably more time as all the examples I found were variations on a theme and everything I tried continued to break the website.

Cutting the story short, and pointing out yet more Muppetry on my part, in the end I found out that the problem was that I had an unprintable character in the .htaccess file and as a result whilst 'http://www.giteinbrittany.com/' and 'test/lytebox/index.html' appeared to be contiguous text when editing the .htaccess in Notepad, actually they were separated by another (invisible) character and as a result the Apache server was barfing on the unrecognised 'extra' text.

Along the way I did an awful lot of research and tried lots of different approaches, none of which worked, until I found the actual root cause problem.

I'll point out a couple of things I did find though, the 'redirect to' URL has to be a full URL (i.e. http://www.blahblah/newpath/filename), it can't be a relative path such as /newpath/filename - at least one site I visited suggested that this was the case.

I can also recommend the htaccess elite forum on using redirect and rewrite for other troubled .htaccess users like me, and then a series of posting by 'produke' on sample redirect statements and a full explanation of how to use the redirect directive in .htaccess.

In the latter article I noticed that the server response code, 301 (permanently redirected), was optional and only introduced from Apache 1.2 onwards. All the examples I'd ever seen had included this response code, so if you have redirect problems then check the webserver version (or take omit the response code).

I'm glad to say now that removing the errant unprintable character cured all my redirect woes and so as a result I have a .htaccess file that prevents directory browsing, redirects requests for non www. pages to the www. version and also now redirects requests for moved pages to their new home.

Putting it all together the .htaccess looks like this:

<Files .htaccess>
order allow,deny
deny from all
</Files>
IndexIgnore */*

### redirect any moved pages that still have old links to them
Redirect 301 /test/index_lytebox_mod.html http://www.giteinbrittany.com/test/lytebox/index.html

RewriteEngine on
RewriteBase /

### re-direct non-www to www
rewritecond %{http_host} ^giteinbrittany.com [nc]
rewriterule ^(.*)$ http://www.giteinbrittany.com/$1 [r=301,nc]

Moral of the story therefore, test things before you put them live and look for the obvious (or perhaps non-obvious) typing errors!

Labels:

1 Comments:

  • doh! you're a braver man than me. I rarely dare touch .htaccess! I hope your bookings haven't suffered too much as a result!?

    By Anonymous Dave, at May 07, 2009  

Post a Comment



<< Home