Running a French Holiday Gite in Rural Brittany

Friday, February 08, 2008

Getting FluffySearch engine to work on my web site

A week or so ago I wrote about adding a search engine to our website, concluding that although I liked fluffy search I couldn't get the indexer to work and so I was a bit stumped.

Well over the last week I've been back and forth with swapping emails with the Ben Summers, the developer of Fluffy Search, and despite the disclaimer that "Fluffy Search is unsupported. However, if you email us we'll try and help you out, although we can't promise an immediate response", I can only heap praise on how helpful they have been in sorting out my problems.

At the end of the day most of the problems have been of my own volition, both by not reading the instructions on how to configure fluffy search properly, and also with some underlying problems within the HTML of my website.

So that others don't tread down the same problematic route I wend down, the key things I ended up getting wrong in the config file were:
  1. Setting the $docroot_disc incorrectly
    This must be set to the full hierarchical UNIX filename of where your website files are stored. In the end I had to log a support call with my website host, 123-reg to find out the actual pathname.

  2. Similarly setting the $index_loc incorrectly
    This must also be a full unix hierarchical pathname (e.g. /home/vhosts/<username>/searchindex)

  3. Setting $search_script and $page_script
    In contrast, these two variables should be set as URL's as you would enter in your browser (so they're http://www.giteinbrittany.com/<cgi-directory>/fluffysearch.pl and /fcp.pl respectively).

  4. Make sure that you've got execute permissions on the .pl scripts
    This is one mistake (perhaps the only mistake) I didn't make! In my case I use the 'manage hosting' option within the 123-reg console to set the permissions correctly.

  5. Set $indexer_cmd to where the make index script is stored
    This is the one I got wrong in the first place. The default config script provided has $indexer_cmd set to 'fluffymkindex.pl' and I had to change it to './fluffymkindex.pl' before it would run properly and create the search index files.

  6. Make sure your website is written with valid HTML!
    This may sound an obvious thing but it was the one that got me completely confused as I'd got the indexer and searching working properly, but when searched for a keyword (e.g. 'weather') and then looked at the page returned with the matching keyword nicely highlighted in red, the underlying HTML was fairly badly broken and several of the page links didn't work any more.

    So for instance instead of nice <a> links like this:
    <a href=http://www.giteinbrittany.com><img src=/theme/mast_mos.jpg width=881 height=100 alt="blah blah"></a>

    What I was getting back was
    <a href="/Test/h"ttp://www.giteinbrittany.com><img src=/theme/mast_mos.jpg width=881 alt="blah blah"></a>

    The bottom line is that my website wasn't written with valid 4.01 strict HTML. When I pointed the W3C HTML validator at any page I got back a slew of error messages, usually 15 to 20 on each page!

    The problem was of my own making of course. What I had been doing over time in an attempt to minimise the HTML file sizes was to reduce any "extraneous" quotes from the HTML, but I'd removed too much. I'd forgotten that any url's or image source filenames (as in <a href=<url> /a> or <img src=<filename>) must be enclosed in quotes if the url or filename contained anything other than letters, numbers and full stops.

    So <a href=index.html> is OK (only letters and full stop in the url), but <img src=/images/gite.jpg> was not (as there's slashes in the filename) and would have to be changed to <img src="/images/gite.jpg">. And so I'm now embarking on fixing all the pages on the site and making them valid HTML again.

And there we are, fluffy search is now all working perfectly on our Gite website and you can take it for a spin over at http://www.giteinbrittany.com/fluffysearch101/fluffysearch.pl.

I've still some work to do before I can properly integrate it into the site and launch it - correcting the invalid HTML on the remaining pages (currently some of the results pages returned don't show embedded images and have broken links on them), I've got to style the page results so that it looks consistent with the rest of the website, there's <fcs_ni></fcs_ni> tags to add around the navigation structure to stop it being indexed on each page - try searching for Gite and you'll see what I mean, but the hard work of getting the search engine working is now done.

Result!

Labels:

0 Comments:

Post a Comment



<< Home