Friday, January 11, 2008

Creating static pages from a database

For my most recent project I will be creating a sign-up sheet that will generate static pages when the information is entered. The reason the pages will be static is so they can be crawled by search engine spiders. A spider cannot read dynamically created pages because it only looks at what is stored on the server and that would be the php script which is not friendly at all to them. The only block of information that I will not have to store in the database is the text block because it will only be written out to the pages for that particular user. The rest of the information will have to be called from other pages.

Creating the page for the vendor is easy (if you've used xtemplates before). Its just a matter using xtemplate function xtpl->assign() to replace the text block and other info where it is needed.

The first mildly challenging aspect I encountered was the multiple select list where the user can choose multiple options with ctrl-click as so:



This data has to be stored somewhere and the most difficult part was getting the array to pass to the php script. In my script I had:

if(isset($cities)){
foreach($cities as $value){
$query="INSERT IGNORE INTO `***` (`user`, `cities`) VALUES ('".$user_name."', '".$value."');";
mysql_query($query) or die('Failed to update user to cities: '.mysql_error());
}
}

< br />
Which I was sure would work but I kept getting an error on my foreach statement. After some research on html forms I found that the value to be passed as an array to a script had to be defined as name="my_name[]". Html needs the [] to show the script that it is an array otherwise it will just pass the first value. Obviously a foreach() statement won't work on an non-array type so this is why that code was breaking.

The next bigger challenge I have is to create site maps to all of these pages being created and also create links on the bottom of the user's page to some pages from other users, based on group, as kind of a minimap. Again all my pages have to be static so php cannot reside on the page and call the database. I looked into the Apache server mod_rewrites and there is no way to create a static page with php on it. My first idea was to write a script to parse the php pages using the ob_start() function which will store all proceeding information on an internal buffer which is sent to the browser's buffer on ob_end_flush(). This buffered content (which will now be parsed by the browser) can be stored in any string. I was then going to take this content and write it to a new .html file under the same name. This takes a lot of time because of the amount of pages being created and buffered and space because I would have to keep the .php file in the event the database was updated I would have to update the .html.

But then I got an idea which was much simpler and straight forward and would save all the extra work from my previous idea. My idea is to not even put the php on the page in the first place. I can just use the xtpl->assign('var', 'content') function here as well and put the php code in the content as xtpl->assign('var', 'php_code'). The browser will parse the php code before it gets sent to the xtpl function saving me a lot of work. The only thing I need to do now is to make my code more modular so I can write a script to update the pages using this philosophy every so often for when the database is updated.

2 comments:

Efeion said...

Search Engine Spiders see exactly what you do when looking at your website in a browser. The php code is server run, so no one can ever see that, all they see is the html that the php generates and sends to them. Otherwise it would be horribly insecure if you could view people's code. Spiders will see exactly what your php outputs, thus if you are worried about bots not seeing your html result, you need not worry.

The only problem that does occur with php is that Spiders will ignore any GET variables passed after the .php, so all they will see is the first page the base .php file creates. There are several way to get around this, one that I have been looking at is here. You just need to trick bots into thinking that the passed parameters are a path and they will read it.

I think this is what you where getting at, but I would be lying if I said I was not a little confused...

TheSpeshulK said...

It appears that there is a lot of misinformation out there on this topic and I happened to run across some of it myself. Spiders will ignore any URL's and links with queries appended to them as you said. This is a nice section I found that explains a lot about spiders and dynamic pages. However there is still a problem with the dynamically generated links on my pages because there are some that need to grab a thousand or more rows from the database which I fear will cause a search engine spider to wait too long and not like the page. For this reason I may need to stick to my idea of processing the php before it is put on the page's template.

Avoiding the query strings can be tricky but if you know anything about regular expressions I think that you will find that a mod_rewrite might work better for you, assuming you are on an Apache server. With this technique you just redirect a URL request like mydomain.com/country/state to mydomain.com/request.php?country=USA&state=New%20Hampshire for example. This way the viewer (and spiders) never see the php query. The best article I've found on explaining this to beginners (like myself) is here.

I was really confused on how the parentheses worked before this article because I wasn't very familiar with regular expressions and all the other sites on .htaccess files assume you are.