ForceType for nice URLs with PHP

Wed, Oct 3, 2007 10:53 PM
This has been covered before, but I was just setting up a new force type on our servers and thought I would mention it for the fun of it. You see lots of stuff about using mod_rewrite to make friendly URLs or SEO friendly URLs. But, if you are using PHP (and I guess other Apache modules) you can do it without mod_rewrite.  We have been doing this for a while at dealnews.  Even before SEO was an issue.

Setting up Apache

From the docs, the ForceType directive "forces all matching files to be served as the content type given by media type." Here is an example configuration:

<Location /deals>
ForceType application/x-httpd-php
</Location>


Now any URL like http://dealnews.com/deals/Cubicle-Warfare/186443.html will attempt to run a file called deals that is in your document root.

Making the script

First save a file called deals witout the .php extension. Modern editors will look for the <?php tag at the first and will color it right. Normally you take input to your PHP scripts with the $_SERVER["QUERY_STRING"] or the $_GET variables. But, in this case, those are not filled by the URL above. They will still be filled if there is a query string, but the path part is not included.  We need to use $_SERVER["PATH_INFO"]. In the case above, $_SERVER["PATH_INFO"] will be filled with /Cubicle-Warfare/186443.html. So, you will have to parse the data yourself. In my case, all I need is the numeric ID toward the end.

$id = (int)basename($_SERVER["PATH_INFO"]);

Now I have an id that I can use to query a database or whatever to get my content.

Avoid "duplicate content"

The bad part of my use case is that any URL that starts with /deals/ and ends in 186443.html will work. So, now we have duplicate content on our site. You may have a more exact URL pattern and not have this issue.  But, to work around this in my case, we should verify that the $_SERVER["PATH_INFO"] is the proper data for the content requested. This code will vary depending on your URLs. In my code, I generate the URL for the content and see if it matches. Kind of a reverse lookup on the URI.  If it does not match, I issue a 301 redirect to the proper location.

header("HTTP/1.1 301 Moved Permanently");
header("Location: $new_url");
exit();


Returning 404

Now, you have to be careful to always return meaningful data when using this technique. Search engines won't like you if you return status 200 for every possible random URL that falls under /deals. I know that Yahoo! will put random things on your URLs to see if you are doing the right thing. So, if you get your id and decide this is not a valid URL, you can return a 404.  In my case, I have a 404 file in my document root.  So, I just send the proper headers and include my regular 404 page.

header('HTTP/1.1 404 Not Found');
header('Status: 404 Not Found');
include $_SERVER["DOCUMENT_ROOT"]."/404.html";
exit();