Implementing Search Engine-Safe (SES) URLs In ColdFusion
posted Mar 14, 2008 at 04:40:38 PM by Doug Gibson.
I recently implemented SES (Search Engine Safe) URLs here on my blog for the first time ever. I read a number of posts about it. Techniques range widely from using Apache's mod_rewrite, which I initially favored, to using a Java servelet, to using ColdFusion to parse out the URL (similar to what Ray Camden does on Blog.cfc), to using 404 and missing template handlers.
Seach Engine Safe URLs, Friendly URLs, Pretty URLs, and Meaningful URLs?
In addition to the many techniques, there are many different angles to approach the topic. Several terms seem to be used interchangeably, but I think there are really different intentions behind each of them. There are Seach Engine Safe URLs, Friendly URLs, Pretty URLs, and Meaningful URLs.
What I would call a friendly URL or pretty URL is one that is coded to be the shortest possible. So the following URL:
http://dgibson.net/blog/categoryid=1
might be represented by a short, memorable and typable URL such as:
http://dgibson.net/blog/coldfusion
These types of URLs require a lot of assumptions in your specific application. If you know them upfront, you could implement this in CF or Apache just the same. But then you need to have a different rule for actual articles (as opposed to the category page).
What's The Goal?
I was not worried about the looks of my URLs so much and going to that extreme. The basis for Search Engine Safe URLs - and my choosing them - is two-fold:
First and foremost to avoid the use of a standard query string, which can hinder low-ranking or new web sites from being fully indexed. Surely Google, et al can index dynamic content by now, but they tend to not index dynamic content on newer sites as aggressively. By not using the query string, all of your pages appear static to the search engines, and therefore get index more thoroughly.
Second, by including a URL stub (a human-friendly, keyword-rich URL version of the headline), you are both making the topic of the article known from looking at the URL and stuffing it with some keywords taken from your headline, which should help your search engine ranking.
Like I said, I initially planned to use Apache's mod_rewrite facilities. I've used them before, but when doing so, the URL is always updated in the browser's address as well, which is not what I wanted. Perhaps there's a way around this, but I wasn't feeling like exploring that method because it has a drawback in flexibility compared to the ColdFusion method I ended up going with.
So I implemented SES URLs through ColdFusion. I decided that I might as well let CF do the heavy lifting since I will be dynamically writing out the href URLs in ColdFusion anyway.
The Deciding Factor
The final deciding factor was one of flexibility and practicality. My main concern was what happens to the URL stub if I change the headline of the article. Technically, that information is just there for looks (i.e. human readability). But if the headline is changed, and therefore the URL stub changed - it seems wrong not to change the URL and have them out of synch - you could have incoming links to the same content with different URLs. This is bad for search engine optimization on two fronts. First, the content has its links (and therefore page rank) split between two or more versions ("pages"). Second, you could also get slapped with a duplicate content penalty and have one or both versions/pages penalized in search engine rankings. Both are counter-productive for "Search Engine-Safe" URLs.
Implementation
The basic mechanism to parse out the Query String variables from the SES structure is pretty simple, and based on my own specific structure and assumptions. I run a short block of code in the Application.cfm to parse the URL and dump any SES variables there to the URL scope like this:
- <!--- PARSE OUT URL VARIABLES FROM SES URLS --->
- <CFSET urlVars=ReReplaceNoCase(Trim(CGI.PATH_INFO),'.+\.cfm/? *','')>
- <CFLOOP INDEX="qsnvp" LIST="#urlVars#" DELIMITERS="/">
- <CFIF Find("=",qsnvp) GT 0>
- <CFSET URL['#ListGetAt(qsnvp,1,"=")#']=ListGetAt(qsnvp,2,"=")>
- <CFELSEIF urlVars IS "/"&qsnvp>
- <CFSET URL.category=qsnvp>
- <CFELSE>
- <CFSET URL.urlstub=qsnvp>
- </CFIF>
- </CFLOOP>
This loops over the Query String looking for name value paris (hence "qsnvp"), and parsing them. If there is only one extra piece of info after the actual CF Script being run, then I assume it's a category. If there is one name-value pair and then a trailing non-pair string, I treat that string as the URL stub.
Here's what a live URL looks like, containing a name-value pair (articleid) and a URL stub: http://dgibson.net/blog/article.cfm/articleid=8/Implementing-Search-Engine-Safe-SES-URLs
So to avoid duplicate content issues, what I've done on my article.cfm page (and upcoming topic pages) is compare the URL stub to the one generated from the database query. If they are different, then I do a 301 redirect to the "correct" version, which should pass all referential value of the first article to the second and prevent the first from being indexed any more.
- <!--- IF THE URLSTUB DOES NOT MATCH UP, DO A PERMANENT REDIRECT TO THE PROPER ONE --->
- <CFIF NOT StructKeyExists(URL,"urlstub") OR URL.urlstub IS NOT CreateURLstub(qArticleDetails.headline)>
- <CFHEADER STATUSCODE="301" STATUSTEXT="Moved Permanently">
- <CFLOCATION URL="#CGI.SCRIPT_NAME#/articleid=#articleid#/#CreateURLstub(qArticleDetails.headline)#" ADDTOKEN="No">
- </CFIF>
This bit of code is run after the query on the article.cfm page, which gives me the chance to validate the URL stub and enforce proper linking - or at least page rank transfer via 301 redirect - to the one "official version" of each of my blog articles.
In Conclusion...
As an aside, the "poor man's referer spam" attacks on my site have appeared to stop as well. Perhaps this is simply because the new URLs do not match the usual pattern.
I don't necessarily find my SES URL solution ideal by many standards, but I found it simple enough to implement and I'm happy with the real-world flexibility of it. The trade-offs of speed to implement, flexibility and power are always tough calls to make. But I'm always open to suggestions for improvement too.