Detect User-Agents: Cloak and Dagger for Web Sites – Part 2

by Scott Allen - February 18, 2007 
Filed Under .htaccess, ASP, Bad Bots, JavaScript, PHP, User-Agents

“I’ve heard of User-Agents…”

In a previous post, I introduced you to User-Agents. Now let’s find out why you need to detect them, and how.

According to Wikipedia:

When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the HTTP request, prefixed with User-agent: or User-Agent: and typically includes information such as the application name, version, host operating system, and language. Bots, such as web crawlers, often also include a URL and/or e-mail address so that the webmaster can contact the operator of the bot.

agents

What Is My User-Agent?
Your User-Agent is:
CCBot/2.0 (http://commoncrawl.org/faq/)

4 Reasons Why You Need to Detect User-Agents

  1. Browsers Have Quirks – Every web page, no matter how strict of an XHTML document, WILL react differently in certain browsers. Sometimes it is necessary to give a document minor adjustments to look uniform in all browsers. You should try to keep this to a minimum, and code your documents according to best practices and standards, but sometimes browsers still don’t cooperate, so we have to tweak.
  2. Personalize Content – It may be appropriate to provide different versions of the content depending on the type of browser or user-agent. For example, you may have specialized content such as podcasts, wallpaper, and downloads for mobile browsers and portable video game systems. It would be important to serve content appropriately so that each visitor has the most relevant experience at your site. As long as the intent is not to deceive search engines, this is not considered cloaking.
  3. Keep Bad Visitors Out of Your Site – Do you often have bandwidth problems because unscrupulous visitors are downloading your entire site or devious webmasters are sending scraper bots to steal your data and use in their spammy sites? Then you need to use your .htaccess file to block bad visitors. Place the following lines into the beginning of your .htaccess file:

    # Bad User-Agent List :: BEGIN
    SetEnvIfNoCase User-Agent "Bad User Agent Here" bad_user_agent
    SetEnvIfNoCase User-Agent "BadUserAgentHere" bad_user_agent
    # Bad User-Agent List :: END

    Then place this near the end:

    <Files *>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_user_agent
    </Files>

    Replace “Bad User Agent Here” and “BadUserAgentHere” with a key identifying phrase from the User-Agent string of the offending visitor(s). Place a slash () before spaces and punctuation. If you have more than one, copy and paste the line to create a longer list. For more info, visit the .Htaccess Reference. This will not block all scrapers, but it will eliminate quite a few.

  4. Guide Search Engine Spiders – Every search engine has a web robot, called a spider, that visits your web site. You need to guide these spiders in how they access your site, using a Robots.txt file. As basic as this may be to some of you, you’d be surprised how many webmasters don’t use Robots.txt correctly. If you need help creating a Robots.txt file, visit the Robots.txt Generator.

How Do I Detect My User-Agent?
That’s a great question. Here’s how to detect your User-Agent, in PHP, ASP, and JavaScript.

PHP:
<?php
$MyUserAgent = $_SERVER['HTTP_USER_AGENT'];
echo "Your User-Agent is: $MyUserAgent";
?>

ASP:
<% @ Language=VBScript %>
<%
MyUserAgent = Request.ServerVariables("HTTP_USER_AGENT")
%>
Your User-Agent is: <%=MyUserAgent%>

JavaScript:
<script language="JavaScript">
MyUserAgent = navigator.userAgent;
document.write('Your User-Agent is: ',MyUserAgent);
</script>

Detailed Browser Detection:

A Sample List of User-Agents:

Learn About Identifying Spiders/Bots/Browsers:



Comments

One Response to “Detect User-Agents: Cloak and Dagger for Web Sites – Part 2”

  1. User-Agents: Cloak and Dagger for Web Sites - Part 1 | WebGeek on January 15th, 2006 10:00 pm

    [...] Ok, well that should do for now…I’ll get more in-depth in a later post. [...]

Leave a Reply
If you have any questions about commenting, please see our Comment Policy.