“I’ve heard of User-Agents…”
According to Wikipedia:
When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the HTTP request, prefixed with User-agent: or User-Agent: and typically includes information such as the application name, version, host operating system, and language. Bots, such as web crawlers, often also include a URL and/or e-mail address so that the webmaster can contact the operator of the bot.
What Is My User-Agent?
Your User-Agent is:
4 Reasons Why You Need to Detect User-Agents
- Browsers Have Quirks – Every web page, no matter how strict of an XHTML document, WILL react differently in certain browsers. Sometimes it is necessary to give a document minor adjustments to look uniform in all browsers. You should try to keep this to a minimum, and code your documents according to best practices and standards, but sometimes browsers still don’t cooperate, so we have to tweak.
- Personalize Content – It may be appropriate to provide different versions of the content depending on the type of browser or user-agent. For example, you may have specialized content such as podcasts, wallpaper, and downloads for mobile browsers and portable video game systems. It would be important to serve content appropriately so that each visitor has the most relevant experience at your site. As long as the intent is not to deceive search engines, this is not considered cloaking.
- Keep Bad Visitors Out of Your Site – Do you often have bandwidth problems because unscrupulous visitors are downloading your entire site or devious webmasters are sending scraper bots to steal your data and use in their spammy sites? Then you need to use your .htaccess file to block bad visitors. Place the following lines into the beginning of your .htaccess file:
# Bad User-Agent List :: BEGIN
SetEnvIfNoCase User-Agent "Bad User Agent Here" bad_user_agent
SetEnvIfNoCase User-Agent "BadUserAgentHere" bad_user_agent
# Bad User-Agent List :: END
Then place this near the end:
Allow from all
Deny from env=bad_user_agent
Replace “Bad User Agent Here” and “BadUserAgentHere” with a key identifying phrase from the User-Agent string of the offending visitor(s). Place a slash () before spaces and punctuation. If you have more than one, copy and paste the line to create a longer list. For more info, visit the .Htaccess Reference. This will not block all scrapers, but it will eliminate quite a few.
- Guide Search Engine Spiders – Every search engine has a web robot, called a spider, that visits your web site. You need to guide these spiders in how they access your site, using a Robots.txt file. As basic as this may be to some of you, you’d be surprised how many webmasters don’t use Robots.txt correctly. If you need help creating a Robots.txt file, visit the Robots.txt Generator.
How Do I Detect My User-Agent?
$MyUserAgent = $_SERVER['HTTP_USER_AGENT'];
echo "Your User-Agent is: $MyUserAgent";
<% @ Language=VBScript %>
MyUserAgent = Request.ServerVariables("HTTP_USER_AGENT")
Your User-Agent is: <%=MyUserAgent%>
MyUserAgent = navigator.userAgent;
document.write('Your User-Agent is: ',MyUserAgent);
Detailed Browser Detection:
Browscap PHP Project
An excellent standalone class that you can install in minutes and use to detect the latest browsers easily.
A Sample List of User-Agents:
- List of User-Agents (Spiders, Robots, Crawler, Browser) – There are new bots created every day, so any list will be out of date as soon as it is published. The key is start learning how to identify different User-Agents.
Learn About Identifying Spiders/Bots/Browsers:
Search Engine Spider Identification Forum at WebmasterWorld