Monday, February 22, 2010

Understanding and Using Robots.txt Files!

Understanding and Using Robots.txt Files!

We all get a bit excited when the search engines visits our web site frequently and indexes our content. However, there are certain things that we don't want the search engines to spider because of private information that we don't want the world to see. Another scenario would be that we may have more than one version of a page on our site. We can tell the search engines in our robot.txt file which page to crawl and which ones to ignore. We definitely don't want both of these pages to be crawled and end up with the search engine nailing you for spam because of the duplicate content in the two similar versions of one page.

Another reason you may want to tell the spiders not to spider a page would be to save some bandwidth by excluding some of the images, style sheets or javascript. With the robot.txt file you can be very specific about what you want spidered and not spidered.

What does Robots.txt file really mean? The robots.txt is a text file (not html) you put on your web site to inform the search robots which pages of your site you would like crawled and which ones you don't want the spiders to crawl. Placing a robots.txt file in your site is not mandated by the search engines, however, the search engines will normally follow your instructions you would put in this file. This process is similar to putting a sign on your web site saying "Do Not Enter" on an unlocked door. This file is not a fire wall so the search engine may still spider your site.

Another way you could tell the engines which files and folders to not spider would be with the use of a robots metatag. Some engines don't read metatags, so the information in the robots metatag would not be seen at all by certain engines. The preferred way to be specific to all the engines would be with the use of the robots.txt file....not robots metatags.

Where you position your robots.txt file is vitally important. It must be in the main directory or the search engines will not find the file. The engines do not search the whole site, they look in the main directory and if they don't find the file there, the engine would assume that such a file does not exist. As a result, then the engine would index everything they find in your site. Even though this file is not required by the engines, if you don't put the file in the right place the search engines will likely index the entire site, including your private information you wanted to keep confidential.

The structure of the robots.txt file has little to no flexibility. Learning the function and structure is pretty simple if you do a bit of study and learn it's function and purpose. There are program that are available online that will help you in this process. By filling in a few blanks and a click of the mouse you can construct a very effective text file that will be very specific to the search engines. Don't attempt to get creative here, it will hurt you in the long run.

When you start trying to manipulate these files and try to allow different engines or directories you can get into trouble rather quickly. Make sure you type your commands very carefully...check and double check your spelling, positioning of colons, slashes and make sure the spelling of the engines is correct. Even though this file is rather simple in it's intent, making simple mistakes can be devastating. This may be where it would be wise to use some form of a validator to check your entries for accuracy. This author does not recommend or endorse any particular product here. Do your research...then decide if this type of validation system is for you. Good Luck in your marketing endeavors.

The eBiz Solutions Team is standing by to assist you with any questions you may have. Call for your free 30 minute consultation today.

"Let's Build Your Business Together"

Larry L Miller SEM/SEO Consulting

Private Line: 321-594-4405
Skype: larrylmiller121

The Most Powerful Link on the Internet!

No comments:

Post a Comment