How to Make a robots.txt file
Sometimes you want parts of your website to be excluded from search engine listings. You don't want the search engine to see these pages in your website. There could be a lot of reasons for this like the pages are incomplete or not ready for search engine listing. Or maybe you want to keep the information private.
If you use google adsense and you use their search engine function on your website you also have to exclude the search results page from the google index bot. This is a requirement of google. So as a minimum you should have this noted in your robots.txt file.
Putting a robots.txt file is easy and I am going to show you exactly how to do it.
Let's Make a robots.txt file
What is a robots.txt file? It is a simple text file that you create in a simple word processor program like notepad. It has commands in it that tell spiders specific instructions like don't crawl certain pages in the site.
A simple robots.txt file
Create a document in notepad and save it as robots.txt
Where do you save it?
You save it in the root directory of your website. For example, on this site I save my robots.txt file to
About the file
Here is the simplest form of a robots.txt file:
These are two of the most common commands. The first line is a robot specification line. You would put on this line any robots you want to specifically address in further commands. In this case the "*" means all robots. If you want a list of robots you can find it here: Robots List
The second line is the disallow command. Here you put the url's you want to be ignored by the robots.
So, here is an example:
These two lines tell all robots to not crawl the page called personastuff.htm . It's quite easy and remember that you have to keep the folder structure correct. In this case the page "personalstuff.htm" is in the root folder of the website. If it were buried deeper in a folder you have to get that url location correct. So if it were in a folder called "private" you resolve it by making the exclusion point to:
Want to block this whole directory? That is easy enough. All you have to do is put the directory name. So it looks like this:
Now everything in the private directory will be skipped by the robot.
A lot of people are afraid of the robots.txt file because it seems like programming and falls out of the normal realm of making webpages. But you should really spend just a little bit of time to create and maintain one. It is one file, in one location and easy to update as you need to.
Alternative method to robots.txt
Okay, if you don't want to create this file you can still exclude pages from the search engines by using the on page method.
In the head section of a webpage you can put this command:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
This tells robots not to index this page or to follow any of the links on the page. If you want search engines to follow the links on the page without indexing the content just leave off the "NOFOLLOW" command.
Note: The < and > are part of the command so make sure you add them in.
Same thing applies if you want a spider to index the page but not follow the links then you just leave out the NOINDEX command.
Remember: If you use Google adsense on your site, and you have their search engine that shows up in a search results page on your site it is mandatory that you have a robots.txt file that excludes the search results page from robots. This is required by google. They don't want the search results page showing up in their search listings. It's redundant.
More Resources and some help from google.com
Here is an awesome tutorial straight from google itself. They tell you exactly what they want and easil explain how to make a robots.txt file. If you work with google (who doesn't?) you should read this stuff for webmasters.