Everything You Need to Know About Robots.txt

robots google crawler referral spams

Have you experienced a drop in your traffic lately? For no visible reason, Google seems to have stopped liking your content and you feel clueless about what’s happening. There could be multiple reasons for such a drop in traffic. We will look at one of the most prominent reasons for drop in traffic – de-indexation. Your pages are said to be de-indexed if they are removed from Google search index.

Before we try to find the reason for de-indexation, let’s first be sure that Google has indeed de-indexed the pages on your website. Follow the steps below to find out the number of pages indexed by Google:

  1. Log into Google Webmaster Tools. (If you do not have access to the webmaster tools, Sign Up today)
  2. In the left side menu, go to Google Index.
  3. Choose Index Status
  4. In the Basic option, you can see a graph showing number of indexed pages and their movement over the last one year. If you choose advanced option, you will additionally be able to view “Removed” as well as “Blocked by robots” pages as well.

Checking Your Site’s Robots.txt

As a part of the basic sanity check that you should do when you encounter an unprecedented drop in traffic, is to check your robots.txt file. How do you do that?

In the address bar, type www.yoursitename.com/robots.txt (replace “yoursitename” with the domain name of your site). Check the results (make sure you use all small case letters while writing robots.txt).

The robots.txt file tells the robots or bots which pages to visit and which pages to ignore. All good bots (including all major search engines) will first go through your robots.txt to find out which pages you want them to ignore. Malware bots generally ignore robots.txt owing to their malicious intentions.

As a rule of thumb, you should not stop a robot from accessing any page on your site. Remember, the more pages that get indexed leads to higher traffic on the site. You would encounter one of the below mentioned structure of robots file.

Locating Robots.txt file

This file is found in the root folder of your website. If you are using WordPress, you may see the content of the robots.txt using the method described above, but you may not be able to locate it in your root folder. This is because WordPress creates a virtual robots.txt. In order to modify/update this file, you need to create a custom robots.txt file. Just open Notepad (or any other text editor), copy the below code and save it as “robots.txt” (No spelling mistake is allowed here). Upload it to your root folder using FTP/cpanel or any other means and you are all set.

Code to be copied to robots.txt:
User-agent: *
Disallow: /wp-admin/

Please note that their are some WordPress plugins which let you create/update robots.txt file. I would recommend creating this file manually.

Structure of Robots.txt

Syntax explanation: “User-agent: *” means this advisory applies to all the robots. The operator after the “Disallow:” function (or the absence of any operator) tells the robots which files or folders they do not have to index. There is no “Allow” field in robots.txt

  • Do not allow any robot to access any file on the server

User-agent: *

Disallow: /

  • Do not allow access to all robots from some of the folders on the server

User-agent: *

Disallow: /temp/

Disallow: /junk/

  • Do not allow access to a single robot

User-agent: MalwareBot

Disallow: /

(Replace “MalwareBot” with the name of the bot that you do not want to access your files)

  • Allow access to only one bot

User-agent: GoodBot


(Replace “GoodBot” with the name of the bot that you want to access your files)

  • Allow access to all files (or folders) except one file (or folder)

There are two ways of pulling it off.

1. Put all files to be disallowed into a separate directory, say “no-see”.

Your code will look something like this:

User-agent: *

Disallow: /~badstuff/no-see/

Needless to say, this is an easier and faster method.

2. Alternatively, you may explicitly disallow pages one by one:

User-agent: *

Disallow: /~badstuff/junk1.html

Disallow: /~badstuff/junk2.html

Disallow: /~badstuff/junk3.html

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *