Disallow
Disallow is a part of a robot.txt file. When a bot or crawler indexes your website, it first checks the robot.txt file. A crawler or bot are search engine tools that go through your website to measure, for example, how SEO friendly your site is. A robot.txt file looks like this:
- User-agent: [name of user-agent]
- Disallow:[URL query string not to be crawled]
When a rule begins with 'user-agent,' the search engine bots know that they must abide by the rules below it. 'Disallow' indicates that crawlers should not visit a particular page, files or folders.
Disallow and allow
You can choose to set all pages on your website to 'Allow', then all pages will be scanned/scrawled. So by setting 'Disallow' you determine which page should not be scanned/scrawled. If you deploy a full 'Disallow', where no single page or directory is allowed to be crawled, you may get a lower position in Google. This is because after the home page, no other pages are allowed to be scanned.
Therefore, it is wiser to enter specific 'Disallow'. Behind 'Disallow' you can specify which folders and/or pages you don't want crawled on your site. All you have to do for this is enter the '/text/' after the URL after 'Disallow:'. The advantage of 'Disallow' is that you can measure more specifically on your site. However, be sure to enter the correct pages, files and folders for your website to be indexed properly.
Robot.txt facts
To make good use of Disallow on your website, there are some important facts of the Robot.txt file:
- The robot.txt file must be found at the same level as the index.php to be visited first by search engine bots
- If a 'Disallow' page contains external links, it can still be indexed by search engines
- Pay close attention to the references you make in the robot.txt file, otherwise it may lead to ranking loss in the search engine
- In Google Webmaster Tools, you can test the robot.txt file for errors