Do the SEO scan

Affiliate Marketing

Artificial intelligence

Branding

Conversion optimization

Copywriting

Corporate identity

Direct marketing

E-commerce

Email Marketing

Financial matters

Graphic design

Legal Affairs

Customer Service

Marketing Analysis

Marketing Strategy

Market Research

Mobile marketing

Online advertising

Online marketing

Out of home

Print Marketing

Products and companies

Project management

Public relations

Radio and television

Retail Marketing

Sales

Social media

Technology and ict

Telemarketing

Web Analysis

Web development

Search engine advertising

Search engine marketing

Search Engine Optimization

Most downloaded

From €0 to €1 per day with Affiliate Marketing

Affiliate marketing

Amazon marketing

Content marketing

Conversion optimization

Copywriting

E-commerce

Email marketing

Guest blog

Google

Manuals

Link Building

Marketing

NFT

OMA

Podcast

Psychology

Reviews

SEA

SEO

Social media

Voice search

Web design

Web Statistics

Robots Exclusion Standard

Also called: robots Exclusion Protocol or robots.txt

The Robots Exclusion Protocol includes some capabilities to give search engines instructions on which pages within a Web site may and may not be indexed. These capabilities can be used when it is undesirable for certain pages to be included in search results.

The "robots" are usually search engine spiders, programs that constantly scour the Web for new information for search engines. There are also web robots for other purposes. Whether the instructions are followed according to the standard depends on the particular robot. Thus, the protocol offers no guarantees. The crawlers of most major search engines (such as Google and Bing) respect these standards.

One protocol that serves to get pages included in search engines just the same is the XML Sitemap.

Robots.txt

Robots.txt is a file stored within the root directory of a domain (domainname.co.uk/robots.txt) that tells search engines which locations within the Web site they may or may not query.

Example of a robots.txt file:

User-agent: *
Disallow: /cgi-bin/
Disallow: /admin/

In this example, all robots are instructed not to crawl locations within the /cgi-bin/ and /admin/ directories. This example immediately highlights a disadvantage of robots.txt: the openness of the file can also actually expose locations we would rather not bring to attention.

Multiple terms and rules can be placed underneath each other. For example, a section that focuses purely on Google's crawlers starts with the rule "User-agent: googlebot." The prefix "Allow:" can also be used to create exceptions that are actually allowed to be accessed.

Robots.txt purely indicates which locations should not be queried by spiders. In theory, a search engine can include such a location in its search results, it just has no knowledge of the content of the page.

Robots-metatag

Another option for influencing spiders' behavior is a special robot meta tag. This HTML tag does not prevent spiders from retrieving the content of a page, but it does then give more control over what happens to the location and content.

Example of a robots meta tag:

<meta name="robots" content="noindex,nofollow" />

This example prescribes that the location of the page may not be included in search results. Also, hyperlinks on the page may not be followed. Counterparts of "noindex" and "nofollow" are "index" (do include) and "follow" (do follow links).