Google Digital Marketing | SMO & SEO Tips | Search Engine Optimization & SEM: June 2014

Monday, June 23, 2014

About Robots.txt file

The standard of Robots.txt was proposed by Martijn Koster, when working for Nexor 1994 on a mailing list, the main communication channel for related activities at the time. Koster suggest robots.txt, after he wrote a badly-behaved web spider that caused an inadvertent denial of service attack on Koster's server.

Robots.txt uses to control the crawling of websites. If a site owner wants to give some instructions to search engine robots about the crawling, they must place a text file in website root directory folder called robots.txt (e.g.www.example.com/robots.txt). You need robots.txt file only if your site have some content that you don't want search engines to index.

If a robot wants to vists a Website, Before it does so it firsts checks for robots.txt. And check about the allow and disallow section in the website.

User-agent: *
Disallow: /

The User-agent: * means this section applies to all robots. The Disallow: / tells the robot that it should not visit any pages on the site.

To block the entire site,
User-agent: Googlebot
Disallow: /

To block a page,
User-agent: Googlebot
Disallow: /private-file.html

To remove a directory and everything in it from Google search,
User-agent: Googlebot
Disallow: /junk-directory/

To block a specific image from Google Images,
User-agent: Googlebot-Image
Disallow: /images/dogs.jpg

To block all images on your site from Google Images:
User-agent: Googlebot-Image
Disallow: /

To block files of a specific file type (for example, .jpg,.gif),
User-agent: Googlebot
Disallow: /*.gif$

Robots.txt play an important role to fix crawling errors of websites.

Malware robots can ignore your /robots.txt. That scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.