Google Digital Marketing | SMO & SEO Tips | Search Engine Optimization & SEM: Robots.txt
Showing posts with label Robots.txt. Show all posts
Showing posts with label Robots.txt. Show all posts

Monday, June 23, 2014

About Robots.txt file

The standard of Robots.txt was proposed by Martijn Koster, when working for Nexor 1994 on a mailing list, the main communication channel for related activities at the time. Koster suggest robots.txt, after he wrote a badly-behaved web spider that caused an inadvertent denial of service attack on Koster's server.

Robots.txt uses to control the crawling of websites. If a site owner wants to give some instructions to search engine robots about the crawling, they must place a text file in website root directory folder called robots.txt (e.g.www.example.com/robots.txt). You need robots.txt file only if your site have some content that you don't want search engines to index.

If a robot wants to vists a Website, Before it does so it firsts checks for robots.txt. And check about the allow and disallow section in the website.

User-agent: *
Disallow: /

The User-agent: * means this section applies to all robots. The Disallow: / tells the robot that it should not visit any pages on the site.

To block the entire site,
User-agent: Googlebot
Disallow: /

To block a page,
User-agent: Googlebot
Disallow: /private-file.html

To remove a directory and everything in it from Google search,
User-agent: Googlebot
Disallow: /junk-directory/

To block a specific image from Google Images,
User-agent: Googlebot-Image
Disallow: /images/dogs.jpg

To block all images on your site from Google Images:
User-agent: Googlebot-Image
Disallow: /

To block files of a specific file type (for example, .jpg,.gif),
User-agent: Googlebot
Disallow: /*.gif$

Robots.txt play an important role to fix crawling errors of websites.

Malware robots can ignore your /robots.txt. That scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.