A Basic Understanding of robot.txt File to draw SEO Benefits
What is robots.txt file?
The simple answer is: robots.txt file is a text file that tells Web Crawler (such as Google Bot) which pages of your site to crawl and/or not to crawl. Robots.txt file functions with root structure of your website.
This means you can control Web Crawler on giving or not giving permission to enter and crawl your site by adding and editing robots.txt file.
The terms that are closely associated with robots.txt story are Crawling and Indexing. For a deeper understanding of the present post, just read up the following-
Now let’s go deeper of robots.txt file-
A glimpse of what is robots.txt file
Every web master expecting Search Engine traffic for their sites have to submit them in Renowned Search Engines (SE) like Google, Yahoo, Bing, Ask.com, Yandex etc. When someone (users) search contents relating to your site or posts, SE shows result giving your site a place in the SERP (Search Engine Result Page). Your page may be positioned in any (1st or 100th) page.
Now question, how Google or other Search Engines fix position of your site/page among millions on the basis of a Search Key word? The easy answer is: First by Crawling and Indexing and then by following a complex algorithm.
QUESTION: How do Search Engines crawl and index millions of sites?
ANSWER: By the help of a Robot commonly known as Web crawlers, web spiders, Search Engine bots or internet bots which are actually a software.
After you submit a site to SE, Web Crawler enters your site following your sitemap (you submitted earlier) or any link leading to that site. Now if your leave your robots.txt file untouched, SE bot will crawl all of your posts, page, admin panel, categories, tags, contact page, support page; in a word- the entire website.
If your website is non-professional and not relating to public engagement, it won’t matte whether you care of robots.txt file or not. But if your site is SEO-sensitive, money-generating, business or eCommerce site, robots.txt file is a factor for you indeed.
Why to add or edit robots.txt file
QUESTION: How do I require to edit robots.txt file?
ANSWER: Followings are the situations in which you need to add (if not any) or edit (if it exists) robots.txt file-
So, we can analyse—
- You might have elements in your site you won’t like get indexed.
- Also there are files that are irrelevant and unnecessary to index even for SE.
- You may like to reduce Server load due to crawling or optimize crawl budget.
QUESTION:What do ‘reduce Server load’ and ‘optimize crawl budget’ mean?
The more time Crawler will explore in your site, the more effect it will cause on Server resulting in ‘slowness of server’ You can reduce this slowness by editing robots.txt file.
Web crawler enters your site sequentially-at a regular interval as per your new and updated posts-flow. When it comes, it has allocated resources for your site. It is called crawl budget. This means it is predetermined how many pages crawler will index within limited time.
If it cannot crawl all the pages with allocated resources, it will stop crawling and thus your indexing will be hampered. You can fix this problem by editing robots.txt file.
QUESTION: Can I get sure Web Crawler won’t index my pages that I limits by robots.txt editing?
ANSWER: Absolutely not. Robots start crawling your site by following robots.txt rule. But they can also reach your restricted page by following Backlink leading to your site from anywhere around internet and index them.
QUESTION: Then what is the value of robots.txt struggle?
You can completely prevent Bot from indexing your site by using Seo tools like Yoast SEO or All in One Seo by applying Noindex.
Keep in mind, Disallowing a page in robots.txt and Noindex are not the same. Many a webmaster gets confused of this and makes mistake. You’ll edit robots.txt to ensure web crawlers’ presence in the essential pages only; for better, deeper and faster indexing that will result in SEO success.
List of Search Engine Bots
- Google: Googlebot
- Yahoo: Slurp
- Bing: Bingbot
- DuckDuckGo: DuckDuckBot
- Baidu (China): Baiduspider
- Yandex (Russia): Yandex Bot
- Exalead (France): Exabot
- Google Image: Googlebot-Image
- Google News: Googlebot-News
- Google Video: Googlebot-Video
Understanding robots.txt file structure
Robots.txt file is a plain text file looking as shown below-
Example above shows that Googlebot is permitted to crawl Plugins and Uploads folder and not permitted wp-admin and cgi-bin.
Here User-agent * means given permission to all Web Crawlers. Allow: / means everything is allowed and Disallow: means nothing is disallowed to crawl.
This means all Bots are given permission to crawl all but Bingbot is disallowed to crawl support page.
In WordPess platform, you’ll find robots.txt file in your site’s root folder. If it is not there, you can assume that your site has no robots.txt file. In this case you have to create a new one.
In Blogspot, find robots.txt here-
How to add or edit robot.txt file?
In Blogspot, you can edit robots.txt file from Setting following the path I have shown above. If you use Yoast SEO plugin for WordPress, you can edit it from here-
SEO→ Tool→ File editor→ robots.txt
Alternatively, in WordPress, enter Hosting Cpanel via FTP Client like FileZilla and you can edit robots.txt from public_html→ robtos.txt. If you fail to find it there, do the following to create and add a robots.txt file-
- Right click there.
- Click Add new file
- Name the file robots.txt
- Now download it to your computer
- Open it with File Editor like Notepad or TextEdit.
- Create the file
- Upload it to site’s root folder. You’re done.
That’s all for robots.txt file and how to optimize it for SEO. Hope you enjoyed this post.