Google announced early this month that they will be discontinuing their unofficial support of noindex directives within robots.txt files. As of 1 September 2019, publishers who are currently relying on robots.txt crawl-delay, nofollow, and noindex directives will need to find another way to instruct search engine robots how to crawl their sites’ pages.

These policy updates will mean rapid changes are necessary for many web publishers in preserving their approach to search engine optimisation. Are you one of them? Continue reading to learn more about Google’s major policy update and what it will mean moving forward.

What are robots.txt files?

Webmasters create robots.txt text files to direct user agents (web crawlers) how to crawl the pages on their websites. These text files either allow or disallow web robots such as search engine crawlers to engage in specified behaviour.

Robots.txt files can be as short as two lines long. The first line specifies a user agent to which the directive applies. The second provides specific instructions to that user agent, such as allow or disallow. Specific web crawlers will disregard robots.txt files that are not directed at them, and some will ignore the files that are. Google will soon count themselves among the latter group.

An example of noindex robots.txt directives, which Google will soon stop acknowledging.

An example of disallow robots.txt directives, which will still function.

Google has long discouraged publishers from using crawl-delay, nofollow, and noindex directives within robots.txt files, but have followed most directives in spite of having no standardised policy toward them.

Why is Google ditching robots.txt now?

Google has spent years trying to standardise their robot exclusion protocol so that they can move ahead of this change. This is why Google has also long encouraged publishers to find alternatives to robots.txt directives.

In their announcem e nt, Google said they were making the Robots Exclusion Protocol (REP) an internet standard. To do so, they have open-sourced the C++ library they used to parse and match roles in robots.txt files. The 20-plus-year-old library, along with a testing tool offered by Google, can help developers create the parsing tools of the future.

How can you keep controlling crawling on your site?

Disregarding robots.txt noindex does not leave publishers without means to control crawling on their sites.

Use robots meta tags to noindex

Robots meta tags are supported in HTTP response headers, as well as HTML.

Use HTTP status codes 404 and 410

These status codes tell crawlers that the page does not exist, thus dropping it from Google’s index once they’ve been crawled and processed.

Hide content behind password protections

Unless you’ve signaled subscription or paywall content with markup, content that is concealed behind login pages will often remove it from Google’s index.

Disallow in robots.txt

If search engines are disallowed from crawling a page, that content cannot be indexed.

Use the Search Console Remove URL tool

The Search Console Remove URL tool allows for the temporary removal of a URL from Google’s search results.

Adapting your noindex approach for the future

If your sites have relied on robots.txt noindex directives to avoid search engine indexing, you have until 1 September to make the necessary changes. If you’re not sure whether your site uses noindex directives, it would be wise to double check. Indexing certain pages you want concealed can cost your website in search rankings.

For help auditing your site’s indexing and search performance, continue reading the Pure SEO blog for more information, or contact us today for the SEO support you need.

Google Announces End of Support for Robots.txt Noindex Directives

What are robots.txt files?

Why is Google ditching robots.txt now?

How can you keep controlling crawling on your site?

Use robots meta tags to noindex

Use HTTP status codes 404 and 410

Hide content behind password protections

Disallow in robots.txt

Use the Search Console Remove URL tool

Adapting your noindex approach for the future

Rollan Schott

GET ACTIONABLE ADVICE, WEEKLY

Follow Us

FREE SEO REPORT - INSTANT

Categories

Ready to take your brand to the next level?
We are here to help.

Google Announces End of Support for Robots.txt Noindex Directives

What are robots.txt files?

Why is Google ditching robots.txt now?

How can you keep controlling crawling on your site?

Use robots meta tags to noindex

Use HTTP status codes 404 and 410

Hide content behind password protections

Disallow in robots.txt

Use the Search Console Remove URL tool

Adapting your noindex approach for the future

Rollan Schott

GET ACTIONABLE ADVICE, WEEKLY

Follow Us

FREE SEO REPORT - INSTANT

Categories

Trending Posts

Google Search Marketing for Academic Articles

Tips For Using Google Search Effectively

GEO vs SEO: How to Optimise for AI Search Engines

The Pure SEO Guide to Heading Structure

Facebook to Merge Messenger, WhatsApp, and Instagram Messaging Platforms

How to Verify Your Domain on Facebook: Step-by-Step Guide

Ready to take your brand to the next level? We are here to help.

Ready to take your brand to the next level?
We are here to help.