Common Crawlability Issues & How to Fix Them

Marionne Banaga

4 years ago

Crawl errors are sneaky, and it can prove to be difficult to trace back to a what caused the problem in the first place. Crawl errors do negatively impact your overall SEO but while they are challenging to handle, they aren’t a dead-end. Today, we delve deeper into what crawl errors are, why they’re bad for SEO, and how to address common issues.

Crawl errors—what are they?

Search engine bots work constantly to follow links, searching for public pages, eventually ending up on your website. They then crawl these pages and index all the content for use in Google. Crawl errors are problems that these bots encounter while trying to access your webpages that keep them from indexing or finding your pages. If you’ve spent a significant amount of time optimising your content but are having problems opening a page or moving from one page to another, it may indicate a crawlability issue.

Why crawl errors matter

Crawl errors hinder search engine bots from reading your content and indexing your pages. When a search engine is crawling your site and encounters an error, it will turn back to find another way through the site. You can then end up with pages that aren’t getting crawled, or pages being crawled more than necessary. Uncrawled rankable content is a wasted opportunity to improve your place in the SERPs.

Common crawlability issues

The great news is crawl errors can be solved. Here’s a rundown of the most common crawl errors that you should pay attention to, and how to address each one of them.

404 errors

404 errors are probably the most common issues that causes crawl errors. A 404 or Not Found error message when opening a web page indicates that the server couldn’t find the requested web page. While Google has stated that 404 errors do not negatively impact a site’s rankings because these pages will not be crawled, multiple 404 errors can ultimately affect overall user experience, so it’s best to be wary of them.

Solution: You need to redirect users away from non-existent URLs (to equivalent URLs if possible) to avoid a negative user experience. Review the list of 404 errors and redirect each error page to a corresponding page on the live site. Alternatively, you could serve 410 HTTP status code for the page to inform search engines that the page has been deleted permanently. That said, there may be a better solution depending on the cause, so we’ve outlined a few extra considerations below:

Broken links – Broken links happen when a URL you have linked from a page on your website is modified without a specific redirect in place. When a user clicks on the old link, a 404 Not Found error will pop up. It can be off-putting to some and may lead your site’s rankings to drop. To avoid this, you may want to review your website and implement 301 redirects, fix the broken links by replacing the link with a live URL, or simply delete them.
Soft 404 errors – A soft 404 error happens when a non-existent URL returns a response code other than 404 or 410. They can occur when several non-existent URLs are redirected to unrelated URLs. This leads search engines to waste time crawling and indexing non-existent URLs, instead of indexing existing URLs first. To resolve soft 404 errors, let your non-existent URLs return to standard 404 errors. This will benefit the website because bots can start prioritising and indexing your existing webpages instead.
Custom 404 error page – Using redirects to prevent 404 errors is good. However, having a few 404 errors here and there is almost always inevitable. The best practice is to display a custom 404 error page rather than a standard “File Not Found” message. A custom 404 error page will allow users to find what they’re looking for, as you can provide them a few helpful links or a site search function when they stumble upon your custom 404 error page by accident.

Page duplicates

Page duplicates are another common SEO issue that can trigger crawlability issues. Page duplicates happen when individual webpages that have the same content can be loaded from multiple URLs. For example, your website homepage can be accessed through the www version and the non-www version of your domain. While page duplicates may not affect website users, they can influence how a search engine sees your website. The duplicates make it harder for search engines to determine which page should be prioritised. They can also cause problems because bots dedicate a limited time to crawl each website. When bots are indexing the same content over and over, you reduce your crawl budget for important pages. Ideally, the bot would crawl each page once.

Solution: URL Canonicalisation is a pragmatic solution to counter page duplicates. You need to utilise the rel=canonical tag which sits with thesection of the website. The tag communicates to search engines what the original or “canonical” page is. Putting the appropriate tag on all pages ensures that search engines won’t crawl multiple versions of the same page.

Robots.txt failure

Before crawling web pages, the bot will try to crawl your robots.txt file to check if there are any areas on your websites you don’t want them to index. The problem occurs when the bot fails to reach the robots.txt file. When this happens, Google will postpone the crawl until it can reach the robots.txt file. It’s imperative to make sure the file is always available.

Solution: The robots.txt file must be hosted on the root of the domain, which should appear as: (https://websiteurl.com/robots.txt). Each domain and subdomain must have a corresponding robots.txt file if there are areas on the website that you don’t want to be included in search engine results. You should also remove blocked resources from your robots.txt file to ensure important pages appear in search engine results.

Wrong pages in the sitemap

XML sitemaps assist search engines to crawl your website more efficiently by providing a full list of web pages that might otherwise be missed when indexing. So when wrong pages or URLs are in your sitemap, you may mislead the bots with confusing instructions, such as blocking search engines from indexing important pages.

Solution: Make sure that the URLs in your sitemap are relevant, updated, and correct, with no typos or formatting errors. Get rid of old URLs which no longer serve a purpose in the sitemap. Here’s what you need to know when creating XML sitemaps:

Your sitemap shouldn’t exceed 50,000 URLs
Your sitemap shouldn’t be larger than 50MB when uncompressed
All the domains and subdomains must be the same
Your sitemap must be UTF-8 encoded

Slow load speed

The faster your pages load, the quicker the crawler goes through your pages. Slow load speed contributes to a poor user experience, and having multiple pages that take a long time to load means they are less likely to appear in the organic search results.

As mentioned in one of our blogs, Google announced that they will update their algorithms to accommodate a new ranking factor—page experience—starting on May 2021. This will measure your website’s loading performance, responsiveness, and visual stability.

Solution: Since search engines are pushing for a positive user experience overall, you want to make sure web pages load as fast as possible. Try minifying your CSS and Javascript and compressing image files to improve load speed. To ensure your web pages are loading at their optimal pace, measure your load speed using Google Lighthouse. It gives insights on your homepage’s performance and suggests how you can improve your overall site speed.

Using HTTP instead of HTTPS

Server security has become an essential part of crawling and indexing. HTTP is the standard protocol used for transferring data from a web server to a browser. The more secure counterpart of HTTP is HTTPS—the “S” stands for secure! HTTP uses an SSL certificate to create a secure encrypted connection between two systems. This is important to note because Google announced on December 2015 that it was adjusting its indexing system to prioritise HTTPS pages and crawl them first by default. This means search engines are pushing websites to switch from HTTP over to HTTPS.

Solution: We advise you to get an SSL certificate for your website and migrate entirely to HTTPS so Google can crawl your website faster.

Your website is not mobile-friendly

On July 2018, Google rolled out mobile-first indexing. The mobile-first initiative means that Google will look at the mobile version of your website first and measure its ranking signals before the desktop version. If your website has a mobile version, that will be the version that determines how your site ranks in both mobile and desktop search results. If your website has no mobile version, you will not be affected by the new mobile-first index.

Solution: To ensure optimum crawlability, you need to adopt mobile-friendly practices for your website, implement a responsive design, and ensure your pages are optimised for both mobile and desktop.

Fix all your crawl errors with Pure SEO

Crawl errors are inevitable, but they shouldn’t be left unaddressed. As a leading digital marketing agency, we can help you get to the root of the problem of your crawlability issues. To partner with our SEO specialists, contact us today!