Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
All Posts
Conversion Rate Optimization
Copywriting
CueBlocks
CueBlocks Case Studies
Culture
Events
Google
Google Analytics
Google Search Console
Industry
Magento
Marketing
Mobile
Online Marketing
Pay Per Click
Recommended
SEO
Shopify
Social Media Marketing
Staff Favourites
Technical SEO
Technology
Usability and Accessibility
UX Design
Video
Web Development
Twitter Facebook Google Plus
Pinterest

How to download 404 errors from Google Search Console with Linking pages

The Google Search Console is of great help to Webmasters with several useful features like sitemap submission, search analytics, links to your site, crawl errors report. With the GSC, the SEO specialists have a powerful tool at their fingertips to direct the course of their SEO requests.

However, the Google Search Console also has several aspects which are somewhat frustrating. For eg; the “Links to your site” feature only show a maximum of 1000 domains. Now, Imagine how many domains link to a site like Moz or Buzzfeed! Do you think the Google Search Console serves them well?

The sitemap tool shows how many URLs were indexed but does not distinguish between the non-indexed and indexed URLs. And well there are other similar annoyances. (Update: Google is testing a beta feature called Index Coverage report that will show indexed pages count and reasons why some pages could not be indexed. Read more on Google Webmaster Blog.)

Another point that annoys Google Search Console (GSC) users is the Crawl Errors report. The Search Console shows us the crawl errors, the HTTP status code of the error (404, 503 etc), and also the source of the error. The bothersome part is you have to click on each URL to view where the error originates from.

Search Console 404 Errors

 

Another problem is the 1,000 errors limit. You’ll have to fix the existing ones to view newer ones.

Solution: Download crawl errors with source using Google API Explorer

The Google API explorer is a savvy tool to communicate with numerous Google APIs. But we need to concern ourselves only with the Search Console API.

You’ll need Full Access to use the API and Search Console account, so make sure you have them beforehand.

Once logged in, you’ll notice the Search Console API offers several Services, 13 services to be exact. But our focus here is the one labeled webmasters.urlcrawlerrorssamples.list. So, go ahead and click on it.

The next screen will look like the image below. Beyond this point it’s a simple 3 step process:

404 Crawl Errors List-Search Console Api Explorer

1. Fill in the parameters and Execute the query

The fields you notice on the next screen read as follows:

  • Site URL – Quite self-explanatory
  • Category – This field is for denoting the type of errors you’d like to filter it with. Luckily for us, it offers an easy to use drop down with possible fields. For 404 errors, choose notFound as the option.
  • Platform – This refers to the user agent or simply the type of device you wish to retrieve errors for. Again an easy drop down to help us out. We’ll be selecting web for now.
  • Fields – This specifies the data you want to retrieve along with crawl errors like source, error detected on date, last crawled date, error code etc.

Use the fields editor to select required fields. The available ones are:

  • urlCrawlErrorSample Provides information about the sample URL and its crawl error
  • first_detected – Know the date when the error was first detected
  • last_crawled – Is the date when the URL was first crawled
  • pageUrl – The URL of crawl error
  • responseCode – duh
  • urlDetails To retrieve more details about error URL. Selecting it gives you two options:
    • containingSitemaps – Sitemap URLs pointing to the crawl error
    • linkedFromUrls – Source of the crawl error. The root of all the fuss!

I prefer selecting all the data and filtering it down later. You are free to choose your path.

Oh! Do turn on the Authorize requests using OAuth 2.0 button on top right (marked with an orange rectangle in the image)

or

Once all the fields are set, just click the Authorize and execute button.

Authorize Button

You’d be greeted with a popup before the report runs, like in the image below:

OAuth 2.0 Confirmation - Search Console API

Check both options and hit the Authorize and execute button.

2. Convert the output JSON to a CSV

If everything goes well, you should see a status 200 OK message with a JSON output similar to the image below

Crawl Errors List - JSON Output

Copy the output after the first curly bracket { till the last } curly bracket in the end.

Now, head over to a JSON to CSV converter. I prefer https://konklone.io/json/ for the simple interface, a preview window, and the ability to download the CSV.

Note: I tried another converter too but the report wasn’t 100% accurate. Test your converter before you finalize on a report.

Paste the JSON output and convert it into a CSV.

3. Download the CSV and Fix 404 errors

This is pretty much self-explanatory! Download your CSV and start fixing those 404 errors.

Word to the wise: Google Search Console may not list sources or linking domains for all errors. So don’t worry if some rows returned are empty.

Conclusion:

404 errors may not seem critical but fixing them helps saves crawl budget, improves user experience, and also helps retain link value.

You may use the same process to find and fix other errors. The API includes 9 types of errors in the Google Search Console that you can fix. You don’t have to wreck your brain to search them out; we have listed all error types below.

  1. authPermissions
  2. flashContent
  3. manyToOneRedirect
  4. notFollowed
  5. notFound
  6. other
  7. roboted
  8. serverError
  9. Soft404

I hope you find this useful! Now head to your laptop and start fixing those 404 errors.

Rajiv Singha

I find happiness in creating things, more often from scratch. Someday, I'd like to work on a toned down version of JARVIS. Exploring tech around the world is another activity I enjoy.

MORE POSTS
Show Comments

9 Replies to “How to download 404 errors from Google Search Console with Linking pages”

  1. Thanks very much for this excellent post Rajiv! Just tried it and its very handy! I was just wondering when you think you’ll find a method to extract more than 1,000 URL’s as that would be ideal for larger websites! Thanks again and all the best! Alex

  2. Hi Rajiv,

    Great walk-through of the API.

    Using the Search Console interface I was able to download 1,325/5,929 errors while the API only returned 1,000. Any idea what I’ve missed?

    Thanks again – JR

    1. Hi John,

      Thank you! Unfortunately, the API explorer seems to be limited to 1000 results only. I am working on a way around this. I’ll make it another blog post once it is complete.

      Rajiv

Add a comment

GET OUR NEWSLETTER