Strange Pages from Your Blog on Google Search Console
Crawled Currently, Not Indexed
I would like to address a common issue that many website owners encounter when using Google Search Console.
It is not uncommon to come across strange pages listed as ‘Crawled Currently Not Indexed’ in the console, even though these URLs do not correspond to any real pages on your website.
This can be quite confusing and leave you wondering where these pages came from.
What is Google Search Console
Google Search Console is a tool provided by Google that allows website owners to monitor and troubleshoot their site’s presence in Google Search results. It provides valuable insights into how Google views and interacts with your site.
Firstly, it is important to understand that Google Search Console provides valuable insights into how Google crawls and indexes your website.
It allows you to monitor your website’s performance in search results and identify any issues that may be affecting its visibility.
One of the features of the Search Console is the ‘Crawled Currently Not Indexed’ section, which displays URLs that Googlebot has crawled but not yet indexed.
Now, “strange pages” in Google Search Console, typically refer to pages that Google has indexed but are unexpected or unwanted. These could be pages that were not intended for indexing, such as internal search result pages, duplicate content, or pages with thin or low-quality content.
Why You Might Find Strange Pages
Here are some common reasons why you might find strange pages in Google Search Console:
1. Thin Content
Pages with very little content or value may be indexed by Google, especially if they are linked to from other pages on your site.
2. Duplicate Content
If you have multiple URLs serving the same content (e.g., HTTP vs. HTTPS, www vs. non-www), Google may index all versions, leading to duplicate content issues.
3. Parameterized URLs
Dynamic URLs with parameters (e.g., session IDs, tracking parameters) can create numerous variations of the same page, all of which may be indexed by Google.
4. Faceted Navigation
E-commerce sites often have faceted navigation systems that generate numerous URL variations for filtering products. These can result in a large number of indexed pages that offer little unique content.
5. Internal Search Pages
Pages generated by on-site search functionality may be indexed by Google, leading to a large number of low-quality search result pages in the index.
6. Orphaned Pages
Pages that are not linked to from anywhere on your site but can still be accessed via direct URL entry or through external links may get indexed.
7. Hacked Pages
In some cases, hackers may inject malicious content into your site, creating spammy pages that get indexed by Google.
Addressing These Issues
To address these issues, you can take several steps:
Use robots.txt
Use the robots.txt file to block Googlebot from crawling certain parts of your site that you don’t want indexed.
Use meta robots tags
Use meta robots tags to prevent indexing of specific pages or sections of your site.
Canonicalization
Implement canonical tags to indicate the preferred version of a page when multiple versions exist.
URL Parameters
Use the URL Parameters tool in Google Search Console to instruct Googlebot on how to handle URLs with parameters.
Monitor Security
Regularly monitor your site for any signs of hacking or unauthorized access, and take immediate action if any issues are detected.
And of course, regular monitoring of your blog’s performance in Google Search Console can help you identify and address any issues with strange or unexpected pages in the search index.
Pages That Are Not Part of Your Blog
When you come across strange pages listed in this section, it is likely that these URLs are not actually part of your website. There are a few possible explanations for this:
Old or deleted pages
Sometimes, Googlebot may continue to crawl pages that have been removed or deleted from your website.
This can happen if these pages were previously indexed by Google and have not yet been completely removed from their index. In such cases, you can safely ignore these URLs as they do not exist anymore.
URL parameters
If your website uses URL parameters, such as session IDs or tracking codes, Googlebot may occasionally generate URLs with different parameter combinations.
These URLs may appear in the ‘Crawled Currently Not Indexed’ section, but they are not actual pages on your site.
You can use URL parameter handling in the Search Console to specify how Google should treat these URLs.
External sources
It is also possible that these strange URLs are being generated by external sources.
For example, if other websites are linking to non-existent pages on your site, Googlebot may crawl these URLs and list them in the Search Console.
In such cases, it is important to regularly monitor your backlinks and address any issues with external sites linking to incorrect URLs.
What Else Can You Do!
If you are certain that these strange pages are not a result of old or deleted pages, URL parameters, or external sources, there may be a chance that your website has been compromised.
Hackers sometimes create fake pages or inject malicious code into your website, which can lead to the creation of these strange URLs.
In such cases, it is crucial to take immediate action to secure your website and remove any malicious code.
To further investigate the issue, you can use the ‘Inspect URL’ tool in the Google Search Console.
This tool allows you to analyze individual URLs and understand how Google sees them.
It provides information on whether the URL is indexed, any crawling or indexing issues, as well as a preview of how the page appears in search results.
Conclusion
Encountering strange pages listed as ‘Crawled Currently Not Indexed’ in the Google Search Console can be puzzling.
However, in most cases, these URLs are not actual pages on your website and can be safely ignored. It is important to regularly monitor your website’s performance in the Search Console and address any issues that may arise.
If you suspect that your website has been compromised, take immediate action to secure it and remove any malicious code. Remember, the Search Console is a valuable tool that can help you optimize your website’s visibility in search results.