Google Explains Why URLs Blocked By Robots.txt Can Still Be Indexed via @sejournal, @martinibuster
Google explained on March 13, 2024, why URLs blocked by robots.txt can still appear in search results. Search Console had flagged approximately 51,000 URLs as "Indexed, though blocked by robots.txt," prompting Google's clarification. The company stated that this situation is not inherently problematic and can occur when Googlebot discovers a URL through other means, such as links from other websites or sitemaps, before it encounters the robots.txt file that disallows crawling. If Google has already indexed the content of such a URL before the robots.txt rule is applied, it may continue to show the URL in search results, even if it cannot recrawl or update the page. Google emphasized that blocking a URL via robots.txt does not guarantee its removal from the index if it has already been discovered and indexed. To ensure a URL is removed from Google's index, webmasters should use the 'Remove URLs' tool in Search Console or implement a noindex directive within the page's HTML or HTTP headers. The Search Engine Journal reported on this explanation.
Original source — read the full reporting at the publisher:
Read on Search Engine Journal