Overview of crawling and indexing topics

The topics in this section describe how you can control Google's ability to find and parse your content in order to show it in Search and other Google properties, as well as how to prevent Google from crawling specific content on your site.

Here's a brief description of each page. To get an overview of crawling and indexing, read ourHow Search works guide.

Topics
File types indexable by Google	Google can index the content of most types of pages and files. Explore a list of the most common file types that Google Search can index.
URL structure	Consider organizing your content so that URLs are constructed logically and in a manner that is most intelligible to humans.
Sitemaps	Tell Google about pages on your site that are new or updated.
Crawler management	Ask Google to recrawl your URLs Managing crawling of faceted navigation URLs Large site owner's guide to managing your crawl budget How HTTP status codes, and network and DNS errors affect Google Search Google crawlers
robots.txt	A robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site.
Canonicalization	Learn what URL canonicalization is and how to tell Google about any duplicate pages on your site in order to avoid excessive crawling. Learn how Google auto-detects duplicate content, how it treats duplicate content, and how it assigns acanonical URL to any duplicate page groups found.
Mobile sites	Learn how you can optimize your site for mobile devices and ensure that it's crawled and indexed properly.
AMP	If you have AMP pages, learn how AMP works in Google Search.
JavaScript	There are some differences and limitations that you need to account for when designing your pages and applications to accommodate how crawlers access and render your content.
Page and content metadata	Use valid HTML to specify page metadata All`meta` tags that Google understands Robots`meta` tag,`data-nosnippet`, and X-Robots-Tag specifications Block indexing with the`noindexmeta` tag Make your links crawlable Qualify your outbound links to Google with`rel` attributes
Removals	Control what you share with Google Remove a page hosted on your site from Google Remove images hosted on your page from search results Keep redacted information out of Google Search
Site moves and changes	Redirects and Google Search Site moves Minimize A/B testing impact in Google Search Temporarily pause or disable a website

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-10 UTC.

Movatterモバイル変換

Overview of crawling and indexing topics