Overview of crawling and indexing topics

The topics in this section describe how you can control Google's ability to find and parse your content in order to show it in Search and other Google properties, as well as how to prevent Google from crawling specific content on your site.

Here's a brief description of each page. To get an overview of crawling and indexing, read ourHow Search works guide.

Topics
File types indexable by GoogleGoogle can index the content of most types of pages and files. Explore a list of the most common file types that Google Search can index.
URL structureConsider organizing your content so that URLs are constructed logically and in a manner that is most intelligible to humans.
SitemapsTell Google about pages on your site that are new or updated.
Crawler management
robots.txtA robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site.
Canonicalization Learn what URL canonicalization is and how to tell Google about any duplicate pages on your site in order to avoid excessive crawling. Learn how Google auto-detects duplicate content, how it treats duplicate content, and how it assigns acanonical URL to any duplicate page groups found.
Mobile sitesLearn how you can optimize your site for mobile devices and ensure that it's crawled and indexed properly.
AMPIf you have AMP pages, learn how AMP works in Google Search.
JavaScriptThere are some differences and limitations that you need to account for when designing your pages and applications to accommodate how crawlers access and render your content.
Page and content metadata
Removals
Site moves and changes

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-10 UTC.