Indexing your offline-capable pages with the Content Indexing API

Enabling service workers to tell browsers which pages work offline

Jeff Posnick
Jeff Posnick

Success: The Content Indexing API, part of thecapabilities project,launched in Chrome 84 for Android.

What is the Content Indexing API?

Using aprogressive web app means having accessto information people care about—images, videos, articles, and more—regardlessof the current state of your network connection. Technologies likeservice workers,theCache Storage API,andIndexedDBprovide you with the building blocks for storing and serving data when folksinteract directly with a PWA. But building a high-quality, offline-first PWA isonly part of the story. If folks don't realize that a web app's content isavailable while they're offline, they won't take full advantage of the work youput into implementing that functionality.

This is adiscovery problem; how can your PWA make users aware of itsoffline-capable content so that they can discover and view what's available? TheContent Indexing API is a solution to this problem. The developer-facing portionof this solution is an extension to service workers, which allows developers toadd URLs and metadata of offline-capable pages to a local index maintained bythe browser. That enhancement is available in Chrome 84 and later.

Once the index is populated with content from your PWA, as well as any otherinstalled PWAs, it will be surfaced by the browser as shown below.

A screenshot of the Downloads menu item on Chrome's new tab page.
First, select theDownloads menu item on Chrome's new tab page.
Media and articles that have been added to the index.
Media and articles that have been added to the index will be shown in theArticles for You section.

Additionally, Chrome can proactively recommend content when it detects that auser is offline.

The Content Indexing APIis not an alternative way of caching content. It'sa way of providing metadata about pages that are already cached by your serviceworker, so that the browser can surface those pages when folks are likely towant to view them. The Content Indexing API helps withdiscoverability ofcached pages.

Note: The Content Indexing API is not a searchable index. While you can get a listof all indexed entries, there's no way to query against indexed metadatadirectly.

See it in action

The best way to get a feel for the Content Indexing API is to try a sampleapplication.

  1. Make sure that you're using a supported browser and platform. Currently,that's limited toChrome 84 or later on Android. Go toabout://version to seewhat version of Chrome you're running.
  2. Visithttps://contentindex.dev
  3. Click the+ button next to one or more of the items on the list.
  4. (Optional) Disable your device's Wi-Fi and cellular data connection, or enableairplane mode to simulate taking your browser offline.
  5. ChooseDownloads from Chrome's menu, and switch to theArticles for You tab.
  6. Browse through the content that you previously saved.

You can viewthe source of the sample application on GitHub.

Another sample application, aScrapbook PWA,illustrates the use of the Content Indexing API with theWeb Share Target API. Thecode demonstrates a techniquefor keeping the Content Indexing API in sync with items stored by a web appusing theCache Storage API.

Using the API

To use the API your app must have a service worker and URLs that are navigableoffline. If your web app does not currently have a service worker, theWorkbox libraries can simplifycreating one.

What type of URLs can be indexed as offline-capable?

The API supports indexing URLs corresponding to HTML documents. A URL for a cachedmedia file, for example, can't be indexed directly. Instead, you need to providea URL for a page that displays media, and which works offline.

A recommended pattern is to create a "viewer" HTML page that could accept theunderlying media URL as a query parameter and then display the contents of thefile, potentially with additional controls or content on the page.

Web apps can only add URLs to the content index that are under thescopeof the current service worker. In other words, a web app could not add a URLbelonging to a completely different domain into the content index.

Overview

The Content Indexing API supports three operations: adding, listing, andremoving metadata. These methods are exposed from a new property,index, thathas been added to theServiceWorkerRegistrationinterface.

The first step in indexing content is getting a reference to the currentServiceWorkerRegistration. Usingnavigator.serviceWorker.ready is the most straightforward way:

constregistration=awaitnavigator.serviceWorker.ready;// Remember to feature-detect before using the API:if('index'inregistration){// Your Content Indexing API code goes here!}

If you're making calls to the Content Indexing API from within a service worker,rather than inside a web page, you can refer to theServiceWorkerRegistrationdirectly viaregistration. It willalready be definedas part of theServiceWorkerGlobalScope.

Adding to the index

Use theadd() method to index URLs and their associated metadata. It's up toyou to choose when items are added to the index. You might want to add to theindex in response to an input, like clicking a "save offline" button. Or youmight add items automatically each time cached data is updated via a mechanismlikeperiodic background sync.

awaitregistration.index.add({// Required; set to something unique within your web app.id:'article-123',// Required; url needs to be an offline-capable HTML page.url:'/articles/123',// Required; used in user-visible lists of content.title:'Article title',// Required; used in user-visible lists of content.description:'Amazing article about things!',// Required; used in user-visible lists of content.icons:[{src:'/img/article-123.png',sizes:'64x64',type:'image/png',}],// Optional; valid categories are currently:// 'homepage', 'article', 'video', 'audio', or '' (default).category:'article',});

Adding an entry only affects the content index; it does not add anything to thecache.

Edge case: Calladd() fromwindow context if your icons rely on afetch handler

When you calladd(), Chrome will make a request foreach icon's URL to ensure that it has a copy of the icon to use whendisplaying a list of indexed content.

  • If you calladd() from thewindow context (in other words, from your webpage), this request will trigger afetch event on your service worker.

  • If you calladd() within your service worker (perhaps inside another eventhandler), the request willnot trigger the service worker'sfetch handler.The icons will be fetched directly, without any service worker involvement. Keepthis in mind if your icons rely on yourfetch handler, perhaps because theyonly exist in the local cache and not on the network. If they do, make sure thatyou only calladd() from thewindow context.

Listing the index's contents

ThegetAll() method returns a promise for an iterable list of indexed entriesand their metadata. Returned entries will contain all of the data saved withadd().

constentries=awaitregistration.index.getAll();for(constentryofentries){// entry.id, entry.launchUrl, etc. are all exposed.}

Removing items from the index

To remove an item from the index, calldelete() with theid of the item toremove:

awaitregistration.index.delete('article-123');

Callingdelete() only affects the index. It does not delete anything from thecache.

Warning: Once indexed, entries do not automatically expire. It'sup to you to either present an interface in your web app for clearing outentries, or periodically remove older entries that you know should no longer beavailable offline.

Handling a user delete event

When the browser displays the indexed content, it may include its own userinterface with aDelete menu item, giving people a chance to indicate thatthey're done viewing previously indexed content. This is how the deletioninterface looks in Chrome 80:

The delete menu item.

When someone selects that menu item, your web app's service worker will receiveacontentdelete event. While handling this event is optional, it provides achance for your service worker to "clean up" content, like locally cached mediafiles, that someone has indicated they are done with.

You do not need to callregistration.index.delete() inside yourcontentdelete handler; if the event has been fired, the relevant indexdeletion has already been performed by the browser.

self.addEventListener('contentdelete',(event)=>{// event.id will correspond to the id value used// when the indexed content was added.// Use that value to determine what content, if any,// to delete from wherever your app stores it—usually// the Cache Storage API or perhaps IndexedDB.});
Note: Thecontentdelete event is only fired when the deletion happens due tointeraction with the browser's built-in user interface. It isnot fired whenregistration.index.delete() is called. If your web app triggers the indexdeletion using that API method, it should alsoclean up cachedcontent at thesame time.

Feedback about the API design

Is there something about the API that's awkward or doesn't work as expected? Orare there missing pieces that you need to implement your idea?

File an issue on theContent Indexing API explainer GitHub repo, or add your thoughtsto an existing issue.

Problem with the implementation?

Did you find a bug with Chrome's implementation?

File a bug athttps://new.crbug.com. Include as muchdetail as you can, simple instructions for reproducing, and setComponentstoBlink>ContentIndexing.

Planning to use the API?

Planning to use the Content Indexing API in your web app? Your public supporthelps Chrome prioritize features, and shows other browser vendors how critical it isto support them.

What are some security and privacy implications of content indexing?

Check outthe answersprovided in response to the W3C'sSecurity and Privacy questionnaire. If youhave further questions, please start a discussion via the project'sGitHub repo.

Hero image by Maksym Kaharlytskyi onUnsplash.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2019-12-12 UTC.