Python 2.7 has reached end of supportand will be deprecatedon January 31, 2026. After deprecation, you won't be able to deploy Python 2.7applications, even if your organization previously used an organization policy tore-enable deployments of legacy runtimes. Your existing Python2.7 applications will continue to run and receive traffic after theirdeprecation date. We recommend thatyoumigrate to the latest supported version of Python.

Search API for legacy bundled services

This API is supported for first-generation runtimes and can be used when upgrading to corresponding second-generation runtimes. If you are updating to the App Engine Python 3 runtime, refer to themigration guide to learn about your migration options for legacy bundled services.

The Search API provides a model forindexing documents that contain structured data. You can search an index, andorganize and present search results. The API supports full text matching onstring fields. Documents and indexes are saved in a separate persistent storeoptimized for search operations. The Search API can index any number ofdocuments. TheApp Engine Datastore may be more appropriate for applications that need toretrieve very large result sets.

Overview

The Search API is based on four main concepts: documents, indexes, queries, andresults.

Documents

A document is an object with a unique ID and a list of fields containing userdata. Each field has a name and a type. There are several types of fields,identified by the kinds of values they contain:

Atom Field - an indivisible character string.
Text Field - a plain text string that can be searched word by word.
HTML Field - a string that contains HTML markup tags, only the text outsidethe markup tags can be searched.
Number Field - a floating point number.
Date Field - a date object with year/month/day andoptional time.
Geopoint Field - a data object with latitude and longitude coordinates.

The maximum size of a document is 1 MB.

Indexes

There is no limit to the number of documents in an index or the number ofindexes you can use. The total size of all the documents in a single index islimited to 10GB by default. Those with theApp Engine Admin role can submita request from the Google Cloud consoleApp Engine Searchpage to increase the size up to 200GB.

Queries

To search an index, you construct a query, which has a query string and possiblysome additional options. A query string specifies conditions for the values ofone or more document fields. When you search an index you get back only thosedocuments in the index with fields that satisfy the query.

The simplest query, sometimes called a "global search" is a string that containsonly field values. This search uses a string that searches for documents thatcontain the words "rose" and "water":

defsimple_search(index):index.search('rose water')

This one searches for documents with date fields that contain the date July 4,1776, or text fields that include the string "1776-07-04":

defsearch_date(index):index.search('1776-07-04')

A query string can also be more specific. It can contain one or more terms, eachnaming a field and a constraint on the field's value. The exact form of a termdepends on the type of the field. For instance, assuming there is a text fieldcalled "product", and a numberfield called "price", here's aquery string with two terms:

defsearch_terms(index):# search for documents with pianos that cost less than $5000index.search("product = piano AND price < 5000")

Query options, as the name implies, are not required. They enable a variety offeatures:

Control how many documents are returned in the search results.
Specify what document fields to include in the results. The default is toinclude all the fields from the original document. You can specify that theresults only include a subset of fields (the original document is not affected).
Sort the results.
Create "computed fields" for documents usingFieldExpressionsand abridged text fields usingsnippets.
Support paging through the search results by returning only a portion of thematched documents on each query (using offsets and cursors)

We recommend logging query strings in your application if you wish to keep arecord of queries that have been executed.

Search results

A call tosearch() can only return a limited number of matching documents.Your search may find more documents than can be returned in a single call. Eachsearch call returns an instance of theSearchResultsclass, which contains information about how many documents were found and howmany were returned, along with the list of returned documents. You can repeatthe same search, usingcursorsoroffsetsto retrieve the complete set of matching documents.

Additional training material

In addition to this documentation, you can read thetwo-part training class on the Search API atthe Google Developer's Academy. The class includes asample Python application.

Documents and fields

TheDocument class represents documents. Each document has adocument identifier and a list offields.

Document identifier

Every document in an index must have a unique document identifier, ordoc_id.The identifier can be used to retrieve a document from an index without performinga search. By default, the Search API automatically generates adoc_id whena document is created. You can also specify thedoc_id yourself when youcreate a document. Adoc_id must contain only visible, printable ASCIIcharacters (ASCII codes 33 through 126 inclusive) and be no longer than 500characters. A document identifier cannot begin with an exclamation point ('!'),and it can't begin and end with double underscores ("__").

While it is convenient to create readable, meaningful unique document identifiers,you cannot include thedoc_id in a search. Consider this scenario: Youhave an index with documents that represent parts, using the part's serialnumber as thedoc_id. It will be very efficient to retrieve the documentfor any single part, but it will be impossible to search for a range of serialnumbers along with other field values, such as purchase date. Storing the serialnumber in an atom field solves the problem.

Document fields

A document contains fields that have aname, atype, and a singlevalue ofthat type. Two or more fields can have the same name, but different types. Forinstance, you can define two fields with the name "age": one with a text type(the value "twenty-two"), the other with a number type (value 22).

Field names

Field names are case sensitive and can only contain ASCII characters. They muststart with a letter and can contain letters, digits, or underscore. A field namecannot be longer than 500 characters.

Multi-valued fields

A field can contain only one value, which must match the field's type. Fieldnames do not have to be unique. A document can have multiple fields with thesame name and same type, which is a way to represent a field with multiple values.(However, date and number fields with the same name can't be repeated.) A documentcan also contain multiple fields with the same name anddifferent field types.

Field types

There are three kinds of fields that store character strings; collectively werefer to them asstring fields:

Text Field: A string with maximum length 1024**2 characters.
HTML Field: An HTML-formatted string with maximum length 1024**2 characters.
Atom Field: A string with maximum length 500 characters.

There are also three field types that store non-textual data:

Number Field: A double precision floating point value between -2,147,483,647and 2,147,483,647.
Date Field: Adatetime.date ordatetime.datetime.
Geopoint Field: A point on earth described by latitude and longitudecoordinates.

The field types are specified by the classesTextField,HtmlField,AtomField,NumberField,DateField,andGeoField.

Special treatment of string and date fields

When a document with date, text, orHTML fields is added to an index, some special handling occurs. It's helpful tounderstand what's going on "under the hood" in order to use the Search API effectively.

Tokenizing string fields

When an HTML or text field is indexed, its contents aretokenized. The stringis split into tokens wherever whitespace or special characters(punctuation marks, hash sign, backslash, etc.) appear. The index will includean entry for each token. This enables you to search for keywords and phrasescomprising only part of a field's value. For instance, a search for "dark" willmatch a document with a text field containing the string "it was a dark andstormy night", and a search for "time" will match a document with a text fieldcontaining the string "this is a real-time system".

In HTML fields, text within markup tags is not tokenized, so a document with anHTML field containingit was a <strong>dark</strong> night will match asearch for "night", but not for "strong". If you want to be able to searchmarkup text, store it in a text field.

Atom fields are not tokenized. A document with an atom field that has the value"bad weather" will only match a search for the entire string "bad weather". Itwill not match a search for "bad" or "weather" alone.

Tokenizing Rules

The underscore (_) and ampersand (&) characters do not break words intotokens.
These whitespace characters always break words into tokens: space, carriagereturn, line feed, horizontal tab, vertical tab, form feed, and NULL.
These characters are treated as punctuation, and will break words into tokens:
! " % ( )
* , - | /
[ ] ] ^ `
: = > ? @
{ } ~ $

The characters in the following tableusually break words into tokens, but they can be handled differently depending on the context in which they appear:

Character	Rule
`<`	In an HTML field the "less than" sign indicates the start of an HTML tag which is ignored.
`+`	A string of one or more "plus" signs is treated as a part of the word if it appears at the end of the word (C++).
`#`	The "hash" sign is treated as a part of the word if it is preceded by a, b, c, d, e, f, g, j, or x (a# - g# are musical notes; j# and x# are programming language, c# is both.) If a term ispreceded by '#' (#google), it is treated as a hashtag and the hash becomes part of the word.
`'`	Apostrophe is a letter if it precedes the letter "s" followed by a word-break, as in "John's hat".
`.`	If a decimal point appears between digits, this is part of a number (i.e., the decimal-separator). This can also be part of a word if used in an acronym (A.B.C).
`-`	The dash is part of a word if used in an acronym (I-B-M).

All other 7-bit characters other than letters and digits ('A-Z', 'a-z', '0-9')are handled as punctuation and break words into tokens.
Everything else is parsed as a UTF-8 character.

Note: Non-western languages, like Japanese and Chinese, use other tokenizationrules.

Acronyms

Tokenization uses special rules to recognize acronyms (strings like "I.B.M.","a-b-c", or "C I A"). An acronym is a string of single alphabetic characters,with the same separator character between all of them. The valid separators arethe period, dash, or any number of spaces. The separator character is removedfrom the string when an acronym is tokenized. So the example strings mentionedabove become the tokens "ibm", "abc", and "cia". The original text remains inthe document field.

When dealing with acronyms, note that:

An acronym cannot contain more than 21 letters. A valid acronym string withmore than 21 letters will be broken into a series of acronyms, each 21 lettersor less.
If the letters in an acronym are separated by spaces, all the letters must bethe same case. Acronyms constructed with period and dash can use mixed caseletters.
When searching for an acronym, you can enter the canonical form of the acronym(the string without any separators), or the acronym punctuated with either thedash or the dot (but not both) between its letters. So the text "I.B.M" couldbe retrieved with any of the search terms "I-B-M", "I.B.M", or "IBM".

Date field accuracy

When you create a date field in adocument you set its value to adatetime.date ordatetime.datetime. Note that onlyPython "naive" date and timeobjects can be used. "Aware" objects are not allowed..For the purpose of indexing and searching thedate field, any time component isignored and the date is converted to the number of days since 1/1/1970 UTC. Thismeans that even though a date fieldcan contain a precise time value a date query can only specify adate field value in the formyyyy-mm-dd. This also means the sorted order ofdate fields with the same date isnot well-defined.

Linking from a document to other resources

You can use a document'sdoc_id and other fields as links to otherresources in your application. For example, if you useBlobstore you can associatethe document with a specific blob by setting thedoc_id or the value of anAtom field to the BlobKey of the data.

Creating a document

The following code sample shows how to create a document object. The Documentconstructor is called with the fields argument set to a list of field objects.Each object in the list is created and initialized by using the constructorfunction of the field's class. Note the use of theGeoPoint constructorand the Pythondatetime class to create the appropriate types of field values.

defcreate_document():document=search.Document(# Setting the doc_id is optional. If omitted, the search service will# create an identifier.doc_id='PA6-5000',fields=[search.TextField(name='customer',value='Joe Jackson'),search.HtmlField(name='comment',value='this is <em>marked up</em> text'),search.NumberField(name='number_of_visits',value=7),search.DateField(name='last_visit',value=datetime.now()),search.DateField(name='birthday',value=datetime(year=1960,month=6,day=19)),search.GeoField(name='home_location',value=search.GeoPoint(37.619,-122.37))])returndocument

Note: When you create a document you mustspecify all of its attributes using the Documentconstructorclass method. You cannot add, remove, or delete fields,nor change the identifier or any other attribute once the document has beencreated. Date and geopoint fields must be assigned anon-null value. Atom, text, HTML, and number fields can be empty.

Working with an index

Putting documents in an index

When you put a document into an index, the document is copied to persistentstorage and each of its fields is indexed according to its name, type, and thedoc_id.

The following code example shows how to access an Index and put a document intoit.

defadd_document_to_index(document):index=search.Index('products')index.put(document)

You can pass up to 200 documents at a time to theput() method. Batching putsis more efficient than adding documents one at a time.

When you put a document into an index and the index already contains a documentwith the samedoc_id, the new document replaces the old one. No warning isgiven. You can callIndex.get(id)before creating or adding a document to an index to check whether a specificdoc_id already exists.

Theput method returns a list ofPutResults,one for each document passed as an argument. If you did not specify thedoc_id yourself, you can examine theid attribute of the result todiscover thedoc_id that was generated:

defadd_document_and_get_doc_id(documents):index=search.Index('products')results=index.put(documents)document_ids=[document.idfordocumentinresults]returndocument_ids

Note that creating an instance of theIndexclass does not guarantee that apersistent index actually exists. A persistent index is created the first timeyou add a document to it with theput method.If you want to check whether or not anindex actually exists before you start to use it, use thesearch.get_indexes()function.

Updating documents

A document cannot be changed once you've added it to an index. You can't add orremove fields, or change a field's value. However, you can replace the documentwith a new document that has the samedoc_id.

Retrieving documents by doc_id

There are two ways to retrieve documents from an index using documentidentifiers:

UseIndex.get() to fetch a single document by itsdoc_id.
UseIndex.get_range() to retrieve a group of consecutive documents ordered bydoc_id.

Each call is demonstrated in the example below.

defget_document_by_id():index=search.Index('products')# Get a single document by ID.document=index.get("AZ125")# Get a range of documents starting with a given ID.documents=index.get_range(start_id="AZ125",limit=100)returndocument,documents

Searching for documents by their contents

To retrieve documents from an index, you construct a query string and callIndex.search().The query string can be passed directlyas the argument, or you can include the string in aQueryobject which is passed as the argument.By default,search() returns matchingdocuments sorted in order of decreasing rank. To control how many documents arereturned, how they are sorted, or add computed fields to the results, you needto use aQuery object, which contains a query string and can also specifyother search and sorting options.

defquery_index():index=search.Index('products')query_string='product: piano AND price < 5000'results=index.search(query_string)forscored_documentinresults:print(scored_document)

Deleting an index

Each index consists of its indexed documents and anindex schema. To delete an index,delete all the documents in an index and then delete the index schema.

You can delete documents in an index by specifying thedoc_id ofone or more documentsyou wish to delete to theIndex.delete() method.You should delete documents in batches to improve efficiency. You can pass up to200 document IDs at a time to thedelete() method.

defdelete_index(index):# index.get_range by returns up to 100 documents at a time, so we must# loop until we've deleted all items.whileTrue:# Use ids_only to get the list of document IDs in the index without# the overhead of getting the entire document.document_ids=[document.doc_idfordocumentinindex.get_range(ids_only=True)]# If no IDs were returned, we've deleted everything.ifnotdocument_ids:break# Delete the documents for the given IDsindex.delete(document_ids)# delete the index schemaindex.delete_schema()

You can pass up to 200 documents at a time to thedelete() method. Batchingdeletes is more efficient than handling them one at a time.

This approach might take a long time if you need to delete a large numberof search index entries. To resolve this issue, try the following:

Delete the project and its dependencies.
Request ahigher quota for fasterdeletions.

Eventual consistency

When you put, update, or delete a document in an index, the change propagatesacross multiple data centers. This usually happens quickly, but the time ittakes can vary. The Search API guaranteeseventualconsistency. This means thatin some cases, a search or a retrieval of one or more documents might returnresults that do not reflect the most recent changes.

Determining the size of an index

An index stores documents for retrieval. You can retrieve a single document byits ID, a range of documents with consecutive IDs, or all the documents in anindex. You can also search an index to retrieve documents that satisfy givencriteria on fields and their values, specified as a query string. You can managegroups of documents by putting them into separate indexes. There is no limit tothe number of documents in an index or the number of indexes you can use. Thetotal size of all the documents in a single index is limited to 10GB by defaultbut can be increased to up to 200GB by submitting a request from theGoogle Cloud consoleApp Engine Searchpage. The index propertystorage_limitis the maximum allowable size of an index.

The index propertystorage_usageis an estimate of the amount of storage space used by an index. This number isan estimate because the index monitoring system does not run continuously; theactual usage is computed periodically. Thestorage_usage is adjusted betweensampling points by accounting for document additions, but not deletions.

Performing asynchronous operations

You can use asynchronous calls to execute multiple operations without blocking,and then retrieve all the results at the same time, blocking only once. Forexample, the following code executes multiple searches asynchronously:

defasync_query(index):futures=[index.search_async('foo'),index.search_async('bar')]results=[future.get_result()forfutureinfutures]returnresults

Index schemas

Every index has a schema that shows all the field names and field types thatappear in the documents it contains. You cannot define a schema yourself.Schemas are maintained dynamically; they are updated as documents are added toan index. A simple schema might look like this, in JSON-like form:

{'comment':['TEXT'],'date':['DATE'],'author':['TEXT'],'count':['NUMBER']}

Each key in the dictionary is the name of a document field. The key's value is alist of the field types used with that field name. If you have used the samefield name with different field types the schema will list more than one fieldtype for a field name, like this:

{'ambiguous-integer':['TEXT','NUMBER','ATOM']}

Once a field appears in a schema it can never be removed. There is no way to delete a field, even if the index no longer contains any documents with that particular field name.

You can view the schemas for your indexes like this:

fromgoogle.appengine.apiimportsearch...forindexinsearch.get_indexes(fetch_schema=True):logging.info("index%s",index.name)logging.info("schema:%s",index.schema)

Note that a call toget_indexes cannot returnmore than 1000 indexes. To retrieve more indexes,call the function repeatedly using thestart_index_name argument.

A schema does not define a "class" in the object-programming sense. As far asthe Search API is concerned, every document is unique and indexes can containdifferent kinds of documents. If you want to treat collections of objects withthe same list of fields as instances of a class, that's an abstraction you mustenforce in your code. For instance, you could ensure that all documents with thesame set of fields are kept in their own index. The index schema could be seenas the class definition, and each document in the index would be an instance ofthe class.

Viewing indexes in the Google Cloud console

In the Google Cloud console, you canview information about yourapplication's indexes and the documents they contain.Clicking an index name displays the documents that index contains. You'll seeall the defined schema fields for the index; for each document with a field ofthat name, you'll see the field's value. You can also issue queries on the indexdata directly from the console.

Search API quotas

The Search API has several free quotas:

Resource or API call	Free Quota
Total storage (documents and indexes)	0.25 GB
Queries	1000 queries per day
Adding documents to indexes	0.01 GB per day

The Search API imposes these limits to ensure the reliability of the service.These apply to both free and paid apps:

Resource	Safety Quota
Maximum query usage	100 aggregated minutes of query execution time per minute
Maximum documents added or deleted	15,000 per minute
Maximum size per index (unlimited number of indexes allowed)	10 GB

API usage is counted in different ways depending on the type of call:

Index.search(): Each API call counts as one query; execution time isequivalent to the latency of the call.
Index.put(): When you add documents to indexes the size of each document andthe number of documents counts towards the indexing quota.
All other Search API calls are counted based on the number of operations theyinvolve:
- search.get_indexes(): 1 operation iscounted for each index actually returned, or 1 operation if nothing isreturned.
- Index.get() andIndex.get_range():1 operation counted for each document actually returned, or 1 operation ifnothing is returned.
- Index.delete(): 1 operation countedfor each document in the request, or 1 operation if the request is empty.

The quota on query throughput is imposed so that a single user cannot monopolizethe search service. Because queries can execute simultaneously, each applicationis allowed to run queries that consume up to 100 minutes of execution time perone minute of clock time. If you are running many short queries, you probablywill not reach this limit. Once you exceed the quota, subsequent queries willfail until the next time slice, when your quota is restored. The quota is notstrictly imposed in one minute slices; a variation of theleaky bucket algorithm is used tocontrol search bandwidth in five second increments.

More information on quotas can be found on theQuotaspage. When an app tries to exceed these amounts, an insufficient quota error isreturned.

Note that although these limits are enforced by the minute, the console displaysthe daily totals for each. Customers withSilver, Gold, or Platinum support can request higherthroughput limits by contacting their support representative.

Search API pricing

The following charges are applied to usage beyond the free quotas:

Resource	Cost
Total storage (documents and indexes)	$0.18 per GB per month
Queries	$0.50 per 10K queries
Indexing searchable documents	$2.00 per GB

Additional information on pricing is on thePricing page.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Search API for legacy bundled services

Overview

Documents

Indexes

Queries

Search results

Additional training material

Documents and fields

Document identifier

Document fields

Field names

Multi-valued fields

Field types

Special treatment of string and date fields

Tokenizing string fields

Tokenizing Rules

Acronyms

Date field accuracy

Other document properties

Linking from a document to other resources

Creating a document

Working with an index

Putting documents in an index

Updating documents

Retrieving documents by doc_id

Searching for documents by their contents

Deleting an index

Eventual consistency

Determining the size of an index

Performing asynchronous operations

Index schemas

Viewing indexes in the Google Cloud console

Search API quotas

Search API pricing

Movatterモバイル変換

Search API for legacy bundled services Stay organized with collections Save and categorize content based on your preferences.

Overview

Documents

Indexes

Queries

Search results

Additional training material

Documents and fields

Document identifier

Document fields

Field names

Multi-valued fields

Field types

Special treatment of string and date fields

Tokenizing string fields

Tokenizing Rules

Acronyms

Date field accuracy

Other document properties

Linking from a document to other resources

Creating a document

Working with an index

Putting documents in an index

Updating documents

Retrieving documents by doc_id

Searching for documents by their contents

Deleting an index

Eventual consistency

Determining the size of an index

Performing asynchronous operations

Index schemas

Viewing indexes in the Google Cloud console

Search API quotas

Search API pricing

Search API for legacy bundled services