Create a media data store

This page explains how to create a data store for media and import data into it.

Before you begin

Make sure that you do the following:

Review the concepts related to media data and schema:
- About media documents and data stores
- Provide or auto-detect a schema
Decide whether you are using thepredefined Google schema for your mediadata or your own schema.
If you're using your own schema, make sure your schema has fields that mapwell to themedia properties for the custom schema:title,url,category, and so on.
Put your media documents into the JSON schema and upload the data toBigQuery or Cloud Storage.
Note: It's also possible to create a data store and upload the data directlyfrom a local file. If you want to take that approach, see Import documentsusing the API. The disadvantage of thisapproach is that you can't edit the schema until all the data is uploadedand if you then make changes to the schema you have to wait until it isreindexed before you can use the data store.
ReviewAbout media user events and prepare your user eventsfor import. User events are required for all media apps.

Choose the procedure according to your data source

To create a media data store and import documents, go to the section for the source that youplan to use:

Import from BigQuery

Console

To use the Google Cloud console to create a media data store and import documents and user events fromBigQuery, follow these steps:

In the Google Cloud console, go to theAI Applications page.
AI Applications
Go to theData Stores page.
ClickCreate data store.
On theSource page, selectBigQuery.
SelectMedia - BigQuery table with structured media data as the kindof data that you are importing.
In theBigQuery path field, clickBrowse, select theBigQuery data that youhave prepared for ingesting, and then clickSelect.Alternatively, enter the location directly in theBigQuery pathfield.
If your data is in the predefined Google schema, chooseGoogle predefinedschema, clickContinue, and skip to step 11.
If your data is in your own schema, chooseCustom schema and clickContinue.
Review the detected schema and use theKey properties menu to assignproperties to your schema fields.
Note: If fields are missing, clickAdd new fields and use those controlsto add missing fields.
ClickContinue.
You can't continue until the required key properties are mapped, indicated bygreen checkmarkscheck_circle instead of orange warning markswarning.
Enter a name for your data store and clickCreate.

Import from Cloud Storage

Console

To use the Google Cloud console to create a media data store and import documentsfrom Cloud Storage, follow these steps:

In the Google Cloud console, go to theAI Applications page.
AI Applications
Go to theData Stores page.
ClickCreate data store.
On theSource page, selectCloud Storage.
SelectStructured media data (JSONL containing media files) as the kindof data that you are importing.
In theSelect a folder or file you want to import section, selectFolder orFile.
ClickBrowse and choose the data that you haveprepared for ingesting, and then clickSelect.Alternatively, enter the location directly in thegs:// field.
If your data is in the predefined Google schema, chooseGoogle predefinedschema, clickContinue, and skip to step 11.
If your data is in your own schema, chooseCustom schema and clickContinue.
Review the detected schema and use theKey properties menu to assignproperties to your schema fields.
Note: If fields are missing, clickAdd new fields and use those controlsto add missing fields.
ClickContinue.
You can't continue until the required key properties are mapped, indicated bygreen checkmarkscheck_circle instead of orange warning markswarning.
Enter a name for your data store and clickCreate.

Import documents using the API

If you are using the Google predefined schema, you can import your documentsby making aPOST request to the Documents:import REST method, using theInlineSource object to specify your data.

For an example of the JSON document format, seeJSON document format.

Import requirements

Here are the requirements for importing media documents using the API:

Each document must be on its own line.
The maximum number of documents in a single import is 100.

Procedure

To import media documents using the API, do the following:

Create a data store.

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{  "displayName": "DATA_STORE_DISPLAY_NAME",  "industryVertical": "MEDIA"}'

Replace the following:

PROJECT_ID: the ID of your Google Cloud project.
DATA_STORE_ID: the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercaseletters, digits, underscores, and hyphens.
DATA_STORE_DISPLAY_NAME: the display name of the Vertex AISearch data store that you want to create.

Create the JSON file for your document and call it./data.json:

{"inlineSource": {"documents": [  {DOCUMENT_1 },  {DOCUMENT_2 }]}}

Call the POST method:

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json; charset=utf-8"\--data@./data.json\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/dataStores/DATA_STORE_ID/branches/0/documents:import"

Replace the following:

PROJECT_ID: the ID of your project.
DATA_STORE_ID: the ID of your data store.

JSON document format

The following examples showDocument entries in JSON format.

Provide an entire document on a single line. Each document should be on its ownline.

Minimum required fields:

{"id":"sample-01","schemaId":"default_schema","jsonData":"{\"title\":\"Test document title\",\"categories\":[\"sports > clip\",\"sports > highlight\"],\"uri\":\"http://www.example.com\",\"media_type\":\"sports-game\",\"available_time\":\"2022-08-26T23:00:17Z\"}"}

Complete object:

{"id":"child-sample-0","schemaId":"default_schema","jsonData":"{\"title\":\"Test document title\",\"description\":\"Test document description\",\"language_code\":\"en-US\",\"categories\":[\"sports > clip\",\"sports > highlight\"],\"uri\":\"http://www.example.com\",\"images\":[{\"uri\":\"http://example.com/img1\",\"name\":\"image_1\"}],\"media_type\":\"sports-game\",\"in_languages\":[\"en-US\"],\"country_of_origin\":\"US\",\"content_index\":0,\"persons\":[{\"name\":\"sports person\",\"role\":\"player\",\"rank\":0,\"uri\":\"http://example.com/person\"},],\"organizations \":[{\"name\":\"sports team\",\"role\":\"team\",\"rank\":0,\"uri\":\"http://example.com/team\"},],\"hash_tags\":[\"tag1\"],\"filter_tags\":[\"filter_tag\"],\"production_year\":1900,\"duration\":\"100s\",\"content_rating\":[\"PG-13\"],\"aggregate_ratings\":[{\"rating_source\":\"imdb\",\"rating_score\":4.5,\"rating_count\":1250}],\"available_time\":\"2022-08-26T23:00:17Z\"}"}

Monitor import and view data

To check the status of your ingestion, go to theData Stores pageand click your data store name to see details about it on itsData page.
Click theActivity tab.
When the status column on theActivity tab changes fromIn progresstoImport completed, the ingestion is complete.
Depending on the size of your data, ingestion can take severalminutes or several hours.
Important: Wait until your document import is complete before importinguser events to avoid unjoined user events.
ClickDocuments to view the data you imported.

Import user events

To import user events to your media data store:

Follow the instructions in Import historical userevents.

What's next

Create a media recommendations app or a media search app.
Keep your document data fresh.
Ideally, you should update your data store daily, by importing fresh data.Scheduling periodic imports prevents model quality from degrading over time.You can useGoogle Cloud Scheduler to automateimports.
You can update only new or changed documents, or you can import the entiredata store. If you import documents that are already in your data store, theyare not added again. Any document that has changed is updated.
Keep your user-event data fresh.
It is particularly important that you keep your user events fresh. Therecommendations app stops working if there aren't enough fresh user events tomeet the data requirements.
For information about importing user event data in real time, seeRecordreal-time user events.
For information about monitoring user-event requirements, seeCheck data quality for media recommendations.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Create a media data store Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Choose the procedure according to your data source

Import from BigQuery

Console

Import from Cloud Storage

Console

Import documents using the API

Import requirements

Procedure

JSON document format

Monitor import and view data

Import user events

What's next

Create a media data store