Create a media data store Stay organized with collections Save and categorize content based on your preferences.
This page explains how to create a data store for media and import data into it.
Before you begin
Make sure that you do the following:
Review the concepts related to media data and schema:
Decide whether you are using thepredefined Google schema for your mediadata or your own schema.
If you're using your own schema, make sure your schema has fields that mapwell to themedia properties for the custom schema:
title,url,category, and so on.Put your media documents into the JSON schema and upload the data toBigQuery or Cloud Storage.
Note: It's also possible to create a data store and upload the data directlyfrom a local file. If you want to take that approach, seeImport documentsusing the API. The disadvantage of thisapproach is that you can't edit the schema until all the data is uploadedand if you then make changes to the schema you have to wait until it isreindexed before you can use the data store.ReviewAbout media user events and prepare your user eventsfor import. User events are required for all media apps.
Choose the procedure according to your data source
To create a media data store and import documents, go to the section for the source that youplan to use:
Import from BigQuery
Console
To use the Google Cloud console to create a media data store and import documents and user events fromBigQuery, follow these steps:
In the Google Cloud console, go to theAI Applications page.
Go to theData Stores page.
ClickCreate data store.
On theSource page, selectBigQuery.
SelectMedia - BigQuery table with structured media data as the kindof data that you are importing.
In theBigQuery path field, clickBrowse, select theBigQuery data that youhave prepared for ingesting, and then clickSelect.Alternatively, enter the location directly in theBigQuery pathfield.
If your data is in the predefined Google schema, chooseGoogle predefinedschema, clickContinue, and skip to step 11.
If your data is in your own schema, chooseCustom schema and clickContinue.
Review the detected schema and use theKey properties menu to assignproperties to your schema fields.
Note: If fields are missing, clickAdd new fields and use those controlsto add missing fields.ClickContinue.
You can't continue until the required key properties are mapped, indicated bygreen checkmarks
check_circle instead of orange warning marks warning. Enter a name for your data store and clickCreate.
Import from Cloud Storage
Console
To use the Google Cloud console to create a media data store and import documentsfrom Cloud Storage, follow these steps:
In the Google Cloud console, go to theAI Applications page.
Go to theData Stores page.
ClickCreate data store.
On theSource page, selectCloud Storage.
SelectStructured media data (JSONL containing media files) as the kindof data that you are importing.
In theSelect a folder or file you want to import section, selectFolder orFile.
ClickBrowse and choose the data that you haveprepared for ingesting, and then clickSelect.Alternatively, enter the location directly in the
gs://field.If your data is in the predefined Google schema, chooseGoogle predefinedschema, clickContinue, and skip to step 11.
If your data is in your own schema, chooseCustom schema and clickContinue.
Review the detected schema and use theKey properties menu to assignproperties to your schema fields.
Note: If fields are missing, clickAdd new fields and use those controlsto add missing fields.ClickContinue.
You can't continue until the required key properties are mapped, indicated bygreen checkmarks
check_circle instead of orange warning marks warning. Enter a name for your data store and clickCreate.
Import documents using the API
If you are using the Google predefined schema, you can import your documentsby making aPOST request to theDocuments:import REST method, using theInlineSource object to specify your data.
For an example of the JSON document format, seeJSON document format.
Import requirements
Here are the requirements for importing media documents using the API:
Each document must be on its own line.
The maximum number of documents in a single import is 100.
Procedure
To import media documents using the API, do the following:
Create a data store.
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{ "displayName": "DATA_STORE_DISPLAY_NAME", "industryVertical": "MEDIA"}'Replace the following:
PROJECT_ID: the ID of your Google Cloud project.DATA_STORE_ID: the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercaseletters, digits, underscores, and hyphens.DATA_STORE_DISPLAY_NAME: the display name of the Vertex AISearch data store that you want to create.
Create the JSON file for your document and call it
./data.json:{"inlineSource": {"documents": [ {DOCUMENT_1 }, {DOCUMENT_2 }]}}Call the POST method:
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json; charset=utf-8"\--data@./data.json\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/dataStores/DATA_STORE_ID/branches/0/documents:import"
Replace the following:
PROJECT_ID: the ID of your project.DATA_STORE_ID: the ID of your data store.
JSON document format
The following examples showDocument entries in JSON format.
Provide an entire document on a single line. Each document should be on its ownline.
Minimum required fields:
{"id":"sample-01","schemaId":"default_schema","jsonData":"{\"title\":\"Test document title\",\"categories\":[\"sports > clip\",\"sports > highlight\"],\"uri\":\"http://www.example.com\",\"media_type\":\"sports-game\",\"available_time\":\"2022-08-26T23:00:17Z\"}"}
Complete object:
{"id":"child-sample-0","schemaId":"default_schema","jsonData":"{\"title\":\"Test document title\",\"description\":\"Test document description\",\"language_code\":\"en-US\",\"categories\":[\"sports > clip\",\"sports > highlight\"],\"uri\":\"http://www.example.com\",\"images\":[{\"uri\":\"http://example.com/img1\",\"name\":\"image_1\"}],\"media_type\":\"sports-game\",\"in_languages\":[\"en-US\"],\"country_of_origin\":\"US\",\"content_index\":0,\"persons\":[{\"name\":\"sports person\",\"role\":\"player\",\"rank\":0,\"uri\":\"http://example.com/person\"},],\"organizations \":[{\"name\":\"sports team\",\"role\":\"team\",\"rank\":0,\"uri\":\"http://example.com/team\"},],\"hash_tags\":[\"tag1\"],\"filter_tags\":[\"filter_tag\"],\"production_year\":1900,\"duration\":\"100s\",\"content_rating\":[\"PG-13\"],\"aggregate_ratings\":[{\"rating_source\":\"imdb\",\"rating_score\":4.5,\"rating_count\":1250}],\"available_time\":\"2022-08-26T23:00:17Z\"}"}
Monitor import and view data
To check the status of your ingestion, go to theData Stores pageand click your data store name to see details about it on itsData page.
Click theActivity tab.
When the status column on theActivity tab changes fromIn progresstoImport completed, the ingestion is complete.
Depending on the size of your data, ingestion can take severalminutes or several hours.
Important: Wait until your document import is complete before importinguser events to avoid unjoined user events.ClickDocuments to view the data you imported.
Import user events
To import user events to your media data store:
- Follow the instructions inImport historical userevents.
What's next
Keep your document data fresh.
Ideally, you should update your data store daily, by importing fresh data.Scheduling periodic imports prevents model quality from degrading over time.You can useGoogle Cloud Scheduler to automateimports.
You can update only new or changed documents, or you can import the entiredata store. If you import documents that are already in your data store, theyare not added again. Any document that has changed is updated.
Keep your user-event data fresh.
It is particularly important that you keep your user events fresh. Therecommendations app stops working if there aren't enough fresh user events tomeet the data requirements.
For information about importing user event data in real time, seeRecordreal-time user events.
For information about monitoring user-event requirements, seeCheck data quality for media recommendations.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.