Storage Insights datasets

Storage Insights datasets is available only if you've configured Storage Intelligence.

This document explains how Storage Insights datasets help you to manage yourCloud Storage environment by providing the visibility and insights intoyour data.

The Storage Insights datasets create a queryable index of metadata and activityfor your Cloud Storage buckets and objects across your organization,folders, projects, or specific buckets. To query the metadata and activityindex, you'll need to link the dataset to BigQuery. You can then use the linkedBigQuery dataset to analyze, query, and visualize your data. Link thedataset to BigQuery to enable querying of the metadata and activityindex.

Storage Insights dataset is an exclusive feature available with theStorage Intelligence subscription. Google Cloud offers a30-day introductory trial for Storage Intelligence. You can enablethe trial to gain insights into your Cloud Storage usage and takeactions. For more information about the trial, see30-day introductory trialfor Storage Intelligence.

Overview

A Storage Insights dataset provides a rolling snapshot of metadata, activitydata, errors, and events for all projects, buckets, and objects within thedefined scope. By continuously collecting and indexing information, the datasetcreates a comprehensive view that helps you understand the state of your data,monitor your Cloud Storage resources, and gain insights to manage andoptimize your storage estate.

The dataset is available as a BigQuery linked dataset, with a set oftables that have the following schemas:

Metadata: a snapshot of metadata for projects, buckets, and objects. Fordetails about the metadata schema, seeDataset schema of metadata.
Activity data: mutation and error records for objects and aggregatedactivity insights for your buckets and projects. For details about theactivity data schema, seeDataset schema of activity data.
Errors and events: information about snapshot processing events anderrors. For details about the errors and events schema, seeDataset schemaof events and errors.

Use cases for Storage Insights datasets

Storage Insights datasets provide views for gaining organization-wide andgranular insights about your data. The following sections describe use cases fordatasets.

Understand your storage estate

You can gain insights into your data by viewing project, bucket, and object metadata. Themetadata views help you with the following tasks:

Spot anomalies, such as data in an unexpected region.
Identify optimization opportunities, like locating temporary or duplicatefiles.
Query for specific insights, such as objects created in the last 24 hours or thetotal count ofPDF files.
Drill down to objects you want to act on by extracting a prefix list of aset of objects based on query results. To learn about how to performoperations on billions of objects in a serverless manner, seestorage batch operations.

Analyze activity patterns

Using the bucket activity view, project activity view, and object events view,you can do the following:

Analyze operational patterns and identify inactive buckets.
Monitor operations on your objects to see how your storage estate ischanging over time.
Map your most active projects, buckets, and prefixes.

Understand regional bucket activity

The bucket regional activity view displays fields like request and responsebytes, which helps you see the regions that frequently interact with yourbucket. Analyze regional bucket activity to determine ifbucket relocationis necessary:

View the total egress and ingress for a bucket in a region to identifybuckets that may be better suited for a regional, rather than multi-regionclass.
Assess total data traffic within and across all regions.

Speed up troubleshooting

By analyzing error information in the object events view, you can inspectoperations on your objects that led to errors, analyze the reason for the error,and accelerate the troubleshooting of the issue. You can also detect projects and buckets with thegreatest number of errors to determine success and error rates. For example, youcan troubleshoot429 errors by identifying the affected bucket,project, and the root cause, such as resource quota or bandwidth limits.

Benefits of Storage Insights datasets

Storage Insights datasets provide metadata and activity information about yourstorage estate in a queryable format in BigQuery. The following are thebenefits of using Storage Insights datasets:

Analyze your storage estate within a customizable scope to gainorganization-wide insights, or specify folders, projects, or buckets foranalysis.
With data available in BigQuery, use SQL and natural languagequeries with Gemini to analyze your data. For details, seeAnalyze data with Gemini assistance.
You can visualize your data by connecting to a Looker dashboard.You can use theStorage Intelligence dashboard as a templatethat provides an example of the insights you can derive from datasets. Youcan use the template to connect to your datasets or add custom charts. Forinformation about how to use the template, seeStorage Intelligence dashboard connection instructions.

How Storage Insights datasets work

To use Storage Insights datasets, first configure a dataset within a project.Specify the organization, folders, or projects for which you want to track data.After creation,grant the necessary permissions to the service agent togenerate the dataset. You can thenlink the dataset toBigQuery for querying. Once configured, the service automaticallycollects and ingests daily snapshots of object metadata, bucket metadata,operations, and errors into a Cloud Storage-ownedBigQuery instance. The data is retained according to theconfigured retention period and stored in an optimized way to minimize storageand analysis costs.

In the dataset configuration, you define which data is collected, where it isstored, and how it is managed.

The following table describes the key properties you must define whenconfiguring a dataset:

Property	Description	Details and limits
Dataset scope	Specifies the resources (organizations, projects, or folders) that contain the buckets and objects you want to include in the dataset.	You can specify projects or folders individually or using a CSV file. Each configuration allows only one dataset scope. You can specify up to`10,000` projects or folders.
Bucket filters	Filters used to include or exclude specific buckets from the dataset.	You can filter by bucket name using regular expressions or filter by bucket location.
Retention period for dataset	The number of days the dataset captures and retains metadata and activity data, including the dataset's creation date. For activity data tables, you can override the data retention period by using theRetention period for activity data property.	This retention period is a rolling window and can be up to`90` days. Datasets update with new metadata every`24` hours. The system automatically deletes data captured outside the retention window. For example, if you create a dataset on October 1, 2023, with a retention window set to`30` days. On October 30, the dataset reflects the past`30` days of data (October 1 to October 30). On October 31, the dataset reflects the data from October 2 to October 31. You can modify the retention window at any time. By default, the retention period applies to themetadata tables and also to theactivity data tables when the retention period for activity data is not specified.
Retention period for activity data	The number of days the dataset captures and retains activity data. When defined, this value overrides theRetention period for dataset.	The retention period can be up to`365 days`. The retention period for activity data is applicable foractivity data tables.
Location	The BigQuery location used to store the dataset and its associated data.	Must be alocation supported by BigQuery such as`us-central1`. We recommend selecting the location of your BigQuery tables if you have existing BigQuery tables.
Service agent type	Determines the scope of the service agent that reads and writes data for the dataset configuration. This can be either a configuration-scoped service agent or a project-scoped service agent	Project-scoped service agents can access and write datasets forall dataset configurations in the project. For example, if you have multiple dataset configurations within a project, you only need to grantrequired permissions to the project-scoped service agent once. This enables it to read and write datasets for all dataset configurations within the project. When a dataset configuration is deleted, the project-scoped service agent is not deleted. Configuration-scoped service agents can only access and write the dataset generated by the particular dataset configuration. This means if you have multiple dataset configurations, you must grantrequired permissions to each configuration-scoped service agent. When a dataset configuration is deleted, the configuration-scoped service agent is deleted.

After specifying the configuration properties and granting the necessarypermissions to the service agent, link the dataset to BigQueryfor querying.

For details about the properties you set when creating or updating a datasetconfiguration, see theDatasetConfigs resource in the JSON API documentation.

After configuration, the service automatically collects and ingests data into aCloud Storage-owned BigQuery instance. The timeline fordata population in the datasets is as follows:

The initial dataset load and activity data for newly added buckets orobjects might take 24–48 hours to appear as a linked dataset inBigQuery.
Activity data is typically included within four hours of the activity(latency might occasionally be higher).
Metadata snapshots (for projects, buckets, and objects) are updated every 24hours.

Considerations

Consider the following for dataset configurations:

When you rename a folder in a bucket withhierarchical namespaceenabled, the object names in that bucket update. When the linked datasetingests these object snapshots, they are considered new entries.
CRC32C checksums and MD5 hashes are not available in theobjectmetadata table for objects encrypted with customer-managed encryptionkeys (CMEK).
Datasets are supported only in the following BigQuerylocations:
- EU
- US
- asia-south1
- asia-south2
- asia-southeast1
- europe-west1
- us-central1
- us-east1
- us-east4

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Storage Insights datasets Stay organized with collections Save and categorize content based on your preferences.