Manage Storage Insights dataset configurations

This page shows you how to manageStorage Insights datasets configurationsto control the source, scope, and retention of your data. You'll learn how toview, list, update, and delete configurations, as well as how to view, query andunlink your linked datasets.

Get the required roles

To get the permissions that you need to manage dataset configurations, ask your administrator to grant you the following IAM roles on your source projects:

For more information about granting roles, seeManage access to projects, folders, and organizations.

These predefined roles contain the permissions required to manage dataset configurations. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to manage dataset configurations:

  • View and list dataset configuration:
    • storageinsights.datasetConfigs.get
    • storageinsights.datasetConfigs.list
    • storage.buckets.getObjectInsights
  • Update and delete dataset configuration:
    • storageinsights.datasetConfigs.update
    • storageinsights.datasetConfigs.delete
    • storage.buckets.getObjectInsights
  • Unlink to BigQuery dataset: storageinsights.datasetConfigs.unlinkDataset
  • Query BigQuery linked datasets: bigquery.jobs.create or bigquery.jobs.*

You might also be able to get these permissions withcustom roles or otherpredefined roles.

View and query linked datasets

To view and query linked datasets, follow these steps:

  1. In the Google Cloud console, go to the Cloud StorageStorage Insights page.

    Go to Storage Insights

    Your project shows a list of created dataset configurations.

  2. Click the BigQuery linked dataset for the dataset configurationyou want to view.

    The Google Cloud console displays the BigQuery linked dataset.For information about the dataset schema of metadata, seeDataset schema of metadata.

  3. You can query tables and views in your linked datasets in the same way youwouldquery any other BigQuery table.

Unlink a dataset

To stop the dataset configuration from publishing to the BigQuerydataset, unlink the dataset. To unlink a dataset, complete the following steps:

Console

  1. In the Google Cloud console, go to the Cloud StorageStorage Insights page.

    Go to Storage Insights

  2. Click the name of the dataset configuration that generated the datasetyou want to unlink.

  3. In theBigQuery linked dataset section, clickUnlink dataset.

Command line

  1. To unlink the dataset, run thegcloud storage insights dataset-configs delete-linkcommand:

    gcloud storage insights dataset-configs delete-linkDATASET_CONFIG_ID --location=LOCATION

    Replace:

    • DATASET_CONFIG_ID with the name of thedataset configuration that generated the dataset you want to unlink.

    • LOCATION with thelocation of yourdataset and dataset configuration. For example,us-central1.

    You can also specify a full dataset configuration path. For example:

    gcloud storage insights dataset-configs delete-link projects/DESTINATION_PROJECT_ID/locations/LOCATION/datasetConfigs/DATASET_CONFIG_ID

    Replace:

    • DESTINATION_PROJECT_ID with the ID of theproject that contains the dataset configuration. For more informationabout project IDs, seeCreating and managing projects.

    • DATASET_CONFIG_ID with the name of thedataset configuration that generated the dataset you want to unlink.

    • LOCATION with thelocationof your dataset and dataset configuration. For example,us-central1.

JSON API

  1. Have gcloud CLIinstalled and initialized, which lets you generate an access token for theAuthorization header.

  2. Create a JSON file that contains the following information:

    {"name":"DATASET_NAME"}

    Replace:

    DATASET_NAME with the name of the dataset you want to unlink. For example,my_project.my_dataset276daa7e_2991_4f4f_b9d4_e354b48426a2.

  3. UsecURL to call theJSON API with anunlinkDataset DatasetConfig request:

    curl --request POST --data-binary @JSON_FILE_NAME \"https://storageinsights.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasetConfigs/DATASET_CONFIG_ID:unlinkDataset?" \  --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account=SERVICE_ACCOUNT)" \  --header "Accept: application/json" \  --header "Content-Type: application/json"

    Replace:

    • JSON_FILE_NAME with the path to theJSON file you created in the previous step.

    • PROJECT_ID with theID of the project that the dataset configuration belongs to.

    • LOCATION with thelocation of thedataset and dataset configuration. For example,us-central1.

    • DATASET_CONFIG_ID with the nameof the dataset configuration that generated the dataset you wantto unlink.

    • SERVICE_ACCOUNT with the service account. For example,test-service-account@test-project.iam.gserviceaccount.com.

View a dataset configuration

To view a dataset configuration, complete the following steps:

Console

  1. In the Google Cloud console, go to the Cloud StorageStorage Insights page.

    Go to Storage Insights

  2. Click the name of the dataset configuration you want to view.

    The dataset configuration details are displayed.

Command line

  1. To describe a dataset configuration, run thegcloud storage insights dataset-configs describecommand:

    gcloud storage insights dataset-configs describeDATASET_CONFIG_ID \  --location=LOCATION

    Replace:

    • DATASET_CONFIG_ID with the nameof the dataset configuration.

    • LOCATION with the location of the dataset anddataset configuration.

    You can also specify a full dataset configuration path. For example:

    gcloud storage insights dataset-configs describe projects/DESTINATION_PROJECT_ID/locations/LOCATION/datasetConfigs/DATASET_CONFIG_ID

    Replace:

    • DESTINATION_PROJECT_ID with the ID of theproject that contains the dataset configuration. For more informationabout project IDs, seeCreating and managing projects.

    • DATASET_CONFIG_ID with the name of thedataset configuration that generated the dataset you want to view.

    • LOCATION with thelocationof your dataset and dataset configuration. For example,us-central1.

JSON API

  1. Have gcloud CLIinstalled and initialized, which lets you generate an access token for theAuthorization header.

  2. UsecURL to call theJSON API with anGet DatasetConfig request:

    curl -X GET \"https://storageinsights.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasetConfigs/DATASET_CONFIG_ID" \  --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account=SERVICE_ACCOUNT)" \  --header "Accept: application/json" \  --header "Content-Type: application/json"

    Replace:

    • PROJECT_ID with theID of the project that the dataset configuration belongs to.

    • LOCATION with thelocation of thedataset and dataset configuration. For example,us-central1.

    • DATASET_CONFIG_ID with the nameof the dataset configuration.

    • SERVICE_ACCOUNT with the service account. For example,test-service-account@test-project..

List dataset configurations

To list the dataset configurations in a project, complete the following steps:

Console

  1. In the Google Cloud console, go to the Cloud StorageStorage Insights page.

    Go to Storage Insights

    The list of dataset configurations is displayed.

Command line

  1. To list dataset configurations in a project, run thegcloud storage insights dataset-configs listcommand:

    gcloud storage insights dataset-configs list --location=LOCATION

    Replace:

    • LOCATION with thelocation of thedataset and dataset configuration. For example,us-central1.

    You can use the following optional flags to specify the behavior of thelisting call:

    • Use--page-size to specify the maximum number of resultsto return per page.

    • Use--filter=FILTER to filter results. Formore information on how to use the--filter flag, rungcloud topic filters and refer to the documentation.

    • Use--sort-by=SORT_BY_VALUE to specifya comma-separated list of resource field key names to sort by.For example,--sort-by=DATASET_CONFIG_ID.

JSON API

  1. Have gcloud CLIinstalled and initialized, which lets you generate an access token for theAuthorization header.

  2. UsecURL to call theJSON API with aGet DatasetConfig request:

    curl -X GET \"https://storageinsights.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasetConfigs" \  --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account=SERVICE_ACCOUNT)" \  --header "Accept: application/json" \  --header "Content-Type: application/json"

    Replace:

    • PROJECT_ID with theID of the project that the dataset configuration belongs to.

    • LOCATION with thelocation of thedataset and dataset configuration. For example,us-central1.

    • SERVICE_ACCOUNT with the service account. For example,test-service-account@test-project.iam.gserviceaccount.com.

Update a dataset configuration

To update a dataset configuration, complete the following steps:

Console

  1. In the Google Cloud console, go to the Cloud StorageStorage Insights page.

    Go to Storage Insights

  2. Click the name of the dataset configuration you want to update.

  3. In theDataset configuration tab, clickEditto update the fields.

Command line

  1. To update a dataset configuration, run thegcloud storage insights dataset-configs updatecommand:

    gcloud storage insights dataset-configs updateDATASET_CONFIG_ID \  --location=LOCATION

    Replace:

    • DATASET_CONFIG_ID with the nameof the dataset configuration.

    • LOCATION with the location of the datasetand dataset configuration.

    Use the following flags to update properties of the dataset configuration:

    • Use--skip-verification to skip checks and failures fromthe verification process, which includes checks for requiredIAM permissions. If used, some or all buckets mightbe excluded from the dataset.

    • Use--retention-period-days=DAYS to specify themoving number of days of data to capture in the dataset snapshot. Forexample,90.

    • Use--activity-data-retention-period-days=ACTIVITY_RETENTION_PERIOD_DAYSto specify the retention period for theactivity data in thedataset. By default, activity data is included in the dataset, andinherits the retentionperiod of the dataset. To override the dataset retention period,specify the number of days to retain activity data for. To excludeactivity data, set theACTIVITY_RETENTION_PERIOD_DAYS to0.

    • Use--description=DESCRIPTION to writea description for the dataset configuration.

    • Use--organization=ORGANIZATION_ID to specifythe organization ID of the source project. If unspecified, defaults tothe source project's organization ID.

JSON API

  1. Have gcloud CLIinstalled and initialized, which lets you generate an access token for theAuthorization header.

  2. Create a JSON file that contains the following optional information:

    {"organization_number":"ORGANIZATION_ID","source_projects":{";project_numbers":"PROJECT_NUMBERS"},"retention_period_days":"RETENTION_PERIOD","activityDataRetentionPeriodDays":"ACTIVITY_DATA_RETENTION_PERIOD_DAYS"}

    Replace:

    • ORGANIZATION_ID with the resource ID oftheorganization to which the source projects belong to. Ifunspecified, defaults to the source project's organization ID.

    • PROJECT_NUMBERS with theproject numbers to include in the dataset. You canspecify one or more projects in a list format.

    • RETENTION_PERIOD with the movingnumber of days of data to capture in the dataset snapshot. Forexample,90.

    • ACTIVITY_DATA_RETENTION_PERIOD_DAYS withthe number of days ofactivity data to capture in thedataset snapshot. By default, activity data is included in thedataset, and inherits the retentionperiod of the dataset. To override the dataset retention period,specify the number of days to retain activity data for. To excludeactivity data, set theACTIVITY_RETENTION_PERIOD_DAYS to0.

  3. To update the dataset configuration, usecURL to call theJSON API with aPatch DatasetConfig request:

    curl -X PATCH --data-binary @JSON_FILE_NAME \"https://storageinsights.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasetConfigs/DATASET_CONFIG_ID?updateMask=UPDATE_MASK" \  --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account=SERVICE_ACCOUNT)" \  --header "Accept: application/json" \  --header "Content-Type: application/json"

    Replace:

    • JSON_FILE_NAME with the path to the JSON file you created in the previous step.

    • PROJECT_ID with theID of the project that the dataset configuration belongs to.

    • LOCATION with thelocation of thedataset and dataset configuration. For example,us-central1.

    • DATASET_CONFIG_ID with the name ofthe dataset configuration you want to update.

    • UPDATE_MASK is the comma-separated list of field names thatthis request updates. The fields use thefieldMask format and are part of theDatasetConfig resource.

    • SERVICE_ACCOUNT with the service account. For example,test-service-account@test-project.iam.gserviceaccount.com.

Delete a dataset configuration

To delete a dataset configuration, complete the following steps:

Console

  1. In the Google Cloud console, go to the Cloud StorageStorage Insights page.

    Go to Storage Insights

  2. Click the name of the dataset configuration you want to delete.

  3. ClickDelete.

Command line

  1. To delete a dataset configuration, run thegcloud storage insights dataset-configs deletecommand:

    gcloud storage insights dataset-configs deleteDATASET_CONFIG_ID \  --location=LOCATION

    Replace:

    • DATASET_CONFIG_ID with the nameof the dataset configuration you want to delete.

    • LOCATION with thelocation of thedataset and dataset configuration. For example,us-central1.

    Use the following flags to delete a dataset configuration:

    • Use--auto-delete-link to unlink the dataset that wasgenerated from the dataset configuration you want to delete. You mustunlink a dataset before you can delete the dataset configuration thatgenerated the dataset.

    You can also specify a full dataset configuration path. For example:

    gcloud storage insights dataset-configs delete projects/DESTINATION_PROJECT_ID/locations/LOCATION/datasetConfigs/DATASET_CONFIG_ID

JSON API

  1. Have gcloud CLIinstalled and initialized, which lets you generate an access token for theAuthorization header.

  2. UsecURL to call theJSON API with anDelete DatasetConfig request:

    curl -X DELETE \  "https://storageinsights.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasetConfigs/DATASET_CONFIG_ID" \  --header "Authorization: Bearer $(gcloud auth print-access-token --impersonate-service-account=SERVICE_ACCOUNT)" \    --header "Accept: application/json" \    --header "Content-Type: application/json"

    Replace:

    • PROJECT_ID with theID of the project that the dataset configuration belongs to.

    • LOCATION with thelocation of thedataset and dataset configuration. For example,us-central1.

    • DATASET_CONFIG_ID with the name ofthe dataset configuration you want to delete.

    • SERVICE_ACCOUNT with the service account. For example,test-service-account@test-project.iam.gserviceaccount.com.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.