gcloud dataplex datascans create data-discovery Stay organized with collections Save and categorize content based on your preferences.
- NAME
- gcloud dataplex datascans create data-discovery - create a Dataplex data discovery scan job
- SYNOPSIS
gcloud dataplex datascans create data-discovery(DATASCAN:--location=LOCATION)--data-source-resource=DATA_SOURCE_RESOURCE[--description=DESCRIPTION][--display-name=DISPLAY_NAME][--labels=[KEY=VALUE,…]][--async|--validate-only][--bigquery-publishing-connection=BIGQUERY_PUBLISHING_CONNECTION--bigquery-publishing-dataset-location=BIGQUERY_PUBLISHING_DATASET_LOCATION--bigquery-publishing-dataset-project=BIGQUERY_PUBLISHING_DATASET_PROJECT--bigquery-publishing-table-type=BIGQUERY_PUBLISHING_TABLE_TYPE--storage-exclude-patterns=[PATTERN,…]--storage-include-patterns=[PATTERN,…]--csv-delimiter=CSV_DELIMITER--csv-disable-type-inference=CSV_DISABLE_TYPE_INFERENCE--csv-encoding=CSV_ENCODING--csv-header-row-count=CSV_HEADER_ROW_COUNT--csv-quote-character=CSV_QUOTE_CHARACTER--json-disable-type-inference=JSON_DISABLE_TYPE_INFERENCE--json-encoding=JSON_ENCODING][--on-demand=ON_DEMAND|--schedule=SCHEDULE|--one-time--ttl-after-scan-completion=TTL_AFTER_SCAN_COMPLETION][GCLOUD_WIDE_FLAG …]
- DESCRIPTION
- Allows users to auto discover BigQuery External and BigLake tables fromunderlying Cloud Storage buckets.
- EXAMPLES
- To create a data discovery scan
data-discovery-datascanin projecttest-projectlocated inus-central1on Cloud Storagebuckettest-bucket, run:gclouddataplexdatascanscreatedata-discoverydata-discovery-datascan--project=test-project--location=us-central1--data-source-resource="//storage.googleapis.com/projects/test-project/buckets/test-bucket" - POSITIONAL ARGUMENTS
- Datascan resource - Arguments and flags that define the Dataplex datascan youwant to create a data discovery scan for. The arguments in this group can beused to specify the attributes of this resource. (NOTE) Some attributes are notgiven arguments in this group but can be set in other ways.
To set the
projectattribute:- provide the argument
datascanon the command line with a fullyspecified name; - provide the argument
--projecton the command line; - set the property
core/project.
This must be specified.
DATASCAN- ID of the datascan or fully qualified identifier for the datascan.
To set the
dataScansattribute:- provide the argument
datascanon the command line.
This positional argument must be specified if any of the other arguments in thisgroup are specified.
- provide the argument
--location=LOCATION- The location of the Dataplex resource.
To set the
locationattribute:- provide the argument
datascanon the command line with a fullyspecified name; - provide the argument
--locationon the command line; - set the property
dataplex/location.
- provide the argument
- provide the argument
- Datascan resource - Arguments and flags that define the Dataplex datascan youwant to create a data discovery scan for. The arguments in this group can beused to specify the attributes of this resource. (NOTE) Some attributes are notgiven arguments in this group but can be set in other ways.
- REQUIRED FLAGS
--data-source-resource=DATA_SOURCE_RESOURCE- Fully-qualified service resource name of the cloud resource bucket that containsthe data for the data discovery scan, of the form:
//storage.googleapis.com/projects/{project_id_or_number}/buckets/{bucket_id}.
- OPTIONAL FLAGS
--description=DESCRIPTION- Description of the data discovery scan.
--display-name=DISPLAY_NAME- Display name of the data discovery scan.
--labels=[KEY=VALUE,…]- List of label KEY=VALUE pairs to add.
Keys must start with a lowercase character and contain only hyphens(
-), underscores (_), lowercase characters, andnumbers. Values must contain only hyphens (-), underscores(_), lowercase characters, and numbers. - At most one of --async | --validate-only can be specified.
At most one of these can be specified:
--async- Return immediately, without waiting for the operation in progress to complete.
--validate-only- Validate the create action, but don't actually perform it.
- Data spec for the data discovery scan.
- BigQuery publishing config arguments for the data discovery scan.
--bigquery-publishing-connection=BIGQUERY_PUBLISHING_CONNECTION- BigQuery connection to use for auto discovering cloud resource bucket to BigLaketables in format
projects/{project_id}/locations/{location_id}/connections/{connection_id}.Connection is required forBIGLAKEBigQuery publishing table type. --bigquery-publishing-dataset-location=BIGQUERY_PUBLISHING_DATASET_LOCATION- The location of the BigQuery dataset to publish BigLake external or non-BigLakeexternal tables to. If not specified, the dataset location will be set to thelocation of the data source resource. Refer tohttps://cloud.google.com/bigquery/docs/locations#supportedLocationsfor supported locations.
--bigquery-publishing-dataset-project=BIGQUERY_PUBLISHING_DATASET_PROJECT- The project of the BigQuery dataset to publish BigLake external or non-BigLakeexternal tables to. If not specified, the cloud resource bucket project will beused to create the dataset. The format is "projects/{project_id_or_number}.
--bigquery-publishing-table-type=BIGQUERY_PUBLISHING_TABLE_TYPE- BigQuery table type to discover the cloud resource bucket. Can be either
EXTERNALorBIGLAKE. If not specified, the table typewill be set toEXTERNAL.BIGQUERY_PUBLISHING_TABLE_TYPEmust be one of:BIGLAKE- Cloud Storage bucket is discovered to BigQuery BigLake tables.
EXTERNAL- Default value. Cloud Storage bucket is discovered to BigQuery External tables.
- Storage config arguments for the data discovery scan.
--storage-exclude-patterns=[PATTERN,…]- List of patterns that identify the data to exclude during discovery. Thesepatterns are interpreted as glob patterns used to match object names in theCloud Storage bucket. Exclude patterns will be applied before include patterns.
--storage-include-patterns=[PATTERN,…]- List of patterns that identify the data to include during discovery when only asubset of the data should be considered. These patterns are interpreted as globpatterns used to match object names in the Cloud Storage bucket.
- CSV options arguments for the data discovery scan.
--csv-delimiter=CSV_DELIMITER- Delimiter used to separate values in the CSV file. If not specified, thedelimiter will be set to comma (",").
--csv-disable-type-inference=CSV_DISABLE_TYPE_INFERENCE- Whether to disable the inference of data types for CSV data. If true, allcolumns are registered as strings.
--csv-encoding=CSV_ENCODING- Character encoding of the CSV file. If not specified, the encoding will be setto UTF-8.
--csv-header-row-count=CSV_HEADER_ROW_COUNT- The number of rows to interpret as header rows that should be skipped whenreading data rows. The default value is 1.
--csv-quote-character=CSV_QUOTE_CHARACTER- The character used to quote column values. Accepts " (double quotation mark) or' (single quotation mark). If unspecified, defaults to " (double quotationmark).
- JSON options arguments for the data discovery scan.
--json-disable-type-inference=JSON_DISABLE_TYPE_INFERENCE- Whether to disable the inference of data types for JSON data. If true, allcolumns are registered as strings.
--json-encoding=JSON_ENCODING- Character encoding of the JSON file. If not specified, the encoding will be setto UTF-8.
- BigQuery publishing config arguments for the data discovery scan.
- Data discovery scan execution settings.
- Data discovery scan scheduling and trigger settings.
At most one of these can be specified:
--on-demand=ON_DEMAND- If set, the scan runs one-time shortly after data discovery scan creation.
--schedule=SCHEDULE- Cron schedule (https://en.wikipedia.org/wiki/Cron) for running scansperiodically. To explicitly set a timezone to the cron tab, apply a prefix inthe cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or "TZ=${IANA_TIME_ZONE}". The${IANA_TIME_ZONE} may only be a valid string from IANA time zone database. Forexample,
CRON_TZ=America/New_York 1 * * * *orTZ=America/New_York 1 * * * *. This field is required for RECURRINGscans. - Data discovery scan one-time trigger settings.
--one-time- If set, the data discovery scan runs once, and auto deleted once thettl_after_scan_completion expires.
--ttl-after-scan-completion=TTL_AFTER_SCAN_COMPLETION- The time to live for one-time scans. Default value is 24 hours, minimum value is0 seconds, and maximum value is 365 days. The time is calculated from the datascan job completion time. If value is set as 0 seconds, the scan will beimmediately deleted upon job completion, regardless of whether the job succeededor failed. The value should be a number followed by a unit suffix "s". Example:"100s" for 100 seconds.The argument is only valid when --one-time is set.
- Data discovery scan scheduling and trigger settings.
- GCLOUD WIDE FLAGS
- These flags are available to all commands:
--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.Run
$gcloud helpfor details. - NOTES
- This variant is also available:
gcloudalphadataplexdatascanscreatedata-discovery
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-16 UTC.