Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
This repository was archived by the owner on Sep 4, 2024. It is now read-only.
/target-bigqueryPublic archive

Target-bigquery generated using meltano SDK

NotificationsYou must be signed in to change notification settings

potloc/target-bigquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

target-bigquery is a Singer target for BigQuery.

The most versatile target for BigQuery. Extremely performant, resource efficient, andfast in all configurations of which there are 7. Denormalized variants indicate data isunpacked during load with a resultant schema in BigQuery based on the tap. Non-denormalizedmeans we have a fixed schema which loads all data into an unstructured JSON column.They are both useful patterns. The latter allowing bigquery to work with schemaless or rapidlychanging sources such as MongoDB seamlessly, while the former is faster to query.

The gap between the methods is closed due in part to this target automatically generatinga VIEW which will unpack a JSON based ingestion source for you. Unless operating attens of millions of rows with 3-4-500 key objects, its reasonably performant. It does howeverfall off given enough scale in the current state of the engineering at Google regarding BQ. Choose wisely.

Sink names (you will most liekly be configuring this target via yaml or json so scroll on for the config table):

# batch job basedBigQueryBatchDenormalizedSinkBigQueryBatchSink# gcs staging bucket -> load jobBigQueryGcsStagingDenormalizedSinkBigQueryGcsStagingSink# streaming apiBigQueryLegacyStreamingDenormalizedSinkBigQueryLegacyStreamingSink# storage write apiBigQueryStorageWriteSink

Old Header (still true)

This is the first truly unstructured sink for BigQuery leveraging the recent GA featurein BigQuery for JSON support. This allows this target to load from essentially any tapregardless of the quality or explicitness of its jsonschema. Observations in existing tapsnote things such aspatternProperties used in jsonschema objects which break down onall existing BigQuery taps due to the previous need for strong typing. Also taps such asMongoDB which inherently deal with unstructured data are seamlessly enabled by this target.

Built with theMeltano Target SDK.

Installation

pipx install target-bigquery

Configuration

Settings

SettingRequiredDefaultDescription
credentials_pathFalseNoneThe path to a gcp credentials json file.
credentials_jsonFalseNoneA JSON string of your service account JSON file.
projectTrueNoneThe target GCP project to materialize data into.
datasetTrueNoneThe target dataset to materialize data into.
batch_sizeFalse250000The maximum number of rows to send in a single batch or commit.
timeoutFalse600Default timeout for batch_job and gcs_stage derived LoadJobs.
denormalizedFalse0Determines whether to denormalize the data before writing to BigQuery. A false value will write data using a fixed JSON column based schema, while a true value will write data using a dynamic schema derived from the tap. Denormalization is only supported for the batch_job, streaming_insert, and gcs_stage methods.
methodTruebatch_jobThe method to use for writing to BigQuery.
append_columnsFalse1In the case of a denormalize sync, whether to append new columns to existing schema
generate_viewFalse0Determines whether to generate a view based on the SCHEMA message parsed from the tap. Only valid if denormalized=false meaning you are using the fixed JSON column based schema.
gcs_bucketFalseNoneThe GCS bucket to use for staging data. Only used if method is gcs_stage.
gcs_buffer_sizeFalse15The size of the buffer for GCS stream before flushing a multipart upload chunk. Value in megabytes. Only used if method is gcs_stage. This eager flushing in conjunction with zlib results in very low memory usage.
gcs_max_file_sizeFalse250The maximum file size in megabytes for a bucket file. This is used as the batch indicator for GCS based ingestion. Only used if method is gcs_stage.
stream_mapsFalseNoneConfig object for stream maps capability. For more information check outStream Maps.
stream_map_configFalseNoneUser-defined config values to be used within map expressions.
flattening_enabledFalseNone'True' to enable schema flattening and automatically expand nested properties.
flattening_max_depthFalseNoneThe max depth to flatten schemas.

A full list of supported settings and capabilities is available by running:target-bigquery --about

Configure using environment variables

This Singer target will automatically import any environment variables within the working directory's.env if the--config=ENV is provided, such that config values will be considered if a matchingenvironment variable is set either in the terminal context or in the.env file.

Source Authentication and Authorization

https://cloud.google.com/bigquery/docs/authentication

Capabilities

  • about
  • stream-maps
  • schema-flattening

Usage

You can easily runtarget-bigquery by itself or in a pipeline usingMeltano.

Executing the Target Directly

target-bigquery --versiontarget-bigquery --help# Test using the "Carbon Intensity" sample:tap-carbon-intensity| target-bigquery --config /path/to/target-bigquery-config.json

Developer Resources

Initialize your Development Environment

pipx install poetrypoetry install

Create and Run Tests

Create tests within thetarget_bigquery/tests subfolder andthen run:

poetry run pytest

You can also test thetarget-bigquery CLI interface directly usingpoetry run:

poetry run target-bigquery --help

Testing withMeltano

Note: This target will work in any Singer environment and does not require Meltano.Examples here are for convenience and to streamline end-to-end orchestration scenarios.

Next, install Meltano (if you haven't already) and any needed plugins:

# Install meltanopipx install meltano# Initialize meltano within this directorycd target-bigquerymeltano install

Now you can test and orchestrate using Meltano:

# Test invocation:meltano invoke target-bigquery --version# OR run a test `elt` pipeline with the Carbon Intensity sample tap:meltano elt tap-carbon-intensity target-bigquery

SDK Dev Guide

See thedev guide for more instructions on how to use the Meltano SDK todevelop your own Singer taps and targets.

About

Target-bigquery generated using meltano SDK

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp