Transfer data with Google Transfer Operators

Cloud Composer 3 | Cloud Composer 2 | Cloud Composer 1

Note: This page isnot yet revised for Cloud Composer 3 and displayscontent for Cloud Composer 2.

This page demonstrates how to transfer data from other services with GoogleTransfer Operators in your DAGs.

About Google Transfer Operators

Google Transfer Operators are aset of Airflow operators that you can use to pull data from other services intoGoogle Cloud.

This guide shows operators for Azure FileShare Storage and Amazon S3 that workwith Cloud Storage. There are many more transfer operators that workwith services within Google Cloud and with services other thanGoogle Cloud.

Amazon S3 to Cloud Storage

This section demonstrates how to synchronize data from Amazon S3 to aCloud Storage bucket.

Install the Amazon provider package

Theapache-airflow-providers-amazon package contains the connectiontypes and functionality that interacts with Amazon S3.Install this PyPI package in yourenvironment.

Configure a connection to Amazon S3

The Amazon provider package provides a connection type for Amazon S3. Youcreate a connection of this type. The connection for Cloud Storage,namedgoogle_cloud_default is already set up in your environment.

Set up a connection to Amazon S3 in the following way:

  1. InAirflow UI, go toAdmin>Connections.
  2. Create a new connection.
  3. SelectAmazon S3 as the connection type.
  4. The following example uses a connection namedaws_s3. You can use thisname, or any other name for the connection.
  5. Specify connection parameters as described in the Airflow documentation forAmazon Web Services Connection.For example, to set up a connection with AWS access keys, you generate anaccess key for your account on AWS, then provide the AWS access key ID as alogin the AWS secret access key as a password for the connection.
Note: We recommend tostore all credentials for connections in Secret Manager.For more information, seeConfigure Secret Manager for your environment.For example, you can create a secret namedairflow-connections-aws_s3 thatstores credentials for theaws_s3 connection.

Transfer data from Amazon S3

If you want to operate on the synchronized data later in another DAG or task,pull it to the/data folder of your environment's bucket. This folder issynchronized to other Airflow workers, so that tasks in your DAGcan operate on it.

The following example DAG does the following:

  • Synchronizes contents of the/data-for-gcs directory from an S3 bucketto the/data/from-s3/data-for-gcs/ folder in your environment's bucket.
  • Waits for two minutes, for the data to synchronize to all Airflow workers inyour environment.
  • Outputs the list of files in this directory using thels command. Replacethis task with other Airflow operators that work with your data.
importdatetimeimportairflowfromairflow.providers.google.cloud.transfers.s3_to_gcsimportS3ToGCSOperatorfromairflow.operators.bash_operatorimportBashOperatorwithairflow.DAG('composer_sample_aws_to_gcs',start_date=datetime.datetime(2022,1,1),schedule=None,)asdag:transfer_dir_from_s3=S3ToGCSOperator(task_id='transfer_dir_from_s3',aws_conn_id='aws_s3',prefix='data-for-gcs',bucket='example-s3-bucket-transfer-operators',dest_gcs='gs://us-central1-example-environ-361f2312-bucket/data/from-s3/')sleep_2min=BashOperator(task_id='sleep_2min',bash_command='sleep 2m')print_dir_files=BashOperator(task_id='print_dir_files',bash_command='ls /home/airflow/gcs/data/from-s3/data-for-gcs/')transfer_dir_from_s3 >>sleep_2min >>print_dir_files

Azure FileShare to Cloud Storage

This section demonstrates how to synchronize data from Azure FileShare to aCloud Storage bucket.

Install the Microsoft Azure provider package

Theapache-airflow-providers-microsoft-azure package contains the connectiontypes and functionality that interacts with Microsoft Azure.Install this PyPI package in yourenvironment.

Configure a connection to Azure FileShare

The Microsoft Azure provider package provides a connection type for Azure FileShare. You create a connection of this type. The connection forCloud Storage, namedgoogle_cloud_default is already set up inyour environment.

Set up a connection to Azure FileShare in the following way:

  1. InAirflow UI, go toAdmin>Connections.
  2. Create a new connection.
  3. SelectAzure FileShare as the connection type.
  4. The following example uses a connection namedazure_fileshare. You can usethis name, or any other name for the connection.
  5. Specify connection parameters as described in the Airflow documentation forMicrosoft Azure File Share Connection.For example, you can specify a connection string for your storage accountaccess key.
Note: We recommend tostore all credentials for connections in Secret Manager.For more information, seeConfigure Secret Manager for your environment.For example, you can create a secret namedairflow-connections-azure_filesharethat stores credentials for theazure_fileshare connection.

Transfer data from Azure FileShare

If you want to operate on the synchronized data later in another DAG or task,pull it to the/data folder of your environment's bucket. This folder issynchronized to other Airflow workers, so that tasks in your DAGcan operate on it.

The following DAG does the following:

The following example DAG does the following:

  • Synchronizes contents of the/data-for-gcs directory from Azure File Shareto the/data/from-azure folder in your environment's bucket.
  • Waits for two minutes, for the data to synchronize to all Airflow workers inyour environment.
  • Outputs the list of files in this directory using thels command. Replacethis task with other Airflow operators that work with your data.
importdatetimeimportairflowfromairflow.providers.google.cloud.transfers.azure_fileshare_to_gcsimportAzureFileShareToGCSOperatorfromairflow.operators.bash_operatorimportBashOperatorwithairflow.DAG('composer_sample_azure_to_gcs',start_date=datetime.datetime(2022,1,1),schedule=None,)asdag:transfer_dir_from_azure=AzureFileShareToGCSOperator(task_id='transfer_dir_from_azure',azure_fileshare_conn_id='azure_fileshare',share_name='example-file-share',directory_name='data-for-gcs',dest_gcs='gs://us-central1-example-environ-361f2312-bucket/data/from-azure/')sleep_2min=BashOperator(task_id='sleep_2min',bash_command='sleep 2m')print_dir_files=BashOperator(task_id='print_dir_files',bash_command='ls /home/airflow/gcs/data/from-azure/')transfer_dir_from_azure >>sleep_2min >>print_dir_files

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.