Transfer data with Google Transfer Operators Stay organized with collections Save and categorize content based on your preferences.
Cloud Composer 3 | Cloud Composer 2 | Cloud Composer 1
Note: This page isnot yet revised for Cloud Composer 3 and displayscontent for Cloud Composer 2.This page demonstrates how to transfer data from other services with GoogleTransfer Operators in your DAGs.
About Google Transfer Operators
Google Transfer Operators are aset of Airflow operators that you can use to pull data from other services intoGoogle Cloud.
This guide shows operators for Azure FileShare Storage and Amazon S3 that workwith Cloud Storage. There are many more transfer operators that workwith services within Google Cloud and with services other thanGoogle Cloud.
Amazon S3 to Cloud Storage
This section demonstrates how to synchronize data from Amazon S3 to aCloud Storage bucket.
Install the Amazon provider package
Theapache-airflow-providers-amazon package contains the connectiontypes and functionality that interacts with Amazon S3.Install this PyPI package in yourenvironment.
Configure a connection to Amazon S3
The Amazon provider package provides a connection type for Amazon S3. Youcreate a connection of this type. The connection for Cloud Storage,namedgoogle_cloud_default is already set up in your environment.
Set up a connection to Amazon S3 in the following way:
- InAirflow UI, go toAdmin>Connections.
- Create a new connection.
- Select
Amazon S3as the connection type. - The following example uses a connection named
aws_s3. You can use thisname, or any other name for the connection. - Specify connection parameters as described in the Airflow documentation forAmazon Web Services Connection.For example, to set up a connection with AWS access keys, you generate anaccess key for your account on AWS, then provide the AWS access key ID as alogin the AWS secret access key as a password for the connection.
airflow-connections-aws_s3 thatstores credentials for theaws_s3 connection.Transfer data from Amazon S3
If you want to operate on the synchronized data later in another DAG or task,pull it to the/data folder of your environment's bucket. This folder issynchronized to other Airflow workers, so that tasks in your DAGcan operate on it.
The following example DAG does the following:
- Synchronizes contents of the
/data-for-gcsdirectory from an S3 bucketto the/data/from-s3/data-for-gcs/folder in your environment's bucket. - Waits for two minutes, for the data to synchronize to all Airflow workers inyour environment.
- Outputs the list of files in this directory using the
lscommand. Replacethis task with other Airflow operators that work with your data.
importdatetimeimportairflowfromairflow.providers.google.cloud.transfers.s3_to_gcsimportS3ToGCSOperatorfromairflow.operators.bash_operatorimportBashOperatorwithairflow.DAG('composer_sample_aws_to_gcs',start_date=datetime.datetime(2022,1,1),schedule=None,)asdag:transfer_dir_from_s3=S3ToGCSOperator(task_id='transfer_dir_from_s3',aws_conn_id='aws_s3',prefix='data-for-gcs',bucket='example-s3-bucket-transfer-operators',dest_gcs='gs://us-central1-example-environ-361f2312-bucket/data/from-s3/')sleep_2min=BashOperator(task_id='sleep_2min',bash_command='sleep 2m')print_dir_files=BashOperator(task_id='print_dir_files',bash_command='ls /home/airflow/gcs/data/from-s3/data-for-gcs/')transfer_dir_from_s3 >>sleep_2min >>print_dir_filesAzure FileShare to Cloud Storage
This section demonstrates how to synchronize data from Azure FileShare to aCloud Storage bucket.
Install the Microsoft Azure provider package
Theapache-airflow-providers-microsoft-azure package contains the connectiontypes and functionality that interacts with Microsoft Azure.Install this PyPI package in yourenvironment.
Configure a connection to Azure FileShare
The Microsoft Azure provider package provides a connection type for Azure FileShare. You create a connection of this type. The connection forCloud Storage, namedgoogle_cloud_default is already set up inyour environment.
Set up a connection to Azure FileShare in the following way:
- InAirflow UI, go toAdmin>Connections.
- Create a new connection.
- Select
Azure FileShareas the connection type. - The following example uses a connection named
azure_fileshare. You can usethis name, or any other name for the connection. - Specify connection parameters as described in the Airflow documentation forMicrosoft Azure File Share Connection.For example, you can specify a connection string for your storage accountaccess key.
airflow-connections-azure_filesharethat stores credentials for theazure_fileshare connection.Transfer data from Azure FileShare
If you want to operate on the synchronized data later in another DAG or task,pull it to the/data folder of your environment's bucket. This folder issynchronized to other Airflow workers, so that tasks in your DAGcan operate on it.
The following DAG does the following:
The following example DAG does the following:
- Synchronizes contents of the
/data-for-gcsdirectory from Azure File Shareto the/data/from-azurefolder in your environment's bucket. - Waits for two minutes, for the data to synchronize to all Airflow workers inyour environment.
- Outputs the list of files in this directory using the
lscommand. Replacethis task with other Airflow operators that work with your data.
importdatetimeimportairflowfromairflow.providers.google.cloud.transfers.azure_fileshare_to_gcsimportAzureFileShareToGCSOperatorfromairflow.operators.bash_operatorimportBashOperatorwithairflow.DAG('composer_sample_azure_to_gcs',start_date=datetime.datetime(2022,1,1),schedule=None,)asdag:transfer_dir_from_azure=AzureFileShareToGCSOperator(task_id='transfer_dir_from_azure',azure_fileshare_conn_id='azure_fileshare',share_name='example-file-share',directory_name='data-for-gcs',dest_gcs='gs://us-central1-example-environ-361f2312-bucket/data/from-azure/')sleep_2min=BashOperator(task_id='sleep_2min',bash_command='sleep 2m')print_dir_files=BashOperator(task_id='print_dir_files',bash_command='ls /home/airflow/gcs/data/from-azure/')transfer_dir_from_azure >>sleep_2min >>print_dir_filesWhat's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.