Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

License

NotificationsYou must be signed in to change notification settings

astronomer/astro-sdk

Repository files navigation

workflows made easy

Python versionsLicenseDevelopment StatusPyPI downloadsContributorsCommit activitypre-commit.ci statusCIcodecov

Astro Python SDK is a Python SDK for rapid development of extract, transform, and load workflows inApache Airflow. It allows you to express your workflows as a set of data dependencies without having to worry about ordering and tasks. The Astro Python SDK is maintained byAstronomer.

Prerequisites

  • Apache Airflow >= 2.1.0.

Install

The Astro Python SDK is available atPyPI. Use the standard Pythoninstallation tools.

To install a cloud-agnostic version of the SDK, run:

pip install astro-sdk-python

You can also install dependencies for using the SDK with popular cloud providers:

pip install astro-sdk-python[amazon,google,snowflake,postgres]

Quickstart

  1. Ensure that your Airflow environment is set up correctly by running the following commands:

    export AIRFLOW_HOME=`pwd`airflow db init

    Note:

    • AIRFLOW__CORE__ENABLE_XCOM_PICKLING no longer needs to be enabled from astro-sdk-python release 1.2 and above.
    • For airflow version < 2.5 and astro-sdk-python release < 1.3 Users can either use a custom XCom backendAstroCustomXcomBackend with Xcom pickling disabled (or) enable Xcom pickling.
    • For airflow version >= 2.5 and astro-sdk-python release >= 1.3.3 Users can either useAirflow's Xcom backend with Xcom pickling disabled (or) enable Xcom pickling.

    The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.

    Read more:enable_xcom_pickling andpickle:

  2. Create a SQLite database for the example to run with:

    # The sqlite_default connection has different host for MAC vs. Linuxexport SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml| grep host| awk'{print $2}'`sqlite3"$SQL_TABLE_NAME""VACUUM;"
  3. Copy the following workflow into a file namedcalculate_popular_movies.py and add it to thedags directory of your Airflow project:

    fromdatetimeimportdatetime
    fromairflowimportDAG
    fromastroimportsqlasaql
    fromastro.filesimportFile
    fromastro.sql.tableimportTable
    @aql.transform()
    deftop_five_animations(input_table:Table):
    return"""
    SELECT Title, Rating
    FROM {{input_table}}
    WHERE Genre1=='Animation'
    ORDER BY Rating desc
    LIMIT 5;
    """
    withDAG(
    "calculate_popular_movies",
    schedule_interval=None,
    start_date=datetime(2000,1,1),
    catchup=False,
    )asdag:
    imdb_movies=aql.load_file(
    File(
    "https://raw.githubusercontent.com/astronomer/astro-sdk/main/tests/data/imdb.csv"
    ),
    output_table=Table(conn_id="sqlite_default"),
    )
    top_five_animations(
    input_table=imdb_movies,
    output_table=Table(name="top_animation"),
    )
    aql.cleanup()

    Alternatively, you can downloadcalculate_popular_movies.py

     curl -O https://raw.githubusercontent.com/astronomer/astro-sdk/main/example_dags/calculate_popular_movies.py
  4. Run the example DAG:

    airflow dagstest calculate_popular_movies`date -Iseconds`
  5. Check the result of your DAG by running:

    sqlite3"$SQL_TABLE_NAME""select * from top_animation;"".exit"

    You should see the following output:

    $ sqlite3"$SQL_TABLE_NAME""select * from top_animation;"".exit"Toy Story 3 (2010)|8.3Inside Out (2015)|8.2How to Train Your Dragon (2010)|8.1Zootopia (2016)|8.1How to Train Your Dragon 2 (2014)|7.9

Supported technologies

FileLocation
local
http
https
gs
gdrive
s3
wasb
wasbs
azure
sftp
ftp
FileType
csv
json
ndjson
parquet
xls
xlsx
Database
postgres
sqlite
delta
bigquery
snowflake
redshift
mssql
duckdb
mysql

Available operations

The following are some key functions available in the SDK:

  • load_file: Load a given file into a SQL table
  • transform: Applies a SQL select statement to a source table and saves the result to a destination table
  • drop_table: Drops a SQL table
  • run_raw_sql: Run any SQL statement without handling its output
  • append: Insert rows from the source SQL table into the destination SQL table, if there are no conflicts
  • merge: Insert rows from the source SQL table into the destination SQL table, depending on conflicts:
    • ignore: Do not add rows that already exist
    • update: Replace existing rows with new ones
  • export_file: Export SQL table rows into a destination file
  • dataframe: Export given SQL table into in-memory Pandas data-frame

For a full list of available operators, see theSDK reference documentation.

Documentation

The documentation is a work in progress--we aim to follow theDiátaxis system:

  • Getting Started Tutorial: A hands-on introduction to the Astro Python SDK
  • How-to guides: Simple step-by-step user guides to accomplish specific tasks
  • Reference guide: Commands, modules, classes and methods
  • Explanation: Clarification and discussion of key decisions when designing the project

Changelog

The Astro Python SDK follows semantic versioning for releases. Check thechangelog for the latest changes.

Release managements

To learn more about our release philosophy and steps, seeManaging Releases.

Contribution guidelines

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

Read theContribution Guideline for a detailed overview on how to contribute.

Contributors and maintainers should abide by theContributor Code of Conduct.

License

Apache Licence 2.0


[8]ページ先頭

©2009-2025 Movatter.jp