bruin-data/iceberg-athena-redshift-demoPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

This is an example Bruin pipeline where Athena and Redshift are combined within the same pipeline.

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
demo-pipeline		demo-pipeline
.gitignore		.gitignore
README.md		README.md
pipeline.png		pipeline.png

Repository files navigation

Bruin - Multi-engine Shopify Pipeline with Athena and Redshift

This is an example of a multi-engine pipeline built withBruin, running onAthena andRedshift usingApache Iceberg.

The pipeline is rather simple:

Ingest customer and order data from Shopify into S3 using Iceberg, Glue & Athena
Build core entities using Athena
Build a simple mart using Redshift

The resulting assets in this pipeline are all queryable from both Athena and Redshift, and all stored in S3 in Iceberg format.

Prerequisites

InstallBruin CLI
Make sure you have a valid AWS account with access to Athena, Glue, Redshift and S3

For the purpose of this Demo the Redshift cluster ispublicly accessible, feel free to connect to your cluster in any way you want.

Set up Glue & Redshift

While Athena has a rather seamless integration with Glue, Redshift is a bit more work. In order to use Iceberg tables with Redshift, you need to create external schemas in Redshift that reference the Iceberg tables in S3.

In this demo, we have 3 schemas, define them in your Redshift cluster:

create external schema demo_rawfrom data catalogdatabase'demo_raw'region'<your-aws-region>'iam_role'<your-role-arn-here>';create external schema demo_shopifyfrom data catalogdatabase'demo_shopify'region'<your-aws-region>'iam_role'<your-role-arn-here>';create external schema demo_martfrom data catalogdatabase'demo_mart'region'<your-aws-region>'iam_role'<your-role-arn-here>';

The role ARN needs to be the role that has access to the S3 bucket where the Iceberg tables are stored, as well as Glue.

Set up Bruin credentials

Run the following command to set up your Bruin credentials:

bruin validate

This will create an empty.bruin.yml file in the root of the project.

Fill it with the following credentials:

default_environment:defaultenvironments:default:connections:athena:                -name:athena-defaultaccess_key_id:<your-access-key-id>secret_access_key:<your-secret-access-key>query_results_path:s3://<your-query-results-path>region:<your-aws-region>redshift:                -name:redshift-defaultusername:<your-redshift-username>password:<your-redshift-password>host:<your-redshift-host>port:<your-redshift-port>database:<your-redshift-database>pool_max_conns:<your-redshift-pool-max-conns>shopify:                -name:shopify-defaulturl:<your-shopify-url>api_key:<your-shopify-api-key>

If you need credentials for Shopify, you can simplycreate a development store in Shopify and use the credentials to pull the data.

Running the pipeline

You can run the pipeline by running the following command in the project root:

bruin run --start-date 2010-01-17T00:00:00.000Z --end-date 2025-03-17T23:59:59.999999999Z ./demo-pipeline

This will run the pipeline to get all the historical data from Shopify, and build all the tables in Athena and Redshift.

So what?

This pipeline is a demo of the kinds of multi-engine pipelines that can be built withBruin.

Bruin enables you to build pipelines that can run on different engines, and even different cloud providers. You can use the right tool for the job, and make the most of your data while building cost-effective solutions.

Some other things you can do with Bruin:

You can pull data from awide variety of sources, not just Shopify.
You can run stuff on Athena, Redshift, BigQuery, Snowflake, and more, in the same pipeline.
You can run Python within the same pipeline, use PyIceberg to read and write Iceberg tables.
You can run some parts of your pipeline in cheaper engines, and only run the crucial parts in more powerful engines.

The possibilities are endless!

Hit us up if these sound interesting to you.

About

This is an example Bruin pipeline where Athena and Redshift are combined within the same pipeline.

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Bruin - Multi-engine Shopify Pipeline with Athena and Redshift

Prerequisites

Set up Glue & Redshift

Set up Bruin credentials

Running the pipeline

So what?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Movatterモバイル変換

bruin-data/iceberg-athena-redshift-demo

Folders and files

Latest commit

History

Repository files navigation

Bruin - Multi-engine Shopify Pipeline with Athena and Redshift

Prerequisites

Set up Glue & Redshift

Set up Bruin credentials

Running the pipeline

So what?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Packages