cuebook/cuelakePublic

NotificationsYou must be signed in to change notification settings
Fork28
Star287

Use SQL to build ELT pipelines on a data lakehouse.

License

Apache-2.0 license

287 stars 28 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 481 Commits
.github		.github
api		api
docs		docs
ui		ui
zeppelinConf		zeppelinConf
.gitallowed		.gitallowed
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cuelake.yaml		cuelake.yaml
docker-compose.dev.yaml		docker-compose.dev.yaml
nginx.conf		nginx.conf

Repository files navigation

With CueLake, you can use SQL to buildELT (Extract, Load, Transform) pipelines on adata lakehouse.

You writeSpark SQL statements inZeppelin notebooks. You then schedule these notebooks using workflows (DAGs).

To extract and load incremental data, you write simple select statements. CueLake executes these statements against your databases and then merges incremental data into your data lakehouse (powered byApache Iceberg).

To transform data, you write SQL statements to create views and tables in your data lakehouse.

CueLake uses Celery as the executor and celery-beat as the scheduler. Celery jobs triggerZeppelin notebooks. Zeppelin auto-starts and stops the Spark cluster for every scheduled run of notebooks.

To know why we are building CueLake, read ourviewpoint.

Getting started

CueLake uses Kuberneteskubectl for installation. Create a namespace and then install using thecuelake.yaml file. Creating a namespace is optional. You can install in the default namespace or in any existing namespace.

In the commands below, we usecuelake as the namespace.

kubectl create namespace cuelakekubectl apply -f https://raw.githubusercontent.com/cuebook/cuelake/main/cuelake.yaml -n cuelakekubectl port-forward services/lakehouse 8080:80 -n cuelake

Now visithttp://localhost:8080 in your browser.

If you don’t want to use Kubernetes and instead want to try it out on your local machine first, we’ll soon have a docker-compose version. Let us know if you’d want that sooner.

Features

Upsert Incremental data. CueLake uses Iceberg’smerge into query to automatically merge incremental data.
Create Views in data lakehouse. CueLake enables you to create views over Iceberg tables.
Create DAGs. Group notebooks into workflows and create DAGs of these workflows.
Elastically Scale Cloud Infrastructure. CueLake uses Zeppelin to auto create and delete Kubernetes resources required to run data pipelines.
In-built Scheduler to schedule your pipelines.
Automated maintenance of Iceberg tables. CueLake does automated maintenance of Iceberg tables - expires snapshots, removes old metadata and orphan files, compacts data files.
Monitoring. Get Slack alerts when a pipeline fails. CueLake maintains detailed logs.
Versioning in Github. Commit and maintain versions of your Zeppelin notebooks in Github.
Data Security. Your data always stays within your cloud account.

Current Limitations

Supports AWS S3 as a destination. Support for ADLS and GCS is in the roadmap.
Uses Apache Iceberg as an open table format. Delta support is in the roadmap.
Uses Celery for scheduling jobs. Support for Airflow is in the roadmap.

Support

For general help using CueLake, read thedocumentation, or go toGithub Discussions.

To report a bug or request a feature, open anissue.

Contributing

We'd love contributions to CueLake. Before you contribute, please first discuss the change you wish to make via anissue or adiscussion. Contributors are expected to adhere to ourcode of conduct.

About

Use SQL to build ELT pipelines on a data lakehouse.

cuelake.cuebook.ai

Releases3

v0.3 Latest

Jul 22, 2021

+ 2 releases

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Getting started

Features

Current Limitations

Support

Contributing

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases3

Uh oh!

Contributors6

Uh oh!

Languages