apache-iceberg

Star

Here are 64 public repositories matching this topic...

Language:All

Filter by language

All64 Python25 Jupyter Notebook10 Java6 Dockerfile4 Shell3 Go2 JavaScript2 Scala2 TypeScript2 Rust1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

matanolabs /matano

Star1.5k

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS

rust aws security cloud big-data serverless alerting dfir secops cybersecurity cloud-native threat-hunting siem log-management aws-security security-tools cloud-security log-analytics apache-iceberg detection-engineering

UpdatedJan 8, 2025
Rust

apache /incubator-xtable

Star1k

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

apache-iceberg delta-lake apache-hudi

UpdatedMar 11, 2025
Java

datazip-inc /olake

Star585

Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB and MySQL

database replication s3 parquet elt cdc data-pipeline change-data-capture apache-iceberg lakehouse

UpdatedMar 25, 2025
Go

cuebook /cuelake

Star285

Use SQL to build ELT pipelines on a data lakehouse.

sql apache-spark etl pipelines data-engineering data-lake data-transfer delta data-integration upsert elt data-pipeline datalake data-ingestion spark-sql zeppelin-notebook apache-iceberg lakehouse incremental-updates

UpdatedMay 25, 2022
JavaScript

buster-so /buster

Star217

The open-source, AI-native data stack

data database ai analytics dbms business-intelligence warehouse apache-iceberg lakehouse starrocks

UpdatedMar 25, 2025
TypeScript

lhbench /lhbench

Star72

Lakehouse storage system benchmark

benchmark database cidr databricks apache-iceberg delta-lake lakehouse apache-hudi

UpdatedFeb 22, 2023
Scala

dominikhei /Local-Data-LakeHouse

Star63

Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

data-lake minio trino hive-metastore apache-iceberg lakehouse data-lakehouse

UpdatedSep 2, 2023
Dockerfile

dacort /modern-data-lake-storage-layers

Star48

Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

aws amazon-emr iceberg hudi apache-iceberg delta-lake apache-hudi

UpdatedJul 13, 2022
Jupyter Notebook

abeltavares /real-time-data-pipeline

Star40

📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.

docker open-source aws big-data etl s3 data-visualization data-engineering minio apache-flink apache-kafka real-time-data data-pipeline trino streaming-analytics apache-superset apache-iceberg lakehouse sql-analytics

UpdatedJan 18, 2025
Python

aws-samples /transactional-datalake-using-apache-iceberg-on-aws-glue

Star31

Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS

apache-spark aws-athena aws-glue aws-dms apache-iceberg

UpdatedFeb 15, 2025
Python

bodo-ai /denali

Star26

An open-source, community-driven REST catalog for Apache Iceberg!

go golang catalog iceberg apache-iceberg

UpdatedJun 26, 2024
Go

aws-samples /monitoring-apache-iceberg-table-metadata-layer

Star25

Sample code to collect Apache Iceberg metrics for table monitoring

aws apache-spark monitoring aws-lambda aws-cloudwatch data-quality aws-glue sam-cli apache-iceberg pyiceberg

UpdatedAug 18, 2024
Python

aws-samples /aws-glue-streaming-etl-with-apache-iceberg

Star23

Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3

apache-spark aws-athena aws-glue apache-iceberg aws-glue-streaming

UpdatedSep 10, 2024
Python

aws-samples /iceberg-streaming-examples

Star22

This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.

apache-spark structured-streaming apache-iceberg

UpdatedNov 14, 2024
Java

tj--- /iceberg-demo

Star19

A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino

java kafka gcs apache-flink apache-kafka flink iceberg flink-stream-processing trino apache-iceberg

UpdatedMay 30, 2022
Java

tlepple /data_origination_workshop

Star14

Hands-on workshop with Iceberg, Redpanda, Debezium and Kafka-Connect

python postgresql pyspark minio spark-streaming kafka-connect iceberg debezium redpanda apache-iceberg debeziumkafkaconnector redpanda-console

UpdatedOct 9, 2024
Shell

BauplanLabs /wap-with-bauplan-and-dbos

Star13

Write-Audit-Publish on the lakehouse in pure Python with bauplan and DBOS

python apache-iceberg lakehouse durable-execution dbos write-audit-publish bauplan

UpdatedJan 8, 2025
Python

tlepple /iceberg-intro-workshop

Star13

Hands-on workshop with Apache Iceberg

linux big-data spark dell pyspark minio spark-streaming object-storage spark-sql apache-iceberg spark-sql-s3 dell-object-storage

UpdatedMar 13, 2024
Shell

aws-samples /aws-glue-streaming-ingestion-from-kafka-to-apache-iceberg

Star11

This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.

aws-s3 pyspark apache-kafka apache-iceberg aws-msk aws-glue-streaming aws-msk-serverless