apache-iceberg
Here are 64 public repositories matching this topic...
Language:All
Sort:Most stars
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
- Updated
Jan 8, 2025 - Rust
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
- Updated
Mar 11, 2025 - Java
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB and MySQL
- Updated
Mar 25, 2025 - Go
Use SQL to build ELT pipelines on a data lakehouse.
- Updated
May 25, 2022 - JavaScript
The open-source, AI-native data stack
- Updated
Mar 25, 2025 - TypeScript
Lakehouse storage system benchmark
- Updated
Feb 22, 2023 - Scala
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
- Updated
Sep 2, 2023 - Dockerfile
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
- Updated
Jul 13, 2022 - Jupyter Notebook
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
- Updated
Jan 18, 2025 - Python
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
- Updated
Feb 15, 2025 - Python
An open-source, community-driven REST catalog for Apache Iceberg!
- Updated
Jun 26, 2024 - Go
Sample code to collect Apache Iceberg metrics for table monitoring
- Updated
Aug 18, 2024 - Python
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
- Updated
Sep 10, 2024 - Python
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
- Updated
Nov 14, 2024 - Java
A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino
- Updated
May 30, 2022 - Java
Hands-on workshop with Iceberg, Redpanda, Debezium and Kafka-Connect
- Updated
Oct 9, 2024 - Shell
Write-Audit-Publish on the lakehouse in pure Python with bauplan and DBOS
- Updated
Jan 8, 2025 - Python
Hands-on workshop with Apache Iceberg
- Updated
Mar 13, 2024 - Shell
This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.
- Updated
Sep 10, 2024 - Python
Miscellaneous codes and writings for MLOps
- Updated
Mar 25, 2025 - Jupyter Notebook
Improve this page
Add a description, image, and links to theapache-iceberg topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theapache-iceberg topic, visit your repo's landing page and select "manage topics."