Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

data-lake

Here are 359 public repositories matching this topic...

lakeFS

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

  • UpdatedFeb 20, 2026
  • Python

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

  • UpdatedFeb 14, 2026
  • Scala
Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

  • UpdatedAug 26, 2022
  • Python

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

  • UpdatedJan 1, 2024
  • Java
goodreads_etl_pipeline

Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.

  • UpdatedFeb 20, 2026
  • Rust

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

  • UpdatedJan 12, 2023
  • Java

Apache Amoro(incubating) is a Lakehouse management system built on open data lake formats.

  • UpdatedFeb 20, 2026
  • Java

An efficient storage and compute engine for both on-prem and cloud-native data analytics.

  • UpdatedFeb 20, 2026
  • Java
wren-engine

🤖 The Semantic Engine for Model Context Protocol(MCP) Clients and AI Agents 🔥

  • UpdatedFeb 16, 2026
  • Java

Generic Data Ingestion & Dispersal Library for Hadoop

  • UpdatedMar 19, 2023
  • Java
data-lakes-on-aws

Enterprise-grade, production-hardened, serverless data lake on AWS

  • UpdatedOct 1, 2025
  • Python

Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required

  • UpdatedNov 5, 2020
  • Jupyter Notebook
gigapi

GigAPI is a Timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐

  • UpdatedOct 20, 2025
  • Go

BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)

  • UpdatedApr 7, 2025
  • C++
amazon-s3-find-and-forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

  • UpdatedJan 24, 2026
  • Python

Improve this page

Add a description, image, and links to thedata-lake topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thedata-lake topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2026 Movatter.jp