big-data
Here are 5,304 public repositories matching this topic...
Language:All
Sort:Most stars
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
- Updated
Jan 4, 2026
ClickHouse® is a real-time analytics database management system
- Updated
Feb 7, 2026 - C++
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
- Updated
Mar 20, 2024 - Python
Open-source IoT Platform - Device management, data collection, processing and visualization.
- Updated
Feb 7, 2026 - Java
An open source cybersecurity protocol for syncing decentralized graph data.
- Updated
Feb 5, 2026 - JavaScript
The official home of the Presto distributed SQL query engine for big data
- Updated
Feb 7, 2026 - Java
The Data Engineering Cookbook
- Updated
Jan 17, 2026 - Python
PredictionIO, a machine learning server for developers and ML engineers.
- Updated
Jan 9, 2021 - Scala
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
- Updated
Feb 7, 2026 - Java
A distributed, fast open-source graph database featuring horizontal scalability and high availability
- Updated
Oct 22, 2025 - C++
CMAK is a tool for managing Apache Kafka clusters
- Updated
Aug 2, 2023 - Scala
Open-Source Web UI for Apache Kafka Management
- Updated
Jul 26, 2024 - Java
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
- Updated
Feb 7, 2026 - Java
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
- Updated
Feb 6, 2026 - Rust
The most widely used Python to C compiler
- Updated
Feb 7, 2026 - Cython
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
- Updated
Feb 7, 2026 - C++
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
- Updated
Feb 7, 2026 - Scala
Improve this page
Add a description, image, and links to thebig-data topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thebig-data topic, visit your repo's landing page and select "manage topics."