apache-spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 1,908 public repositories matching this topic...
Language:All
Sort:Most stars
Open source platform for the machine learning lifecycle
- Updated
Mar 18, 2025 - Python
Simple and Distributed Machine Learning
- Updated
Mar 12, 2025 - Scala
lakeFS - Data version control for your data lake | Git for data
- Updated
Mar 18, 2025 - Go
酷玩 Spark: Spark 源代码解析、Spark 类库等
- Updated
May 18, 2022 - Scala
Interactive and Reactive Data Science using Scala and Spark.
- Updated
May 16, 2023 - JavaScript
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
- Updated
Mar 18, 2025 - Go
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
- Updated
Jan 21, 2025 - Jupyter Notebook
Apache Spark docker image
- Updated
Apr 21, 2023 - Shell
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
- Updated
Feb 19, 2025 - C#
Feathr – A scalable, unified data and AI engineering platform for enterprise
- Updated
Apr 4, 2024 - Scala
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
- Updated
Aug 16, 2021 - Java
A curated list of awesome Apache Spark packages and resources.
- Updated
Oct 24, 2024 - Shell
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
- Updated
Jul 18, 2022 - Jupyter Notebook
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
- Updated
Mar 9, 2020 - Python
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
- Updated
Jan 28, 2025 - Scala
PySpark + Scikit-learn = Sparkit-learn
- Updated
Dec 31, 2020 - Python
(Deprecated) Scikit-learn integration package for Apache Spark
- Updated
Dec 3, 2019 - Python
MapReduce, Spark, Java, and Scala for Data Algorithms Book
- Updated
Oct 14, 2024 - Java
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
- Updated
Mar 17, 2025 - Scala
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 426 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia