apache-hadoop
Here are 93 public repositories matching this topic...
Language:All
Sort:Most stars
MapReduce, Spark, Java, and Scala for Data Algorithms Book
- Updated
Oct 14, 2024 - Java
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
- Updated
Dec 4, 2025 - HTML
hadoop-cos(CosN文件系统)为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持,可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage
- Updated
Dec 4, 2025 - Java
Export Hadoop YARN (resource-manager) metrics in prometheus format
- Updated
Apr 15, 2025 - Go
This is projects of Cloud Computing Course
- Updated
Sep 2, 2022 - Python
Containerized Apache Hive Metastore for horizontally scalable Hive Metastore deployments
- Updated
Jan 31, 2022 - Dockerfile
A Spark application to merge small files on Hadoop
- Updated
Sep 7, 2020 - Scala
This repository provides a guide to preprocess and analyze the network intrusion data set using NumPy, Pandas, and matplotlib, and implement a random forest classifier machine learning model using Scikit-learn.
- Updated
May 8, 2024 - Jupyter Notebook
Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.
- Updated
May 22, 2024 - Java
An python implementation of Minimal Mapreduce Algorithms for Apache Spark
- Updated
Jun 22, 2020 - Python
A fast, scalable and distributed community detection algorithm based on CEIL scoring function.
- Updated
Jan 1, 2019 - Scala
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
- Updated
May 10, 2024 - Shell
This repository showcases a Medallion Architecture Data Lakehouse designed for both batch and real-time processing of e-commerce and marketing data. It supports comprehensive data analysis, reporting, and monitoring, providing a scalable solution for deriving insights from integrated datasets.
- Updated
Sep 26, 2024 - Jupyter Notebook
- Updated
Jan 5, 2021 - HCL
Simplified Hadoop Setup and Configuration Automation
- Updated
Sep 2, 2023 - Shell
Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.
- Updated
Jan 19, 2024 - Go
Apache Hadoop Cluster configuration with original apache/hadoop:3.4.1 docker image (with YARN)
- Updated
Jun 30, 2025 - Shell
An email spam filter using Apache Spark’s ML library
- Updated
Apr 14, 2021 - Python
Improve this page
Add a description, image, and links to theapache-hadoop topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theapache-hadoop topic, visit your repo's landing page and select "manage topics."