hdfs
Here are 1,051 public repositories matching this topic...
Language:All
Sort:Most stars
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.
- Updated
Dec 18, 2025 - Go
Ceph is a distributed object, block, and file storage platform
- Updated
Dec 17, 2025 - C++
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
- Updated
Dec 17, 2025 - Go
Utils for streaming large files (S3, HDFS, gzip, bz2...)
- Updated
Dec 1, 2025 - Python
The Universal Storage Engine
- Updated
Dec 17, 2025 - C++
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL seamlessly
- Updated
Dec 12, 2025 - Java
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
- Updated
Nov 6, 2025 - Python
Real Time Analytics and Data Pipelines based on Spark Streaming
- Updated
Oct 24, 2019 - Scala
Deprecated - See Lenses.io Community Edition
- Updated
May 7, 2025 - JavaScript
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
- Updated
Oct 31, 2025 - FreeMarker
Fundamentals of Spark with Python (using PySpark), code examples
- Updated
Oct 29, 2022 - Jupyter Notebook
Improve this page
Add a description, image, and links to thehdfs topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thehdfs topic, visit your repo's landing page and select "manage topics."