massive-datasets
Here are 27 public repositories matching this topic...
Language:All
Sort:Most stars
PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.
- Updated
Aug 27, 2025 - Java
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
- Updated
Nov 28, 2025 - Python
PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.
- Updated
Aug 29, 2025 - Makefile
Command line tool to quickly generate a lot of files in a lot of directories
- Updated
Feb 18, 2022 - C++
Building a Bloom Filter on English dictionary words
- Updated
Oct 7, 2021 - Jupyter Notebook
The project is based on the analysis of the "IBM Transactions for Anti Money Laundering" dataset published on Kaggle. The task is to implement a model which predicts whether or not a transaction is illicit, using the attribute "Is Laundering" as a label to be predicted.
- Updated
Aug 12, 2024 - Jupyter Notebook
Building PageRank algorithm on Web Graph around Stanford.edu using NetworkX python library
- Updated
Oct 7, 2021 - Jupyter Notebook
gipa -- compression/decompression tool to package compress and encode massive archive files with floating-point data
- Updated
Sep 14, 2017 - Python
Scalable, chunk-wise K-anonymization tool based on the Optimal Lattice Anonymization (OLA) algorithm. It is designed to handle large datasets by processing them in manageable chunks, ensuring data privacy while maintaining utility.
- Updated
Nov 11, 2025 - Python
This repository contains a LaTeX file that generates a PDF document comprising comprehensive notes for the course "Algorithms for Massive Datasets"
- Updated
Aug 12, 2024 - TeX
Permite abrir e manipular arquivos massivos de texto/dados cujo seria impossivel abrir em um computador, por exemplo um arquivo de texto de +20gb, permite manipular o arquivo pegando apenas as linhas necessárias sem travar o computador por falta de memória.
- Updated
Feb 12, 2022 - Python
Building node2vec algorithm
- Updated
Oct 7, 2021 - Jupyter Notebook
Calculate statistical measures of one column in big data Datasets with these simply Hadoop Application
- Updated
Feb 24, 2017 - Java
📺 Content Recommendation System for the Netflix Prize Challenge with Collaborative Filtering.
- Updated
Feb 17, 2024 - Jupyter Notebook
Automated massive geolocator of addresses with parallel processing.
- Updated
Mar 20, 2025 - Jupyter Notebook
word count in Spark
- Updated
Oct 6, 2021 - Jupyter Notebook
Use of pyspark framework on processing/analysis of enormous text data
- Updated
Jul 4, 2025 - Jupyter Notebook
Training the MASSIVE dataset by Amazon(english-US, German-DE and Swahili-KE)
- Updated
Oct 2, 2023 - Python
TF-Package: Multiple-Input Multiple-Output Keras Data-Generator for massive and complex datasets
- Updated
Jan 2, 2023 - Python
Improve this page
Add a description, image, and links to themassive-datasets topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with themassive-datasets topic, visit your repo's landing page and select "manage topics."