Movatterモバイル変換

Skip to content

#

massive-datasets

Here are 27 public repositories matching this topic...

Language:All

Filter by language

All27 Jupyter Notebook11 Python7 Java3 JavaScript2 C#1 C++1 Makefile1 TeX1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

polardb /polardbx-sql

PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.

mysql distributed-transactions cloud-native high-availability relational-database high-concurrency massive-datasets htap horizontal-scaling enterprise-class

UpdatedAug 27, 2025
Java

heat

helmholtz-analytics /heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

python data-science machine-learning hpc gpu numpy mpi pytorch distributed parallelism data-analytics tensors data-processing multi-gpu mpi4py massive-datasets multi-node-cluster array-api

UpdatedNov 28, 2025
Python

polardb /polardbx

PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.

mysql distributed-transactions cloud-native high-availability relational-databases high-concurrency massive-datasets htap horizontal-scaling enterprise-class

UpdatedAug 29, 2025
Makefile

joshuaboud /gen-dataset

Command line tool to quickly generate a lot of files in a lot of directories

linux benchmarking evaluation multithreading dataset dataset-generation massive-datasets cli-tool dataset-generator

UpdatedFeb 18, 2022
C++

rajeshidumalla /Bloom-Filter

Building a Bloom Filter on English dictionary words

python data-science machine-learning bloom-filter data-analysis nltk-library massive-datasets

UpdatedOct 7, 2021
Jupyter Notebook

FedericoBruzzone /anti-money-laundering

The project is based on the analysis of the "IBM Transactions for Anti Money Laundering" dataset published on Kaggle. The task is to implement a model which predicts whether or not a transaction is illicit, using the attribute "Is Laundering" as a label to be predicted.

machine-learning machine-learning-algorithms pyspark massive-datasets

UpdatedAug 12, 2024
Jupyter Notebook

rajeshidumalla /PageRank

Building PageRank algorithm on Web Graph around Stanford.edu using NetworkX python library

python data-science machine-learning spark numpy pagerank-algorithm pandas data-analysis massive-datasets networkx-library

UpdatedOct 7, 2021
Jupyter Notebook

gmalik9 /floating_point_data_compressor

gipa -- compression/decompression tool to package compress and encode massive archive files with floating-point data

compression data-visualization autoencoder compressor data-compression representation representation-learning floating-point massive-datasets

UpdatedSep 14, 2017
Python

datakaveri /k-anonymisation-SKALD

Scalable, chunk-wise K-anonymization tool based on the Optimal Lattice Anonymization (OLA) algorithm. It is designed to handle large datasets by processing them in manageable chunks, ensuring data privacy while maintaining utility.

encoding chunking ola large-dataset massive-datasets k-anonymity l-diversity t-closeness skald discernibility record-suppression predictive-tagging

UpdatedNov 11, 2025
Python

FedericoBruzzone /algorithms-for-massive-datasets

This repository contains a LaTeX file that generates a PDF document comprising comprehensive notes for the course "Algorithms for Massive Datasets"

deep-learning algorithms recommender-system massive-datasets unimi linkanalysis

UpdatedAug 12, 2024
TeX

Alex4gtx /Massive-Data-Handler

Permite abrir e manipular arquivos massivos de texto/dados cujo seria impossivel abrir em um computador, por exemplo um arquivo de texto de +20gb, permite manipular o arquivo pegando apenas as linhas necessárias sem travar o computador por falta de memória.

big-data dictionaries python-script massive-datasets manipulacao-arquivos

UpdatedFeb 12, 2022
Python

diem-ai /google-bigquery

Series of SQL exercise working with databases, using Google BigQuery to scale to massive datasets taught by educators in Kaggle.com

python bigquery sql analytics kaggle massive-datasets

UpdatedJul 9, 2019
Jupyter Notebook

rajeshidumalla /node2vec

Building node2vec algorithm

python data-science machine-learning numpy pandas data-analysis matplotlib massive-datasets node2vec networkx-graph

UpdatedOct 7, 2021
Jupyter Notebook

manuparra /hadoop-statistics

Calculate statistical measures of one column in big data Datasets with these simply Hadoop Application

java hadoop bigdata max avg min standardeviation massive-datasets

UpdatedFeb 24, 2017
Java

arhcoder /Netflix-Recommendation

📺 Content Recommendation System for the Netflix Prize Challenge with Collaborative Filtering.

python jupyter-notebook collaborative-filtering netflix recommendation-system recommendation-engine recommender-system massive-datasets netflix-prize massive-data

UpdatedFeb 17, 2024
Jupyter Notebook

Geocoding

StefanoBalbo /Geocoding

Automated massive geolocator of addresses with parallel processing.

python docker geocoding osm geospatial geolocation ssh-server jupyter-notebook nominatim jupyterlab spatial-analysis massively-parallel geopandas osm-data geopy massive-datasets massive nominatim-docker micromamba

UpdatedMar 20, 2025
Jupyter Notebook

rajeshidumalla /Wordcount-in-Spark

word count in Spark

python spark python-library pandas wordcount massive-datasets

UpdatedOct 6, 2021
Jupyter Notebook

theveryhim /Massive-text-processing

Use of pyspark framework on processing/analysis of enormous text data

big-data pyspark data-analysis frequent-itemsets massive-datasets text-preprocessing

UpdatedJul 4, 2025
Jupyter Notebook

KolwaBrad /massivedataset

Training the MASSIVE dataset by Amazon(english-US, German-DE and Swahili-KE)

python massive-datasets

UpdatedOct 2, 2023
Python

simkarwin /mimo_keras

TF-Package: Multiple-Input Multiple-Output Keras Data-Generator for massive and complex datasets

massive-datasets keras-datagenerator mimo-models

UpdatedJan 2, 2023
Python

Improve this page

Add a description, image, and links to themassive-datasets topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with themassive-datasets topic, visit your repo's landing page and select "manage topics."

[8]ページ先頭

©2009-2025 Movatter.jp