Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

massive-datasets

Here are 27 public repositories matching this topic...

PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.

  • UpdatedAug 27, 2025
  • Java
heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

  • UpdatedNov 28, 2025
  • Python

PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.

  • UpdatedAug 29, 2025
  • Makefile

Command line tool to quickly generate a lot of files in a lot of directories

  • UpdatedFeb 18, 2022
  • C++

Building a Bloom Filter on English dictionary words

  • UpdatedOct 7, 2021
  • Jupyter Notebook

The project is based on the analysis of the "IBM Transactions for Anti Money Laundering" dataset published on Kaggle. The task is to implement a model which predicts whether or not a transaction is illicit, using the attribute "Is Laundering" as a label to be predicted.

  • UpdatedAug 12, 2024
  • Jupyter Notebook

Building PageRank algorithm on Web Graph around Stanford.edu using NetworkX python library

  • UpdatedOct 7, 2021
  • Jupyter Notebook

gipa -- compression/decompression tool to package compress and encode massive archive files with floating-point data

  • UpdatedSep 14, 2017
  • Python

Scalable, chunk-wise K-anonymization tool based on the Optimal Lattice Anonymization (OLA) algorithm. It is designed to handle large datasets by processing them in manageable chunks, ensuring data privacy while maintaining utility.

  • UpdatedNov 11, 2025
  • Python

This repository contains a LaTeX file that generates a PDF document comprising comprehensive notes for the course "Algorithms for Massive Datasets"

  • UpdatedAug 12, 2024
  • TeX

Permite abrir e manipular arquivos massivos de texto/dados cujo seria impossivel abrir em um computador, por exemplo um arquivo de texto de +20gb, permite manipular o arquivo pegando apenas as linhas necessárias sem travar o computador por falta de memória.

  • UpdatedFeb 12, 2022
  • Python

Series of SQL exercise working with databases, using Google BigQuery to scale to massive datasets taught by educators in Kaggle.com

  • UpdatedJul 9, 2019
  • Jupyter Notebook

Calculate statistical measures of one column in big data Datasets with these simply Hadoop Application

  • UpdatedFeb 24, 2017
  • Java

📺 Content Recommendation System for the Netflix Prize Challenge with Collaborative Filtering.

  • UpdatedFeb 17, 2024
  • Jupyter Notebook
Geocoding

word count in Spark

  • UpdatedOct 6, 2021
  • Jupyter Notebook

Use of pyspark framework on processing/analysis of enormous text data

  • UpdatedJul 4, 2025
  • Jupyter Notebook

Training the MASSIVE dataset by Amazon(english-US, German-DE and Swahili-KE)

  • UpdatedOct 2, 2023
  • Python

TF-Package: Multiple-Input Multiple-Output Keras Data-Generator for massive and complex datasets

  • UpdatedJan 2, 2023
  • Python

Improve this page

Add a description, image, and links to themassive-datasets topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with themassive-datasets topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp