data-preparation
Here are 433 public repositories matching this topic...
Language:All
Sort:Most stars
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
- Updated
Dec 2, 2024 - Python
Machine learning with dataframes
- Updated
Dec 16, 2025 - Python
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
- Updated
Dec 8, 2025 - TypeScript
Scalable data pre processing and curation toolkit for LLMs
- Updated
Dec 18, 2025 - Python
Open source project for data preparation for GenAI applications
- Updated
Dec 12, 2025 - HTML
Data Preparation for Satellite Machine Learning
- Updated
Oct 3, 2023 - Python
Continuously updated paper list on advancements in Data Agents. Companion repo to our paper "A Survey of Data Agents: Emerging Paradigm or Overstated Hype?"
- Updated
Dec 16, 2025 - Python
A New, Interactive Approach to Learning Data Science
- Updated
Dec 15, 2025 - Jupyter Notebook
An open source book to learn data science, data analysis and machine learning, suitable for all ages!
- Updated
Apr 14, 2024 - TeX
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
- Updated
Jul 15, 2023 - Vue
【AAAI'2021】MVFNet: Multi-View Fusion Network for Efficient Video Recognition
- Updated
Mar 28, 2022 - Python
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
- Updated
Nov 21, 2021 - Python
ABAP unit testing framework, prepare in Excel, reuse in abap code
- Updated
Aug 13, 2025 - ABAP
Go web crawler to scrape documentation sites and convert content to clean Markdown for LLM ingestion (RAG, training data).
- Updated
Jul 22, 2025 - Go
This repository contains my implementations of the algorithms which MoNuSAC participants could use for data preparation to train their models at ISBI 2020.
- Updated
Dec 2, 2021 - Jupyter Notebook
Accelerating AI Training and Inference from Storage Perspective (Must-read Papers on Storage for AI)
- Updated
Dec 17, 2025
“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.
- Updated
Apr 15, 2020 - Jupyter Notebook
Market Mix Modelling for an eCommerce firm to estimate the impact of various marketing levers on sales
- Updated
Apr 6, 2021 - R
GWAS summary statistics files QC tool
- Updated
Dec 24, 2024 - Python
Data preparation for data science projects.
- Updated
Sep 2, 2025 - R
Improve this page
Add a description, image, and links to thedata-preparation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-preparation topic, visit your repo's landing page and select "manage topics."