data-processing
Here are 1,542 public repositories matching this topic...
Language:All
Sort:Most stars
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
- Updated
Jul 18, 2025 - Python
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
- Updated
Apr 3, 2025
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
- Updated
Jul 11, 2025 - Go
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
- Updated
Jul 10, 2025 - Go
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
- Updated
Jul 14, 2025 - C++
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
- Updated
Jul 18, 2025 - Python
A lightweight data processing framework built on DuckDB and 3FS.
- Updated
Mar 5, 2025 - Python
A light-weight, flexible, and expressive statistical data testing library
- Updated
Jul 16, 2025 - Python
Concurrent and multi-stage data ingestion and data processing with Elixir
- Updated
Jun 6, 2025 - Elixir
Large-scale pretraining for dialogue
- Updated
Oct 17, 2022 - Python
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project:http://casl-project.ai/
- Updated
Aug 26, 2021 - Python
Data transformation framework for AI. Ultra performant, with incremental processing.
- Updated
Jul 18, 2025 - Rust
Kubernetes-native platform to run massively parallel data/streaming jobs
- Updated
Jul 18, 2025 - Go
Python Stream Processing
- Updated
Mar 27, 2025 - Python
Extract Transform Load for Python 3.5+
- Updated
May 12, 2023 - Python
Concurrent Python made simple
- Updated
Feb 4, 2025 - Python
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
- Updated
Mar 22, 2025 - Jupyter Notebook
Data and tools for generating and inspecting OLMo pre-training data.
- Updated
Jul 15, 2025 - Python
Scalable data pre processing and curation toolkit for LLMs
- Updated
Jul 18, 2025 - Python
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
- Updated
Jul 18, 2025 - Python
Improve this page
Add a description, image, and links to thedata-processing topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-processing topic, visit your repo's landing page and select "manage topics."