training-data
Here are 233 public repositories matching this topic...
Language:All
Sort:Most stars
A system for quickly generating training data with weak supervision
- Updated
May 2, 2024 - Python
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
- Updated
Nov 18, 2024 - Python
Synthetic data generators for tabular and time-series data
- Updated
Dec 15, 2025 - Jupyter Notebook
skweak: A software toolkit for weak supervision applied to NLP tasks
- Updated
Sep 2, 2024 - Python
Computer vision based ML training data generation tool 🚀
- Updated
Feb 15, 2025 - JavaScript
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
- Updated
Mar 31, 2025 - Python
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
- Updated
Jul 20, 2025 - Python
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
- Updated
Dec 8, 2025 - Python
JavaScript in-page GUI agent. Control web interfaces with natural language.
- Updated
Dec 17, 2025 - TypeScript
Web application for image labeling and segmentation
- Updated
Dec 9, 2022 - JavaScript
🏖TagEditor - Annotation tool for spaCy
- Updated
Sep 23, 2022
A lightweight web application for brushing labels onto time series data; useful for building training sets.
- Updated
Mar 4, 2023 - JavaScript
Augmenty is an augmentation library based on spaCy for augmenting texts.
- Updated
May 24, 2024 - Python
Aubo i5 Dual Arm Collaborative Robot - RealSense D435 - 3D Object Pose Estimation - ROS
- Updated
Jun 22, 2022 - C++
Natural Language Data Augmentation Tool for Conversational Systems
- Updated
Dec 26, 2022 - Python
Generating training data from the Carla driving simulator in the KITTI dataset format
- Updated
May 21, 2019 - Python
Collection of casual conversations that can be used with the Rasa Stack
- Updated
May 25, 2020 - Python
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
- Updated
Nov 13, 2023
Convert all files in git repository to .txt files. Useful for training LLMs on your codebase.
- Updated
Dec 7, 2024 - Python
COVID-19 Coughs files for training AI models
- Updated
Oct 13, 2020 - Python
Improve this page
Add a description, image, and links to thetraining-data topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thetraining-data topic, visit your repo's landing page and select "manage topics."