unstructured-data
Here are 209 public repositories matching this topic...
Language:All
Sort:Most stars
🦉 Data Versioning and ML Experiments
- Updated
Dec 16, 2025 - Python
Refine high-quality datasets and visual AI models
- Updated
Dec 17, 2025 - Python
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
- Updated
Dec 17, 2025 - Python
Neo4j graph construction from unstructured data using LLMs
- Updated
Dec 17, 2025 - Jupyter Notebook
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
- Updated
Oct 18, 2024 - Python
A system for agentic LLM-powered data processing and ETL
- Updated
Nov 29, 2025 - Python
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.
- Updated
Dec 15, 2025 - Jupyter Notebook
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
- Updated
Dec 5, 2025 - Python
Nomic Developer API SDK
- Updated
Nov 11, 2025 - Python
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
- Updated
Aug 25, 2025 - Python
ContextGem: Effortless LLM extraction from documents
- Updated
Nov 16, 2025 - Python
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
- Updated
Dec 17, 2025 - Java
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
- Updated
Dec 21, 2024 - Rust
AI-Powered Data Processing: Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, accurate query processing, that's as simple as writing Pandas code
- Updated
Dec 11, 2025 - Python
Get clean data from tricky documents, powered by vision-language models ⚡
- Updated
Oct 18, 2025 - Python
A curated list of resources for Document Understanding (DU) topic
- Updated
Jun 2, 2023
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
- Updated
Dec 8, 2025 - TypeScript
Interactively explore unstructured datasets from your dataframe.
- Updated
Dec 17, 2025 - TypeScript
Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
- Updated
Dec 17, 2025 - Python
Curate better data for LLMs
- Updated
Mar 19, 2024 - Python
Improve this page
Add a description, image, and links to theunstructured-data topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theunstructured-data topic, visit your repo's landing page and select "manage topics."