extraction
Here are 590 public repositories matching this topic...
Language:All
Sort:Most stars
Transforms PDF, Documents and Images into Enriched Structured Data
- Updated
Dec 3, 2023 - JavaScript
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
- Updated
Nov 5, 2025 - Python
extract internal monitoring data from application logs for collection in a timeseries database
- Updated
Oct 29, 2025 - Go
a library for audio and music analysis
- Updated
May 12, 2025 - C
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
- Updated
Nov 5, 2025 - Java
Visual Novels resource browser
- Updated
Jul 8, 2024 - C#
Provides functions to read and write from/to an object or array using a simple string notation
- Updated
Oct 28, 2025 - PHP
Extract files from any kind of container formats
- Updated
Nov 3, 2025 - Python
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
- Updated
Aug 25, 2025 - Python
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
- Updated
Oct 5, 2022 - HTML
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
- Updated
Apr 14, 2025 - Python
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
- Updated
Dec 21, 2024 - Rust
🦜⛏️ Did you say you like data?
- Updated
Oct 14, 2025 - Rich Text Format
A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
- Updated
Oct 4, 2025 - C++
A program to extract files from the RPA archive format.
- Updated
Jun 27, 2022 - Python
Stanford Open Information Extraction made simple!
- Updated
Jan 11, 2024 - Python
北京航空航天大学大数据高精尖中心自然语言处理研究团队对信息抽取领域的调研。包括实体识别,关系抽取,属性抽取等子任务,每类子任务分别对学术界和工业界进行调研。
- Updated
Apr 29, 2022
DataTool is a program that lets you extract models, maps, and files from Overwatch.
- Updated
Oct 29, 2025 - C#
File Injector is a script that allows you to store any file in an image using steganography
- Updated
Nov 18, 2022 - Python
PHP URI Template (RFC 6570) supports both URI expansion & extraction
- Updated
Nov 27, 2024 - PHP
Improve this page
Add a description, image, and links to theextraction topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theextraction topic, visit your repo's landing page and select "manage topics."