content-extraction
Here are 146 public repositories matching this topic...
Language:All
Sort:Most stars
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
- Updated
Nov 22, 2025 - JavaScript
Model Context Protocol (MCP) Server for Graphlit Platform
- Updated
Dec 5, 2025 - TypeScript
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
- Updated
May 19, 2025 - HTML
Readability2 converts HTML to plain text.
- Updated
Dec 12, 2018 - TypeScript
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
- Updated
Dec 8, 2023 - TypeScript
Pure ruby implementation of the Boilerpipe content extraction algorithm tuned for online articles
- Updated
Feb 21, 2021 - Ruby
DOM Based Content Extraction via Text Density
- Updated
Sep 23, 2025 - Rust
Web content extraction using machine learning
- Updated
Mar 3, 2021 - HTML
🔍 Model Context Protocol (MCP) tool for parsing websites using the Jina.ai Reader
- Updated
Apr 5, 2025 - JavaScript
Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more
- Updated
Jan 15, 2019 - Python
Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...
- Updated
Oct 7, 2025 - C++
This repository houses a Python application for extracting YouTube video transcripts and summarizing its content.
- Updated
Sep 29, 2023 - Python
Benson turns a list of URLs into mp3s of the contents of each web page - take control over your reading backlog!
- Updated
Oct 30, 2024 - Python
Via Text Density Simple Web Crawler With Go
- Updated
Mar 19, 2023 - Go
Seize is light Node or Browser web-page content extractor inspired by arc90 readability and Safari Reader
- Updated
May 20, 2017 - HTML
📸 Crawell – 网页图片/正文一键提取、Markdown 转换与批量下载的浏览器扩展,本地化,免费 Crawell browser extension for one-click image & article extraction, Markdown conversion and bulk download – 100 % local processing.
- Updated
Jul 31, 2025 - TypeScript
A userscript that adds a button to YouTube video pages for copying the transcript with or without timestamps.
- Updated
Oct 11, 2025 - JavaScript
The Ultimate Web Content Extraction & Conversion Tool for AI/LLM Applications. Convert almost any web content into clean Markdown with intelligent AI processing.
- Updated
Oct 8, 2025 - TypeScript
Chrome extension to copy YouTube transcripts with AI-friendly features
- Updated
Aug 6, 2025 - JavaScript
Mobile First Indexing Tool
- Updated
Sep 10, 2025 - Python
Improve this page
Add a description, image, and links to thecontent-extraction topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thecontent-extraction topic, visit your repo's landing page and select "manage topics."