content-extraction
Here are 163 public repositories matching this topic...
Language:All
Sort:Most stars
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
- Updated
Feb 20, 2026 - JavaScript
Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire web, clean markdown, ready for your agents.
- Updated
Feb 2, 2026 - TypeScript
Model Context Protocol (MCP) Server for Graphlit Platform
- Updated
Jan 12, 2026 - TypeScript
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
- Updated
May 19, 2025 - HTML
A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content extraction into your MCP environment, enabling AI assistants to search the web and extract webpage content programmatically.
- Updated
Feb 13, 2026 - JavaScript
Readability2 converts HTML to plain text.
- Updated
Dec 12, 2018 - TypeScript
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
- Updated
Dec 8, 2023 - TypeScript
Pure ruby implementation of the Boilerpipe content extraction algorithm tuned for online articles
- Updated
Feb 21, 2021 - Ruby
DOM Based Content Extraction via Text Density
- Updated
Sep 23, 2025 - Rust
Web content extraction using machine learning
- Updated
Mar 3, 2021 - HTML
🔍 Model Context Protocol (MCP) tool for parsing websites using the Jina.ai Reader
- Updated
Apr 5, 2025 - JavaScript
Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more
- Updated
Jan 15, 2019 - Python
Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...
- Updated
Feb 20, 2026 - C++
Benson turns a list of URLs into mp3s of the contents of each web page - take control over your reading backlog!
- Updated
Oct 30, 2024 - Python
This repository houses a Python application for extracting YouTube video transcripts and summarizing its content.
- Updated
Sep 29, 2023 - Python
A userscript that adds a button to YouTube video pages for copying the transcript with or without timestamps.
- Updated
Jan 25, 2026 - JavaScript
Extract meaningful content from the chaos of a web page
- Updated
Feb 20, 2026 - JavaScript
Via Text Density Simple Web Crawler With Go
- Updated
Mar 19, 2023 - Go
Seize is light Node or Browser web-page content extractor inspired by arc90 readability and Safari Reader
- Updated
May 20, 2017 - HTML
📸 Crawell – 网页图片/正文一键提取、Markdown 转换与批量下载的浏览器扩展,本地化,免费 Crawell browser extension for one-click image & article extraction, Markdown conversion and bulk download – 100 % local processing.
- Updated
Jul 31, 2025 - TypeScript
Improve this page
Add a description, image, and links to thecontent-extraction topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thecontent-extraction topic, visit your repo's landing page and select "manage topics."