Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

content-extraction

Here are 163 public repositories matching this topic...

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

  • UpdatedFeb 20, 2026
  • JavaScript
reader

Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire web, clean markdown, ready for your agents.

  • UpdatedFeb 2, 2026
  • TypeScript

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

  • UpdatedMay 19, 2025
  • HTML

A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content extraction into your MCP environment, enabling AI assistants to search the web and extract webpage content programmatically.

  • UpdatedFeb 13, 2026
  • JavaScript

Readability2 converts HTML to plain text.

  • UpdatedDec 12, 2018
  • TypeScript

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

  • UpdatedDec 8, 2023
  • TypeScript

Pure ruby implementation of the Boilerpipe content extraction algorithm tuned for online articles

  • UpdatedFeb 21, 2021
  • Ruby

DOM Based Content Extraction via Text Density

  • UpdatedSep 23, 2025
  • Rust

Web content extraction using machine learning

  • UpdatedMar 3, 2021
  • HTML

🔍 Model Context Protocol (MCP) tool for parsing websites using the Jina.ai Reader

  • UpdatedApr 5, 2025
  • JavaScript

Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more

  • UpdatedJan 15, 2019
  • Python

Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...

  • UpdatedFeb 20, 2026
  • C++

Benson turns a list of URLs into mp3s of the contents of each web page - take control over your reading backlog!

  • UpdatedOct 30, 2024
  • Python

This repository houses a Python application for extracting YouTube video transcripts and summarizing its content.

  • UpdatedSep 29, 2023
  • Python

Extract meaningful content from the chaos of a web page

  • UpdatedFeb 20, 2026
  • JavaScript

Via Text Density Simple Web Crawler With Go

  • UpdatedMar 19, 2023
  • Go

Seize is light Node or Browser web-page content extractor inspired by arc90 readability and Safari Reader

  • UpdatedMay 20, 2017
  • HTML

📸 Crawell – 网页图片/正文一键提取、Markdown 转换与批量下载的浏览器扩展,本地化,免费 Crawell browser extension for one-click image & article extraction, Markdown conversion and bulk download – 100 % local processing.

  • UpdatedJul 31, 2025
  • TypeScript

Improve this page

Add a description, image, and links to thecontent-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thecontent-extraction topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2026 Movatter.jp