pdf-to-markdown
Here are 42 public repositories matching this topic...
Language:All
Sort:Most stars
Knowledge Agents and Management in the Cloud
- Updated
Dec 12, 2025 - TypeScript
E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.
- Updated
Sep 8, 2024 - Jupyter Notebook
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.
- Updated
Oct 31, 2025 - Python
Safe, Open, High-Performance — PDF for AI
- Updated
Dec 17, 2025 - Java
Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is Ideal for large-scale workflows, it offers text/table extraction, OCR, and batch processing with sync/async endpoints.
- Updated
Mar 4, 2025 - Python
Parse PDFs into markdown using Vision LLMs
- Updated
Oct 4, 2025 - Python
A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.
- Updated
Jul 5, 2025 - Python
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.
- Updated
Nov 22, 2024 - Python
Smart PDF to Markdown converter with intelligent heading detection, automatic header/footer removal, orphan fragment merging, and image export. Features a user-friendly GUI with preview mode, persistent settings, and per-page error recovery. Optimized for Obsidian and other Markdown-based note-taking workflows.
- Updated
Dec 1, 2025 - Python
smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.
- Updated
Nov 14, 2025 - Python
URL to Markdown API is a service that convert web content into clean, structured Markdown format through a simple HTTP GET request. It's built using FastAPI and the MarkItDown library, offering a straightforward way to convert various content types (web pages, YouTube videos, PDFs, documents) into Markdown that's optimized for Large Language Mod
- Updated
Oct 25, 2025 - Python
Serving files for hungry LLMs
- Updated
Jun 3, 2025 - Python
Quick way to convert files (PDF, DOCX, HTML, PPTX, Images) to (MD, JSON, YAML) using Docling and Streamlit
- Updated
Jul 9, 2025 - Python
Turn a supported list of filetypes (e.g. .docx) into a markdown structured text file. Also optionally defangs indicators and extract texts from images. Built for threat intel use-cases.
- Updated
Dec 16, 2025 - Python
RAG-Ingest: A tool for converting PDFs to markdown and indexing them for enhanced Retrieval Augmented Generation (RAG) capabilities.
- Updated
Nov 22, 2024 - Python
⚡ Pen2PDF Suite – an all-in-one 🚀 productivity platform ✨ with 🤖 AI-powered text extraction (PDF/Images → Markdown 📝), 📅 smart timetable management (CSV/Excel import 📊), ✅ todo lists with subtasks📈, 🧠 AI-generated notes library 📚 and 💬 Isabella AI assistant (OpenAI/Microsoft/llama/Mistral/LongCat/Gemini models 🔄)for context-aware help 🧩.
- Updated
Nov 26, 2025 - JavaScript
Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.
- Updated
Aug 14, 2025 - Python
Bidirectional Markdown↔PDF converter with AI-powered vision. MD→PDF with beautiful themes, PDF→MD with LLaVA - open source & privacy-first
- Updated
Nov 5, 2025 - TypeScript
LangParse is a universal document parsing and text chunking engine for LLM or Agent applications — Documents In, Knowledge Out.
- Updated
Nov 23, 2025 - Python
Improve this page
Add a description, image, and links to thepdf-to-markdown topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thepdf-to-markdown topic, visit your repo's landing page and select "manage topics."