pdf-parsing

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is Ideal for large-scale workflows, it offers text/table extraction, OCR, and batch processing with sync/async endpoints.

api markdown-parser pdf-converter pdf-conversion pdf-parsing pdf-parser fastapi pdf-chatbot pdf-to-markdown

UpdatedMar 4, 2025
Python

jstockwin /py-pdf-parser

Star417

A Python tool to help extracting information from structured PDFs.

pdf parsing pdf-parsing py-pdf-parser

UpdatedOct 6, 2025
Python

chunyenHuang /hummusRecipe

Star350

A powerful PDF tool for NodeJS based on HummusJS.

nodejs pdf pdf-files pdf-generation pdf-manipulation pdf-parsing pdf-modification overlay-pdf

UpdatedApr 18, 2023
JavaScript

thoqbk /traprange

Star335

(Java)A Method to Extract Tabular Content from PDF Files

java pdf parser pdfbox pdf-files pdf-manipulation pdf-parsing

UpdatedApr 22, 2023
HTML

ck-unifr /pdf_parsing

Star209

PDF解析（文字，章节，表格，图片，参考），基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答，摘要，信息抽取

python pdf information-extraction pdf-parsing streamlit llm rwkv langchain chatpdf chatglm2-6b

UpdatedOct 17, 2023
Python

ScientaNL /pdf-extractor

Star102

Node.js module for rendering pdf pages to images, svgs, html files, text files and json metadata

nodejs image-generation pdfjs html-generation pdf-parsing

UpdatedMay 16, 2023
JavaScript

iamarunbrahma /pdf-to-markdown

Star95

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

python information-retrieval document-conversion pdf-converter text-extraction pdf-parsing document-processing rag pdf-extraction retrieval-augmented-generation pdf-to-markdown

UpdatedNov 22, 2024
Python

rostrovsky /pdf-table

Star80

Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV

opencv table pdfbox java8 java-library tables pdf-parsing opencv3

UpdatedMay 9, 2023
Java

hellpanderrr /linkedin-pdf-parsing

Star68

Parsing resumes in a PDF format from linkedIn

python linkedin resume-parser pdf-parsing

UpdatedSep 30, 2016
Python

tuffstuff9 /nextjs-pdf-parser

Star64

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

nextjs content-extraction pdf-parsing react-pdf pdf-parser pdf2json filepond pdf-upload pdf-parse nextjs-pdf-parser nextjs-pdf react-pdf-parser nextjs-pdf-parse nextjs-pdf-parsing

UpdatedDec 8, 2023
TypeScript

dipietrantonio /pdf4py

Star57

A PDF parser written in Python 3 with no external dependencies.

python pdf parser information-extraction pdf-parsing

UpdatedMay 28, 2020
Python

abdullahshafiq-20 /ResumeTex

Star37

ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaTeX syntax.

nodejs resume open-source tex automation express latex reactjs developer-tools job-application pdf-parsing document-processing tailwindcss pdf-to-latex google-generative-ai ai-resume-generator resume-converter

UpdatedSep 3, 2025
JavaScript

DQ-Zhang /refchaser

Star22

Written in python, for checking reference lists in systematic reviews and literature reviews, helps with reference list searching both backward&forward by extracting references and creating search queries, ranks articles by relevance to improve screening efficiency, download full-text pdf of research articles in batch.

text-mining systematic-literature-reviews research-paper bibliographic-references pdf-parsing systematic-reviews pdf-downloader literature-review scihub cermine evidence-based-medicine citation-managment-tool

UpdatedJun 8, 2020
Python

adrienjoly /npm-pdfreader-example

Star16

Example of use of pdfreader: parse a PDF résumé

example pdf-parsing

UpdatedMay 1, 2022
JavaScript

malice-plugins /pdf

Star16

Malice PDF Plugin

plugin docker pdf malware malware-analyzer malware-analysis malice pdf-parsing pdfid peepdf malice-plugin pdf-malware pdf-analyzer

UpdatedJan 7, 2019
Python

aimaster-dev /chatbot-using-rag-and-langchain

Star13

Chat with your PDFs using AI! This Streamlit app uses RAG, LangChain, FAISS, and OpenAI to let you ask questions and get answers with page and file references.

python nlp pdf ai chatbot embeddings openai context-aware semantic-search chat-ui pdf-parsing file-metadata document-search faiss rag streamlit llm llms langchain vector-store

UpdatedMay 29, 2025
Python

Improve this page

Add a description, image, and links to thepdf-parsing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thepdf-parsing topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly