pdf-processing

Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion

python pdf-converter pdf-generation pdf-document-processor ocr-python pdf-processing

UpdatedJul 3, 2025
Python

Govind-S-B /pdf-to-text-chroma-search

Star23

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.

text-extraction similarity-search pdf-processing vector-embeddings chromadb

UpdatedOct 23, 2023
Python

ManasMadan /pdf-actions

Star14

A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...

react javascript pdf npm reactjs react-component pdf-merge pdf-split pdf-rotate pdf-merger pdf-downloader pdf-lib pdf-splitter pdf-processing pdf-download pdf-free pdf-online

UpdatedOct 31, 2023
JavaScript

ranguy9304 /LangGraphRAG

Star11

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

python natural-language-processing information-retrieval chatbot web-scraping nlp-machine-learning rag terminal-application pdf-processing vector-database openai-api langgraph

UpdatedJul 13, 2024
Python

ManasMadan /PDFActions

Star7

Built with pdf-actions NPM package.

react pdf reactjs react-component react-components pdf-merge pdf-split pdf-rotate pdf-merger pdf-downloader pdf-lib pdf-splitter pdf-processing pdf-download

UpdatedMay 27, 2024
JavaScript

Inc44 /MaTools

Sponsor

Star6

An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.

python rust productivity application gui qt ocr image-processing video-processing speech-recognition youtube-downloader file-management audio-processing pdf-processing code-formatting

UpdatedMar 15, 2025
Python

enesmanan /paper-bold

Star6

AI-powered RAG-based tool for summarizing, extracting insights, and answering questions about research papers with high accuracy

academic-paper gemini-api rag pdf-processing academic-research langchain

UpdatedMar 20, 2025
HTML

DioCrafts /ai-book-summarizer

Star4

📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

python markdown pdf machine-learning natural-language-processing automation ai text-analysis openai text-summarization document-analysis study-materials pymupdf knowledge-extraction pdf-processing book-summary educational-tools pdf-summarization ai-powered-tools

UpdatedJan 2, 2025
Python

allanninal /document-summarizer

Star4

The Document Summarizer leverages Hugging Face’s facebook/bart-large-cnn model to transform lengthy documents into concise summaries. Built with ReactJS (Vite) for the frontend and Flask for the backend, it supports PDF and text files, offering real-time summarization for researchers, students, and professionals.

nlp flask reactjs text-summarization vite huggingface pdf-processing document-summarizer ai-tools open-source-cods

UpdatedDec 7, 2024
JavaScript

Remy2404 /Polymind

Star4

Polymind is a powerful multi-modal Telegram bot built with Gemini, DeepSeek, OpenRouter, and over 50 cutting-edge AI models. It offers seamless conversational intelligence, Mermaid diagram rendering, PDF/DOCX analysis, image generation, and collaborative tools—all in a single bot interface.

telegram-bot voice image-processing voice-recognition gemini multi-model pdf-processing ai-assistant openrouter mermiad deepseek-r1

UpdatedJul 15, 2025
Python

Aleptonic /PdfSnipper

Star3

PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.

utilities pdf-processing nlp-tools

UpdatedFeb 3, 2025
Python

Yardenrsk /PsychometryReceiverCV

Star3

A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing

pandas opencv-python pdf-processing

UpdatedSep 18, 2022
Python

thinhuos0913 /python_useful_mini_projects

Star3

This is some useful mini projects that I had worked for self-learning Python programming.

python opencv ocr image-processing pdf-processing

UpdatedMay 20, 2024
Python

gwyndolin75 /Document-QA-System

Star3

A Streamlit-based app for asking questions directly from uploaded documents using Gemini embeddings and a language model. Supports PDF, TXT, and DOCX files. Fast, simple, and powerful document-based QA.

nodejs python transformers text-analysis embeddings question-answering pdf-generation gemini-api rag pdf-processing streamlit astradb langchain-python chromadb

UpdatedJul 18, 2025
Jupyter Notebook

setuc /pdf-annotation-with-azure-doc-intel

Star2

Azure Document Intelligence Result Processor: A toolset for annotating PDFs based on Azure Document Intelligence analysis results, featuring a React web application and a standalone Python script for processing and visualizing extracted data with confidence indicators.

react javascript python vite pdf-annotation pdf-processing confidence-scores form-recognizer azure-document-intelligence

UpdatedMar 12, 2025
JavaScript

Improve this page

Add a description, image, and links to thepdf-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thepdf-processing topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-processing

Here are 129 public repositories matching this topic...

dissorial /doc-chatbot

allenai /papermage

ahmedkhemiri95 /PDFs-TextExtract

postralai /masquerade

aws-samples /document-processing-pipeline-for-regulated-industries

PSPDFKit /nutrient-dws-client-python

Govind-S-B /pdf-to-text-chroma-search

ManasMadan /pdf-actions

ranguy9304 /LangGraphRAG

ManasMadan /PDFActions

Inc44 /MaTools

enesmanan /paper-bold

DioCrafts /ai-book-summarizer

allanninal /document-summarizer

Remy2404 /Polymind

Aleptonic /PdfSnipper

Yardenrsk /PsychometryReceiverCV

thinhuos0913 /python_useful_mini_projects

gwyndolin75 /Document-QA-System

setuc /pdf-annotation-with-azure-doc-intel

Improve this page

Add this topic to your repo