pdf-processing
Here are 70 public repositories matching this topic...
Language:All
Sort:Most stars
Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.
- Updated
Jul 21, 2023 - TypeScript
library supporting NLP and CV research on scientific papers
- Updated
Nov 8, 2024 - Python
Multiple and Large PDF Documents Text Extraction.
- Updated
Feb 10, 2025 - Python
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
- Updated
Oct 25, 2021 - Python
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.
- Updated
Oct 23, 2023 - Python
A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...
- Updated
Oct 31, 2023 - JavaScript
Built with pdf-actions NPM package.
- Updated
May 27, 2024 - JavaScript
LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.
- Updated
Jul 13, 2024 - Python
AI-powered RAG-based tool for summarizing, extracting insights, and answering questions about research papers with high accuracy
- Updated
Mar 20, 2025 - HTML
An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.
- Updated
Mar 15, 2025 - Python
The Document Summarizer leverages Hugging Face’s facebook/bart-large-cnn model to transform lengthy documents into concise summaries. Built with ReactJS (Vite) for the frontend and Flask for the backend, it supports PDF and text files, offering real-time summarization for researchers, students, and professionals.
- Updated
Dec 7, 2024 - JavaScript
A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing
- Updated
Sep 18, 2022 - Python
PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.
- Updated
Feb 3, 2025 - Python
This is some useful mini projects that I had worked for self-learning Python programming.
- Updated
May 20, 2024 - Python
A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.
- Updated
Jan 6, 2025 - Python
This project implements a Retrieval Augmented Generation (RAG) system that answers questions based on the PDF document. It utilizes Weaviate as a vector database for efficient retrieval of relevant information and Gemini to generate natural language responses.
- Updated
Jan 12, 2025 - Jupyter Notebook
A modern, intelligent invoice processing system with advanced multi-format data extraction capabilities. Process invoices from PDFs, Excel files, and images with smart data recognition.
- Updated
Nov 23, 2024 - JavaScript
A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.
- Updated
Oct 31, 2024 - Python
A statistical data display and notifier app for Covid-19 pandemic.
- Updated
May 15, 2022 - Kotlin
Opinionated and Sophisticated Document Region Analyzer.
- Updated
Mar 4, 2025 - Python
Improve this page
Add a description, image, and links to thepdf-processing topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thepdf-processing topic, visit your repo's landing page and select "manage topics."