Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

NotificationsYou must be signed in to change notification settings

PromtEngineer/localGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PromtEngineer%2FlocalGPT | Trendshift

GitHub StarsGitHub ForksGitHub IssuesGitHub Pull RequestsPython 3.8+LicenseDocker

Follow on XJoin our Discord

🚀 What is LocalGPT?

LocalGPT is afully private, on-premise Document Intelligence platform. Ask questions, summarise, and uncover insights from your files with state-of-the-art AI—no data ever leaves your machine.

More than a traditional RAG (Retrieval-Augmented Generation) tool, LocalGPT features ahybrid search engine that blends semantic similarity, keyword matching, andLate Chunking for long-context precision. Asmart router automatically selects between RAG and direct LLM answering for every query, whilecontextual enrichment and sentence-levelContext Pruning surface only the most relevant content. An independentverification pass adds an extra layer of accuracy.

The architecture ismodular and lightweight—enable only the components you need. With a pure-Python core and minimal dependencies, LocalGPT is simple to deploy, run, and maintain on any infrastructure.The system has minimal dependencies on frameworks and libraries, making it easy to deploy and maintain. The RAG system is pure python and does not require any additional dependencies.

▶️ Video

Watch thisvideo to get started with LocalGPT.

HomeCreate IndexChat

✨ Features

  • Utmost Privacy: Your data remains on your computer, ensuring 100% security.
  • Versatile Model Support: Seamlessly integrate a variety of open-source models via Ollama.
  • Diverse Embeddings: Choose from a range of open-source embeddings.
  • Reuse Your LLM: Once downloaded, reuse your LLM without the need for repeated downloads.
  • Chat History: Remembers your previous conversations (in a session).
  • API: LocalGPT has an API that you can use for building RAG Applications.
  • GPU, CPU, HPU & MPS Support: Supports multiple platforms out of the box, Chat with your data usingCUDA,CPU,HPU (Intel® Gaudi®) orMPS and more!

📖 Document Processing

  • Multi-format Support: PDF, DOCX, TXT, Markdown, and more (Currently only PDF is supported)
  • Contextual Enrichment: Enhanced document understanding with AI-generated context, inspired byContextual Retrieval
  • Batch Processing: Handle multiple documents simultaneously

🤖 AI-Powered Chat

  • Natural Language Queries: Ask questions in plain English
  • Source Attribution: Every answer includes document references
  • Smart Routing: Automatically chooses between RAG and direct LLM responses
  • Query Decomposition: Breaks complex queries into sub-questions for better answers
  • Semantic Caching: TTL-based caching with similarity matching for faster responses
  • Session-Aware History: Maintains conversation context across interactions
  • Answer Verification: Independent verification pass for accuracy
  • Multiple AI Models: Ollama for inference, HuggingFace for embeddings and reranking

🛠️ Developer-Friendly

  • RESTful APIs: Complete API access for integration
  • Real-time Progress: Live updates during document processing
  • Flexible Configuration: Customize models, chunk sizes, and search parameters
  • Extensible Architecture: Plugin system for custom components

🎨 Modern Interface

  • Intuitive Web UI: Clean, responsive design
  • Session Management: Organize conversations by topic
  • Index Management: Easy document collection management
  • Real-time Chat: Streaming responses for immediate feedback

🚀 Quick Start

Note: The installation is currently only tested on macOS.

Prerequisites

  • Python 3.8 or higher (tested with Python 3.11.5)
  • Node.js 16+ and npm (tested with Node.js 23.10.0, npm 10.9.2)
  • Docker (optional, for containerized deployment)
  • 8GB+ RAM (16GB+ recommended)
  • Ollama (required for both deployment approaches)

NOTE

Before this brach is moved to the main branch, please clone this branch for instalation:

git clone -b localgpt-v2 https://github.com/PromtEngineer/localGPT.gitcd localGPT

Option 1: Docker Deployment

# Clone the repositorygit clone https://github.com/PromtEngineer/localGPT.gitcd localGPT# Install Ollama locally (required even for Docker)curl -fsSL https://ollama.ai/install.sh| shollama pull qwen3:0.6bollama pull qwen3:8b# Start Ollamaollama serve# Start with Docker (in a new terminal)./start-docker.sh# Access the applicationopen http://localhost:3000

Docker Management Commands:

# Check container statusdocker compose ps# View logsdocker compose logs -f# Stop containers./start-docker.sh stop

Option 2: Direct Development (Recommended for Development)

# Clone the repositorygit clone https://github.com/PromtEngineer/localGPT.gitcd localGPT# Install Python dependenciespip install -r requirements.txt# Key dependencies installed:# - torch==2.4.1, transformers==4.51.0 (AI models)# - lancedb (vector database)# - rank_bm25, fuzzywuzzy (search algorithms)# - sentence_transformers, rerankers (embedding/reranking)# - docling (document processing)# - colpali-engine (multimodal processing - support coming soon)# Install Node.js dependenciesnpm install# Install and start Ollamacurl -fsSL https://ollama.ai/install.sh| shollama pull qwen3:0.6bollama pull qwen3:8bollama serve# Start the system (in a new terminal)python run_system.py# Access the applicationopen http://localhost:3000

System Management:

# Check system health (comprehensive diagnostics)python system_health_check.py# Check service status and healthpython run_system.py --health# Start in production modepython run_system.py --mode prod# Skip frontend (backend + RAG API only)python run_system.py --no-frontend# View aggregated logspython run_system.py --logs-only# Stop all servicespython run_system.py --stop# Or press Ctrl+C in the terminal running python run_system.py

Service Architecture:Therun_system.py launcher manages four key services:

  • Ollama Server (port 11434): AI model serving
  • RAG API Server (port 8001): Document processing and retrieval
  • Backend Server (port 8000): Session management and API endpoints
  • Frontend Server (port 3000): React/Next.js web interface

Option 3: Manual Component Startup

# Terminal 1: Start Ollamaollama serve# Terminal 2: Start RAG APIpython -m rag_system.api_server# Terminal 3: Start Backendcd backend&& python server.py# Terminal 4: Start Frontendnpm run dev# Access at http://localhost:3000

Detailed Installation

1. Install System Dependencies

Ubuntu/Debian:

sudo apt updatesudo apt install python3.8 python3-pip nodejs npm docker.io docker-compose

macOS:

brew install python@3.8 node npm docker docker-compose

Windows:

# Install Python 3.8+, Node.js, and Docker Desktop# Then use PowerShell or WSL2

2. Install AI Models

Install Ollama (Recommended):

# Install Ollamacurl -fsSL https://ollama.ai/install.sh| sh# Pull recommended modelsollama pull qwen3:0.6b# Fast generation modelollama pull qwen3:8b# High-quality generation model

3. Configure Environment

# Copy environment templatecp .env.example .env# Edit configurationnano .env

Key Configuration Options:

# AI Models (referenced in rag_system/main.py)OLLAMA_HOST=http://localhost:11434# Database Paths (used by backend and RAG system)DATABASE_PATH=./backend/chat_data.dbVECTOR_DB_PATH=./lancedb# Server Settings (used by run_system.py)BACKEND_PORT=8000FRONTEND_PORT=3000RAG_API_PORT=8001# Optional: Override default modelsGENERATION_MODEL=qwen3:8bENRICHMENT_MODEL=qwen3:0.6bEMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6BRERANKER_MODEL=answerdotai/answerai-colbert-small-v1

4. Initialize the System

# Run system health checkpython system_health_check.py# Initialize databasespython -c"from backend.database import ChatDatabase; ChatDatabase().init_database()"# Test installationpython -c"from rag_system.main import get_agent; print('✅ Installation successful!')"# Validate complete setuppython run_system.py --health

🎯 Getting Started

1. Create Your First Index

Anindex is a collection of processed documents that you can chat with.

Using the Web Interface:

  1. Openhttp://localhost:3000
  2. Click "Create New Index"
  3. Upload your documents (PDF, DOCX, TXT)
  4. Configure processing options
  5. Click "Build Index"

Using Scripts:

# Simple script approach./simple_create_index.sh"My Documents""path/to/document.pdf"# Interactive scriptpython create_index_script.py

Using API:

# Create indexcurl -X POST http://localhost:8000/indexes \  -H"Content-Type: application/json" \  -d'{"name": "My Index", "description": "My documents"}'# Upload documentscurl -X POST http://localhost:8000/indexes/INDEX_ID/upload \  -F"files=@document.pdf"# Build indexcurl -X POST http://localhost:8000/indexes/INDEX_ID/build

2. Start Chatting

Once your index is built:

  1. Create a Chat Session: Click "New Chat" or use an existing session
  2. Select Your Index: Choose which document collection to query
  3. Ask Questions: Type natural language questions about your documents
  4. Get Answers: Receive AI-generated responses with source citations

3. Advanced Features

Custom Model Configuration

# Use different models for different taskscurl -X POST http://localhost:8000/sessions \  -H"Content-Type: application/json" \  -d'{    "title": "High Quality Session",    "model": "qwen3:8b",    "embedding_model": "Qwen/Qwen3-Embedding-4B"  }'

Batch Document Processing

# Process multiple documents at oncepython demo_batch_indexing.py --config batch_indexing_config.json

API Integration

importrequests# Chat with your documents via APIresponse=requests.post('http://localhost:8000/chat',json={'query':'What are the key findings in the research papers?','session_id':'your-session-id','search_type':'hybrid','retrieval_k':20})print(response.json()['response'])

🔧 Configuration

Model Configuration

LocalGPT supports multiple AI model providers with centralized configuration:

Ollama Models (Local Inference)

OLLAMA_CONFIG= {"host":"http://localhost:11434","generation_model":"qwen3:8b",# Main text generation"enrichment_model":"qwen3:0.6b"# Lightweight routing/enrichment}

External Models (HuggingFace Direct)

EXTERNAL_MODELS= {"embedding_model":"Qwen/Qwen3-Embedding-0.6B",# 1024 dimensions"reranker_model":"answerdotai/answerai-colbert-small-v1",# ColBERT reranker"fallback_reranker":"BAAI/bge-reranker-base"# Backup reranker}

Pipeline Configuration

LocalGPT offers two main pipeline configurations:

Default Pipeline (Production-Ready)

"default": {"description":"Production-ready pipeline with hybrid search, AI reranking, and verification","storage": {"lancedb_uri":"./lancedb","text_table_name":"text_pages_v3","bm25_path":"./index_store/bm25"    },"retrieval": {"retriever":"multivector","search_type":"hybrid","late_chunking": {"enabled":True},"dense": {"enabled":True,"weight":0.7},"bm25": {"enabled":True}    },"reranker": {"enabled":True,"type":"ai","strategy":"rerankers-lib","model_name":"answerdotai/answerai-colbert-small-v1","top_k":10    },"query_decomposition": {"enabled":True,"max_sub_queries":3},"verification": {"enabled":True},"retrieval_k":20,"contextual_enricher": {"enabled":True,"window_size":1}}

Fast Pipeline (Speed-Optimized)

"fast": {"description":"Speed-optimized pipeline with minimal overhead","retrieval": {"search_type":"vector_only","late_chunking": {"enabled":False}    },"reranker": {"enabled":False},"query_decomposition": {"enabled":False},"verification": {"enabled":False},"retrieval_k":10,"contextual_enricher": {"enabled":False}}

Search Configuration

SEARCH_CONFIG= {'hybrid': {'dense_weight':0.7,'sparse_weight':0.3,'retrieval_k':20,'reranker_top_k':10    }}

🛠️ Troubleshooting

Common Issues

Installation Problems

# Check Python versionpython --version# Should be 3.8+# Check dependenciespip list| grep -E"(torch|transformers|lancedb)"# Reinstall dependenciespip install -r requirements.txt --force-reinstall

Model Loading Issues

# Check Ollama statusollama listcurl http://localhost:11434/api/tags# Pull missing modelsollama pull qwen3:0.6b

Database Issues

# Check database connectivitypython -c"from backend.database import ChatDatabase; db = ChatDatabase(); print('✅ Database OK')"# Reset database (WARNING: This deletes all data)rm backend/chat_data.dbpython -c"from backend.database import ChatDatabase; ChatDatabase().init_database()"

Performance Issues

# Check system resourcespython system_health_check.py# Monitor memory usagehtop# or Task Manager on Windows# Optimize for low-memory systemsexport PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

Getting Help

  1. Check Logs: The system creates structured logs in thelogs/ directory:

    • logs/system.log: Main system events and errors
    • logs/ollama.log: Ollama server logs
    • logs/rag-api.log: RAG API processing logs
    • logs/backend.log: Backend server logs
    • logs/frontend.log: Frontend build and runtime logs
  2. System Health: Run comprehensive diagnostics:

    python system_health_check.py# Full system diagnosticspython run_system.py --health# Service status check
  3. Health Endpoints: Check individual service health:

    • Backend:http://localhost:8000/health
    • RAG API:http://localhost:8001/health
    • Ollama:http://localhost:11434/api/tags
  4. Documentation: Check theTechnical Documentation

  5. GitHub Issues: Report bugs and request features

  6. Community: Join our Discord/Slack community


🔗 API Reference

Core Endpoints

Chat API

# Session-based chat (recommended)POST /sessions/{session_id}/chatContent-Type: application/json{"query":"What are the main topics discussed?","search_type":"hybrid","retrieval_k":20,"ai_rerank":true,"context_window_size":5}# Legacy chat endpointPOST /chatContent-Type: application/json{"query":"What are the main topics discussed?","session_id":"uuid","search_type":"hybrid","retrieval_k":20}

Index Management

# Create indexPOST /indexesContent-Type: application/json{"name":"My Index","description":"Description","config":"default"}# Get all indexesGET /indexes# Get specific indexGET /indexes/{id}# Upload documents to indexPOST /indexes/{id}/uploadContent-Type: multipart/form-datafiles: [file1.pdf, file2.pdf, ...]# Build index (process uploaded documents)POST /indexes/{id}/buildContent-Type: application/json{"config_mode":"default","enable_enrich":true,"chunk_size":512}# Delete indexDELETE /indexes/{id}

Session Management

# Create sessionPOST /sessionsContent-Type: application/json{"title":"My Session","model":"qwen3:0.6b"}# Get all sessionsGET /sessions# Get specific sessionGET /sessions/{session_id}# Get session documentsGET /sessions/{session_id}/documents# Get session indexesGET /sessions/{session_id}/indexes# Link index to sessionPOST /sessions/{session_id}/indexes/{index_id}# Delete sessionDELETE /sessions/{session_id}# Rename sessionPOST /sessions/{session_id}/renameContent-Type: application/json{"new_title":"Updated Session Name"}

Advanced Features

Query Decomposition

The system can break complex queries into sub-questions for better answers:

POST /sessions/{session_id}/chatContent-Type: application/json{"query":"Compare the methodologies and analyze their effectiveness","query_decompose":true,"compose_sub_answers":true}

Answer Verification

Independent verification pass for accuracy using a separate verification model:

POST /sessions/{session_id}/chatContent-Type: application/json{"query":"What are the key findings?","verify":true}

Contextual Enrichment

Document context enrichment during indexing for better understanding:

# Enable during index buildingPOST /indexes/{id}/build{"enable_enrich": true,"window_size": 2}

Late Chunking

Better context preservation by chunking after embedding:

# Configure in pipeline"late_chunking": {"enabled": true}

Streaming Chat

POST /chat/streamContent-Type: application/json{"query":"Explain the methodology","session_id":"uuid","stream":true}

Batch Processing

# Using the batch indexing scriptpython demo_batch_indexing.py --config batch_indexing_config.json# Example batch configuration (batch_indexing_config.json):{"index_name":"Sample Batch Index","index_description":"Example batch index configuration","documents": ["./rag_system/documents/invoice_1039.pdf","./rag_system/documents/invoice_1041.pdf"  ],"processing": {"chunk_size": 512,"chunk_overlap": 64,"enable_enrich": true,"enable_latechunk": true,"enable_docling": true,"embedding_model":"Qwen/Qwen3-Embedding-0.6B","generation_model":"qwen3:0.6b","retrieval_mode":"hybrid","window_size": 2  }}
# API endpoint for batch processingPOST /batch/indexContent-Type: application/json{"file_paths": ["doc1.pdf","doc2.pdf"],"config": {"chunk_size":512,"enable_enrich":true,"enable_latechunk":true,"enable_docling":true  }}

For complete API documentation, seeAPI_REFERENCE.md.


🏗️ Architecture

LocalGPT is built with a modular, scalable architecture:

graph TB    UI[Web Interface] --> API[Backend API]    API --> Agent[RAG Agent]    Agent --> Retrieval[Retrieval Pipeline]    Agent --> Generation[Generation Pipeline]    Retrieval --> Vector[Vector Search]    Retrieval --> BM25[BM25 Search]    Retrieval --> Rerank[Reranking]    Vector --> LanceDB[(LanceDB)]    BM25 --> BM25DB[(BM25 Index)]    Generation --> Ollama[Ollama Models]    Generation --> HF[Hugging Face Models]    API --> SQLite[(SQLite DB)]
Loading

Overview of the Retrieval Agent

graph TD    classDef llmcall fill:#e6f3ff,stroke:#007bff;    classDef pipeline fill:#e6ffe6,stroke:#28a745;    classDef cache fill:#fff3e0,stroke:#fd7e14;    classDef logic fill:#f8f9fa,stroke:#6c757d;    classDef thread stroke-dasharray: 5 5;    A(Start: Agent.run) --> B_asyncio.run(_run_async);    B --> C{_run_async};    C --> C1[Get Chat History];    C1 --> T1[Build Triage Prompt <br/> Query + Doc Overviews ];    T1 --> T2["(asyncio.to_thread)<br/>LLM Triage: RAG or LLM_DIRECT?"]; class T2 llmcall,thread;    T2 --> T3{Decision?};    T3 -- RAG --> RAG_Path;    T3 -- LLM_DIRECT --> LLM_Path;    subgraph RAG Path        RAG_Path --> R1[Format Query + History];        R1 --> R2["(asyncio.to_thread)<br/>Generate Query Embedding"]; class R2 pipeline,thread;        R2 --> R3{{Check Semantic Cache}}; class R3 cache;        R3 -- Hit --> R_Cache_Hit(Return Cached Result);        R_Cache_Hit --> R_Hist_Update;        R3 -- Miss --> R4{Decomposition <br/> Enabled?};        R4 -- Yes --> R5["(asyncio.to_thread)<br/>Decompose Raw Query"]; class R5 llmcall,thread;        R5 --> R6{{Run Sub-Queries <br/> Parallel RAG Pipeline}}; class R6 pipeline,thread;        R6 --> R7[Collect Results & Docs];        R7 --> R8["(asyncio.to_thread)<br/>Compose Final Answer"]; class R8 llmcall,thread;        R8 --> V1(RAG Answer);        R4 -- No --> R9["(asyncio.to_thread)<br/>Run Single Query <br/>(RAG Pipeline)"]; class R9 pipeline,thread;        R9 --> V1;        V1 --> V2{{Verification <br/> await verify_async}}; class V2 llmcall;        V2 --> V3(Final RAG Result);        V3 --> R_Cache_Store{{Store in Semantic Cache}}; class R_Cache_Store cache;        R_Cache_Store --> FinalResult;    end    subgraph Direct LLM Path        LLM_Path --> L1[Format Query + History];        L1 --> L2["(asyncio.to_thread)<br/>Generate Direct LLM Answer <br/> (No RAG)"]; class L2 llmcall,thread;        L2 --> FinalResult(Final Direct Result);    end    FinalResult --> R_Hist_Update(Update Chat History);    R_Hist_Update --> ZZZ(End: Return Result);
Loading

🤝 Contributing

We welcome contributions from developers of all skill levels! LocalGPT is an open-source project that benefits from community involvement.

🚀 Quick Start for Contributors

# Fork and clone the repositorygit clone https://github.com/PromtEngineer/localGPT.gitcd localGPT# Set up development environmentpip install -r requirements.txtnpm install# Install Ollama and modelscurl -fsSL https://ollama.ai/install.sh| shollama pull qwen3:0.6b qwen3:8b# Verify setuppython system_health_check.pypython run_system.py --mode dev

📋 How to Contribute

  1. 🐛 Report Bugs: Use ourbug report template
  2. 💡 Request Features: Use ourfeature request template
  3. 🔧 Submit Code: Follow ourdevelopment workflow
  4. 📚 Improve Docs: Help make our documentation better

📖 Detailed Guidelines

For comprehensive contributing guidelines, including:

  • Development setup and workflow
  • Coding standards and best practices
  • Testing requirements
  • Documentation standards
  • Release process

👉 See ourCONTRIBUTING.md guide


📄 License

This project is licensed under the MIT License - see theLICENSE file for details. For models, please check their respective licenses.


📞 Support


Star History

Star History Chart

About

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp