niyeldeii/DeepTutorPublic

forked fromHKUDS/DeepTutor

NotificationsYou must be signed in to change notification settings
Fork0
Star0

"DeepTutor: AI-Powered Personalized Learning Assistant"

hkuds.github.io/DeepTutor

License

AGPL-3.0 license

0 stars 1.4k forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
assets		assets
config		config
data		data
docs		docs
scripts		scripts
src		src
tests/agents/solve/utils		tests/agents/solve/utils
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
Communication.md		Communication.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Repository files navigation

DeepTutor: AI-Powered Personalized Learning Assistant

Quick Start ·Core Modules ·FAQ

🇨🇳 中文 ·🇯🇵 日本語 ·🇪🇸 Español ·🇫🇷 Français ·🇸🇦 العربية ·🇷🇺 Русский ·🇮🇳 हिन्दी ·🇵🇹 Português

📚Massive Document Knowledge Q&A • 🎨Interactive Learning Visualization
🎯Knowledge Reinforcement • 🔍Deep Research & Idea Generation

[2026.1.3] Released DeepTutorv0.2.0 - thanks to all the contributors! ❤️

[2026.1.1] Happy New Year! Join ourGitHub Discussions - shape the future of DeepTutor! 💬

[2025.12.30] Visit ourOfficial Website for more details!

[2025.12.29] DeepTutor v0.1 is now live! ✨

Key Features of DeepTutor

📚 Massive Document Knowledge Q&A

•Smart Knowledge Base: Upload textbooks, research papers, technical manuals, and domain-specific documents. Build a comprehensive AI-powered knowledge repository for instant access.
•Multi-Agent Problem Solving: Dual-loop reasoning architecture with RAG, web search, and code execution -- delivering step-by-step solutions with precise citations.

🎨 Interactive Learning Visualization

•Knowledge Simplification & Explanations: Transform complex concepts, knowledge, and algorithms into easy-to-understand visual aids, detailed step-by-step breakdowns, and engaging interactive demonstrations.
•Personalized Q&A: Context-aware conversations that adapt to your learning progress, with interactive pages and session-based knowledge tracking.

🎯 Knowledge Reinforcement with Practice Exercise Generator

•Intelligent Exercise Creation: Generate targeted quizzes, practice problems, and customized assessments tailored to your current knowledge level and specific learning objectives.
•Authentic Exam Simulation: Upload reference exams to generate practice questions that perfectly match the original style, format, and difficulty—giving you realistic preparation for the actual test.

🔍 Deep Research & Idea Generation

•Comprehensive Research & Literature Review: Conduct in-depth topic exploration with systematic analysis. Identify patterns, connect related concepts across disciplines, and synthesize existing research findings.
•Novel Insight Discovery: Generate structured learning materials and uncover knowledge gaps. Identify promising new research directions through intelligent cross-domain knowledge synthesis.

📚 Massive Document Knowledge Q&A

_{Multi-agent Problem Solving with Exact Citations}

🎨 Interactive Learning Visualization

_{Step-by-step Visual Explanations with Personal QAs.}

🎯 Knowledge Reinforcement

Custom Questions
_{Auto-Validated Practice Questions Generation}

Mimic Questions
_{Clone Exam Style for Authentic Practice}

🔍 Deep Research & Idea Generation

Deep Research
_{Knowledge Extention from Textbook with RAG, Web and Paper-search}

Automated IdeaGen
_{Systematic Brainstorming and Concept Synthesis with Dual-filter Workflow}

Interactive IdeaGen
_{RAG and Web-search Powered Co-writer with Podcast Generation}

🏗️ All-in-One Knowledge System

Personal Knowledge Base
_{Build and Organize Your Own Knowledge Repository}

Personal Notebook
_{Your Contextual Memory for Learning Sessions}

_{🌙 Use DeepTutor inDark Mode!}

🏛️ DeepTutor's Framework

💬 User Interface Layer

•Intuitive Interaction: Simple bidirectional query-response flow for intuitive interaction.
•Structured Output: Structured response generation that organizes complex information into actionable outputs.

🤖 Intelligent Agent Modules

•Problem Solving & Assessment: Step-by-step problem solving and custom assessment generation.
•Research & Learning: Deep Research for topic exploration and Guided Learning with visualization.
•Idea Generation: Automated and interactive concept development with multi-source insights.

🔧 Tool Integration Layer

•Information Retrieval: RAG hybrid retrieval, real-time web search, and academic paper databases.
•Processing & Analysis: Python code execution, query item lookup, and PDF parsing for document analysis.

🧠 Knowledge & Memory Foundation

•Knowledge Graph: Entity-relation mapping for semantic connections and knowledge discovery.
•Vector Store: Embedding-based semantic search for intelligent content retrieval.
•Memory System: Session state management and citation tracking for contextual continuity.

📋 Todo

🌟 Star to follow our future updates!

Support Local LLM Services (e.g., ollama)
Refactor RAG Module (seeDiscussions)
Deep-coding from idea generation
Personalized Interaction with Notebook

🚀 Getting Started

Step 1: Pre-Configuration

① Clone Repository

git clone https://github.com/HKUDS/DeepTutor.gitcd DeepTutor

② Set Up Environment Variables

cp .env.example .env# Edit .env file with your API keys

📋Environment Variables Reference

Variable	Required	Description
`LLM_MODEL`	Yes	Model name (e.g.,`gpt-4o`)
`LLM_BINDING_API_KEY`	Yes	Your LLM API key
`LLM_BINDING_HOST`	Yes	API endpoint URL
`EMBEDDING_MODEL`	Yes	Embedding model name
`EMBEDDING_BINDING_API_KEY`	Yes	Embedding API key
`EMBEDDING_BINDING_HOST`	Yes	Embedding API endpoint
`BACKEND_PORT`	No	Backend port (default:`8001`)
`FRONTEND_PORT`	No	Frontend port (default:`3782`)
`TTS_*`	No	Text-to-Speech settings
`PERPLEXITY_API_KEY`	No	For web search

③ Configure Ports & LLM(Optional)

Ports: Editconfig/main.yaml →server.backend_port /server.frontend_port
LLM: Editconfig/agents.yaml →temperature /max_tokens per module
SeeConfiguration Docs for details

④ Try Demo Knowledge Bases(Optional)

📚Available Demos

Research Papers — 5 papers from our lab (AI-Researcher,LightRAG, etc.)
Data Science Textbook — 8 chapters, 296 pages (Book Link)

Download fromGoogle Drive
Extract intodata/ directory

Demo KBs usetext-embedding-3-large withdimensions = 3072

⑤ Create Your Own Knowledge Base(After Launch)

Go tohttp://localhost:3782/knowledge
Click "New Knowledge Base" → Enter name → Upload PDF/TXT/MD files
Monitor progress in terminal

Step 2: Choose Your Installation Method

🐳 Docker Deployment

Recommended — No Python/Node.js setup

Prerequisites:Docker &Docker Compose

Quick Start:

# Build and start (~5-10 min first run)docker compose up --build -d# View logsdocker compose logs -f

Commands:

docker compose up -d# Startdocker compose logs -f# Logsdocker compose down# Stopdocker compose up --build# Rebuild

Dev Mode: Add-f docker-compose.dev.yml

Advanced:

# Build custom imagedocker build -t deeptutor:latest.# Run standalonedocker run -p 8001:8001 -p 3782:3782 \  --env-file .env deeptutor:latest

💻 Manual Installation

For development or non-Docker environments

Prerequisites: Python 3.10+, Node.js 18+

Set Up Environment:

# Using conda (Recommended)conda create -n deeptutor python=3.10conda activate deeptutor# Or using venvpython -m venv venvsource venv/bin/activate

Install Dependencies:

bash scripts/install_all.sh# Or manually:pip install -r requirements.txtnpm install --prefix web

Launch:

# Start web interfacepython scripts/start_web.py# Or CLI onlypython scripts/start.py# Stop: Ctrl+C

Access URLs

Service	URL	Description
Frontend	http://localhost:3782	Main web interface
API Docs	http://localhost:8001/docs	Interactive API documentation

📂 Data Storage

All user content and system data are stored in thedata/ directory:

data/├── knowledge_bases/              # Knowledge base storage└── user/                         # User activity data    ├── solve/                    # Problem solving results and artifacts    ├── question/                 # Generated questions    ├── research/                 # Research reports and cache    ├── co-writer/                # Interactive IdeaGen documents and audio files    ├── notebook/                 # Notebook records and metadata    ├── guide/                    # Guided learning sessions    ├── logs/                     # System logs    └── run_code_workspace/       # Code execution workspace

Results are automatically saved during all activities. Directories are created automatically as needed.

📦 Core Modules

🧠 Smart Solver

Architecture Diagram

Intelligent problem-solving system based onAnalysis Loop + Solve Loop dual-loop architecture, supporting multi-mode reasoning and dynamic knowledge retrieval.

Core Features

Feature	Description
Dual-Loop Architecture	Analysis Loop: InvestigateAgent → NoteAgent Solve Loop: PlanAgent → ManagerAgent → SolveAgent → CheckAgent → Format
Multi-Agent Collaboration	Specialized agents: InvestigateAgent, NoteAgent, PlanAgent, ManagerAgent, SolveAgent, CheckAgent
Real-time Streaming	WebSocket transmission with live reasoning process display
Tool Integration	RAG (naive/hybrid), Web Search, Query Item, Code Execution
Persistent Memory	JSON-based memory files for context preservation
Citation Management	Structured citations with reference tracking

Usage

Visithttp://localhost:{frontend_port}/solver
Select a knowledge base
Enter your question, click "Solve"
Watch the real-time reasoning process and final answer

Python API

importasynciofromsrc.agents.solveimportMainSolverasyncdefmain():solver=MainSolver(kb_name="ai_textbook")result=awaitsolver.solve(question="Calculate the linear convolution of x=[1,2,3] and h=[4,5]",mode="auto"    )print(result['formatted_solution'])asyncio.run(main())

Output Location

data/user/solve/solve_YYYYMMDD_HHMMSS/├── investigate_memory.json    # Analysis Loop memory├── solve_chain.json           # Solve Loop steps & tool records├── citation_memory.json       # Citation management├── final_answer.md            # Final solution (Markdown)├── performance_report.json    # Performance monitoring└── artifacts/                 # Code execution outputs

📝 Question Generator

Architecture Diagram

Dual-mode question generation system supportingcustom knowledge-based generation andreference exam paper mimicking with automatic validation.

Core Features

Feature	Description
Custom Mode	Background Knowledge →Question Planning →Generation →Single-Pass Validation Analyzes question relevance without rejection logic
Mimic Mode	PDF Upload →MinerU Parsing →Question Extraction →Style Mimicking Generates questions based on reference exam structure
ReAct Engine	QuestionGenerationAgent with autonomous decision-making (think → act → observe)
Validation Analysis	Single-pass relevance analysis with`kb_coverage` and`extension_points`
Question Types	Multiple choice, fill-in-the-blank, calculation, written response, etc.
Batch Generation	Parallel processing with progress tracking
Complete Persistence	All intermediate files saved (background knowledge, plan, individual results)
Timestamped Output	Mimic mode creates batch folders:`mimic_YYYYMMDD_HHMMSS_{pdf_name}/`

Usage

Custom Mode:

Visithttp://localhost:{frontend_port}/question
Fill in requirements (topic, difficulty, question type, count)
Click "Generate Questions"
View generated questions with validation reports

Mimic Mode:

Visithttp://localhost:{frontend_port}/question
Switch to "Mimic Exam" tab
Upload PDF or provide parsed exam directory
Wait for parsing → extraction → generation
View generated questions alongside original references

Python API

Custom Mode - Full Pipeline:

importasynciofromsrc.agents.questionimportAgentCoordinatorasyncdefmain():coordinator=AgentCoordinator(kb_name="ai_textbook",output_dir="data/user/question"    )# Generate multiple questions from text requirementresult=awaitcoordinator.generate_questions_custom(requirement_text="Generate 3 medium-difficulty questions about deep learning basics",difficulty="medium",question_type="choice",count=3    )print(f"✅ Generated{result['completed']}/{result['requested']} questions")forqinresult['results']:print(f"- Relevance:{q['validation']['relevance']}")asyncio.run(main())

Mimic Mode - PDF Upload:

fromsrc.agents.question.tools.exam_mimicimportmimic_exam_questionsresult=awaitmimic_exam_questions(pdf_path="exams/midterm.pdf",kb_name="calculus",output_dir="data/user/question/mimic_papers",max_questions=5)print(f"✅ Generated{result['successful_generations']} questions")print(f"Output:{result['output_file']}")

Output Location

Custom Mode:

data/user/question/custom_YYYYMMDD_HHMMSS/├── background_knowledge.json      # RAG retrieval results├── question_plan.json              # Question planning├── question_1_result.json          # Individual question results├── question_2_result.json└── ...

Mimic Mode:

data/user/question/mimic_papers/└── mimic_YYYYMMDD_HHMMSS_{pdf_name}/    ├── {pdf_name}.pdf                              # Original PDF    ├── auto/{pdf_name}.md                          # MinerU parsed markdown    ├── {pdf_name}_YYYYMMDD_HHMMSS_questions.json  # Extracted questions    └── {pdf_name}_YYYYMMDD_HHMMSS_generated_questions.json  # Generated questions

🎓 Guided Learning

Architecture Diagram

Personalized learning system based on notebook content, automatically generating progressive learning paths through interactive pages and smart Q&A.

Core Features

Feature	Description
Multi-Agent Architecture	LocateAgent: Identifies 3-5 progressive knowledge points InteractiveAgent: Converts to visual HTML pages ChatAgent: Provides contextual Q&A SummaryAgent: Generates learning summaries
Smart Knowledge Location	Automatic analysis of notebook content
Interactive Pages	HTML page generation with bug fixing
Smart Q&A	Context-aware answers with explanations
Progress Tracking	Real-time status with session persistence
Cross-Notebook Support	Select records from multiple notebooks

Usage Flow

Select Notebook(s) — Choose one or multiple notebooks (cross-notebook selection supported)
Generate Learning Plan — LocateAgent identifies 3-5 core knowledge points
Start Learning — InteractiveAgent generates HTML visualization
Learning Interaction — Ask questions, click "Next" to proceed
Complete Learning — SummaryAgent generates learning summary

Output Location

data/user/guide/└── session_{session_id}.json    # Complete session state, knowledge points, chat history

✏️ Interactive IdeaGen (Co-Writer)

Architecture Diagram

Intelligent Markdown editor supporting AI-assisted writing, auto-annotation, and TTS narration.

Core Features

Feature	Description
Rich Text Editing	Full Markdown syntax support with live preview
EditAgent	Rewrite: Custom instructions with optional RAG/web context Shorten: Compress while preserving key information Expand: Add details and context
Auto-Annotation	Automatic key content identification and marking
NarratorAgent	Script generation, TTS audio, multiple voices (Cherry, Stella, Annie, Cally, Eva, Bella)
Context Enhancement	Optional RAG or web search for additional context
Multi-Format Export	Markdown, PDF, etc.

Usage

Visithttp://localhost:{frontend_port}/co_writer
Enter or paste text in the editor
Use AI features: Rewrite, Shorten, Expand, Auto Mark, Narrate
Export to Markdown or PDF

Output Location

data/user/co-writer/├── audio/                    # TTS audio files│   └── {operation_id}.mp3├── tool_calls/               # Tool call history│   └── {operation_id}_{tool_type}.json└── history.json              # Edit history

🔬 Deep Research

Architecture Diagram

DR-in-KG (Deep Research in Knowledge Graph) — A systematic deep research system based onDynamic Topic Queue architecture, enabling multi-agent collaboration across three phases:Planning → Researching → Reporting.

Core Features

Feature	Description
Three-Phase Architecture	Phase 1 (Planning): RephraseAgent (topic optimization) + DecomposeAgent (subtopic decomposition) Phase 2 (Researching): ManagerAgent (queue scheduling) + ResearchAgent (research decisions) + NoteAgent (info compression) Phase 3 (Reporting): Deduplication → Three-level outline generation → Report writing with citations
Dynamic Topic Queue	Core scheduling system with TopicBlock state management:`PENDING → RESEARCHING → COMPLETED/FAILED`. Supports dynamic topic discovery during research
Execution Modes	Series Mode: Sequential topic processing Parallel Mode: Concurrent multi-topic processing with`AsyncCitationManagerWrapper` for thread-safe operations
Multi-Tool Integration	RAG (hybrid/naive),Query Item (entity lookup),Paper Search,Web Search,Code Execution — dynamically selected by ResearchAgent
Unified Citation System	Centralized CitationManager as single source of truth for citation ID generation, ref_number mapping, and deduplication
Preset Configurations	quick: Fast research (1-2 subtopics, 1-2 iterations) medium/standard: Balanced depth (5 subtopics, 4 iterations) deep: Thorough research (8 subtopics, 7 iterations) auto: Agent autonomously decides depth

Citation System Architecture

The citation system follows a centralized design with CitationManager as the single source of truth:

┌─────────────────────────────────────────────────────────────────┐│                      CitationManager                            ││  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  ││  │  ID Generation  │  │  ref_number Map │  │   Deduplication │  ││  │  PLAN-XX        │  │  citation_id →  │  │   (papers only) │  ││  │  CIT-X-XX       │  │  ref_number     │  │                 │  ││  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘  │└───────────┼────────────────────┼────────────────────┼───────────┘            │                    │                    │     ┌──────┴──────┐      ┌──────┴──────┐      ┌──────┴──────┐     │DecomposeAgent│      │ReportingAgent│      │ References │     │ ResearchAgent│      │ (inline [N]) │      │  Section   │     │  NoteAgent   │      └─────────────┘      └────────────┘     └─────────────┘

Component	Description
ID Format	PLAN-XX (planning stage RAG queries) +CIT-X-XX (research stage, X=block number)
ref_number Mapping	Sequential 1-based numbers built from sorted citation IDs, with paper deduplication
Inline Citations	Simple`[N]` format in LLM output, post-processed to clickable`[[N]](#ref-N)` links
Citation Table	Clear reference table provided to LLM:`Cite as [1] → (RAG) query preview...`
Post-processing	Automatic format conversion + validation to remove invalid citation references
Parallel Safety	Thread-safe async methods (`get_next_citation_id_async`,`add_citation_async`) for concurrent execution

Parallel Execution Architecture

Whenexecution_mode: "parallel" is enabled, multiple topic blocks are researched concurrently:

┌─────────────────────────────────────────────────────────────────────────┐│                    Parallel Research Execution                          │├─────────────────────────────────────────────────────────────────────────┤│                                                                         ││   DynamicTopicQueue                    AsyncCitationManagerWrapper      ││   ┌─────────────────┐                  ┌─────────────────────────┐      ││   │ Topic 1 (PENDING)│ ──┐             │  Thread-safe wrapper    │      ││   │ Topic 2 (PENDING)│ ──┼──→ asyncio  │  for CitationManager    │      ││   │ Topic 3 (PENDING)│ ──┤   Semaphore │                         │      ││   │ Topic 4 (PENDING)│ ──┤   (max=5)   │  • get_next_citation_   │      ││   │ Topic 5 (PENDING)│ ──┘             │    id_async()           │      ││   └─────────────────┘                  │  • add_citation_async() │      ││            │                           └───────────┬─────────────┘      ││            ▼                                       │                    ││   ┌─────────────────────────────────────────────────────────────┐      ││   │              Concurrent ResearchAgent Tasks                  │      ││   │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐        │      ││   │  │ Task 1  │  │ Task 2  │  │ Task 3  │  │ Task 4  │  ...   │      ││   │  │(Topic 1)│  │(Topic 2)│  │(Topic 3)│  │(Topic 4)│        │      ││   │  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘        │      ││   │       │            │            │            │              │      ││   │       └────────────┴────────────┴────────────┘              │      ││   │                         │                                    │      ││   │                         ▼                                    │      ││   │              AsyncManagerAgentWrapper                        │      ││   │              (Thread-safe queue updates)                     │      ││   └─────────────────────────────────────────────────────────────┘      ││                                                                         │└─────────────────────────────────────────────────────────────────────────┘

Component	Description
`asyncio.Semaphore`	Limits concurrent tasks to`max_parallel_topics` (default: 5)
`AsyncCitationManagerWrapper`	Wraps CitationManager with`asyncio.Lock()` for thread-safe ID generation
`AsyncManagerAgentWrapper`	Ensures queue state updates are atomic across parallel tasks
Real-time Progress	Live display of all active research tasks with status indicators

Agent Responsibilities

Agent	Phase	Responsibility
RephraseAgent	Planning	Optimizes user input topic, supports multi-turn user interaction for refinement
DecomposeAgent	Planning	Decomposes topic into subtopics with RAG context, obtains citation IDs from CitationManager
ManagerAgent	Researching	Queue state management, task scheduling, dynamic topic addition
ResearchAgent	Researching	Knowledge sufficiency check, query planning, tool selection, requests citation IDs before each tool call
NoteAgent	Researching	Compresses raw tool outputs into summaries, creates ToolTraces with pre-assigned citation IDs
ReportingAgent	Reporting	Builds citation map, generates three-level outline, writes report sections with citation tables, post-processes citations

Report Generation Pipeline

1. Build Citation Map     →  CitationManager.build_ref_number_map()2. Generate Outline       →  Three-level headings (H1 → H2 → H3)3. Write Sections         →  LLM uses [N] citations with provided citation table4. Post-process           →  Convert [N] → [[N]](#ref-N), validate references5. Generate References    →  Academic-style entries with collapsible source details

Usage

Visithttp://localhost:{frontend_port}/research
Enter research topic
Select research mode (quick/medium/deep/auto)
Watch real-time progress with parallel/series execution
View structured report with clickable inline citations
Export as Markdown or PDF (with proper page splitting and Mermaid diagram support)

CLI

# Quick mode (fast research)python src/agents/research/main.py --topic"Deep Learning Basics" --preset quick# Medium mode (balanced)python src/agents/research/main.py --topic"Transformer Architecture" --preset medium# Deep mode (thorough research)python src/agents/research/main.py --topic"Graph Neural Networks" --preset deep# Auto mode (agent decides depth)python src/agents/research/main.py --topic"Reinforcement Learning" --preset auto

Python API

importasynciofromsrc.agents.researchimportResearchPipelinefromsrc.core.coreimportget_llm_config,load_config_with_mainasyncdefmain():# Load configuration (main.yaml merged with any module-specific overrides)config=load_config_with_main("research_config.yaml")llm_config=get_llm_config()# Create pipeline (agent parameters loaded from agents.yaml automatically)pipeline=ResearchPipeline(config=config,api_key=llm_config["api_key"],base_url=llm_config["base_url"],kb_name="ai_textbook"# Optional: override knowledge base    )# Run researchresult=awaitpipeline.run(topic="Attention Mechanisms in Deep Learning")print(f"Report saved to:{result['final_report_path']}")asyncio.run(main())

Output Location

data/user/research/├── reports/                          # Final research reports│   ├── research_YYYYMMDD_HHMMSS.md   # Markdown report with clickable citations [[N]](#ref-N)│   └── research_*_metadata.json      # Research metadata and statistics└── cache/                            # Research process cache    └── research_YYYYMMDD_HHMMSS/        ├── queue.json                # DynamicTopicQueue state (TopicBlocks + ToolTraces)        ├── citations.json            # Citation registry with ID counters and ref_number mapping        │                             #   - citations: {citation_id: citation_info}        │                             #   - counters: {plan_counter, block_counters}        ├── step1_planning.json       # Planning phase results (subtopics + PLAN-XX citations)        ├── planning_progress.json    # Planning progress events        ├── researching_progress.json # Researching progress events        ├── reporting_progress.json   # Reporting progress events        ├── outline.json              # Three-level report outline structure        └── token_cost_summary.json   # Token usage statistics

Citation File Structure (citations.json):

{"research_id":"research_20241209_120000","citations": {"PLAN-01": {"citation_id":"PLAN-01","tool_type":"rag_hybrid","query":"...","summary":"..."},"CIT-1-01": {"citation_id":"CIT-1-01","tool_type":"paper_search","papers": [...],...}  },"counters": {"plan_counter":2,"block_counters": {"1":3,"2":2}  }}

Configuration Options

Key configuration inconfig/main.yaml (research section) andconfig/agents.yaml:

# config/agents.yaml - Agent LLM parametersresearch:temperature:0.5max_tokens:12000# config/main.yaml - Research settingsresearch:# Execution Moderesearching:execution_mode:"parallel"# "series" or "parallel"max_parallel_topics:5# Max concurrent topicsmax_iterations:5# Max iterations per topic# Tool Switchesenable_rag_hybrid:true# Hybrid RAG retrievalenable_rag_naive:true# Basic RAG retrievalenable_paper_search:true# Academic paper searchenable_web_search:true# Web search (also controlled by tools.web_search.enabled)enable_run_code:true# Code execution# Queue Limitsqueue:max_length:5# Maximum topics in queue# Reportingreporting:enable_inline_citations:true# Enable clickable [N] citations in report# Presets: quick, medium, deep, auto# Global tool switches in tools sectiontools:web_search:enabled:true# Global web search switch (higher priority)

💡 Automated IdeaGen

Architecture Diagram

Research idea generation system that extracts knowledge points from notebook records and generates research ideas through multi-stage filtering.

Core Features

Feature	Description
MaterialOrganizerAgent	Extracts knowledge points from notebook records
Multi-Stage Filtering	Loose Filter →Explore Ideas (5+ per point) →Strict Filter →Generate Markdown
Idea Exploration	Innovative thinking from multiple dimensions
Structured Output	Organized markdown with knowledge points and ideas
Progress Callbacks	Real-time updates for each stage

Usage

Visithttp://localhost:{frontend_port}/ideagen
Select a notebook with records
Optionally provide user thoughts/preferences
Click "Generate Ideas"
View generated research ideas organized by knowledge points

Python API

importasynciofromsrc.agents.ideagenimportIdeaGenerationWorkflow,MaterialOrganizerAgentfromsrc.core.coreimportget_llm_configasyncdefmain():llm_config=get_llm_config()# Step 1: Extract knowledge points from materialsorganizer=MaterialOrganizerAgent(api_key=llm_config["api_key"],base_url=llm_config["base_url"]    )knowledge_points=awaitorganizer.extract_knowledge_points("Your learning materials or notebook content here"    )# Step 2: Generate research ideasworkflow=IdeaGenerationWorkflow(api_key=llm_config["api_key"],base_url=llm_config["base_url"]    )result=awaitworkflow.process(knowledge_points)print(result)# Markdown formatted research ideasasyncio.run(main())

📊 Dashboard + Knowledge Base Management

Unified system entry providing activity tracking, knowledge base management, and system status monitoring.

Key Features

Feature	Description
Activity Statistics	Recent solving/generation/research records
Knowledge Base Overview	KB list, statistics, incremental updates
Notebook Statistics	Notebook counts, record distribution
Quick Actions	One-click access to all modules

Usage

Web Interface: Visithttp://localhost:{frontend_port} to view system overview
Create KB: Click "New Knowledge Base", upload PDF/Markdown documents
View Activity: Check recent learning activities on Dashboard

📓 Notebook

Unified learning record management, connecting outputs from all modules to create a personalized learning knowledge base.

Core Features

Feature	Description
Multi-Notebook Management	Create, edit, delete notebooks
Unified Record Storage	Integrate solving/generation/research/Interactive IdeaGen records
Categorization Tags	Auto-categorize by type, knowledge base
Custom Appearance	Color, icon personalization

Usage

Visithttp://localhost:{frontend_port}/notebook
Create new notebook (set name, description, color, icon)
After completing tasks in other modules, click "Add to Notebook"
View and manage all records on the notebook page

📖 Module Documentation

Configuration	Data Directory	API Backend	Core Utilities
Knowledge Base	Tools	Web Frontend	Solve Module
Question Module	Research Module	Interactive IdeaGen Module	Guide Module
Automated IdeaGen Module

❓ FAQ

Backend fails to start?

Checklist

Confirm Python version >= 3.10
Confirm all dependencies installed:pip install -r requirements.txt
Check if port 8001 is in use (configurable inconfig/main.yaml)
Check.env file configuration

Solutions

Change port: Editconfig/main.yaml server.backend_port
Check logs: Review terminal error messages

Port occupied after Ctrl+C?

Problem

After pressing Ctrl+C during a running task (e.g., deep research), restarting shows "port already in use" error.

Cause

Ctrl+C sometimes only terminates the frontend process while the backend continues running in the background.

Solution

# macOS/Linux: Find and kill the processlsof -i :8001kill -9<PID># Windows: Find and kill the processnetstat -ano| findstr :8001taskkill /PID<PID> /F

Then restart the service withpython scripts/start_web.py.

npm: command not found error?

Problem

Runningscripts/start_web.py showsnpm: command not found or exit status 127.

Checklist

Check if npm is installed:npm --version
Check if Node.js is installed:node --version
Confirm conda environment is activated (if using conda)

Solutions

# Option A: Using Conda (Recommended)conda install -c conda-forge nodejs# Option B: Using Official Installer# Download from https://nodejs.org/# Option C: Using nvmnvm install 18nvm use 18

Verify Installation

node --version# Should show v18.x.x or highernpm --version# Should show version number

Frontend cannot connect to backend?

Checklist

Confirm backend is running (visithttp://localhost:8001/docs)
Check browser console for error messages

Solution

Create.env.local inweb directory:

NEXT_PUBLIC_API_BASE=http://localhost:8001

WebSocket connection fails?

Checklist

Confirm backend is running
Check firewall settings
Confirm WebSocket URL is correct

Solution

Check backend logs
Confirm URL format:ws://localhost:8001/api/v1/...

Where are module outputs stored?

Module	Output Path
Solve	`data/user/solve/solve_YYYYMMDD_HHMMSS/`
Question	`data/user/question/question_YYYYMMDD_HHMMSS/`
Research	`data/user/research/reports/`
Interactive IdeaGen	`data/user/co-writer/`
Notebook	`data/user/notebook/`
Guide	`data/user/guide/session_{session_id}.json`
Logs	`data/user/logs/`

How to add a new knowledge base?

Web Interface

Visithttp://localhost:{frontend_port}/knowledge
Click "New Knowledge Base"
Enter knowledge base name
Upload PDF/TXT/MD documents
System will process documents in background

CLI

python -m src.knowledge.start_kb init<kb_name> --docs<pdf_path>

How to incrementally add documents to existing KB?

CLI (Recommended)

python -m src.knowledge.add_documents<kb_name> --docs<new_document.pdf>

Benefits

Only processes new documents, saves time and API costs
Automatically merges with existing knowledge graph
Preserves all existing data

Numbered items extraction failed with uvloop.Loop error?

Problem

When initializing a knowledge base, you may encounter this error:

ValueError: Can't patch loop of type <class 'uvloop.Loop'>

This occurs because Uvicorn usesuvloop event loop by default, which is incompatible withnest_asyncio.

Solution

Use one of the following methods to extract numbered items:

# Option 1: Using the shell script (recommended)./scripts/extract_numbered_items.sh<kb_name># Option 2: Direct Python commandpython src/knowledge/extract_numbered_items.py --kb<kb_name> --base-dir ./data/knowledge_bases

This will extract numbered items (Definitions, Theorems, Equations, etc.) from your knowledge base without reinitializing it.

📄 License

This project is licensed under theAGPL-3.0 License.

🤝 Contribution

We welcome contributions from the community! To ensure code quality and consistency, please follow the guidelines below.

Development Setup

Pre-commit Hooks Setup

This project usespre-commit hooks to automatically format code and check for issues before commits.

Step 1: Install pre-commit

# Using pippip install pre-commit# Or using condaconda install -c conda-forge pre-commit

Step 2: Install Git hooks

cd DeepTutorpre-commit install

Step 3: (Optional) Run checks on all files

pre-commit run --all-files

Every time you rungit commit, pre-commit hooks will automatically:

Format Python code with Ruff
Format frontend code with Prettier
Check for syntax errors
Validate YAML/JSON files
Detect potential security issues

Code Quality Tools

Tool	Purpose	Configuration
Ruff	Python linting & formatting	`pyproject.toml`
Prettier	Frontend code formatting	`web/.prettierrc.json`
detect-secrets	Security check	`.secrets.baseline`

Note: The project usesRuff format instead of Black to avoid formatting conflicts.

Common Commands

# Normal commit (hooks run automatically)git commit -m"Your commit message"# Manually check all filespre-commit run --all-files# Update hooks to latest versionspre-commit autoupdate# Skip hooks (not recommended, only for emergencies)git commit --no-verify -m"Emergency fix"

Contribution Guidelines

Fork and Clone: Fork the repository and clone your fork
Create Branch: Create a feature branch frommain
Install Pre-commit: Follow the setup steps above
Make Changes: Write your code following the project's style
Test: Ensure your changes work correctly
Commit: Pre-commit hooks will automatically format your code
Push and PR: Push to your fork and create a Pull Request

Reporting Issues

Use GitHub Issues to report bugs or suggest features
Provide detailed information about the issue
Include steps to reproduce if it's a bug

❤️ We thank all our contributors for their valuable contributions.

🔗 Related Projects

⚡ LightRAG	🎨 RAG-Anything	💻 DeepCode	🔬 AI-Researcher
Simple and Fast RAG	Multimodal RAG	AI Code Assistant	Research Automation

Data Intelligence Lab @ HKU

⭐ Star us ·🐛 Report a bug ·💬 Discussions

Star History

✨ Thanks for visitingDeepTutor!

About

"DeepTutor: AI-Powered Personalized Learning Assistant"

hkuds.github.io/DeepTutor

Releases

No releases published

Packages

No packages published

Languages

Python61.3%
TypeScript37.0%
Other1.7%

Movatterモバイル変換

License

niyeldeii/DeepTutor

Folders and files

Latest commit

History

Repository files navigation

DeepTutor: AI-Powered Personalized Learning Assistant

Key Features of DeepTutor

📚 Massive Document Knowledge Q&A

🎨 Interactive Learning Visualization

🎯 Knowledge Reinforcement with Practice Exercise Generator

🔍 Deep Research & Idea Generation

📚 Massive Document Knowledge Q&A

🎨 Interactive Learning Visualization

🎯 Knowledge Reinforcement

🔍 Deep Research & Idea Generation

🏗️ All-in-One Knowledge System

🏛️ DeepTutor's Framework

💬 User Interface Layer

🤖 Intelligent Agent Modules

🔧 Tool Integration Layer

🧠 Knowledge & Memory Foundation

📋 Todo

🚀 Getting Started

Step 1: Pre-Configuration

Step 2: Choose Your Installation Method

🐳 Docker Deployment

💻 Manual Installation

Access URLs

📂 Data Storage

📦 Core Modules

📖 Module Documentation

❓ FAQ

📄 License

🤝 Contribution

Pre-commit Hooks Setup

Code Quality Tools

Common Commands

Contribution Guidelines

Reporting Issues

🔗 Related Projects

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages