- Notifications
You must be signed in to change notification settings - Fork43
Build report editing workflow for SQA by claude#48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Build report editing workflow for SQA by claude#48
Conversation
Implements a comprehensive edit workflow that allows users to modifyexisting research reports based on natural language instructions,instead of generating new reports from scratch.New Features:- Step 1a: Decide if new search is needed based on edit instruction- Step 1b: Conditionally search and rerank new papers- Step 2: Generate per-section edit plan with multiple action types- Step 3: Execute edits section-by-section with citation updatesEdit Actions Supported:- keep: Leave section unchanged- expand: Add more detail- add_papers: Integrate specific papers- delete: Remove section- go_deeper: Comprehensive analysis with more depth- replace: Replace with new content- modify: Make specific changesAPI Changes:- Extended ToolRequest with edit_existing, thread_id, edit_instruction, and mentioned_papers fields- Added routing logic in app.py to direct edit requests- New run_edit_pipeline() method in ScholarQAImplementation:- Maximizes reuse of existing pipeline components- New files: models_edit.py, edit_prompts.py, edit_pipeline.py- Comprehensive documentation in EDIT_WORKFLOW.mdTest Cases Covered:- Add single/multiple papers- Search and add papers by topic- Go deeper on sections- Remove and replace sections- Fix specific issues- Extract sections into new reportsAddresses user feedback about wanting follow-up reports with moredepth through the "go deeper" feature.
Complete redesign to make edit workflow a TRUE MIRROR of run_qa_pipeline.## Key Changes### 1. Extended Prompts (Not New Prompts)- SYSTEM_PROMPT_QUOTE_PER_PAPER_EDIT extends SYSTEM_PROMPT_QUOTE_PER_PAPER - Added: current_sections_summary, edit_instruction - Same quote extraction logic, edit-aware context- SYSTEM_PROMPT_QUOTE_CLUSTER_EDIT extends SYSTEM_PROMPT_QUOTE_CLUSTER - Added: current_report, edit_instruction - Same clustering logic, outputs include actions (KEEP/EXPAND/ADD_TO/etc)- PROMPT_ASSEMBLE_SUMMARY_EDIT extends PROMPT_ASSEMBLE_SUMMARY - Added: current_section_content, action - Same section generation, handles edit actions### 2. Mirrored Pipeline StructureEditPipeline mirrors MultiStepQAPipeline:- step_select_quotes_edit() mirrors step_select_quotes()- step_clustering_edit() mirrors step_clustering()- generate_iterative_summary_edit() mirrors generate_iterative_summary()Each method has SAME signature pattern + edit context parameters.### 3. Mirrored run_edit_pipeline FlowExactly parallels run_qa_pipeline (4 steps):Step 0 (NEW): Retrieve current report from thread_idStep 1: mentioned_papers + conditional search → rerank → scored_dfStep 2: step_select_quotes_edit → per_paper_summaries (WITH edit context)Step 3: step_clustering_edit → EditClusterPlan (WITH current report + actions)Step 4: extract_quote_citations → same as originalStep 5: generate_iterative_summary_edit → sections (WITH current content + actions)### 4. mentioned_papers (paper_ids) UsageClearly marked at 3 critical points in code:MARKER#1: Fetch mentioned papers metadata (lines ~120-140) - Currently uses get_paper_metadata() for abstracts - TODO: Fetch full text/snippets like search resultsMARKER#2: Convert to search format (lines ~140-165) - Creates candidates with abstracts as content - TODO: Use same rich snippet format as searchMARKER#3: Quote extraction (edit_pipeline.py) - LLM extracts from abstracts for mentioned papers - TODO: Maintain same granularity as searched papers### 5. Reused ComponentsMaximizes reuse (9 methods unchanged):✅ find_relevant_papers, rerank_and_aggregate✅ extract_quote_citations, get_json_summary✅ get_paper_metadata, gen_table_thread✅ EventTrace, CostAwareLLMCaller, state_mgrOnly 3 NEW methods in EditPipeline (all extensions).## Files Changed- api/scholarqa/llms/edit_prompts.py: Extended prompts (~300 lines)- api/scholarqa/rag/edit_pipeline.py: Mirrored pipeline (~368 lines)- api/scholarqa/scholar_qa.py: Redesigned run_edit_pipeline (~280 lines)- EDIT_WORKFLOW.md: Comprehensive documentation with: - Side-by-side comparison tables - Data flow diagrams - mentioned_papers usage markers - Architecture explanation## Files Removed- api/scholarqa/models_edit.py: No longer needed - EditClusterPlan moved to edit_pipeline.py - No separate action enums - using strings in plan## Why This Design1. Consistency: Edit and generate follow same pattern2. Maintainability: Prompts evolve together3. Reusability: Battle-tested components4. Clarity: Easy to see what's extended vs new5. Future-proof: Clear extension points## Testing✅ All files compile successfully✅ Syntax validated for all modified files✅ Imports cleaned up
Created three levels of tests to validate edit workflow:1. test_edit_workflow.py (pytest-based) - Comprehensive unit tests with mocks - Tests all EditPipeline methods - Error handling and data flow tests - Requires: pytest, pandas2. test_edit_workflow_simple.py (simplified) - Tests without pytest dependency - Mock LLM interactions - Requires: pandas, unittest.mock3. test_edit_workflow_basic.py (minimal) - Validation without any runtime dependencies - File compilation checks - Code structure validation - Runs in dev environmentResults:✅ All files compile (syntax valid)✅ App routing works (edit_existing flag checked)❌ Runtime tests blocked by missing pandas/dependenciesAdded TESTING_STATUS.md documenting:- What we validated (6 items)- What requires testing (8 items)- Confidence levels (high/medium/low)- Manual test plan for E2E validation- Next steps for full validationRecommendation: Code is structurally sound but needs runtimetesting in environment with dependencies before production use.
Document comprehensive structural validation of edit workflow including:- All compilation, prompt, and integration checks that passed- Environment limitations that blocked runtime testing- Recommendations for production deployment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Pull Request Overview
This PR implements an edit workflow for existing research reports, allowing users to modify reports through natural language instructions. The implementation mirrors the existing 4-step report generation pipeline with edit-specific extensions.
Key Changes:
- Extended
ToolRequestmodel with edit workflow fields (edit_existing,thread_id,edit_instruction,mentioned_papers) - Created
EditPipelineclass that mirrorsMultiStepQAPipelinewith edit-aware methods - Added
run_edit_pipeline()method toScholarQAthat parallels the existingrun_qa_pipeline() - Implemented routing logic in
app.pyto direct requests to the appropriate pipeline
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| api/scholarqa/models.py | Extended ToolRequest with 4 edit workflow fields |
| api/scholarqa/llms/edit_prompts.py | Added 6 edit-specific prompts that extend originals with current report context |
| api/scholarqa/rag/edit_pipeline.py | New EditPipeline class with 3 main methods mirroring MultiStepQAPipeline |
| api/scholarqa/scholar_qa.py | Added run_edit_pipeline() method (~400 lines) and _retrieve_report_from_thread() helper |
| api/scholarqa/app.py | Added routing logic based on edit_existing flag |
| api/scholarqa/test_*.py | Three test files with comprehensive test coverage |
| EDIT_WORKFLOW.md, TESTING_STATUS.md, VALIDATION_RESULTS.md | Documentation files |
Comments suppressed due to low confidence (1)
api/scholarqa/test_edit_workflow.py:1
- This inline class definition duplicates the same pattern from edit_pipeline.py. If this is testing the actual implementation behavior, consider importing or mocking the actual class to avoid drift between test and implementation.
"""💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.
| sections=generated_sections, | ||
| cost=event_trace.total_cost, | ||
| tokens=event_trace.tokens | ||
| ) |
CopilotAINov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Missing blank line before method definition. PEP 8 recommends two blank lines before top-level function/method definitions in a class to improve readability.
| ) | |
| ) |
| classKeepResult: | ||
| content=current_section.text |
CopilotAINov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Defining an ad-hoc class inside a loop is inefficient and unconventional. Consider defining KeepResult outside the loop or using a namedtuple/dataclass to avoid recreating the class on each iteration.
| ##Summary | ||
| **Date**: $(date +%Y-%m-%d) |
CopilotAINov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The date placeholder '$(date +%Y-%m-%d)' is a shell command that won't be evaluated in a markdown file. This should either be replaced with an actual date or removed if not needed.
| **Date**:$(date +%Y-%m-%d) | |
| **Date**:2024-06-10 |
| importpytest | ||
| importpandasaspd | ||
| fromunittest.mockimportMock,MagicMock,patch |
CopilotAINov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Import of 'MagicMock' is not used.
| fromunittest.mockimportMock,MagicMock,patch | |
| fromunittest.mockimportMock,patch |
| importpytest | ||
| importpandasaspd | ||
| fromunittest.mockimportMock,MagicMock,patch | ||
| fromtypingimportDict,List,Any |
CopilotAINov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Import of 'Any' is not used.
Import of 'Dict' is not used.
| fromtypingimportDict,List,Any | |
| fromtypingimportList |
| fromtypingimportDict,List,Any | ||
| fromscholarqa.modelsimportTaskResult,GeneratedSection,CitationSrc,PaperDetails,Author,ToolRequest | ||
| fromscholarqa.rag.edit_pipelineimportEditPipeline,EditClusterPlan |
CopilotAINov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Import of 'EditClusterPlan' is not used.
| fromscholarqa.rag.edit_pipelineimportEditPipeline,EditClusterPlan | |
| fromscholarqa.rag.edit_pipelineimportEditPipeline |
| # Add parent dir to path | ||
| sys.path.insert(0,'..') | ||
| fromscholarqa.modelsimportTaskResult,GeneratedSection,CitationSrc,PaperDetails,Author,ToolRequest |
CopilotAINov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Import of 'CitationSrc' is not used.
Import of 'PaperDetails' is not used.
Import of 'Author' is not used.
| fromscholarqa.modelsimportTaskResult,GeneratedSection,CitationSrc,PaperDetails,Author,ToolRequest | |
| fromscholarqa.modelsimportTaskResult,GeneratedSection,ToolRequest |
| sys.path.insert(0,'..') | ||
| fromscholarqa.modelsimportTaskResult,GeneratedSection,CitationSrc,PaperDetails,Author,ToolRequest | ||
| fromscholarqa.rag.edit_pipelineimportEditPipeline,EditClusterPlan |
CopilotAINov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Import of 'EditClusterPlan' is not used.
| fromscholarqa.rag.edit_pipelineimportEditPipeline,EditClusterPlan | |
| fromscholarqa.rag.edit_pipelineimportEditPipeline |
Aman, this was written by Claude. Is it at all useful??? Don't merge but maybe you can pluck some useful code or ideas?