allenai/ai2-scholarqa-libPublic

NotificationsYou must be signed in to change notification settings
Fork43
Star233

Build report editing workflow for SQA by claude#48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

sergeyf wants to merge4 commits intomain

base:main

Choose a base branch

fromclaude/sqa-edit-report-workflow-011CUpCKBFoxGFDEKHDcMJ4W

Open

Build report editing workflow for SQA by claude#48

sergeyf wants to merge4 commits intomainfromclaude/sqa-edit-report-workflow-011CUpCKBFoxGFDEKHDcMJ4W

+3,088 −1

Conversation

Copy link

Collaborator

sergeyf commentedNov 5, 2025

Aman, this was written by Claude. Is it at all useful??? Don't merge but maybe you can pluck some useful code or ideas?

claude added4 commits

November 5, 2025 05:41

Add SQA Edit Workflow for modifying existing reports

079c8b4

Implements a comprehensive edit workflow that allows users to modifyexisting research reports based on natural language instructions,instead of generating new reports from scratch.New Features:- Step 1a: Decide if new search is needed based on edit instruction- Step 1b: Conditionally search and rerank new papers- Step 2: Generate per-section edit plan with multiple action types- Step 3: Execute edits section-by-section with citation updatesEdit Actions Supported:- keep: Leave section unchanged- expand: Add more detail- add_papers: Integrate specific papers- delete: Remove section- go_deeper: Comprehensive analysis with more depth- replace: Replace with new content- modify: Make specific changesAPI Changes:- Extended ToolRequest with edit_existing, thread_id,  edit_instruction, and mentioned_papers fields- Added routing logic in app.py to direct edit requests- New run_edit_pipeline() method in ScholarQAImplementation:- Maximizes reuse of existing pipeline components- New files: models_edit.py, edit_prompts.py, edit_pipeline.py- Comprehensive documentation in EDIT_WORKFLOW.mdTest Cases Covered:- Add single/multiple papers- Search and add papers by topic- Go deeper on sections- Remove and replace sections- Fix specific issues- Extract sections into new reportsAddresses user feedback about wanting follow-up reports with moredepth through the "go deeper" feature.

REDESIGN: Mirror existing 4-step pipeline in edit workflow

2f965b5

Complete redesign to make edit workflow a TRUE MIRROR of run_qa_pipeline.## Key Changes### 1. Extended Prompts (Not New Prompts)- SYSTEM_PROMPT_QUOTE_PER_PAPER_EDIT extends SYSTEM_PROMPT_QUOTE_PER_PAPER  - Added: current_sections_summary, edit_instruction  - Same quote extraction logic, edit-aware context- SYSTEM_PROMPT_QUOTE_CLUSTER_EDIT extends SYSTEM_PROMPT_QUOTE_CLUSTER  - Added: current_report, edit_instruction  - Same clustering logic, outputs include actions (KEEP/EXPAND/ADD_TO/etc)- PROMPT_ASSEMBLE_SUMMARY_EDIT extends PROMPT_ASSEMBLE_SUMMARY  - Added: current_section_content, action  - Same section generation, handles edit actions### 2. Mirrored Pipeline StructureEditPipeline mirrors MultiStepQAPipeline:- step_select_quotes_edit() mirrors step_select_quotes()- step_clustering_edit() mirrors step_clustering()- generate_iterative_summary_edit() mirrors generate_iterative_summary()Each method has SAME signature pattern + edit context parameters.### 3. Mirrored run_edit_pipeline FlowExactly parallels run_qa_pipeline (4 steps):Step 0 (NEW): Retrieve current report from thread_idStep 1: mentioned_papers + conditional search → rerank → scored_dfStep 2: step_select_quotes_edit → per_paper_summaries (WITH edit context)Step 3: step_clustering_edit → EditClusterPlan (WITH current report + actions)Step 4: extract_quote_citations → same as originalStep 5: generate_iterative_summary_edit → sections (WITH current content + actions)### 4. mentioned_papers (paper_ids) UsageClearly marked at 3 critical points in code:MARKER#1: Fetch mentioned papers metadata (lines ~120-140)  - Currently uses get_paper_metadata() for abstracts  - TODO: Fetch full text/snippets like search resultsMARKER#2: Convert to search format (lines ~140-165)  - Creates candidates with abstracts as content  - TODO: Use same rich snippet format as searchMARKER#3: Quote extraction (edit_pipeline.py)  - LLM extracts from abstracts for mentioned papers  - TODO: Maintain same granularity as searched papers### 5. Reused ComponentsMaximizes reuse (9 methods unchanged):✅ find_relevant_papers, rerank_and_aggregate✅ extract_quote_citations, get_json_summary✅ get_paper_metadata, gen_table_thread✅ EventTrace, CostAwareLLMCaller, state_mgrOnly 3 NEW methods in EditPipeline (all extensions).## Files Changed- api/scholarqa/llms/edit_prompts.py: Extended prompts (~300 lines)- api/scholarqa/rag/edit_pipeline.py: Mirrored pipeline (~368 lines)- api/scholarqa/scholar_qa.py: Redesigned run_edit_pipeline (~280 lines)- EDIT_WORKFLOW.md: Comprehensive documentation with:  - Side-by-side comparison tables  - Data flow diagrams  - mentioned_papers usage markers  - Architecture explanation## Files Removed- api/scholarqa/models_edit.py: No longer needed  - EditClusterPlan moved to edit_pipeline.py  - No separate action enums - using strings in plan## Why This Design1. Consistency: Edit and generate follow same pattern2. Maintainability: Prompts evolve together3. Reusability: Battle-tested components4. Clarity: Easy to see what's extended vs new5. Future-proof: Clear extension points## Testing✅ All files compile successfully✅ Syntax validated for all modified files✅ Imports cleaned up

Add structural tests and validation documentation

1cb565b

Created three levels of tests to validate edit workflow:1. test_edit_workflow.py (pytest-based)   - Comprehensive unit tests with mocks   - Tests all EditPipeline methods   - Error handling and data flow tests   - Requires: pytest, pandas2. test_edit_workflow_simple.py (simplified)   - Tests without pytest dependency   - Mock LLM interactions   - Requires: pandas, unittest.mock3. test_edit_workflow_basic.py (minimal)   - Validation without any runtime dependencies   - File compilation checks   - Code structure validation   - Runs in dev environmentResults:✅ All files compile (syntax valid)✅ App routing works (edit_existing flag checked)❌ Runtime tests blocked by missing pandas/dependenciesAdded TESTING_STATUS.md documenting:- What we validated (6 items)- What requires testing (8 items)- Confidence levels (high/medium/low)- Manual test plan for E2E validation- Next steps for full validationRecommendation: Code is structurally sound but needs runtimetesting in environment with dependencies before production use.

Add validation results documentation

3e60e3a

Document comprehensive structural validation of edit workflow including:- All compilation, prompt, and integration checks that passed- Environment limitations that blocked runtime testing- Recommendations for production deployment

sergeyf requested review fromamanpreet692 andCopilot

November 5, 2025 22:27

CopilotAI reviewed

Nov 5, 2025

View reviewed changes

Copy link

Contributor

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull Request Overview

This PR implements an edit workflow for existing research reports, allowing users to modify reports through natural language instructions. The implementation mirrors the existing 4-step report generation pipeline with edit-specific extensions.

Key Changes:

ExtendedToolRequest model with edit workflow fields (edit_existing,thread_id,edit_instruction,mentioned_papers)
CreatedEditPipeline class that mirrorsMultiStepQAPipeline with edit-aware methods
Addedrun_edit_pipeline() method toScholarQA that parallels the existingrun_qa_pipeline()
Implemented routing logic inapp.py to direct requests to the appropriate pipeline

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
api/scholarqa/models.py	Extended ToolRequest with 4 edit workflow fields
api/scholarqa/llms/edit_prompts.py	Added 6 edit-specific prompts that extend originals with current report context
api/scholarqa/rag/edit_pipeline.py	New EditPipeline class with 3 main methods mirroring MultiStepQAPipeline
api/scholarqa/scholar_qa.py	Added run_edit_pipeline() method (~400 lines) and _retrieve_report_from_thread() helper
api/scholarqa/app.py	Added routing logic based on edit_existing flag
api/scholarqa/test_*.py	Three test files with comprehensive test coverage
EDIT_WORKFLOW.md, TESTING_STATUS.md, VALIDATION_RESULTS.md	Documentation files

Comments suppressed due to low confidence (1)

api/scholarqa/test_edit_workflow.py:1

This inline class definition duplicates the same pattern from edit_pipeline.py. If this is testing the actual implementation behavior, consider importing or mocking the actual class to avoid drift between test and implementation.

"""

💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.

api/scholarqa/scholar_qa.py

		sections=generated_sections,
		cost=event_trace.total_cost,
		tokens=event_trace.tokens
		)

Copy link

CopilotAINov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Missing blank line before method definition. PEP 8 recommends two blank lines before top-level function/method definitions in a class to improve readability.

Suggested change

	)
	)

Copilot uses AI. Check for mistakes.

api/scholarqa/rag/edit_pipeline.py

Comment on lines +275 to +276

		classKeepResult:
		content=current_section.text

Copy link

CopilotAINov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Defining an ad-hoc class inside a loop is inefficient and unconventional. Consider defining KeepResult outside the loop or using a namedtuple/dataclass to avoid recreating the class on each iteration.

Copilot uses AI. Check for mistakes.

VALIDATION_RESULTS.md


		##Summary

		Date: $(date +%Y-%m-%d)

Copy link

CopilotAINov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The date placeholder '$(date +%Y-%m-%d)' is a shell command that won't be evaluated in a markdown file. This should either be replaced with an actual date or removed if not needed.

Suggested change

	Date:$(date +%Y-%m-%d)
	Date:2024-06-10

Copilot uses AI. Check for mistakes.

api/scholarqa/test_edit_workflow.py


		importpytest
		importpandasaspd
		fromunittest.mockimportMock,MagicMock,patch

Copy link

CopilotAINov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Import of 'MagicMock' is not used.

Suggested change

	fromunittest.mockimportMock,MagicMock,patch
	fromunittest.mockimportMock,patch

Copilot uses AI. Check for mistakes.

api/scholarqa/test_edit_workflow.py

		importpytest
		importpandasaspd
		fromunittest.mockimportMock,MagicMock,patch
		fromtypingimportDict,List,Any

Copy link

CopilotAINov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Import of 'Any' is not used.
Import of 'Dict' is not used.

Suggested change

	fromtypingimportDict,List,Any
	fromtypingimportList

Copilot uses AI. Check for mistakes.

api/scholarqa/test_edit_workflow.py

		fromtypingimportDict,List,Any

		fromscholarqa.modelsimportTaskResult,GeneratedSection,CitationSrc,PaperDetails,Author,ToolRequest
		fromscholarqa.rag.edit_pipelineimportEditPipeline,EditClusterPlan

Copy link

CopilotAINov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Import of 'EditClusterPlan' is not used.

Suggested change

	fromscholarqa.rag.edit_pipelineimportEditPipeline,EditClusterPlan
	fromscholarqa.rag.edit_pipelineimportEditPipeline

Copilot uses AI. Check for mistakes.

api/scholarqa/test_edit_workflow_simple.py

		# Add parent dir to path
		sys.path.insert(0,'..')

		fromscholarqa.modelsimportTaskResult,GeneratedSection,CitationSrc,PaperDetails,Author,ToolRequest

Copy link

CopilotAINov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Import of 'CitationSrc' is not used.
Import of 'PaperDetails' is not used.
Import of 'Author' is not used.

Suggested change

	fromscholarqa.modelsimportTaskResult,GeneratedSection,CitationSrc,PaperDetails,Author,ToolRequest
	fromscholarqa.modelsimportTaskResult,GeneratedSection,ToolRequest

Copilot uses AI. Check for mistakes.

api/scholarqa/test_edit_workflow_simple.py

		sys.path.insert(0,'..')

		fromscholarqa.modelsimportTaskResult,GeneratedSection,CitationSrc,PaperDetails,Author,ToolRequest
		fromscholarqa.rag.edit_pipelineimportEditPipeline,EditClusterPlan

Copy link

CopilotAINov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Import of 'EditClusterPlan' is not used.

Suggested change

	fromscholarqa.rag.edit_pipelineimportEditPipeline,EditClusterPlan
	fromscholarqa.rag.edit_pipelineimportEditPipeline

Copilot uses AI. Check for mistakes.

Labels

None yet

Movatterモバイル変換

Build report editing workflow for SQA by claude#48

Are you sure you want to change the base?

Build report editing workflow for SQA by claude#48

Conversation

sergeyf commentedNov 5, 2025

Uh oh!

CopilotAI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

CopilotAINov 5, 2025

Choose a reason for hiding this comment

Uh oh!

CopilotAINov 5, 2025

Choose a reason for hiding this comment

Uh oh!

CopilotAINov 5, 2025

Choose a reason for hiding this comment

Uh oh!

CopilotAINov 5, 2025

Choose a reason for hiding this comment

Uh oh!

CopilotAINov 5, 2025

Choose a reason for hiding this comment

Uh oh!

CopilotAINov 5, 2025

Choose a reason for hiding this comment

Uh oh!

CopilotAINov 5, 2025

Choose a reason for hiding this comment

Uh oh!

CopilotAINov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants