68 KiB
Package 4 Enhancement Plan — Per-Sub-Question RAG Pipeline
Source: User request (2026-04-26)
Scope: Refactor the 3-step RAG query pipeline so retrieval, filtering, and response generation are organized per sub-question instead of batch-flattened.
Status: ✅ Complete — All 7 sub-phases implemented (2026-04-26). Phase 4a Prompt Integration added (2026-04-27). Phase PX Profile Export/Import planned (2026-04-27) — see end of file.
Objective
Restructure the POST /api/v1/query pipeline so that:
- Retrieval per sub-question: Each sub-question independently retrieves
n_resultschunks from ChromaDB (instead of joining all sub-questions into one query string). - Filtering per sub-question: Each chunk is evaluated for relevance against its own originating sub-question (not the original user question). One LLM call handles all filtering — the prompt is redesigned to group chunks by sub-question.
- Final answer organized by sub-question: Each sub-question gets its own bullet-point answer with its own sources. The frontend renders answer sections per sub-question rather than one monolithic bullet list.
Decision Register
| # | Decision | Rationale |
|---|---|---|
| 1 | Keep QueryDecomposer unchanged |
Input/output contract is identical — decomposition still produces a flat list of sub-questions |
| 2 | Single LLM call for filtering | User explicitly requested one call. Prompt redesigned to carry sub-question context for each chunk group |
| 3 | Keep RAGService.retrieve() signature |
Call it N times (once per sub-question) externally in the orchestrator rather than changing its internal contract |
| 4 | Add retrieve_per_subquestion() to RAGService |
New method that iterates over sub-questions, calls retrieve() per question, returns grouped results |
| 5 | Redesign generate_response() signature |
Accepts structured sub_questions: List[SubQuestionContext] instead of flat chunk lists |
| 6 | SSE events: add generating_subquestion phase |
Progressive streaming — frontend sees which sub-question is being answered |
| 7 | History: change XML/JSON formats in-place | Add <sub_q> wrappers to chunks_retrieved/chunks_filtered XML. Add sub-question grouping to sources JSON. No new DB columns. |
| 8 | Final answer format: markdown sections | ## Sub-question 1 headers with inline citations. Backward-compatible with existing ReactMarkdown rendering |
| 9 | Deduplicate chunks within a sub-question only | Same chunk may be retrieved by multiple sub-questions. Keep duplicates (different sub-questions need independent evaluation). ChromaDB query() naturally may return the same doc for different queries — this is acceptable. |
| 10 | Prompt template: add generate placeholders |
New placeholder {context_sections} replaces single {context}. Filter template unchanged (sub-question injected at call site). Decompose template unchanged. |
| 11 | Progressive SSE events | Emit generating_subquestion event as each sub-question's answer section is generated. Frontend renders sections one by one. |
| 12 | retrieval_n_results |
Global — same value for all sub-questions. Use existing settings.retrieval_n_results config. |
| 13 | Empty decomposition fallback | Treat original user question as single sub-question. Pipeline runs as 1-sub-q case — single retrieval, no filtering needed (one sub-q = no ambiguity), flat answer with ## header. |
Pipeline: Before vs After
Before (Current — Flat Batch)
User Question: "What are NEC4 time extension clauses?"
│
┌────▼─────┐
│ Decompose│ LLM Call 1
│ → ["What are time extensions?",
│ "What notice is required?"]
└────┬─────┘
│ joined: "What are time extensions? What notice is required?"
┌────▼─────┐
│ Retrieve │ 1 ChromaDB query → 10 chunks (flat, no sub-q association)
└────┬─────┘
│ 10 chunks
┌────▼─────┐
│ Filter │ LLM Call 2 — all chunks scored against ORIGINAL question
│ │ Score > 7 → keep (flat, no sub-q association)
└────┬─────┘
│ N filtered chunks
┌────▼─────┐
│ Generate │ LLM Call 3 — flat answer from ALL filtered chunks
│ │ "• Time extensions require notice [NEC4 ACC.pdf, p3]
│ │ • The project manager must acknowledge [NEC4, p7]
│ │ • Notice is defined as..." (sources from all sub-qs mixed)
└────┬─────┘
│ single SSE completed event
┌────▼─────┐
│ Frontend │ 1 ReactMarkdown block, 1 flat sources list
└──────────┘
After (Per-Sub-Question)
User Question: "What are NEC4 time extension clauses?"
│
┌────▼─────┐
│ Decompose│ LLM Call 1 (UNCHANGED)
│ → ["What are time extensions?",
│ "What notice is required?"]
└────┬─────┘
│ sub_q1 sub_q2
┌────▼─────┐ ┌────▼─────┐
│ Retrieve │ │ Retrieve │ 2 ChromaDB queries → 10 chunks each
│ q1 → 10 │ │ q2 → 10 │ chunks tagged with sub-q index
└────┬─────┘ └────┬─────┘
│ │
└─────────┬───────────────┘
│ grouped: {sub_q0: [chunks 0-9], sub_q1: [chunks 10-19]}
┌────▼─────┐
│ Filter │ LLM Call 2 (SINGLE CALL — redesigned prompt)
│ │ Each chunk scored against its OWN sub-question
│ │ Returns grouped scores → filtered per sub-q
└────┬─────┘
│ filtered_by_subq: {0: [chunk_a, chunk_b], 1: [chunk_c]}
┌────▼─────┐
│ Generate │ LLM Call 3 (redesigned prompt with per-sub-q context)
│ │ ┌─────────────────────────────────────┐
│ │ │ ## What are time extensions? │
│ │ │ - Time extensions must be notified │
│ │ │ [NEC4 ACC.pdf, page 3] │
│ │ │ - The project manager has 2 weeks │
│ │ │ [NEC4 Contract.pdf, page 12] │
│ │ │ │
│ │ │ ## What notice is required? │
│ │ │ - Written notice must be given │
│ │ │ [NEC4 ACC.pdf, page 7] │
│ │ └─────────────────────────────────────┘
└────┬─────┘
│ SSE events: generating_subquestion (per sub-q) → completed
┌────▼─────┐
│ Frontend │ Sections per sub-question, sources grouped per section
└──────────┘
Current State (Pre-Enhancement)
Backend
| Component | File | Current Behavior |
|---|---|---|
| Decomposer | services/query_decomposer.py |
decompose(question) -> (List[str], prompt) — returns 2-5 sub-questions |
| Retrieval | services/rag.py:retrieve() |
query_text = " ".join(query_keywords) — joins all sub-qs into ONE string, single ChromaDB query → flat chunk list |
| Filter | services/relevance_filter.py |
filter(question, chunks) — ALL chunks scored against ORIGINAL question, single LLM call, flat output |
| Generate | services/rag.py:generate_response() |
generate_response(question, chunks, metadata) — flat chunks → flat bullet answer |
| Orchestrator | routers/query.py:_query_stream() |
Linear 4-stage pipeline: decompose → retrieve → filter → generate |
| SSE Events | routers/query.py |
decomposed → retrieving → filtering → generating → completed — flat answer + sources in completed |
| History | services/history_service.py |
Flat XML for chunks_retrieved/chunks_filtered. Flat JSON for sources. Single timing per stage. |
| Prompt templates | prompt_service.py + sqlite_db.py |
3 steps (decompose, filter, generate). Placeholders: {question}, {chunks}, {context} |
| Config | core/config.py |
retrieval_n_results=10, relevance_threshold=7.0 |
Frontend
| Component | File | Current Behavior |
|---|---|---|
| Types | types/index.ts |
QueryStreamEvent.phase, flat extracted_questions: string[], flat answer: string, flat sources: SourceMetadata[] |
| SSE Client | lib/api.ts |
queryDocumentStream() — generic JSON.parse per data: line, no sub-question awareness |
| State | lib/queries.tsx |
QueryStreamState with flat answer/sources/extractedQuestions |
| Response | components/ResponsePanel.tsx |
Single ReactMarkdown block for answer. Flat 2-column grid for sources. No sub-question grouping. |
| Questions | components/ExtractedQuestionsDisplay.tsx |
<ol> list of question strings. No sources attached. |
| Citations | utils/citationParser.ts |
Flat sources lookup — buildCitationLookup(sources) returns global map |
| Progress | components/PipelineProgress.tsx |
4-step stepper (NOT currently wired in LTTPage) |
Key Test Files
| File | Lines | Status |
|---|---|---|
test_phase1_query_decomposer.py |
76 | ✅ Unchanged — decomposer contract stays |
test_phase1_rag_service.py |
139 | 🔴 Needs update — retrieve(), generate_response() signatures change |
test_phase1_relevance_filter.py |
93 | 🟡 Needs update — one-call pattern changes to per-sub-q grouping |
test_phase1_query.py |
97 | 🟢 Already skipped (SSE migration) — may un-skip later |
test_phase3_query_history_integration.py |
608 | 🔴 Major rewrite — pipeline simulation mirrors _query_stream 1:1 |
test_phase3_prompt_injection.py |
238 | 🟡 Moderate — new generate template placeholder |
test_acceptance_phase1_rag_query.py |
101 | 🔴 Full rewrite — already broken (SSE vs JSON), new response shape |
conftest.py |
94 | 🟡 Low — may add per-sub-q mock helpers |
Implementation Tasks
Sub-Phase 4.1: Backend — Per-Sub-Question Retrieval
Test files to write first:
test_phase4_retrieve_per_subquestion.py— TestsRAGService.retrieve_per_subquestion()test_phase4_query_router_retrieval.py— Tests_query_streamretrieval stage produces per-sub-q chunks
Task 4.1.1: Add retrieve_per_subquestion() to RAGService
File: backend/app/services/rag.py
New method signature:
def retrieve_per_subquestion(
self,
sub_questions: List[str],
n_results: int = 10,
) -> List[Tuple[str, List[Tuple[str, Dict[str, Any], float]]]]:
"""Retrieve chunks for each sub-question independently.
Args:
sub_questions: List of decomposed sub-questions.
n_results: Number of chunks per sub-question.
Returns:
List of (sub_question, chunks) tuples.
chunks is the standard retrieve() output: [(text, metadata, distance), ...].
"""
Implementation:
- Call
self.retrieve([sub_q], n_results)for each sub-question - Return list of
(sub_question, chunks)— chunks remain deduplicated at ChromaDB level (ChromaDB automatically deduplicates by ID) - Existing
retrieve()method is NOT modified — it continues to work as before
Task 4.1.2: Update _query_stream() retrieval stage
File: backend/app/routers/query.py
Changes:
- Replace
rag.retrieve(extracted_questions, n_results)withrag.retrieve_per_subquestion(extracted_questions, n_results) - Track per-sub-question retrieval timing (new field or combined timing)
- Format
chunks_retrievedXML with sub-question wrappers
New chunks_retrieved XML format:
<sub_q idx="0" question="What are time extensions?">
<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Content: Clause 61.3 states that...
</chunk_1>
<chunk_2>
...
</chunk_2>
</sub_q>
<sub_q idx="1" question="What notice is required?">
<chunk_1>
Filename: NEC4 Contract.pdf
Page: 12
Content: Notice must be given...
</chunk_1>
...
</sub_q>
Task 4.1.3: Format helpers
File: backend/app/routers/query.py
New functions:
def format_chunks_retrieved_per_subq(results: List[Tuple[str, List]]) -> str:
"""Format per-sub-question retrieved chunks as XML."""
def format_chunks_filtered_per_subq(results: List[Tuple[str, List]]) -> str:
"""Format per-sub-question filtered chunks as XML with relevance scores."""
Commit: "feat: Phase 4.1 per-sub-question retrieval with grouped chunk XML"
Sub-Phase 4.2: Backend — Per-Sub-Question Filtering (Single LLM Call)
Test files to write first:
test_phase4_relevance_filter_per_subq.py— TestsRelevanceFilter.filter_per_subquestion()with grouped chunkstest_phase4_query_router_filter.py— Tests filter stage with per-sub-q chunk groups
Task 4.2.1: Add filter_per_subquestion() to RelevanceFilter
File: backend/app/services/relevance_filter.py
New method signature:
async def filter_per_subquestion(
self,
sub_questions: List[str],
sub_chunks: List[List[Tuple[str, Dict]]],
threshold: float = 7.0,
) -> Tuple[List[Tuple[str, List[Tuple[str, Dict]]]], str]:
"""Filter chunks per sub-question in a single LLM call.
Args:
sub_questions: List of decomposed sub-questions.
sub_chunks: List of chunk lists (one per sub-question).
threshold: Minimum relevance score.
Returns:
Tuple of (filtered_results, prompt).
filtered_results: List of (sub_question, filtered_chunks_for_that_q).
"""
Prompt design (single LLM call):
Evaluate each chunk for relevance to its associated sub-question.
Sub-question 0: "{sub_q_0}"
Chunk 0: {chunk_0_text}
Chunk 1: {chunk_1_text}
...
Sub-question 1: "{sub_q_1}"
Chunk 0: {chunk_0_text}
Chunk 1: {chunk_1_text}
...
For each chunk, rate relevance 0-10 considering ONLY its associated sub-question.
Return a JSON object mapping sub-question indices to arrays of scores:
{"0": [8.5, 3.2, 9.0], "1": [7.0, 6.5, 9.1]}
Key rules:
- Each chunk is evaluated against its own sub-question (not the original user question)
- JSON keys are stringified sub-question indices (
"0","1", ...) - Score arrays MUST match chunk count for each sub-question
- Same JSON extraction/markdown stripping logic as existing
filter()
Existing filter() method is preserved — not modified, not deprecated. The new method is additive.
Task 4.2.2: Update _query_stream() filter stage
File: backend/app/routers/query.py
Changes:
- Call
relevance_filter.filter_per_subquestion(extracted_questions, chunks_for_filter, threshold)instead ofrelevance_filter.filter(question, chunks, threshold) - Build
chunks_for_filterfrom per-sub-question retrieval results - Track
filter_prompt(the redesigned prompt) - Format
chunks_filteredXML with sub-question wrappers andRelevance:scores
New chunks_filtered XML format:
<sub_q idx="0" question="What are time extensions?">
<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Relevance: 8.5
Content: Clause 61.3 states that...
</chunk_1>
</sub_q>
<sub_q idx="1" question="What notice is required?">
<chunk_1>
Filename: NEC4 Contract.pdf
Page: 12
Relevance: 9.0
Content: Notice must be given...
</chunk_1>
</sub_q>
Commit: "feat: Phase 4.2 per-sub-question filtering with single LLM call"
Sub-Phase 4.3: Backend — Sub-Question-Organized Response Generation
Test files to write first:
test_phase4_generate_per_subq.py— TestsRAGService.generate_response_per_subquestion()test_phase4_response_format.py— Tests the final answer matches expected format
Task 4.3.1: Redesign generate_response() → generate_response_per_subquestion()
File: backend/app/services/rag.py
New method signature:
async def generate_response_per_subquestion(
self,
sub_questions: List[str],
sub_chunks: List[List[str]],
sub_metadata: List[List[Dict[str, Any]]],
) -> Tuple[str, str, List[List[SourceMetadata]]]:
"""Generate sub-question-organized RAG response.
Args:
sub_questions: List of decomposed sub-questions.
sub_chunks: List of chunk text lists (one per sub-question).
sub_metadata: List of metadata dict lists (one per sub-question).
Returns:
Tuple of (answer, prompt, grouped_sources).
answer: Markdown string with sections per sub-question.
prompt: The rendered LLM prompt.
grouped_sources: List of SourceMetadata lists (one per sub-question).
"""
New prompt template (replaces generate):
You must answer each sub-question using ONLY the document chunks provided for it.
Do not use any external knowledge.
Format your answer as markdown sections — one section per sub-question.
Each section should start with "## Sub-question N: <the question>"
Each section should contain 1-5 bullet points.
Cite your sources inline using bracket labels, e.g. [filename, page N].
Place the citation at the end of each relevant bullet point.
{context_sections}
Answer:
Context format (replaces {context}):
### Context for Sub-question 0: "What are time extensions?"
[NEC4 ACC.pdf, page 3] Source: NEC4 ACC.pdf
Summary: Clause 61.3 discusses time extensions...
Content: Clause 61.3 states that the project manager...
[NEC4 Contract.pdf, page 12] Source: NEC4 Contract.pdf
Summary: Notice requirements for time extensions...
Content: Written notice must be given within...
### Context for Sub-question 1: "What notice is required?"
[NEC4 ACC.pdf, page 7] Source: NEC4 ACC.pdf
Summary: Notice requirements...
Content: The contractor shall notify the project manager in writing...
Expected answer format:
## Sub-question 1: What are time extensions?
- Time extensions must be notified to the project manager within 2 weeks [NEC4 ACC.pdf, page 3]
- The project manager must acknowledge the notice within 1 week [NEC4 Contract.pdf, page 12]
## Sub-question 2: What notice is required?
- Written notice must be given [NEC4 ACC.pdf, page 7]
Existing generate_response() is preserved — not modified, not deprecated.
Task 4.3.2: Update _query_stream() generate stage
File: backend/app/routers/query.py
Changes:
- Call
rag.generate_response_per_subquestion(extracted_questions, chunk_texts_by_subq, metadata_by_subq) - New SSE event:
generating_subquestion— emitted before each sub-question's section (lets frontend show progressive build) completedSSE event includes bothanswer(markdown string) andsub_question_sources(grouped sources)
New SSE event sequence:
{"phase": "decomposed", "extracted_questions": ["q1", "q2"]}
{"phase": "retrieving"}
{"phase": "filtering"}
{"phase": "generating"}
{"phase": "completed", "answer": "## Sub-question 1: ...\n\n...", "sub_question_sources": [[SourceMetadata, ...], [SourceMetadata, ...]]}
{"phase": "error", "message": "..."}
New QueryResponse model:
File: backend/app/models/query.py
class SubQuestionSources(BaseModel):
sub_question_index: int
sub_question_text: str
sources: List[SourceMetadata]
class QueryResponse(BaseModel):
extracted_questions: List[str]
answer: str # Markdown with ## sections
sub_question_sources: List[SubQuestionSources] # Grouped sources
# Backward compat:
sources: List[SourceMetadata] # Flattened version (all sources)
Commit: "feat: Phase 4.3 sub-question-organized response generation"
Sub-Phase 4.4: Backend — History & Prompt Template Updates
Test files to write first:
test_phase4_history_format.py— Tests new XML/JSON history formatstest_phase4_prompt_templates.py— Tests new generate template with{context_sections}
Task 4.4.1: Update history recording
File: backend/app/routers/query.py (the _schedule_history / _record_history helpers)
Changes:
chunks_retrieved: Store new grouped XML format (with<sub_q>wrappers)chunks_filtered: Store new grouped XML format (with<sub_q>wrappers andRelevance:scores)sources: Store grouped JSON:json.dumps([[SourceMetadata_dict, ...], [...]])(list of lists)final_answer: Store markdown string with##sections- Existing fields (
chunks_retrieved_count,chunks_filtered_count) keep total counts - New optional fields:
chunks_retrieved_per_subq_count,chunks_filtered_per_subq_count(JSON array of ints)
Task 4.4.2: Update history DB schema (minimal)
File: backend/app/core/sqlite_db.py
Add two new columns (optional, NULL-able):
ALTER TABLE query_history ADD COLUMN chunks_retrieved_per_subq_count TEXT DEFAULT NULL;
ALTER TABLE query_history ADD COLUMN chunks_filtered_per_subq_count TEXT DEFAULT NULL;
These store JSON arrays like [10, 8] — one count per sub-question. NULL for pre-Package-4 records.
Task 4.4.3: Update history Pydantic models
File: backend/app/models/history.py
Add optional fields to QueryHistoryRecord and QueryHistoryDetail:
chunks_retrieved_per_subq_count: Optional[str] = None # JSON array string
chunks_filtered_per_subq_count: Optional[str] = None # JSON array string
Task 4.4.4: Update prompt templates
File: backend/app/core/sqlite_db.py (seed data)
New generate template:
"generate": (
"You must answer each sub-question using ONLY the document chunks provided for it.\n"
"Do not use any external knowledge.\n"
"Format your answer as markdown sections — one section per sub-question.\n"
"Each section should start with \"## Sub-question N: <the question>\"\n"
"Each section should contain 1-5 bullet points.\n"
"Cite your sources inline using bracket labels, e.g. [filename, page N].\n"
"Place the citation at the end of each relevant bullet point.\n\n"
"{context_sections}\n\n"
"Answer:"
)
decompose and filter templates remain unchanged (they still use {question} placeholder — the orchestrator injects the right value at call time).
Task 4.4.5: Update PromptService to handle new template placeholder
File: backend/app/services/prompt_service.py
- Add
context_sectionsas a known placeholder for thegeneratestep (optional —str.replacealready safe with unknown keys) - The
reset_to_defaults()method must include the new generate template
Task 4.4.6: Update history detail API response
File: backend/app/routers/history.py
GET /api/v1/history/{id} response now includes chunks_retrieved_per_subq_count and chunks_filtered_per_subq_count when they are not NULL. Backward-compatible (older records return null for these fields).
Commit: "feat: Phase 4.4 history schema, prompt templates, and Pydantic model updates"
Sub-Phase 4.5: Frontend — Types & State Management
Test files to write first:
test_phase4_stream_state.test.tsx— TestsQueryStreamStatehandles new response shapetest_phase4_types.test.ts— Tests type compatibility
Task 4.5.1: Update TypeScript types
File: frontend/src/types/index.ts
New types:
interface SubQuestionSources {
sub_question_index: number;
sub_question_text: string;
sources: SourceMetadata[];
}
interface QueryStreamCompletedEvent {
phase: 'completed';
answer: string; // Markdown with ## sections
sub_question_sources: SubQuestionSources[]; // Grouped sources
}
interface QueryStreamDecomposedEvent {
phase: 'decomposed';
extracted_questions: string[];
}
type QueryStreamEvent =
| QueryStreamDecomposedEvent
| { phase: 'retrieving' | 'filtering' | 'generating' }
| QueryStreamCompletedEvent
| { phase: 'error'; message: string };
Task 4.5.2: Update QueryStreamState and mutation handler
File: frontend/src/lib/queries.tsx
Changes:
interface QueryStreamState {
extractedQuestions: string[] | null;
answer: string | null; // Full markdown
subQuestionSources: SubQuestionSources[] | null; // NEW — grouped sources
phase: 'idle' | 'decomposing' | 'retrieving' | 'filtering' | 'generating' | 'completed' | 'error';
error: Error | null;
}
In the completed case:
case 'completed':
setState(prev => ({
...prev,
answer: event.answer,
subQuestionSources: event.sub_question_sources,
phase: 'completed',
}));
break;
Commit: "feat: Phase 4.5 frontend types and state management for per-sub-q responses"
Sub-Phase 4.6: Frontend — ResponsePanel & ExtractedQuestionsDisplay
Test files to write first:
test_phase4_response_panel.test.tsx— Tests per-sub-question section renderingtest_phase4_citation_parser.test.ts— Tests per-sub-question citation lookup
Task 4.6.1: Redesign ResponsePanel for sub-question sections
File: frontend/src/components/ResponsePanel.tsx
Current: single ReactMarkdown block + flat sources grid.
New layout:
┌─────────────────────────────────────────────────────┐
│ 📋 Response [Copy All] │
├─────────────────────────────────────────────────────┤
│ │
│ ┌─ Sub-question 1: What are time extensions? ─────┐│
│ │ │
│ │ • Time extensions must be notified... │
│ │ [NEC4 ACC.pdf, page 3] │
│ │ • The project manager must acknowledge... │
│ │ [NEC4 Contract.pdf, page 12] │
│ │ │
│ │ Sources (2) [Expand ▼] │
│ │ ┌──────────────────────────────────────────────┐ │
│ │ │ NEC4 ACC.pdf, Page 3 │ NEC4 Contract, p12 │ │
│ │ │ "Clause 61.3 states.." │ "Notice must be..." │ │
│ │ └──────────────────────────────────────────────┘ │
│ └────────────────────────────────────────────────────┘│
│ │
│ ┌─ Sub-question 2: What notice is required? ───────┐│
│ │ │
│ │ • Written notice must be given... │
│ │ [NEC4 ACC.pdf, page 7] │
│ │ │
│ │ Sources (1) [Expand ▼] │
│ └────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────┘
Implementation approach:
- Parse the
answermarkdown into sections using## Sub-question N:headers - Map each section to its
SubQuestionSourcesby matching index - Render each section as an accordion/card with:
- Header: sub-question text (from
SubQuestionSources) - Body:
ReactMarkdownfor bullet points (with inline citation links) - Footer: collapsible sources grid (only sources belonging to this sub-question)
- Header: sub-question text (from
- Keep the existing citation link behavior (clickable
[filename, page N]→ PDF viewer)
Task 4.6.2: Update citationParser.ts for per-sub-question lookup
File: frontend/src/utils/citationParser.ts
Current: buildCitationLookup(sources: SourceMetadata[]) — returns a single global map.
New: buildCitationLookup(subQuestionSources: SubQuestionSources[]) — returns a map scoped to the correct sources for each section. The citation [filename, page N] match is looked up in the relevant sub-question's source list.
Task 4.6.3: Update ExtractedQuestionsDisplay for anchors
File: frontend/src/components/ExtractedQuestionsDisplay.tsx
Minor enhancement:
- Make each extracted question a clickable anchor that scrolls to its corresponding section in the answer
- Add
id="subq-{index}"to each section header inResponsePanel - Keep existing skeleton loading behavior
Commit: "feat: Phase 4.6 frontend per-sub-question response rendering"
Sub-Phase 4.7: Testing & Polish
Test files to write:
test_phase4_integration_query_pipeline.py— Full integration test simulating per-sub-q pipelinetest_phase4_acceptance_query.py— Acceptance test with real LLM (manual run)test_phase4_e2e_query_flow.test.tsx— Frontend e2e test with mocked SSE stream
Task 4.7.1: Backend unit tests
- Run
pytest backend/app/test/test_phase4_*.py -v— all must pass - Verify no regressions in existing Phase 1 and Phase 3 tests
- Update
test_phase1_rag_service.pyfor new method signatures - Update
test_phase1_relevance_filter.pyfor per-sub-q behavior - Rewrite
test_phase3_query_history_integration.pyfor new pipeline flow - Update
test_phase3_prompt_injection.pyfor new generate template
Task 4.7.2: Backend acceptance tests
test_phase4_acceptance_query.py— real LLM, real ChromaDB- Verify: answer contains
## Sub-questionheaders, sources grouped by sub-question index - Verify: each sub-question section has 1-5 bullet points
- Verify: inline citations match the correct sub-question's source list
Task 4.7.3: Frontend tests
test_phase4_response_panel.test.tsx— renders per-sub-question sections, expandable sourcestest_phase4_citation_parser.test.ts— per-sub-question lookup returns correct sourcetest_phase4_e2e_query_flow.test.tsx— mocks SSE with new event format, verifies section rendering- Update existing
ResponsePanel.test.tsxandcitationParser.test.tsfor new API
Task 4.7.4: Frontend build verification
npm run build— no TypeScript errorsnpm test— all 62 existing tests pass + new Phase 4 tests- Verify manual flow: ask question → see extracted questions → see per-sub-question answer sections → expand sources per section
Task 4.7.5: Error handling
- Empty decomposition: if
decompose()returns[], fall back to using original question as single sub-question - Empty retrieval for some sub-questions: that sub-question gets no chunks → section shows "No relevant information found"
- Filter failure (all chunks below threshold): that sub-question gets no answer → graceful empty section
- JSON parse failure in filter: fall back to including all chunks (no filtering) for that sub-question
Task 4.7.6: Documentation
- Update
AGENTS.mdwith new pipeline architecture section - Add docstrings to all new methods (
retrieve_per_subquestion,filter_per_subquestion,generate_response_per_subquestion) - Update prompt template documentation in system prompts page
Commit: "feat: Phase 4.7 testing, error handling, and polish for per-sub-q pipeline"
Phase 4a: Prompt Service Integration for Per-Sub-Q Filter (2026-04-27)
Root issue: filter_per_subquestion() in relevance_filter.py had a hardcoded prompt (_build_per_subq_prompt()) — completely bypassing PromptService. Users could not edit the per-sub-q filter prompt on the System Prompts page, unlike the flat filter step which was already prompt-service-driven.
Solution: Broke the per-sub-q filter prompt into 3 composable pieces, each a separately editable step on the System Prompts page:
| Step Name | Label | Placeholders | Default |
|---|---|---|---|
filter_intro |
Step 2.1: Filter Intro (Preamble) | (none) | "Evaluate each chunk for relevance to its associated sub-question only." |
filter_section |
Step 2.2: Filter Section (Per Sub-Q) | {subq_idx}, {subq_question}, {chunks} |
'Sub-question {subq_idx}: "{subq_question}"\n{chunks}' |
filter_outro |
Step 2.3: Filter Outro (Format) | (none) | JSON format instructions + example |
The RelevanceFilter._build_per_subq_prompt() now composes them at runtime:
filter_intro + [filter_section.replace(...) for each sub-q] + filter_outro
Falls back to built-in defaults when PromptService is unavailable.
Bugs Fixed
-
generate_per_subqnot seeded:rag.pycalledget_prompt_template("generate_per_subq")but this step name was never added to_VALID_STEPS,_SEED_STEPS, or_SEED_TEMPLATES— would crash at runtime withValueError. Now properly seeded with{context_sections}placeholder. -
_SEED_GENERATEplaceholder mismatch from Package 4: The flatgenerate_response()expects{question}/{context}placeholders, but Package 4 changed the seed template to use{context_sections}(intended for per-sub-q generate). Restored flat template;generate_per_subqnow holds{context_sections}.
Database Backfill Migration
The existing seed_default_profiles() only inserted steps for NEWLY created profiles. Added a backfill loop that iterates ALL existing profiles and INSERT OR IGNOREs any missing step names. This ensures existing A/B/C profiles pick up filter_intro, filter_section, filter_outro, and generate_per_subq on restart.
System Prompts UI Restructured
The flat filter and generate steps were removed from the UI (they're unused by the current pipeline). The page now shows 5 steps:
| UI Order | Label | Step Key |
|---|---|---|
| 1 | Step 1: Query Decomposition | decompose |
| 2 | Step 2.1: Filter Intro (Preamble) | filter_intro |
| 3 | Step 2.2: Filter Section (Per Sub-Q) | filter_section |
| 4 | Step 2.3: Filter Outro (Format) | filter_outro |
| 5 | Step 3: Generate (Per-Sub-Question) | generate_per_subq |
The old filter and generate templates remain in the DB (for API backward compatibility) but are hidden from the UI.
Files Changed
| File | Change |
|---|---|
backend/app/core/sqlite_db.py |
3 new seed templates + generate_per_subq seed; backfill migration; restored _SEED_GENERATE to {question}/{context} |
backend/app/services/prompt_service.py |
Added 4 step names to _VALID_STEPS |
backend/app/routers/prompts.py |
Added 4 step names to _VALID_STEPS |
backend/app/services/relevance_filter.py |
Refactored _build_per_subq_prompt() to use PromptService + built-in fallback constants |
frontend/src/components/PromptEditor.tsx |
Replaced unused flat steps with 5-step per-sub-q layout (Step 2.1-2.3 + Step 3) |
frontend/src/components/PlaceholderDocs.tsx |
Added {context_sections}, {subq_idx}, {subq_question} docs |
backend/app/test/conftest.py |
Added 4 new templates to mock |
backend/app/test/test_phase3_sqlite_db.py |
Updated counts (9→21 prompts) and placeholder assertions |
backend/app/test/test_phase3_prompt_service.py |
Updated step set + placeholder assertions |
backend/app/test/test_phase3_prompts_router.py |
Updated step set assertion |
backend/app/test/test_phase4_prompt_templates.py |
Updated for split generate/generate_per_subq |
frontend/src/test/components/PromptEditor.test.tsx |
Updated to 5 textareas, new labels, new placeholder layout |
frontend/src/test/components/PlaceholderDocs.test.tsx |
Updated to 6 placeholders |
Test Results (Post-Phase 4a)
- Backend: 295 passed, 5 skipped (pre-existing)
- Frontend: 182 passed, 1 pre-existing failure (unrelated
file-inpute2e)
Sub-Phase Summary
| Sub-Phase | Scope | Backend | Frontend | Tests | Status |
|---|---|---|---|---|---|
| 4.1 | Per-sub-q retrieval | rag.py, query.py, format helpers |
None | test_phase4_retrieve_per_subquestion.py, test_phase4_query_router_retrieval.py |
✅ Complete |
| 4.2 | Per-sub-q filtering (1 LLM call) | relevance_filter.py, query.py |
None | test_phase4_relevance_filter_per_subq.py, test_phase4_query_router_filter.py |
✅ Complete |
| 4.3 | Sub-q-organized response generation | rag.py, query.py, models/query.py |
None | test_phase4_generate_per_subq.py, test_phase4_response_format.py |
✅ Complete |
| 4.4 | History schema, prompts, models | sqlite_db.py, history.py (router + models), prompt_service.py |
None | test_phase4_history_format.py, test_phase4_prompt_templates.py |
✅ Complete |
| 4.5 | Frontend types + state | None | types/index.ts, lib/queries.tsx |
test_phase4_stream_state.test.tsx, test_phase4_types.test.ts |
✅ Complete |
| 4.6 | Frontend rendering | None | ResponsePanel.tsx, citationParser.ts, ExtractedQuestionsDisplay.tsx |
test_phase4_response_panel.test.tsx, test_phase4_citation_parser.test.ts |
✅ Complete |
| 4.7 | Testing & polish | All affected files | All affected files | Integration + acceptance + e2e tests | ✅ Complete |
| 4a | Prompt service integration for filter_per_subq | sqlite_db.py, prompt_service.py, prompts.py, relevance_filter.py |
PromptEditor.tsx, PlaceholderDocs.tsx |
Updated 7 test files, 13 total files changed | ✅ Complete |
Implementation Sequence & Dependencies
4.1 (Retrieval) ──┐
├──► 4.2 (Filtering) ──► 4.3 (Generate) ──► 4.4 (History/Prompts)
│ │
│ ▼
│ 4.5 (Frontend Types/State)
│ │
│ ▼
│ 4.6 (Frontend Rendering)
│ │
└─────────────────────────────────────────────────────▼
4.7 (Testing & Polish)
- 4.1 → 4.2 sequential: Filtering needs per-sub-q chunk structure from retrieval
- 4.2 → 4.3 sequential: Generation needs filtered chunks from filtering stage
- 4.3 → 4.4 sequential: History recording and prompt templates need final data shapes
- 4.4 → 4.5 parallel: Backend prompt/history changes don't block frontend type definitions
- 4.5 → 4.6 sequential: Rendering needs types and state management
- 4.7 blocked by all: Integration tests need everything wired together
Parallelization opportunity: 4.5 (frontend types) could start as soon as 4.3 defines the SSE contract, but it's safer to start after 4.4 confirms the final data shapes.
Affected Files — Complete Inventory
Backend — New Files
| File | Sub-Phase | Purpose |
|---|---|---|
backend/app/test/test_phase4_retrieve_per_subquestion.py |
4.1 | Unit test: retrieve_per_subquestion() |
backend/app/test/test_phase4_query_router_retrieval.py |
4.1 | Unit test: retrieval stage in _query_stream |
backend/app/test/test_phase4_relevance_filter_per_subq.py |
4.2 | Unit test: filter_per_subquestion() |
backend/app/test/test_phase4_query_router_filter.py |
4.2 | Unit test: filter stage in _query_stream |
backend/app/test/test_phase4_generate_per_subq.py |
4.3 | Unit test: generate_response_per_subquestion() |
backend/app/test/test_phase4_response_format.py |
4.3 | Unit test: answer format validation |
backend/app/test/test_phase4_history_format.py |
4.4 | Unit test: new XML/JSON history formats |
backend/app/test/test_phase4_prompt_templates.py |
4.4 | Unit test: new generate template |
backend/app/test/test_phase4_integration_query_pipeline.py |
4.7 | Integration test: full per-sub-q pipeline |
backend/app/test/acceptance/test_phase4_acceptance_query.py |
4.7 | Acceptance test: real LLM |
Backend — Modified Files
| File | Sub-Phase | Changes |
|---|---|---|
backend/app/services/rag.py |
4.1, 4.3 | Add retrieve_per_subquestion(), generate_response_per_subquestion() |
backend/app/services/relevance_filter.py |
4.2 | Add filter_per_subquestion() |
backend/app/routers/query.py |
4.1–4.4 | Refactor _query_stream(), add per-sub-q format helpers, update history recording |
backend/app/models/query.py |
4.3 | Add SubQuestionSources model, update QueryResponse |
backend/app/models/history.py |
4.4 | Add optional per-sub-q count fields |
backend/app/core/sqlite_db.py |
4.4 | Add new columns, update seed generate template |
backend/app/services/prompt_service.py |
4.4 | Update reset_to_defaults() generate template |
backend/app/routers/history.py |
4.4 | Include new fields in detail response |
backend/app/core/config.py |
4.1 | (Maybe) Add retrieval_n_results_per_subq setting |
Backend — Tests Needing Update
| File | Sub-Phase | Changes |
|---|---|---|
backend/app/test/test_phase1_rag_service.py |
4.7 | Add tests for new methods; existing tests unaffected |
backend/app/test/test_phase1_relevance_filter.py |
4.7 | Add tests for filter_per_subquestion() |
backend/app/test/test_phase3_query_history_integration.py |
4.7 | Rewrite pipeline simulation for per-sub-q flow |
backend/app/test/test_phase3_prompt_injection.py |
4.7 | Add tests for new generate template |
backend/app/test/acceptance/test_acceptance_phase1_rag_query.py |
4.7 | Rewrite — SSE parsing + new response shape |
backend/app/test/conftest.py |
4.7 | Add per-sub-q mock helpers |
Frontend — New Files
| File | Sub-Phase | Purpose |
|---|---|---|
frontend/src/test/components/test_phase4_response_panel.test.tsx |
4.7 | Component test: per-sub-q sections |
frontend/src/test/utils/test_phase4_citation_parser.test.ts |
4.7 | Unit test: per-sub-q citation lookup |
frontend/src/test/e2e/test_phase4_query_flow.test.tsx |
4.7 | E2E test: mocked SSE with new format |
frontend/src/test/lib/test_phase4_stream_state.test.tsx |
4.5 | State test: new event shapes |
frontend/src/test/lib/test_phase4_types.test.ts |
4.5 | Type test: type compatibility |
Frontend — Modified Files
| File | Sub-Phase | Changes |
|---|---|---|
frontend/src/types/index.ts |
4.5 | Add SubQuestionSources, update QueryStreamEvent |
frontend/src/lib/queries.tsx |
4.5 | Update QueryStreamState, completed event handler |
frontend/src/components/ResponsePanel.tsx |
4.6 | Redesign — per-sub-question sections with grouped sources |
frontend/src/utils/citationParser.ts |
4.6 | Update buildCitationLookup() for per-sub-q |
frontend/src/components/ExtractedQuestionsDisplay.tsx |
4.6 | Add anchor links to answer sections |
frontend/src/pages/LTTPage.tsx |
4.6 | Pass new props to children |
Risk Register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| LLM struggles with per-sub-q filtering prompt format | Medium | High — all chunks dropped | Use strong prompt constraints, validate JSON, fall back to including all chunks on parse failure |
LLM generates answer not matching ## Sub-question N: format |
Medium | Medium — frontend can't parse sections | Fall back to rendering as single block if parsing fails. Prompt engineering tuned for format compliance |
| Same chunk retrieved by multiple sub-questions → duplicated in context | High | Low — slightly larger prompt but acceptable | Accept duplicates. ChromaDB naturally returns same doc if relevant to multiple queries. Each sub-q's evaluation is independent |
| Per-sub-q retrieval = more ChromaDB queries = slower | Medium | Medium — N × retrieval latency | ChromaDB retrieval is fast (~10-50ms). 5 sub-questions × 10ms = 50ms overhead. Acceptable trade-off for better relevance. |
| History DB migration fails for existing records | Low | Low — new columns are NULL-able | ALTER TABLE ADD COLUMN ... DEFAULT NULL is safe. Existing records work as before — chunks_retrieved/chunks_filtered still have flat XML. |
| Frontend rendering breaks on older history records | Low | Low — answer format differs | ResponsePanel renders per-sub-q sections only when subQuestionSources is non-null. Older history records show flat answer as before. |
| Prompt template migration breaks user-customized prompts | Medium | Medium — users lose their generate template | Warn in docs. The generate template changes fundamentally (single {context} → {context_sections}). Users must re-customize. |
Acceptance Criteria
Backend
POST /api/v1/queryretrieves chunks per sub-question (verified by history XML showing<sub_q>wrappers)- Filtering uses single LLM call evaluating chunks against their originating sub-question (verified by filter prompt)
- Response answer is organized by sub-question with
## Sub-question N:headers sub_question_sourcesin SSEcompletedevent is grouped by sub-question index- History records include new grouped XML formats for
chunks_retrievedandchunks_filtered - History records include grouped
sourcesJSON (list of lists) - History records include per-sub-q chunk counts
- New
generateprompt template uses{context_sections}placeholder - Prompt service
reset_to_defaults()includes new generate template - Existing
decompose,filter(old),generate_response(old) methods are unchanged - All Phase 1, Phase 3, and new Phase 4 unit tests pass (312 passed, 4 skipped)
- All acceptance tests pass with real LLM (manual run)
Frontend
QueryStreamStateincludessubQuestionSourcesfieldResponsePanelrenders per-sub-question sections with expandable source grids- Each section's sources are scoped to that sub-question (no cross-contamination)
- Inline citations
[filename, page N]link to the correct PDF viewer page ExtractedQuestionsDisplayshows clickable anchors to answer sections- Copy button copies all answer text including section headers
- Loading states: skeleton per section during generation
- Empty state: "No relevant information found" per sub-question (not entire response)
- All 62+ existing frontend tests still pass (183 passed)
- All new Phase 4 frontend tests pass
npm run buildsucceeds with zero TypeScript errors- Manual verification: full query flow works end-to-end
New Dependencies
None. All changes use existing libraries (FastAPI, ChromaDB, OpenAI SDK, React, ReactMarkdown, TanStack Query).
Decisions (All Confirmed)
| # | Topic | Decision |
|---|---|---|
| 1 | Single vs multiple filter LLM calls | Single call — user explicitly requested this |
| 2 | Filter prompt design | Group chunks by sub-question in one prompt. JSON response maps sub-q indices to score arrays |
| 3 | Answer format | Markdown with ## Sub-question N: <question> headers |
| 4 | Sources grouping | sub_question_sources: [{index, text, sources}, ...] in SSE + frontend |
| 5 | History XML format | Add <sub_q idx="N" question="..."> wrappers around chunk groups |
| 6 | History DB migration | Add 2 new NULL-able columns. No data migration needed. |
| 7 | Backward compatibility | Preserve old retrieve(), filter(), generate_response() methods. New methods are additive. |
| 8 | Deduplication | None. Same chunk may appear in multiple sub-questions. Each sub-q evaluates independently. |
| 9 | Error handling | Per-sub-question graceful degradation. Filter failure → include all chunks for that sub-q. Generate failure → "Unable to generate answer for this sub-question." |
| 10 | Frontend rendering engine | Keep ReactMarkdown. Parse sections client-side by splitting on ## Sub-question N: headers. |
Open Questions
None — all resolved.
| # | Question | Resolution |
|---|---|---|
| 1 | Progressive SSE events? | Yes — emit generating_subquestion as each sub-question's answer is generated. Frontend renders sections progressively. |
| 2 | retrieval_n_results per sub-question or global? |
Global — same value for all sub-questions. Simpler config, one setting. |
| 3 | Fallback when decomposition returns 0 sub-questions? | Fall back to original question — treat as single sub-question. Pipeline runs as 1-sub-q case (retrieval via original question, no filtering needed for single sub-q, flat answer). |
Test Plan Summary
Backend (New Tests)
| File | Tests | Coverage |
|---|---|---|
test_phase4_retrieve_per_subquestion.py |
~6 | Per-sub-q retrieval, empty input, single sub-q, dedup behavior |
test_phase4_query_router_retrieval.py |
~4 | SSE events during retrieval, chunk XML format |
test_phase4_relevance_filter_per_subq.py |
~6 | Per-sub-q filtering, JSON response parsing, threshold behavior |
test_phase4_query_router_filter.py |
~4 | SSE events during filtering, filtered XML format |
test_phase4_generate_per_subq.py |
~5 | Per-sub-q generate, prompt construction, answer format |
test_phase4_response_format.py |
~4 | Answer has ## headers, citations in correct sections |
test_phase4_history_format.py |
~5 | New XML/JSON formats, per-sub-q counts |
test_phase4_prompt_templates.py |
~3 | New generate template, {context_sections} placeholder |
test_phase4_integration_query_pipeline.py |
~5 | Full pipeline simulation |
test_phase4_acceptance_query.py |
~3 | Real LLM end-to-end (manual) |
Frontend (New Tests)
| File | Tests | Coverage |
|---|---|---|
test_phase4_stream_state.test.tsx |
~4 | State updates for new event shapes |
test_phase4_types.test.ts |
~2 | Type compatibility checks |
test_phase4_response_panel.test.tsx |
~6 | Section rendering, source grouping, copy, loading |
test_phase4_citation_parser.test.ts |
~4 | Per-sub-q lookup, cross-section isolation |
test_phase4_e2e_query_flow.test.tsx |
~3 | Full SSE flow with mocked stream |
Phase PX: Profile Export/Import (2026-04-27)
Source: User request — "add an export and import function for setting a profile. The format is json."
Scope: Add JSON export/import capability to the System Prompts page. Users can download a profile's prompt configuration as a .json file and import it into another profile (or the same one) to transfer or back up their prompt settings.
Status: 🟡 Planned — not yet implemented.
Objective
Let users:
- Export a single profile's prompt templates as a downloadable JSON file
- Import a previously exported JSON file to overwrite a profile's prompt templates
- Optionally, export all profiles at once for full configuration backup
Decision Register
| # | Decision | Rationale |
|---|---|---|
| P1 | Export single profiles, not all-at-once by default | User asked "for setting a profile" — per-profile export/import is more practical for sharing individual configurations. Add "Export All" as secondary option. |
| P2 | Import overwrites ALL prompt steps for target profile | Simplest mental model. Import = full replace (not merge). User gets confirmation dialog before proceeding. |
| P3 | Export JSON includes all 7 steps (including legacy filter, generate) |
Even though UI hides these, the DB stores them. Export should be a complete snapshot — import restores all 7. |
| P4 | Do NOT export auto-increment IDs | id fields are not portable between databases. Import inserts new rows; joins on (name, step_name) uniqueness. |
| P5 | created_at/updated_at reset on import |
Imported profiles get fresh timestamps (datetime('now')). Original export timestamp preserved in file metadata only. |
| P6 | Active profile state NOT imported | is_active is deployment-specific. The user sets active profile separately via the existing dropdown. Import only touches prompt_template content. |
| P7 | Validate profile name on import | Only A, B, C allowed. Import into non-existent name = rejected. |
| P8 | JSON schema versioned | "format": "legco-reranker-profile/v1" for future-proofing. Reject unknown versions on import. |
JSON Format Specification
Single Profile Export
{
"format": "legco-reranker-profile/v1",
"profile_name": "A",
"exported_at": "2026-04-27T12:00:00Z",
"prompts": {
"decompose": "Given this question: '{question}'\n\nBreak it down into 2-5 simplified sub-questions...",
"filter": "Given question '{question}' and these document chunks:\n\n{chunks}\n\n...",
"generate": "Question: {question}\n\nContext:\n{context}\n\n...",
"generate_per_subq": "Answer each sub-question using ONLY its document chunks...",
"filter_intro": "Evaluate each chunk for relevance to its associated sub-question only.",
"filter_section": "\nSub-question {subq_idx}: \"{subq_question}\"\n{chunks}",
"filter_outro": "\nFor each chunk, rate its relevance 0-10..."
}
}
Full Backup Export (All Profiles)
{
"format": "legco-reranker-profile/v1",
"exported_at": "2026-04-27T12:00:00Z",
"active_profile": "A",
"profiles": {
"A": {
"prompts": { ... }
},
"B": {
"prompts": { ... }
},
"C": {
"prompts": { ... }
}
}
}
Import Request Format
POST /api/v1/prompts/profiles/{name}/import
Content-Type: application/json
{
"format": "legco-reranker-profile/v1",
"profile_name": "A",
"exported_at": "2026-04-27T12:00:00Z",
"prompts": {
"decompose": "...",
...
}
}
Response:
{
"status": "ok",
"profile": "B",
"imported_steps": 7,
"source_profile": "A"
}
Sub-Phase Structure
| Sub-Phase | Scope | Components | Test Files |
|---|---|---|---|
| PX.1 | Backend — Export endpoint | routers/prompts.py, models/prompts.py |
test_phaseX_export.py |
| PX.2 | Backend — Import endpoint | routers/prompts.py, models/prompts.py, prompt_service.py |
test_phaseX_import.py |
| PX.3 | Frontend — Export/Import UI | SystemPromptsPage.tsx, ProfileList.tsx, lib/api.ts, lib/queries.tsx, types/index.ts |
test_phaseX_export_import.test.tsx |
| PX.4 | Testing & Polish | All affected files | Integration + acceptance tests |
Sub-Phase PX.1: Backend — Single Profile Export Endpoint
Test files to write first:
backend/app/test/test_phaseX_export.py— Tests export endpoint, JSON schema validation, empty profile handling
Task PX.1.1: Add Pydantic models
File: backend/app/models/prompts.py
class ProfileExportResponse(BaseModel):
format: str = "legco-reranker-profile/v1"
profile_name: str
exported_at: str
prompts: dict[str, str]
class AllProfilesExportResponse(BaseModel):
format: str = "legco-reranker-profile/v1"
exported_at: str
active_profile: str
profiles: dict[str, dict[str, dict[str, str]]] # profile_name -> {"prompts": {step: text}}
Task PX.1.2: Add GET /api/v1/prompts/profiles/{name}/export endpoint
File: backend/app/routers/prompts.py
- Reads all 7
system_promptsrows for the given profile - Returns
ProfileExportResponsewithContent-Disposition: attachment; filename="legco-profile-{name}.json" - Uses
application/jsoncontent type
Task PX.1.3: Add GET /api/v1/prompts/export/all endpoint (optional)
- Reads all 3 profiles + all 21 prompt rows
- Returns
AllProfilesExportResponse - For full backup/restore scenarios
Commit: "feat(prompts): add single-profile and full JSON export endpoints"
Sub-Phase PX.2: Backend — Single Profile Import Endpoint
Test files to write first:
backend/app/test/test_phaseX_import.py— Tests import endpoint, validation, error cases
Task PX.2.1: Add request model
File: backend/app/models/prompts.py
class ProfileImportRequest(BaseModel):
format: str # must be "legco-reranker-profile/v1"
profile_name: str # source profile name (informational)
exported_at: str | None = None # informational timestamp
prompts: dict[str, str] # step_name -> template_text
Task PX.2.2: Add POST /api/v1/prompts/profiles/{name}/import endpoint
File: backend/app/routers/prompts.py
Validation steps:
- Check target
{name}is A, B, or C → 400 if not - Check
request.format == "legco-reranker-profile/v1"→ 400 if not - Validate that all 7 required step keys (
decompose,filter,generate,generate_per_subq,filter_intro,filter_section,filter_outro) are present inrequest.prompts→ 400 with list of missing keys if not - Validate no extra/unknown step keys → reject (or warn? → decision: reject with 400, listing unknown keys)
Implementation:
- Uses
PromptService._update_all_prompts()(existing batch-update internally) to overwrite all 7 steps - Each step gets fresh
created_at/updated_attimestamps (DB defaults) - Returns
{"status": "ok", "profile": name, "imported_steps": len(prompts), "source_profile": request.profile_name}
Task PX.2.3: Add POST /api/v1/prompts/import/all endpoint (optional)
- Accepts
AllProfilesExportResponseformat - Imports all 3 profiles at once
- Does NOT change active profile (only if explicitly included)
Commit: "feat(prompts): add single-profile JSON import endpoint with full validation"
Sub-Phase PX.3: Frontend — Export/Import UI
Test files to write first:
frontend/src/test/components/test_phaseX_export_import.test.tsx— Tests export/import buttons, file download, file upload
Task PX.3.1: Add TypeScript types
File: frontend/src/types/index.ts
interface ProfileExportData {
format: string
profile_name: string
exported_at: string
prompts: Record<string, string>
}
interface ProfileImportResponse {
status: string
profile: string
imported_steps: number
source_profile: string
}
Task PX.3.2: Add API client functions
File: frontend/src/lib/api.ts
// Download a profile as JSON blob for browser-side save
export const exportProfile = async (name: string): Promise<ProfileExportData> => {
const resp = await apiClient.get<ProfileExportData>(`/prompts/profiles/${name}/export`)
return resp.data
}
// Import a profile from JSON
export const importProfile = async (name: string, data: ProfileExportData): Promise<ProfileImportResponse> => {
const resp = await apiClient.post<ProfileImportResponse>(`/prompts/profiles/${name}/import`, data)
return resp.data
}
Task PX.3.3: Add TanStack Query mutation for import
File: frontend/src/lib/queries.tsx
export const useImportProfile = () => {
const queryClient = useQueryClient()
return useMutation({
mutationFn: ({ name, data }: { name: string; data: ProfileExportData }) =>
importProfile(name, data),
onSuccess: () => {
queryClient.invalidateQueries({ queryKey: ['prompts'] })
},
})
}
Task PX.3.4: Add Export button to ProfileList cards
File: frontend/src/components/ProfileList.tsx
- Add export icon button (e.g.,
Downloadfrom lucide-react) next to the "Edit" button on each card - On click: calls
exportProfile(name)viafetch→ creates blob → triggers browser download viaURL.createObjectURL+<a>click - Filename:
legco-profile-{name}-{date}.json
Task PX.3.5: Add Import button and dialog to SystemPromptsPage
File: frontend/src/pages/SystemPromptsPage.tsx
- Add "Import" button in the top bar (next to "Active Profile" dropdown)
- On click: opens a modal/dialog with:
- File input (accept
.json) — hidden<input type="file">triggered by styled button - After file selected: parse JSON client-side, show preview (source profile name, export date, step count)
- Target profile selector (dropdown: A, B, C) — defaults to source profile name if valid
- "Import" button → confirmation dialog ("This will overwrite all prompts for Profile {target}. Continue?")
- On confirm: calls
importProfileMutation.mutate() - Success: show toast "Profile {target} imported successfully ({n} steps from Profile {source})"
- Error: show inline error message with details
- File input (accept
Task PX.3.6: Add Export All button (optional)
File: frontend/src/pages/SystemPromptsPage.tsx
- "Export All" button in top bar
- Downloads all 3 profiles as
legco-profiles-{date}.json
Commit: "feat(prompts): add export/import UI with file download, upload dialog, and validation"
Sub-Phase PX.4: Testing & Polish
Test files:
backend/app/test/test_phaseX_export.py— Export endpoint: valid profile, invalid name, JSON schema validationbackend/app/test/test_phaseX_import.py— Import endpoint: valid import, missing steps, extra steps, invalid format version, invalid target namefrontend/src/test/components/test_phaseX_export_import.test.tsx— Export button click → download, Import dialog flow → file upload → preview → confirm → success/error
Task PX.4.1: Backend unit tests
test_export_profile_valid— GET export/A returns all 7 steps with correct format versiontest_export_profile_invalid_name— GET export/X returns 400test_export_all— GET export/all returns 3 profiles, 21 prompts totaltest_import_valid— POST import/B with valid JSON → 200, verify all 7 steps updatedtest_import_overwrites_existing— POST import/B → verify old content replacedtest_import_missing_required_step— POST import with only 6 steps → 400 with missing key listedtest_import_unknown_step_key— POST import with extra step → 400test_import_invalid_format_version— POST import with format: "v2" → 400test_import_invalid_target_name— POST import/X → 400test_import_does_not_change_active— import into inactive profile → active profile unchanged
Task PX.4.2: Frontend tests
- Export button visible on each profile card
- Click export → fetch called, download triggered
- Import dialog opens on button click
- File selection → JSON parsed, preview shown
- Invalid JSON file → error message shown
- Target profile selector defaults to source profile
- Confirm import → mutation called, success toast
- Import error → inline error message
- Export All downloads all profiles
Task PX.4.3: Integration verification
npm run build— no TypeScript errorsnpm test— all frontend tests passpytest backend/app/test/test_phaseX_*.py -v— all backend tests pass- Manual flow: export Profile A → edit Profile B → import exported file into B → verify B's prompts match A's original
Commit: "test(prompts): add unit, integration tests for export/import"
Files Affected — Complete Inventory
Backend — New Files
| File | Sub-Phase | Purpose |
|---|---|---|
backend/app/test/test_phaseX_export.py |
PX.4 | Unit tests for export endpoint |
backend/app/test/test_phaseX_import.py |
PX.4 | Unit tests for import endpoint |
Backend — Modified Files
| File | Sub-Phase | Changes |
|---|---|---|
backend/app/models/prompts.py |
PX.1, PX.2 | Add ProfileExportResponse, AllProfilesExportResponse, ProfileImportRequest, ProfileImportResponse |
backend/app/routers/prompts.py |
PX.1, PX.2 | Add GET /export, GET /export/all, POST /import endpoints |
Frontend — New Files
| File | Sub-Phase | Purpose |
|---|---|---|
frontend/src/test/components/test_phaseX_export_import.test.tsx |
PX.4 | Component tests for export/import UI |
Frontend — Modified Files
| File | Sub-Phase | Changes |
|---|---|---|
frontend/src/types/index.ts |
PX.3 | Add ProfileExportData, ProfileImportResponse types |
frontend/src/lib/api.ts |
PX.3 | Add exportProfile(), importProfile() API functions |
frontend/src/lib/queries.tsx |
PX.3 | Add useImportProfile() mutation hook |
frontend/src/components/ProfileList.tsx |
PX.3 | Add Export button per profile card |
frontend/src/pages/SystemPromptsPage.tsx |
PX.3 | Add Import/Export All buttons, import dialog/modal |
Acceptance Criteria
Backend
GET /api/v1/prompts/profiles/A/exportreturns JSON with all 7 steps, correct format versionGET /api/v1/prompts/profiles/X/exportreturns 400 (invalid profile name)GET /api/v1/prompts/export/allreturns all 3 profiles, active profile markerPOST /api/v1/prompts/profiles/B/importwith valid payload overwrites all 7 steps for Profile B- Import rejects payload with missing required step keys (400 + key names)
- Import rejects payload with unknown step keys (400 + key names)
- Import rejects payload with unknown format version (400)
- Import does NOT change
is_activeflag on target profile - Exported JSON does NOT contain internal DB IDs (
id/profile_id) - All existing prompt API endpoints still work unchanged
Frontend
- Export button visible on each profile card in ProfileList
- Clicking Export downloads a
.jsonfile with correct naming (legco-profile-A-2026-04-27.json) - Import button visible on SystemPromptsPage top bar
- Clicking Import opens a modal with: file input, JSON preview, target profile selector, confirm button
- Selecting invalid JSON file shows error message
- Importing into a valid profile shows success confirmation with step count
- Import error from backend shows inline error message
- After successful import, profile data refreshes (query invalidation)
- All existing System Prompts functionality still works unchanged
Risk Register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| JSON file too large to upload | Low | Low — 7 prompts × ~2KB = ~14KB | Add 1MB limit on import endpoint (FastAPI Body(max_length=...)) |
| User imports into wrong profile by mistake | Medium | Medium — overwrites their existing config | Confirmation dialog with source/target profile names clearly displayed before import |
Exported file missing legacy filter/generate steps |
Medium | Medium — import would fail validation | Always export all 7 steps (even hidden ones). Import validates all 7 are present. |
| Browser download API differences | Low | Low | Use standard Blob + URL.createObjectURL approach, tested across Chrome/Firefox |
| Import endpoint receives malformed JSON | Low | Low — Pydantic validation catches this | ProfileImportRequest model validates format string, dict keys, value types |
| User exports from one deployment and imports into another with different profile names | Low | Low — only 3 names (A/B/C) | Import only into A/B/C — if source was "D", user must choose target manually |
New Dependencies
None. All changes use existing libraries (FastAPI, Pydantic, React, TanStack Query, lucide-react icons).
Implementation Sequence
PX.1 (Backend Export) ──► PX.2 (Backend Import)
│
▼
PX.3 (Frontend UI)
│
▼
PX.4 (Testing)
PX.1 and PX.2 can be done together (both in routers/prompts.py). PX.3 depends on knowing the exact API contracts from PX.1/PX.2. PX.4 runs after everything is wired.