68 KiB

Raw Blame History

Package 4 Enhancement Plan — Per-Sub-Question RAG Pipeline

Source: User request (2026-04-26)
Scope: Refactor the 3-step RAG query pipeline so retrieval, filtering, and response generation are organized per sub-question instead of batch-flattened.
Status: ✅ Complete — All 7 sub-phases implemented (2026-04-26). Phase 4a Prompt Integration added (2026-04-27). Phase PX Profile Export/Import planned (2026-04-27) — see end of file.

Objective

Restructure the POST /api/v1/query pipeline so that:

Retrieval per sub-question: Each sub-question independently retrieves n_results chunks from ChromaDB (instead of joining all sub-questions into one query string).
Filtering per sub-question: Each chunk is evaluated for relevance against its own originating sub-question (not the original user question). One LLM call handles all filtering — the prompt is redesigned to group chunks by sub-question.
Final answer organized by sub-question: Each sub-question gets its own bullet-point answer with its own sources. The frontend renders answer sections per sub-question rather than one monolithic bullet list.

Decision Register

#	Decision	Rationale
1	Keep `QueryDecomposer` unchanged	Input/output contract is identical — decomposition still produces a flat list of sub-questions
2	Single LLM call for filtering	User explicitly requested one call. Prompt redesigned to carry sub-question context for each chunk group
3	Keep `RAGService.retrieve()` signature	Call it N times (once per sub-question) externally in the orchestrator rather than changing its internal contract
4	Add `retrieve_per_subquestion()` to `RAGService`	New method that iterates over sub-questions, calls `retrieve()` per question, returns grouped results
5	Redesign `generate_response()` signature	Accepts structured `sub_questions: List[SubQuestionContext]` instead of flat chunk lists
6	SSE events: add `generating_subquestion` phase	Progressive streaming — frontend sees which sub-question is being answered
7	History: change XML/JSON formats in-place	Add `<sub_q>` wrappers to `chunks_retrieved`/`chunks_filtered` XML. Add sub-question grouping to `sources` JSON. No new DB columns.
8	Final answer format: markdown sections	`## Sub-question 1` headers with inline citations. Backward-compatible with existing `ReactMarkdown` rendering
9	Deduplicate chunks within a sub-question only	Same chunk may be retrieved by multiple sub-questions. Keep duplicates (different sub-questions need independent evaluation). ChromaDB `query()` naturally may return the same doc for different queries — this is acceptable.
10	Prompt template: add `generate` placeholders	New placeholder `{context_sections}` replaces single `{context}`. Filter template unchanged (sub-question injected at call site). Decompose template unchanged.
11	Progressive SSE events	Emit `generating_subquestion` event as each sub-question's answer section is generated. Frontend renders sections one by one.
12	`retrieval_n_results`	Global — same value for all sub-questions. Use existing `settings.retrieval_n_results` config.
13	Empty decomposition fallback	Treat original user question as single sub-question. Pipeline runs as 1-sub-q case — single retrieval, no filtering needed (one sub-q = no ambiguity), flat answer with `##` header.

Pipeline: Before vs After

Before (Current — Flat Batch)

User Question: "What are NEC4 time extension clauses?"
         │
    ┌────▼─────┐
    │ Decompose│  LLM Call 1
    │ → ["What are time extensions?", 
    │    "What notice is required?"]
    └────┬─────┘
         │ joined: "What are time extensions? What notice is required?"
    ┌────▼─────┐
    │ Retrieve │  1 ChromaDB query → 10 chunks (flat, no sub-q association)
    └────┬─────┘
         │ 10 chunks
    ┌────▼─────┐
    │  Filter  │  LLM Call 2 — all chunks scored against ORIGINAL question
    │          │  Score > 7 → keep (flat, no sub-q association)
    └────┬─────┘
         │ N filtered chunks
    ┌────▼─────┐
    │ Generate │  LLM Call 3 — flat answer from ALL filtered chunks
    │          │  "• Time extensions require notice [NEC4 ACC.pdf, p3]
    │          │   • The project manager must acknowledge [NEC4, p7]
    │          │   • Notice is defined as..."  (sources from all sub-qs mixed)
    └────┬─────┘
         │ single SSE completed event
    ┌────▼─────┐
    │ Frontend │  1 ReactMarkdown block, 1 flat sources list
    └──────────┘

After (Per-Sub-Question)

User Question: "What are NEC4 time extension clauses?"
         │
    ┌────▼─────┐
    │ Decompose│  LLM Call 1 (UNCHANGED)
    │ → ["What are time extensions?",
    │    "What notice is required?"]
    └────┬─────┘
         │ sub_q1                    sub_q2
    ┌────▼─────┐              ┌────▼─────┐
    │ Retrieve │              │ Retrieve │   2 ChromaDB queries → 10 chunks each
    │ q1 → 10  │              │ q2 → 10  │   chunks tagged with sub-q index
    └────┬─────┘              └────┬─────┘
         │                         │
         └─────────┬───────────────┘
                   │ grouped: {sub_q0: [chunks 0-9], sub_q1: [chunks 10-19]}
              ┌────▼─────┐
              │  Filter  │  LLM Call 2 (SINGLE CALL — redesigned prompt)
              │          │  Each chunk scored against its OWN sub-question
              │          │  Returns grouped scores → filtered per sub-q
              └────┬─────┘
                   │ filtered_by_subq: {0: [chunk_a, chunk_b], 1: [chunk_c]}
              ┌────▼─────┐
              │ Generate │  LLM Call 3 (redesigned prompt with per-sub-q context)
              │          │  ┌─────────────────────────────────────┐
              │          │  │ ## What are time extensions?         │
              │          │  │ - Time extensions must be notified   │
              │          │  │   [NEC4 ACC.pdf, page 3]             │
              │          │  │ - The project manager has 2 weeks    │
              │          │  │   [NEC4 Contract.pdf, page 12]       │
              │          │  │                                      │
              │          │  │ ## What notice is required?          │
              │          │  │ - Written notice must be given       │
              │          │  │   [NEC4 ACC.pdf, page 7]             │
              │          │  └─────────────────────────────────────┘
              └────┬─────┘
                   │ SSE events: generating_subquestion (per sub-q) → completed
              ┌────▼─────┐
              │ Frontend │  Sections per sub-question, sources grouped per section
              └──────────┘

Current State (Pre-Enhancement)

Backend

Component	File	Current Behavior
Decomposer	`services/query_decomposer.py`	`decompose(question) -> (List[str], prompt)` — returns 2-5 sub-questions
Retrieval	`services/rag.py:retrieve()`	`query_text = " ".join(query_keywords)` — joins all sub-qs into ONE string, single ChromaDB query → flat chunk list
Filter	`services/relevance_filter.py`	`filter(question, chunks)` — ALL chunks scored against ORIGINAL question, single LLM call, flat output
Generate	`services/rag.py:generate_response()`	`generate_response(question, chunks, metadata)` — flat chunks → flat bullet answer
Orchestrator	`routers/query.py:_query_stream()`	Linear 4-stage pipeline: decompose → retrieve → filter → generate
SSE Events	`routers/query.py`	`decomposed → retrieving → filtering → generating → completed` — flat answer + sources in `completed`
History	`services/history_service.py`	Flat XML for `chunks_retrieved`/`chunks_filtered`. Flat JSON for `sources`. Single timing per stage.
Prompt templates	`prompt_service.py` + `sqlite_db.py`	3 steps (`decompose`, `filter`, `generate`). Placeholders: `{question}`, `{chunks}`, `{context}`
Config	`core/config.py`	`retrieval_n_results=10`, `relevance_threshold=7.0`

Frontend

Component	File	Current Behavior
Types	`types/index.ts`	`QueryStreamEvent.phase`, flat `extracted_questions: string[]`, flat `answer: string`, flat `sources: SourceMetadata[]`
SSE Client	`lib/api.ts`	`queryDocumentStream()` — generic `JSON.parse` per `data:` line, no sub-question awareness
State	`lib/queries.tsx`	`QueryStreamState` with flat `answer`/`sources`/`extractedQuestions`
Response	`components/ResponsePanel.tsx`	Single `ReactMarkdown` block for answer. Flat 2-column grid for sources. No sub-question grouping.
Questions	`components/ExtractedQuestionsDisplay.tsx`	`<ol>` list of question strings. No sources attached.
Citations	`utils/citationParser.ts`	Flat `sources` lookup — `buildCitationLookup(sources)` returns global map
Progress	`components/PipelineProgress.tsx`	4-step stepper (NOT currently wired in LTTPage)

Key Test Files

File	Lines	Status
`test_phase1_query_decomposer.py`	76	✅ Unchanged — decomposer contract stays
`test_phase1_rag_service.py`	139	🔴 Needs update — `retrieve()`, `generate_response()` signatures change
`test_phase1_relevance_filter.py`	93	🟡 Needs update — one-call pattern changes to per-sub-q grouping
`test_phase1_query.py`	97	🟢 Already skipped (SSE migration) — may un-skip later
`test_phase3_query_history_integration.py`	608	🔴 Major rewrite — pipeline simulation mirrors `_query_stream` 1:1
`test_phase3_prompt_injection.py`	238	🟡 Moderate — new generate template placeholder
`test_acceptance_phase1_rag_query.py`	101	🔴 Full rewrite — already broken (SSE vs JSON), new response shape
`conftest.py`	94	🟡 Low — may add per-sub-q mock helpers

Implementation Tasks

Sub-Phase 4.1: Backend — Per-Sub-Question Retrieval

Test files to write first:

test_phase4_retrieve_per_subquestion.py — Tests RAGService.retrieve_per_subquestion()
test_phase4_query_router_retrieval.py — Tests _query_stream retrieval stage produces per-sub-q chunks

Task 4.1.1: Add retrieve_per_subquestion() to RAGService

File: backend/app/services/rag.py

New method signature:

def retrieve_per_subquestion(
    self,
    sub_questions: List[str],
    n_results: int = 10,
) -> List[Tuple[str, List[Tuple[str, Dict[str, Any], float]]]]:
    """Retrieve chunks for each sub-question independently.

    Args:
        sub_questions: List of decomposed sub-questions.
        n_results: Number of chunks per sub-question.

    Returns:
        List of (sub_question, chunks) tuples.
        chunks is the standard retrieve() output: [(text, metadata, distance), ...].
    """

Implementation:

Call self.retrieve([sub_q], n_results) for each sub-question
Return list of (sub_question, chunks) — chunks remain deduplicated at ChromaDB level (ChromaDB automatically deduplicates by ID)
Existing retrieve() method is NOT modified — it continues to work as before

Task 4.1.2: Update _query_stream() retrieval stage

File: backend/app/routers/query.py

Changes:

Replace rag.retrieve(extracted_questions, n_results) with rag.retrieve_per_subquestion(extracted_questions, n_results)
Track per-sub-question retrieval timing (new field or combined timing)
Format chunks_retrieved XML with sub-question wrappers

New chunks_retrieved XML format:

<sub_q idx="0" question="What are time extensions?">
<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Content: Clause 61.3 states that...
</chunk_1>
<chunk_2>
...
</chunk_2>
</sub_q>
<sub_q idx="1" question="What notice is required?">
<chunk_1>
Filename: NEC4 Contract.pdf
Page: 12
Content: Notice must be given...
</chunk_1>
...
</sub_q>

Task 4.1.3: Format helpers

File: backend/app/routers/query.py

New functions:

def format_chunks_retrieved_per_subq(results: List[Tuple[str, List]]) -> str:
    """Format per-sub-question retrieved chunks as XML."""
    
def format_chunks_filtered_per_subq(results: List[Tuple[str, List]]) -> str:
    """Format per-sub-question filtered chunks as XML with relevance scores."""

Commit: "feat: Phase 4.1 per-sub-question retrieval with grouped chunk XML"

Sub-Phase 4.2: Backend — Per-Sub-Question Filtering (Single LLM Call)

Test files to write first:

test_phase4_relevance_filter_per_subq.py — Tests RelevanceFilter.filter_per_subquestion() with grouped chunks
test_phase4_query_router_filter.py — Tests filter stage with per-sub-q chunk groups

Task 4.2.1: Add filter_per_subquestion() to RelevanceFilter

File: backend/app/services/relevance_filter.py

New method signature:

async def filter_per_subquestion(
    self,
    sub_questions: List[str],
    sub_chunks: List[List[Tuple[str, Dict]]],
    threshold: float = 7.0,
) -> Tuple[List[Tuple[str, List[Tuple[str, Dict]]]], str]:
    """Filter chunks per sub-question in a single LLM call.

    Args:
        sub_questions: List of decomposed sub-questions.
        sub_chunks: List of chunk lists (one per sub-question).
        threshold: Minimum relevance score.

    Returns:
        Tuple of (filtered_results, prompt).
        filtered_results: List of (sub_question, filtered_chunks_for_that_q).
    """

Prompt design (single LLM call):

Evaluate each chunk for relevance to its associated sub-question.

Sub-question 0: "{sub_q_0}"
Chunk 0: {chunk_0_text}
Chunk 1: {chunk_1_text}
...

Sub-question 1: "{sub_q_1}"
Chunk 0: {chunk_0_text}
Chunk 1: {chunk_1_text}
...

For each chunk, rate relevance 0-10 considering ONLY its associated sub-question.
Return a JSON object mapping sub-question indices to arrays of scores:
{"0": [8.5, 3.2, 9.0], "1": [7.0, 6.5, 9.1]}

Key rules:

Each chunk is evaluated against its own sub-question (not the original user question)
JSON keys are stringified sub-question indices ("0", "1", ...)
Score arrays MUST match chunk count for each sub-question
Same JSON extraction/markdown stripping logic as existing filter()

Existing filter() method is preserved — not modified, not deprecated. The new method is additive.

Task 4.2.2: Update _query_stream() filter stage

File: backend/app/routers/query.py

Changes:

Call relevance_filter.filter_per_subquestion(extracted_questions, chunks_for_filter, threshold) instead of relevance_filter.filter(question, chunks, threshold)
Build chunks_for_filter from per-sub-question retrieval results
Track filter_prompt (the redesigned prompt)
Format chunks_filtered XML with sub-question wrappers and Relevance: scores

New chunks_filtered XML format:

<sub_q idx="0" question="What are time extensions?">
<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Relevance: 8.5
Content: Clause 61.3 states that...
</chunk_1>
</sub_q>
<sub_q idx="1" question="What notice is required?">
<chunk_1>
Filename: NEC4 Contract.pdf
Page: 12
Relevance: 9.0
Content: Notice must be given...
</chunk_1>
</sub_q>

Commit: "feat: Phase 4.2 per-sub-question filtering with single LLM call"

Sub-Phase 4.3: Backend — Sub-Question-Organized Response Generation

Test files to write first:

test_phase4_generate_per_subq.py — Tests RAGService.generate_response_per_subquestion()
test_phase4_response_format.py — Tests the final answer matches expected format

Task 4.3.1: Redesign generate_response() → generate_response_per_subquestion()

File: backend/app/services/rag.py

New method signature:

async def generate_response_per_subquestion(
    self,
    sub_questions: List[str],
    sub_chunks: List[List[str]],
    sub_metadata: List[List[Dict[str, Any]]],
) -> Tuple[str, str, List[List[SourceMetadata]]]:
    """Generate sub-question-organized RAG response.

    Args:
        sub_questions: List of decomposed sub-questions.
        sub_chunks: List of chunk text lists (one per sub-question).
        sub_metadata: List of metadata dict lists (one per sub-question).

    Returns:
        Tuple of (answer, prompt, grouped_sources).
        answer: Markdown string with sections per sub-question.
        prompt: The rendered LLM prompt.
        grouped_sources: List of SourceMetadata lists (one per sub-question).
    """

New prompt template (replaces generate):

You must answer each sub-question using ONLY the document chunks provided for it.
Do not use any external knowledge.
Format your answer as markdown sections — one section per sub-question.
Each section should start with "## Sub-question N: <the question>"
Each section should contain 1-5 bullet points.
Cite your sources inline using bracket labels, e.g. [filename, page N].
Place the citation at the end of each relevant bullet point.

{context_sections}

Answer:

Context format (replaces {context}):

### Context for Sub-question 0: "What are time extensions?"
[NEC4 ACC.pdf, page 3] Source: NEC4 ACC.pdf
Summary: Clause 61.3 discusses time extensions...
Content: Clause 61.3 states that the project manager...

[NEC4 Contract.pdf, page 12] Source: NEC4 Contract.pdf
Summary: Notice requirements for time extensions...
Content: Written notice must be given within...

### Context for Sub-question 1: "What notice is required?"
[NEC4 ACC.pdf, page 7] Source: NEC4 ACC.pdf
Summary: Notice requirements...
Content: The contractor shall notify the project manager in writing...

Expected answer format:

## Sub-question 1: What are time extensions?
- Time extensions must be notified to the project manager within 2 weeks [NEC4 ACC.pdf, page 3]
- The project manager must acknowledge the notice within 1 week [NEC4 Contract.pdf, page 12]

## Sub-question 2: What notice is required?
- Written notice must be given [NEC4 ACC.pdf, page 7]

Existing generate_response() is preserved — not modified, not deprecated.

Task 4.3.2: Update _query_stream() generate stage

File: backend/app/routers/query.py

Changes:

Call rag.generate_response_per_subquestion(extracted_questions, chunk_texts_by_subq, metadata_by_subq)
New SSE event: generating_subquestion — emitted before each sub-question's section (lets frontend show progressive build)
completed SSE event includes both answer (markdown string) and sub_question_sources (grouped sources)

New SSE event sequence:

{"phase": "decomposed", "extracted_questions": ["q1", "q2"]}
{"phase": "retrieving"}
{"phase": "filtering"}
{"phase": "generating"}
{"phase": "completed", "answer": "## Sub-question 1: ...\n\n...", "sub_question_sources": [[SourceMetadata, ...], [SourceMetadata, ...]]}
{"phase": "error", "message": "..."}

New QueryResponse model:

File: backend/app/models/query.py

class SubQuestionSources(BaseModel):
    sub_question_index: int
    sub_question_text: str
    sources: List[SourceMetadata]

class QueryResponse(BaseModel):
    extracted_questions: List[str]
    answer: str                          # Markdown with ## sections
    sub_question_sources: List[SubQuestionSources]  # Grouped sources
    # Backward compat:
    sources: List[SourceMetadata]        # Flattened version (all sources)

Commit: "feat: Phase 4.3 sub-question-organized response generation"

Sub-Phase 4.4: Backend — History & Prompt Template Updates

Test files to write first:

test_phase4_history_format.py — Tests new XML/JSON history formats
test_phase4_prompt_templates.py — Tests new generate template with {context_sections}

Task 4.4.1: Update history recording

File: backend/app/routers/query.py (the _schedule_history / _record_history helpers)

Changes:

chunks_retrieved: Store new grouped XML format (with <sub_q> wrappers)
chunks_filtered: Store new grouped XML format (with <sub_q> wrappers and Relevance: scores)
sources: Store grouped JSON: json.dumps([[SourceMetadata_dict, ...], [...]]) (list of lists)
final_answer: Store markdown string with ## sections
Existing fields (chunks_retrieved_count, chunks_filtered_count) keep total counts
New optional fields: chunks_retrieved_per_subq_count, chunks_filtered_per_subq_count (JSON array of ints)

Task 4.4.2: Update history DB schema (minimal)

File: backend/app/core/sqlite_db.py

Add two new columns (optional, NULL-able):

ALTER TABLE query_history ADD COLUMN chunks_retrieved_per_subq_count TEXT DEFAULT NULL;
ALTER TABLE query_history ADD COLUMN chunks_filtered_per_subq_count TEXT DEFAULT NULL;

These store JSON arrays like [10, 8] — one count per sub-question. NULL for pre-Package-4 records.

Task 4.4.3: Update history Pydantic models

File: backend/app/models/history.py

Add optional fields to QueryHistoryRecord and QueryHistoryDetail:

chunks_retrieved_per_subq_count: Optional[str] = None  # JSON array string
chunks_filtered_per_subq_count: Optional[str] = None    # JSON array string

Task 4.4.4: Update prompt templates

File: backend/app/core/sqlite_db.py (seed data)

New generate template:

"generate": (
    "You must answer each sub-question using ONLY the document chunks provided for it.\n"
    "Do not use any external knowledge.\n"
    "Format your answer as markdown sections — one section per sub-question.\n"
    "Each section should start with \"## Sub-question N: <the question>\"\n"
    "Each section should contain 1-5 bullet points.\n"
    "Cite your sources inline using bracket labels, e.g. [filename, page N].\n"
    "Place the citation at the end of each relevant bullet point.\n\n"
    "{context_sections}\n\n"
    "Answer:"
)

decompose and filter templates remain unchanged (they still use {question} placeholder — the orchestrator injects the right value at call time).

Task 4.4.5: Update PromptService to handle new template placeholder

File: backend/app/services/prompt_service.py

Add context_sections as a known placeholder for the generate step (optional — str.replace already safe with unknown keys)
The reset_to_defaults() method must include the new generate template

Task 4.4.6: Update history detail API response

File: backend/app/routers/history.py

GET /api/v1/history/{id} response now includes chunks_retrieved_per_subq_count and chunks_filtered_per_subq_count when they are not NULL. Backward-compatible (older records return null for these fields).

Commit: "feat: Phase 4.4 history schema, prompt templates, and Pydantic model updates"

Sub-Phase 4.5: Frontend — Types & State Management

Test files to write first:

test_phase4_stream_state.test.tsx — Tests QueryStreamState handles new response shape
test_phase4_types.test.ts — Tests type compatibility

Task 4.5.1: Update TypeScript types

File: frontend/src/types/index.ts

New types:

interface SubQuestionSources {
  sub_question_index: number;
  sub_question_text: string;
  sources: SourceMetadata[];
}

interface QueryStreamCompletedEvent {
  phase: 'completed';
  answer: string;                              // Markdown with ## sections
  sub_question_sources: SubQuestionSources[];  // Grouped sources
}

interface QueryStreamDecomposedEvent {
  phase: 'decomposed';
  extracted_questions: string[];
}

type QueryStreamEvent = 
  | QueryStreamDecomposedEvent
  | { phase: 'retrieving' | 'filtering' | 'generating' }
  | QueryStreamCompletedEvent
  | { phase: 'error'; message: string };

Task 4.5.2: Update QueryStreamState and mutation handler

File: frontend/src/lib/queries.tsx

Changes:

interface QueryStreamState {
  extractedQuestions: string[] | null;
  answer: string | null;                        // Full markdown
  subQuestionSources: SubQuestionSources[] | null;  // NEW — grouped sources
  phase: 'idle' | 'decomposing' | 'retrieving' | 'filtering' | 'generating' | 'completed' | 'error';
  error: Error | null;
}

In the completed case:

case 'completed':
  setState(prev => ({
    ...prev,
    answer: event.answer,
    subQuestionSources: event.sub_question_sources,
    phase: 'completed',
  }));
  break;

Commit: "feat: Phase 4.5 frontend types and state management for per-sub-q responses"

Sub-Phase 4.6: Frontend — ResponsePanel & ExtractedQuestionsDisplay

Test files to write first:

test_phase4_response_panel.test.tsx — Tests per-sub-question section rendering
test_phase4_citation_parser.test.ts — Tests per-sub-question citation lookup

Task 4.6.1: Redesign ResponsePanel for sub-question sections

File: frontend/src/components/ResponsePanel.tsx

Current: single ReactMarkdown block + flat sources grid.

New layout:

┌─────────────────────────────────────────────────────┐
│  📋 Response                           [Copy All]   │
├─────────────────────────────────────────────────────┤
│                                                      │
│  ┌─ Sub-question 1: What are time extensions? ─────┐│
│  │                                                    │
│  │  • Time extensions must be notified...             │
│  │    [NEC4 ACC.pdf, page 3]                          │
│  │  • The project manager must acknowledge...         │
│  │    [NEC4 Contract.pdf, page 12]                    │
│  │                                                    │
│  │  Sources (2)                          [Expand ▼]  │
│  │  ┌──────────────────────────────────────────────┐ │
│  │  │ NEC4 ACC.pdf, Page 3  │ NEC4 Contract, p12 │ │
│  │  │ "Clause 61.3 states.." │ "Notice must be..." │ │
│  │  └──────────────────────────────────────────────┘ │
│  └────────────────────────────────────────────────────┘│
│                                                      │
│  ┌─ Sub-question 2: What notice is required? ───────┐│
│  │                                                    │
│  │  • Written notice must be given...                  │
│  │    [NEC4 ACC.pdf, page 7]                           │
│  │                                                    │
│  │  Sources (1)                          [Expand ▼]  │
│  └────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────┘

Implementation approach:

Parse the answer markdown into sections using ## Sub-question N: headers
Map each section to its SubQuestionSources by matching index
Render each section as an accordion/card with:
- Header: sub-question text (from SubQuestionSources)
- Body: ReactMarkdown for bullet points (with inline citation links)
- Footer: collapsible sources grid (only sources belonging to this sub-question)
Keep the existing citation link behavior (clickable [filename, page N] → PDF viewer)

Task 4.6.2: Update citationParser.ts for per-sub-question lookup

File: frontend/src/utils/citationParser.ts

Current: buildCitationLookup(sources: SourceMetadata[]) — returns a single global map.

New: buildCitationLookup(subQuestionSources: SubQuestionSources[]) — returns a map scoped to the correct sources for each section. The citation [filename, page N] match is looked up in the relevant sub-question's source list.

Task 4.6.3: Update ExtractedQuestionsDisplay for anchors

File: frontend/src/components/ExtractedQuestionsDisplay.tsx

Minor enhancement:

Make each extracted question a clickable anchor that scrolls to its corresponding section in the answer
Add id="subq-{index}" to each section header in ResponsePanel
Keep existing skeleton loading behavior

Commit: "feat: Phase 4.6 frontend per-sub-question response rendering"

Sub-Phase 4.7: Testing & Polish

Test files to write:

test_phase4_integration_query_pipeline.py — Full integration test simulating per-sub-q pipeline
test_phase4_acceptance_query.py — Acceptance test with real LLM (manual run)
test_phase4_e2e_query_flow.test.tsx — Frontend e2e test with mocked SSE stream

Task 4.7.1: Backend unit tests

Run pytest backend/app/test/test_phase4_*.py -v — all must pass
Verify no regressions in existing Phase 1 and Phase 3 tests
Update test_phase1_rag_service.py for new method signatures
Update test_phase1_relevance_filter.py for per-sub-q behavior
Rewrite test_phase3_query_history_integration.py for new pipeline flow
Update test_phase3_prompt_injection.py for new generate template

Task 4.7.2: Backend acceptance tests

test_phase4_acceptance_query.py — real LLM, real ChromaDB
Verify: answer contains ## Sub-question headers, sources grouped by sub-question index
Verify: each sub-question section has 1-5 bullet points
Verify: inline citations match the correct sub-question's source list

Task 4.7.3: Frontend tests

test_phase4_response_panel.test.tsx — renders per-sub-question sections, expandable sources
test_phase4_citation_parser.test.ts — per-sub-question lookup returns correct source
test_phase4_e2e_query_flow.test.tsx — mocks SSE with new event format, verifies section rendering
Update existing ResponsePanel.test.tsx and citationParser.test.ts for new API

Task 4.7.4: Frontend build verification

npm run build — no TypeScript errors
npm test — all 62 existing tests pass + new Phase 4 tests
Verify manual flow: ask question → see extracted questions → see per-sub-question answer sections → expand sources per section

Task 4.7.5: Error handling

Empty decomposition: if decompose() returns [], fall back to using original question as single sub-question
Empty retrieval for some sub-questions: that sub-question gets no chunks → section shows "No relevant information found"
Filter failure (all chunks below threshold): that sub-question gets no answer → graceful empty section
JSON parse failure in filter: fall back to including all chunks (no filtering) for that sub-question

Task 4.7.6: Documentation

Update AGENTS.md with new pipeline architecture section
Add docstrings to all new methods (retrieve_per_subquestion, filter_per_subquestion, generate_response_per_subquestion)
Update prompt template documentation in system prompts page

Commit: "feat: Phase 4.7 testing, error handling, and polish for per-sub-q pipeline"

Phase 4a: Prompt Service Integration for Per-Sub-Q Filter (2026-04-27)

Root issue: filter_per_subquestion() in relevance_filter.py had a hardcoded prompt (_build_per_subq_prompt()) — completely bypassing PromptService. Users could not edit the per-sub-q filter prompt on the System Prompts page, unlike the flat filter step which was already prompt-service-driven.

Solution: Broke the per-sub-q filter prompt into 3 composable pieces, each a separately editable step on the System Prompts page:

Step Name	Label	Placeholders	Default
`filter_intro`	Step 2.1: Filter Intro (Preamble)	(none)	`"Evaluate each chunk for relevance to its associated sub-question only."`
`filter_section`	Step 2.2: Filter Section (Per Sub-Q)	`{subq_idx}`, `{subq_question}`, `{chunks}`	`'Sub-question {subq_idx}: "{subq_question}"\n{chunks}'`
`filter_outro`	Step 2.3: Filter Outro (Format)	(none)	JSON format instructions + example

The RelevanceFilter._build_per_subq_prompt() now composes them at runtime:

filter_intro + [filter_section.replace(...) for each sub-q] + filter_outro

Falls back to built-in defaults when PromptService is unavailable.

Bugs Fixed

generate_per_subq not seeded: rag.py called get_prompt_template("generate_per_subq") but this step name was never added to _VALID_STEPS, _SEED_STEPS, or _SEED_TEMPLATES — would crash at runtime with ValueError. Now properly seeded with {context_sections} placeholder.
_SEED_GENERATE placeholder mismatch from Package 4: The flat generate_response() expects {question}/{context} placeholders, but Package 4 changed the seed template to use {context_sections} (intended for per-sub-q generate). Restored flat template; generate_per_subq now holds {context_sections}.

Database Backfill Migration

The existing seed_default_profiles() only inserted steps for NEWLY created profiles. Added a backfill loop that iterates ALL existing profiles and INSERT OR IGNOREs any missing step names. This ensures existing A/B/C profiles pick up filter_intro, filter_section, filter_outro, and generate_per_subq on restart.

System Prompts UI Restructured

The flat filter and generate steps were removed from the UI (they're unused by the current pipeline). The page now shows 5 steps:

UI Order	Label	Step Key
1	Step 1: Query Decomposition	`decompose`
2	Step 2.1: Filter Intro (Preamble)	`filter_intro`
3	Step 2.2: Filter Section (Per Sub-Q)	`filter_section`
4	Step 2.3: Filter Outro (Format)	`filter_outro`
5	Step 3: Generate (Per-Sub-Question)	`generate_per_subq`

The old filter and generate templates remain in the DB (for API backward compatibility) but are hidden from the UI.

Files Changed

File	Change
`backend/app/core/sqlite_db.py`	3 new seed templates + `generate_per_subq` seed; backfill migration; restored `_SEED_GENERATE` to `{question}`/`{context}`
`backend/app/services/prompt_service.py`	Added 4 step names to `_VALID_STEPS`
`backend/app/routers/prompts.py`	Added 4 step names to `_VALID_STEPS`
`backend/app/services/relevance_filter.py`	Refactored `_build_per_subq_prompt()` to use PromptService + built-in fallback constants
`frontend/src/components/PromptEditor.tsx`	Replaced unused flat steps with 5-step per-sub-q layout (Step 2.1-2.3 + Step 3)
`frontend/src/components/PlaceholderDocs.tsx`	Added `{context_sections}`, `{subq_idx}`, `{subq_question}` docs
`backend/app/test/conftest.py`	Added 4 new templates to mock
`backend/app/test/test_phase3_sqlite_db.py`	Updated counts (9→21 prompts) and placeholder assertions
`backend/app/test/test_phase3_prompt_service.py`	Updated step set + placeholder assertions
`backend/app/test/test_phase3_prompts_router.py`	Updated step set assertion
`backend/app/test/test_phase4_prompt_templates.py`	Updated for split generate/generate_per_subq
`frontend/src/test/components/PromptEditor.test.tsx`	Updated to 5 textareas, new labels, new placeholder layout
`frontend/src/test/components/PlaceholderDocs.test.tsx`	Updated to 6 placeholders

Test Results (Post-Phase 4a)

Backend: 295 passed, 5 skipped (pre-existing)
Frontend: 182 passed, 1 pre-existing failure (unrelated file-input e2e)

Sub-Phase Summary

Sub-Phase	Scope	Backend	Frontend	Tests	Status
4.1	Per-sub-q retrieval	`rag.py`, `query.py`, format helpers	None	`test_phase4_retrieve_per_subquestion.py`, `test_phase4_query_router_retrieval.py`	✅ Complete
4.2	Per-sub-q filtering (1 LLM call)	`relevance_filter.py`, `query.py`	None	`test_phase4_relevance_filter_per_subq.py`, `test_phase4_query_router_filter.py`	✅ Complete
4.3	Sub-q-organized response generation	`rag.py`, `query.py`, `models/query.py`	None	`test_phase4_generate_per_subq.py`, `test_phase4_response_format.py`	✅ Complete
4.4	History schema, prompts, models	`sqlite_db.py`, `history.py` (router + models), `prompt_service.py`	None	`test_phase4_history_format.py`, `test_phase4_prompt_templates.py`	✅ Complete
4.5	Frontend types + state	None	`types/index.ts`, `lib/queries.tsx`	`test_phase4_stream_state.test.tsx`, `test_phase4_types.test.ts`	✅ Complete
4.6	Frontend rendering	None	`ResponsePanel.tsx`, `citationParser.ts`, `ExtractedQuestionsDisplay.tsx`	`test_phase4_response_panel.test.tsx`, `test_phase4_citation_parser.test.ts`	✅ Complete
4.7	Testing & polish	All affected files	All affected files	Integration + acceptance + e2e tests	✅ Complete
4a	Prompt service integration for filter_per_subq	`sqlite_db.py`, `prompt_service.py`, `prompts.py`, `relevance_filter.py`	`PromptEditor.tsx`, `PlaceholderDocs.tsx`	Updated 7 test files, 13 total files changed	✅ Complete

Implementation Sequence & Dependencies

4.1 (Retrieval) ──┐
                  ├──► 4.2 (Filtering) ──► 4.3 (Generate) ──► 4.4 (History/Prompts)
                  │                                                    │
                  │                                                    ▼
                  │                                         4.5 (Frontend Types/State)
                  │                                                    │
                  │                                                    ▼
                  │                                         4.6 (Frontend Rendering)
                  │                                                    │
                  └─────────────────────────────────────────────────────▼
                                                              4.7 (Testing & Polish)

4.1 → 4.2 sequential: Filtering needs per-sub-q chunk structure from retrieval
4.2 → 4.3 sequential: Generation needs filtered chunks from filtering stage
4.3 → 4.4 sequential: History recording and prompt templates need final data shapes
4.4 → 4.5 parallel: Backend prompt/history changes don't block frontend type definitions
4.5 → 4.6 sequential: Rendering needs types and state management
4.7 blocked by all: Integration tests need everything wired together

Parallelization opportunity: 4.5 (frontend types) could start as soon as 4.3 defines the SSE contract, but it's safer to start after 4.4 confirms the final data shapes.

Affected Files — Complete Inventory

Backend — New Files

File	Sub-Phase	Purpose
`backend/app/test/test_phase4_retrieve_per_subquestion.py`	4.1	Unit test: `retrieve_per_subquestion()`
`backend/app/test/test_phase4_query_router_retrieval.py`	4.1	Unit test: retrieval stage in `_query_stream`
`backend/app/test/test_phase4_relevance_filter_per_subq.py`	4.2	Unit test: `filter_per_subquestion()`
`backend/app/test/test_phase4_query_router_filter.py`	4.2	Unit test: filter stage in `_query_stream`
`backend/app/test/test_phase4_generate_per_subq.py`	4.3	Unit test: `generate_response_per_subquestion()`
`backend/app/test/test_phase4_response_format.py`	4.3	Unit test: answer format validation
`backend/app/test/test_phase4_history_format.py`	4.4	Unit test: new XML/JSON history formats
`backend/app/test/test_phase4_prompt_templates.py`	4.4	Unit test: new generate template
`backend/app/test/test_phase4_integration_query_pipeline.py`	4.7	Integration test: full per-sub-q pipeline
`backend/app/test/acceptance/test_phase4_acceptance_query.py`	4.7	Acceptance test: real LLM

Backend — Modified Files

File	Sub-Phase	Changes
`backend/app/services/rag.py`	4.1, 4.3	Add `retrieve_per_subquestion()`, `generate_response_per_subquestion()`
`backend/app/services/relevance_filter.py`	4.2	Add `filter_per_subquestion()`
`backend/app/routers/query.py`	4.1–4.4	Refactor `_query_stream()`, add per-sub-q format helpers, update history recording
`backend/app/models/query.py`	4.3	Add `SubQuestionSources` model, update `QueryResponse`
`backend/app/models/history.py`	4.4	Add optional per-sub-q count fields
`backend/app/core/sqlite_db.py`	4.4	Add new columns, update seed generate template
`backend/app/services/prompt_service.py`	4.4	Update `reset_to_defaults()` generate template
`backend/app/routers/history.py`	4.4	Include new fields in detail response
`backend/app/core/config.py`	4.1	(Maybe) Add `retrieval_n_results_per_subq` setting

Backend — Tests Needing Update

File	Sub-Phase	Changes
`backend/app/test/test_phase1_rag_service.py`	4.7	Add tests for new methods; existing tests unaffected
`backend/app/test/test_phase1_relevance_filter.py`	4.7	Add tests for `filter_per_subquestion()`
`backend/app/test/test_phase3_query_history_integration.py`	4.7	Rewrite pipeline simulation for per-sub-q flow
`backend/app/test/test_phase3_prompt_injection.py`	4.7	Add tests for new generate template
`backend/app/test/acceptance/test_acceptance_phase1_rag_query.py`	4.7	Rewrite — SSE parsing + new response shape
`backend/app/test/conftest.py`	4.7	Add per-sub-q mock helpers

Frontend — New Files

File	Sub-Phase	Purpose
`frontend/src/test/components/test_phase4_response_panel.test.tsx`	4.7	Component test: per-sub-q sections
`frontend/src/test/utils/test_phase4_citation_parser.test.ts`	4.7	Unit test: per-sub-q citation lookup
`frontend/src/test/e2e/test_phase4_query_flow.test.tsx`	4.7	E2E test: mocked SSE with new format
`frontend/src/test/lib/test_phase4_stream_state.test.tsx`	4.5	State test: new event shapes
`frontend/src/test/lib/test_phase4_types.test.ts`	4.5	Type test: type compatibility

Frontend — Modified Files

File	Sub-Phase	Changes
`frontend/src/types/index.ts`	4.5	Add `SubQuestionSources`, update `QueryStreamEvent`
`frontend/src/lib/queries.tsx`	4.5	Update `QueryStreamState`, `completed` event handler
`frontend/src/components/ResponsePanel.tsx`	4.6	Redesign — per-sub-question sections with grouped sources
`frontend/src/utils/citationParser.ts`	4.6	Update `buildCitationLookup()` for per-sub-q
`frontend/src/components/ExtractedQuestionsDisplay.tsx`	4.6	Add anchor links to answer sections
`frontend/src/pages/LTTPage.tsx`	4.6	Pass new props to children

Risk Register

Risk	Likelihood	Impact	Mitigation
LLM struggles with per-sub-q filtering prompt format	Medium	High — all chunks dropped	Use strong prompt constraints, validate JSON, fall back to including all chunks on parse failure
LLM generates answer not matching `## Sub-question N:` format	Medium	Medium — frontend can't parse sections	Fall back to rendering as single block if parsing fails. Prompt engineering tuned for format compliance
Same chunk retrieved by multiple sub-questions → duplicated in context	High	Low — slightly larger prompt but acceptable	Accept duplicates. ChromaDB naturally returns same doc if relevant to multiple queries. Each sub-q's evaluation is independent
Per-sub-q retrieval = more ChromaDB queries = slower	Medium	Medium — N × retrieval latency	ChromaDB retrieval is fast (~10-50ms). 5 sub-questions × 10ms = 50ms overhead. Acceptable trade-off for better relevance.
History DB migration fails for existing records	Low	Low — new columns are NULL-able	`ALTER TABLE ADD COLUMN ... DEFAULT NULL` is safe. Existing records work as before — `chunks_retrieved`/`chunks_filtered` still have flat XML.
Frontend rendering breaks on older history records	Low	Low — answer format differs	`ResponsePanel` renders per-sub-q sections only when `subQuestionSources` is non-null. Older history records show flat answer as before.
Prompt template migration breaks user-customized prompts	Medium	Medium — users lose their generate template	Warn in docs. The `generate` template changes fundamentally (single `{context}` → `{context_sections}`). Users must re-customize.

Acceptance Criteria

Backend

POST /api/v1/query retrieves chunks per sub-question (verified by history XML showing <sub_q> wrappers)
Filtering uses single LLM call evaluating chunks against their originating sub-question (verified by filter prompt)
Response answer is organized by sub-question with ## Sub-question N: headers
sub_question_sources in SSE completed event is grouped by sub-question index
History records include new grouped XML formats for chunks_retrieved and chunks_filtered
History records include grouped sources JSON (list of lists)
History records include per-sub-q chunk counts
New generate prompt template uses {context_sections} placeholder
Prompt service reset_to_defaults() includes new generate template
Existing decompose, filter (old), generate_response (old) methods are unchanged
All Phase 1, Phase 3, and new Phase 4 unit tests pass (312 passed, 4 skipped)
All acceptance tests pass with real LLM (manual run)

Frontend

QueryStreamState includes subQuestionSources field
ResponsePanel renders per-sub-question sections with expandable source grids
Each section's sources are scoped to that sub-question (no cross-contamination)
Inline citations [filename, page N] link to the correct PDF viewer page
ExtractedQuestionsDisplay shows clickable anchors to answer sections
Copy button copies all answer text including section headers
Loading states: skeleton per section during generation
Empty state: "No relevant information found" per sub-question (not entire response)
All 62+ existing frontend tests still pass (183 passed)
All new Phase 4 frontend tests pass
npm run build succeeds with zero TypeScript errors
Manual verification: full query flow works end-to-end

New Dependencies

None. All changes use existing libraries (FastAPI, ChromaDB, OpenAI SDK, React, ReactMarkdown, TanStack Query).

Decisions (All Confirmed)

#	Topic	Decision
1	Single vs multiple filter LLM calls	Single call — user explicitly requested this
2	Filter prompt design	Group chunks by sub-question in one prompt. JSON response maps sub-q indices to score arrays
3	Answer format	Markdown with `## Sub-question N: <question>` headers
4	Sources grouping	`sub_question_sources: [{index, text, sources}, ...]` in SSE + frontend
5	History XML format	Add `<sub_q idx="N" question="...">` wrappers around chunk groups
6	History DB migration	Add 2 new NULL-able columns. No data migration needed.
7	Backward compatibility	Preserve old `retrieve()`, `filter()`, `generate_response()` methods. New methods are additive.
8	Deduplication	None. Same chunk may appear in multiple sub-questions. Each sub-q evaluates independently.
9	Error handling	Per-sub-question graceful degradation. Filter failure → include all chunks for that sub-q. Generate failure → "Unable to generate answer for this sub-question."
10	Frontend rendering engine	Keep `ReactMarkdown`. Parse sections client-side by splitting on `## Sub-question N:` headers.

Open Questions

None — all resolved.

#	Question	Resolution
1	Progressive SSE events?	Yes — emit `generating_subquestion` as each sub-question's answer is generated. Frontend renders sections progressively.
2	`retrieval_n_results` per sub-question or global?	Global — same value for all sub-questions. Simpler config, one setting.
3	Fallback when decomposition returns 0 sub-questions?	Fall back to original question — treat as single sub-question. Pipeline runs as 1-sub-q case (retrieval via original question, no filtering needed for single sub-q, flat answer).

Test Plan Summary

Backend (New Tests)

File	Tests	Coverage
`test_phase4_retrieve_per_subquestion.py`	~6	Per-sub-q retrieval, empty input, single sub-q, dedup behavior
`test_phase4_query_router_retrieval.py`	~4	SSE events during retrieval, chunk XML format
`test_phase4_relevance_filter_per_subq.py`	~6	Per-sub-q filtering, JSON response parsing, threshold behavior
`test_phase4_query_router_filter.py`	~4	SSE events during filtering, filtered XML format
`test_phase4_generate_per_subq.py`	~5	Per-sub-q generate, prompt construction, answer format
`test_phase4_response_format.py`	~4	Answer has `##` headers, citations in correct sections
`test_phase4_history_format.py`	~5	New XML/JSON formats, per-sub-q counts
`test_phase4_prompt_templates.py`	~3	New generate template, `{context_sections}` placeholder
`test_phase4_integration_query_pipeline.py`	~5	Full pipeline simulation
`test_phase4_acceptance_query.py`	~3	Real LLM end-to-end (manual)

Frontend (New Tests)

File	Tests	Coverage
`test_phase4_stream_state.test.tsx`	~4	State updates for new event shapes
`test_phase4_types.test.ts`	~2	Type compatibility checks
`test_phase4_response_panel.test.tsx`	~6	Section rendering, source grouping, copy, loading
`test_phase4_citation_parser.test.ts`	~4	Per-sub-q lookup, cross-section isolation
`test_phase4_e2e_query_flow.test.tsx`	~3	Full SSE flow with mocked stream

Phase PX: Profile Export/Import (2026-04-27)

Source: User request — "add an export and import function for setting a profile. The format is json."

Scope: Add JSON export/import capability to the System Prompts page. Users can download a profile's prompt configuration as a .json file and import it into another profile (or the same one) to transfer or back up their prompt settings.

Status: 🟡 Planned — not yet implemented.

Objective

Let users:

Export a single profile's prompt templates as a downloadable JSON file
Import a previously exported JSON file to overwrite a profile's prompt templates
Optionally, export all profiles at once for full configuration backup

Decision Register

#	Decision	Rationale
P1	Export single profiles, not all-at-once by default	User asked "for setting a profile" — per-profile export/import is more practical for sharing individual configurations. Add "Export All" as secondary option.
P2	Import overwrites ALL prompt steps for target profile	Simplest mental model. Import = full replace (not merge). User gets confirmation dialog before proceeding.
P3	Export JSON includes all 7 steps (including legacy `filter`, `generate`)	Even though UI hides these, the DB stores them. Export should be a complete snapshot — import restores all 7.
P4	Do NOT export auto-increment IDs	`id` fields are not portable between databases. Import inserts new rows; joins on `(name, step_name)` uniqueness.
P5	`created_at`/`updated_at` reset on import	Imported profiles get fresh timestamps (`datetime('now')`). Original export timestamp preserved in file metadata only.
P6	Active profile state NOT imported	`is_active` is deployment-specific. The user sets active profile separately via the existing dropdown. Import only touches `prompt_template` content.
P7	Validate profile name on import	Only A, B, C allowed. Import into non-existent name = rejected.
P8	JSON schema versioned	`"format": "legco-reranker-profile/v1"` for future-proofing. Reject unknown versions on import.

JSON Format Specification

Single Profile Export

{
  "format": "legco-reranker-profile/v1",
  "profile_name": "A",
  "exported_at": "2026-04-27T12:00:00Z",
  "prompts": {
    "decompose": "Given this question: '{question}'\n\nBreak it down into 2-5 simplified sub-questions...",
    "filter": "Given question '{question}' and these document chunks:\n\n{chunks}\n\n...",
    "generate": "Question: {question}\n\nContext:\n{context}\n\n...",
    "generate_per_subq": "Answer each sub-question using ONLY its document chunks...",
    "filter_intro": "Evaluate each chunk for relevance to its associated sub-question only.",
    "filter_section": "\nSub-question {subq_idx}: \"{subq_question}\"\n{chunks}",
    "filter_outro": "\nFor each chunk, rate its relevance 0-10..."
  }
}

Full Backup Export (All Profiles)

{
  "format": "legco-reranker-profile/v1",
  "exported_at": "2026-04-27T12:00:00Z",
  "active_profile": "A",
  "profiles": {
    "A": {
      "prompts": { ... }
    },
    "B": {
      "prompts": { ... }
    },
    "C": {
      "prompts": { ... }
    }
  }
}

Import Request Format

POST /api/v1/prompts/profiles/{name}/import
Content-Type: application/json

{
  "format": "legco-reranker-profile/v1",
  "profile_name": "A",
  "exported_at": "2026-04-27T12:00:00Z",
  "prompts": {
    "decompose": "...",
    ...
  }
}

Response:

{
  "status": "ok",
  "profile": "B",
  "imported_steps": 7,
  "source_profile": "A"
}

Sub-Phase Structure

Sub-Phase	Scope	Components	Test Files
PX.1	Backend — Export endpoint	`routers/prompts.py`, `models/prompts.py`	`test_phaseX_export.py`
PX.2	Backend — Import endpoint	`routers/prompts.py`, `models/prompts.py`, `prompt_service.py`	`test_phaseX_import.py`
PX.3	Frontend — Export/Import UI	`SystemPromptsPage.tsx`, `ProfileList.tsx`, `lib/api.ts`, `lib/queries.tsx`, `types/index.ts`	`test_phaseX_export_import.test.tsx`
PX.4	Testing & Polish	All affected files	Integration + acceptance tests

Sub-Phase PX.1: Backend — Single Profile Export Endpoint

Test files to write first:

backend/app/test/test_phaseX_export.py — Tests export endpoint, JSON schema validation, empty profile handling

Task PX.1.1: Add Pydantic models

File: backend/app/models/prompts.py

class ProfileExportResponse(BaseModel):
    format: str = "legco-reranker-profile/v1"
    profile_name: str
    exported_at: str
    prompts: dict[str, str]

class AllProfilesExportResponse(BaseModel):
    format: str = "legco-reranker-profile/v1"
    exported_at: str
    active_profile: str
    profiles: dict[str, dict[str, dict[str, str]]]  # profile_name -> {"prompts": {step: text}}

Task PX.1.2: Add GET /api/v1/prompts/profiles/{name}/export endpoint

File: backend/app/routers/prompts.py

Reads all 7 system_prompts rows for the given profile
Returns ProfileExportResponse with Content-Disposition: attachment; filename="legco-profile-{name}.json"
Uses application/json content type

Task PX.1.3: Add GET /api/v1/prompts/export/all endpoint (optional)

Reads all 3 profiles + all 21 prompt rows
Returns AllProfilesExportResponse
For full backup/restore scenarios

Commit: "feat(prompts): add single-profile and full JSON export endpoints"

Sub-Phase PX.2: Backend — Single Profile Import Endpoint

Test files to write first:

backend/app/test/test_phaseX_import.py — Tests import endpoint, validation, error cases

Task PX.2.1: Add request model

File: backend/app/models/prompts.py

class ProfileImportRequest(BaseModel):
    format: str                                          # must be "legco-reranker-profile/v1"
    profile_name: str                                    # source profile name (informational)
    exported_at: str | None = None                       # informational timestamp
    prompts: dict[str, str]                              # step_name -> template_text

Task PX.2.2: Add POST /api/v1/prompts/profiles/{name}/import endpoint

File: backend/app/routers/prompts.py

Validation steps:

Check target {name} is A, B, or C → 400 if not
Check request.format == "legco-reranker-profile/v1" → 400 if not
Validate that all 7 required step keys (decompose, filter, generate, generate_per_subq, filter_intro, filter_section, filter_outro) are present in request.prompts → 400 with list of missing keys if not
Validate no extra/unknown step keys → reject (or warn? → decision: reject with 400, listing unknown keys)

Implementation:

Uses PromptService._update_all_prompts() (existing batch-update internally) to overwrite all 7 steps
Each step gets fresh created_at/updated_at timestamps (DB defaults)
Returns {"status": "ok", "profile": name, "imported_steps": len(prompts), "source_profile": request.profile_name}

Task PX.2.3: Add POST /api/v1/prompts/import/all endpoint (optional)

Accepts AllProfilesExportResponse format
Imports all 3 profiles at once
Does NOT change active profile (only if explicitly included)

Commit: "feat(prompts): add single-profile JSON import endpoint with full validation"

Sub-Phase PX.3: Frontend — Export/Import UI

Test files to write first:

frontend/src/test/components/test_phaseX_export_import.test.tsx — Tests export/import buttons, file download, file upload

Task PX.3.1: Add TypeScript types

File: frontend/src/types/index.ts

interface ProfileExportData {
  format: string
  profile_name: string
  exported_at: string
  prompts: Record<string, string>
}

interface ProfileImportResponse {
  status: string
  profile: string
  imported_steps: number
  source_profile: string
}

Task PX.3.2: Add API client functions

File: frontend/src/lib/api.ts

// Download a profile as JSON blob for browser-side save
export const exportProfile = async (name: string): Promise<ProfileExportData> => {
  const resp = await apiClient.get<ProfileExportData>(`/prompts/profiles/${name}/export`)
  return resp.data
}

// Import a profile from JSON
export const importProfile = async (name: string, data: ProfileExportData): Promise<ProfileImportResponse> => {
  const resp = await apiClient.post<ProfileImportResponse>(`/prompts/profiles/${name}/import`, data)
  return resp.data
}

Task PX.3.3: Add TanStack Query mutation for import

File: frontend/src/lib/queries.tsx

export const useImportProfile = () => {
  const queryClient = useQueryClient()
  return useMutation({
    mutationFn: ({ name, data }: { name: string; data: ProfileExportData }) =>
      importProfile(name, data),
    onSuccess: () => {
      queryClient.invalidateQueries({ queryKey: ['prompts'] })
    },
  })
}

Task PX.3.4: Add Export button to ProfileList cards

File: frontend/src/components/ProfileList.tsx

Add export icon button (e.g., Download from lucide-react) next to the "Edit" button on each card
On click: calls exportProfile(name) via fetch → creates blob → triggers browser download via URL.createObjectURL + <a> click
Filename: legco-profile-{name}-{date}.json

Task PX.3.5: Add Import button and dialog to SystemPromptsPage

File: frontend/src/pages/SystemPromptsPage.tsx

Add "Import" button in the top bar (next to "Active Profile" dropdown)
On click: opens a modal/dialog with:
- File input (accept .json) — hidden <input type="file"> triggered by styled button
- After file selected: parse JSON client-side, show preview (source profile name, export date, step count)
- Target profile selector (dropdown: A, B, C) — defaults to source profile name if valid
- "Import" button → confirmation dialog ("This will overwrite all prompts for Profile {target}. Continue?")
- On confirm: calls importProfileMutation.mutate()
- Success: show toast "Profile {target} imported successfully ({n} steps from Profile {source})"
- Error: show inline error message with details

Task PX.3.6: Add Export All button (optional)

File: frontend/src/pages/SystemPromptsPage.tsx

"Export All" button in top bar
Downloads all 3 profiles as legco-profiles-{date}.json

Commit: "feat(prompts): add export/import UI with file download, upload dialog, and validation"

Sub-Phase PX.4: Testing & Polish

Test files:

backend/app/test/test_phaseX_export.py — Export endpoint: valid profile, invalid name, JSON schema validation
backend/app/test/test_phaseX_import.py — Import endpoint: valid import, missing steps, extra steps, invalid format version, invalid target name
frontend/src/test/components/test_phaseX_export_import.test.tsx — Export button click → download, Import dialog flow → file upload → preview → confirm → success/error

Task PX.4.1: Backend unit tests

test_export_profile_valid — GET export/A returns all 7 steps with correct format version
test_export_profile_invalid_name — GET export/X returns 400
test_export_all — GET export/all returns 3 profiles, 21 prompts total
test_import_valid — POST import/B with valid JSON → 200, verify all 7 steps updated
test_import_overwrites_existing — POST import/B → verify old content replaced
test_import_missing_required_step — POST import with only 6 steps → 400 with missing key listed
test_import_unknown_step_key — POST import with extra step → 400
test_import_invalid_format_version — POST import with format: "v2" → 400
test_import_invalid_target_name — POST import/X → 400
test_import_does_not_change_active — import into inactive profile → active profile unchanged

Task PX.4.2: Frontend tests

Export button visible on each profile card
Click export → fetch called, download triggered
Import dialog opens on button click
File selection → JSON parsed, preview shown
Invalid JSON file → error message shown
Target profile selector defaults to source profile
Confirm import → mutation called, success toast
Import error → inline error message
Export All downloads all profiles

Task PX.4.3: Integration verification

npm run build — no TypeScript errors
npm test — all frontend tests pass
pytest backend/app/test/test_phaseX_*.py -v — all backend tests pass
Manual flow: export Profile A → edit Profile B → import exported file into B → verify B's prompts match A's original

Commit: "test(prompts): add unit, integration tests for export/import"

Files Affected — Complete Inventory

Backend — New Files

File	Sub-Phase	Purpose
`backend/app/test/test_phaseX_export.py`	PX.4	Unit tests for export endpoint
`backend/app/test/test_phaseX_import.py`	PX.4	Unit tests for import endpoint

Backend — Modified Files

File	Sub-Phase	Changes
`backend/app/models/prompts.py`	PX.1, PX.2	Add `ProfileExportResponse`, `AllProfilesExportResponse`, `ProfileImportRequest`, `ProfileImportResponse`
`backend/app/routers/prompts.py`	PX.1, PX.2	Add `GET /export`, `GET /export/all`, `POST /import` endpoints

Frontend — New Files

File	Sub-Phase	Purpose
`frontend/src/test/components/test_phaseX_export_import.test.tsx`	PX.4	Component tests for export/import UI

Frontend — Modified Files

File	Sub-Phase	Changes
`frontend/src/types/index.ts`	PX.3	Add `ProfileExportData`, `ProfileImportResponse` types
`frontend/src/lib/api.ts`	PX.3	Add `exportProfile()`, `importProfile()` API functions
`frontend/src/lib/queries.tsx`	PX.3	Add `useImportProfile()` mutation hook
`frontend/src/components/ProfileList.tsx`	PX.3	Add Export button per profile card
`frontend/src/pages/SystemPromptsPage.tsx`	PX.3	Add Import/Export All buttons, import dialog/modal

Acceptance Criteria

Backend

GET /api/v1/prompts/profiles/A/export returns JSON with all 7 steps, correct format version
GET /api/v1/prompts/profiles/X/export returns 400 (invalid profile name)
GET /api/v1/prompts/export/all returns all 3 profiles, active profile marker
POST /api/v1/prompts/profiles/B/import with valid payload overwrites all 7 steps for Profile B
Import rejects payload with missing required step keys (400 + key names)
Import rejects payload with unknown step keys (400 + key names)
Import rejects payload with unknown format version (400)
Import does NOT change is_active flag on target profile
Exported JSON does NOT contain internal DB IDs (id/profile_id)
All existing prompt API endpoints still work unchanged

Frontend

Export button visible on each profile card in ProfileList
Clicking Export downloads a .json file with correct naming (legco-profile-A-2026-04-27.json)
Import button visible on SystemPromptsPage top bar
Clicking Import opens a modal with: file input, JSON preview, target profile selector, confirm button
Selecting invalid JSON file shows error message
Importing into a valid profile shows success confirmation with step count
Import error from backend shows inline error message
After successful import, profile data refreshes (query invalidation)
All existing System Prompts functionality still works unchanged

Risk Register

Risk	Likelihood	Impact	Mitigation
JSON file too large to upload	Low	Low — 7 prompts × ~2KB = ~14KB	Add 1MB limit on import endpoint (`FastAPI` `Body(max_length=...)`)
User imports into wrong profile by mistake	Medium	Medium — overwrites their existing config	Confirmation dialog with source/target profile names clearly displayed before import
Exported file missing legacy `filter`/`generate` steps	Medium	Medium — import would fail validation	Always export all 7 steps (even hidden ones). Import validates all 7 are present.
Browser download API differences	Low	Low	Use standard `Blob` + `URL.createObjectURL` approach, tested across Chrome/Firefox
Import endpoint receives malformed JSON	Low	Low — Pydantic validation catches this	`ProfileImportRequest` model validates format string, dict keys, value types
User exports from one deployment and imports into another with different profile names	Low	Low — only 3 names (A/B/C)	Import only into A/B/C — if source was "D", user must choose target manually

New Dependencies

None. All changes use existing libraries (FastAPI, Pydantic, React, TanStack Query, lucide-react icons).

Implementation Sequence

PX.1 (Backend Export) ──► PX.2 (Backend Import)
                              │
                              ▼
                         PX.3 (Frontend UI)
                              │
                              ▼
                         PX.4 (Testing)

PX.1 and PX.2 can be done together (both in routers/prompts.py). PX.3 depends on knowing the exact API contracts from PX.1/PX.2. PX.4 runs after everything is wired.

68 KiB Raw Blame History Unescape Escape

Package 4 Enhancement Plan — Per-Sub-Question RAG Pipeline

Objective

Decision Register

Pipeline: Before vs After

Before (Current — Flat Batch)

After (Per-Sub-Question)

Current State (Pre-Enhancement)

Backend

Frontend

Key Test Files

Implementation Tasks

Sub-Phase 4.1: Backend — Per-Sub-Question Retrieval

Sub-Phase 4.2: Backend — Per-Sub-Question Filtering (Single LLM Call)

Sub-Phase 4.3: Backend — Sub-Question-Organized Response Generation

Sub-Phase 4.4: Backend — History & Prompt Template Updates

Sub-Phase 4.5: Frontend — Types & State Management

Sub-Phase 4.6: Frontend — ResponsePanel & ExtractedQuestionsDisplay

Sub-Phase 4.7: Testing & Polish

Phase 4a: Prompt Service Integration for Per-Sub-Q Filter (2026-04-27)

Bugs Fixed

Database Backfill Migration

System Prompts UI Restructured

Files Changed

Test Results (Post-Phase 4a)

Sub-Phase Summary

Implementation Sequence & Dependencies

Affected Files — Complete Inventory

Backend — New Files

Backend — Modified Files

Backend — Tests Needing Update

Frontend — New Files

Frontend — Modified Files

Risk Register

Acceptance Criteria

Backend

Frontend

New Dependencies

Decisions (All Confirmed)

Open Questions

Test Plan Summary

Backend (New Tests)

Frontend (New Tests)

Phase PX: Profile Export/Import (2026-04-27)

Objective

Decision Register

JSON Format Specification

Single Profile Export

Full Backup Export (All Profiles)

Import Request Format

Sub-Phase Structure

Sub-Phase PX.1: Backend — Single Profile Export Endpoint

Sub-Phase PX.2: Backend — Single Profile Import Endpoint

Sub-Phase PX.3: Frontend — Export/Import UI

Sub-Phase PX.4: Testing & Polish

Files Affected — Complete Inventory

Backend — New Files

Backend — Modified Files

Frontend — New Files

Frontend — Modified Files

Acceptance Criteria

Backend

Frontend

Risk Register

New Dependencies

Implementation Sequence

68 KiB

Raw Blame History