legco_ai_assistant/.plans/package4_enhancement_plan.md

68 KiB
Raw Blame History

Package 4 Enhancement Plan — Per-Sub-Question RAG Pipeline

Source: User request (2026-04-26)
Scope: Refactor the 3-step RAG query pipeline so retrieval, filtering, and response generation are organized per sub-question instead of batch-flattened.
Status: Complete — All 7 sub-phases implemented (2026-04-26). Phase 4a Prompt Integration added (2026-04-27). Phase PX Profile Export/Import planned (2026-04-27) — see end of file.


Objective

Restructure the POST /api/v1/query pipeline so that:

  1. Retrieval per sub-question: Each sub-question independently retrieves n_results chunks from ChromaDB (instead of joining all sub-questions into one query string).
  2. Filtering per sub-question: Each chunk is evaluated for relevance against its own originating sub-question (not the original user question). One LLM call handles all filtering — the prompt is redesigned to group chunks by sub-question.
  3. Final answer organized by sub-question: Each sub-question gets its own bullet-point answer with its own sources. The frontend renders answer sections per sub-question rather than one monolithic bullet list.

Decision Register

# Decision Rationale
1 Keep QueryDecomposer unchanged Input/output contract is identical — decomposition still produces a flat list of sub-questions
2 Single LLM call for filtering User explicitly requested one call. Prompt redesigned to carry sub-question context for each chunk group
3 Keep RAGService.retrieve() signature Call it N times (once per sub-question) externally in the orchestrator rather than changing its internal contract
4 Add retrieve_per_subquestion() to RAGService New method that iterates over sub-questions, calls retrieve() per question, returns grouped results
5 Redesign generate_response() signature Accepts structured sub_questions: List[SubQuestionContext] instead of flat chunk lists
6 SSE events: add generating_subquestion phase Progressive streaming — frontend sees which sub-question is being answered
7 History: change XML/JSON formats in-place Add <sub_q> wrappers to chunks_retrieved/chunks_filtered XML. Add sub-question grouping to sources JSON. No new DB columns.
8 Final answer format: markdown sections ## Sub-question 1 headers with inline citations. Backward-compatible with existing ReactMarkdown rendering
9 Deduplicate chunks within a sub-question only Same chunk may be retrieved by multiple sub-questions. Keep duplicates (different sub-questions need independent evaluation). ChromaDB query() naturally may return the same doc for different queries — this is acceptable.
10 Prompt template: add generate placeholders New placeholder {context_sections} replaces single {context}. Filter template unchanged (sub-question injected at call site). Decompose template unchanged.
11 Progressive SSE events Emit generating_subquestion event as each sub-question's answer section is generated. Frontend renders sections one by one.
12 retrieval_n_results Global — same value for all sub-questions. Use existing settings.retrieval_n_results config.
13 Empty decomposition fallback Treat original user question as single sub-question. Pipeline runs as 1-sub-q case — single retrieval, no filtering needed (one sub-q = no ambiguity), flat answer with ## header.

Pipeline: Before vs After

Before (Current — Flat Batch)

User Question: "What are NEC4 time extension clauses?"
         │
    ┌────▼─────┐
    │ Decompose│  LLM Call 1
    │ → ["What are time extensions?", 
    │    "What notice is required?"]
    └────┬─────┘
         │ joined: "What are time extensions? What notice is required?"
    ┌────▼─────┐
    │ Retrieve │  1 ChromaDB query → 10 chunks (flat, no sub-q association)
    └────┬─────┘
         │ 10 chunks
    ┌────▼─────┐
    │  Filter  │  LLM Call 2 — all chunks scored against ORIGINAL question
    │          │  Score > 7 → keep (flat, no sub-q association)
    └────┬─────┘
         │ N filtered chunks
    ┌────▼─────┐
    │ Generate │  LLM Call 3 — flat answer from ALL filtered chunks
    │          │  "• Time extensions require notice [NEC4 ACC.pdf, p3]
    │          │   • The project manager must acknowledge [NEC4, p7]
    │          │   • Notice is defined as..."  (sources from all sub-qs mixed)
    └────┬─────┘
         │ single SSE completed event
    ┌────▼─────┐
    │ Frontend │  1 ReactMarkdown block, 1 flat sources list
    └──────────┘

After (Per-Sub-Question)

User Question: "What are NEC4 time extension clauses?"
         │
    ┌────▼─────┐
    │ Decompose│  LLM Call 1 (UNCHANGED)
    │ → ["What are time extensions?",
    │    "What notice is required?"]
    └────┬─────┘
         │ sub_q1                    sub_q2
    ┌────▼─────┐              ┌────▼─────┐
    │ Retrieve │              │ Retrieve │   2 ChromaDB queries → 10 chunks each
    │ q1 → 10  │              │ q2 → 10  │   chunks tagged with sub-q index
    └────┬─────┘              └────┬─────┘
         │                         │
         └─────────┬───────────────┘
                   │ grouped: {sub_q0: [chunks 0-9], sub_q1: [chunks 10-19]}
              ┌────▼─────┐
              │  Filter  │  LLM Call 2 (SINGLE CALL — redesigned prompt)
              │          │  Each chunk scored against its OWN sub-question
              │          │  Returns grouped scores → filtered per sub-q
              └────┬─────┘
                   │ filtered_by_subq: {0: [chunk_a, chunk_b], 1: [chunk_c]}
              ┌────▼─────┐
              │ Generate │  LLM Call 3 (redesigned prompt with per-sub-q context)
              │          │  ┌─────────────────────────────────────┐
              │          │  │ ## What are time extensions?         │
              │          │  │ - Time extensions must be notified   │
              │          │  │   [NEC4 ACC.pdf, page 3]             │
              │          │  │ - The project manager has 2 weeks    │
              │          │  │   [NEC4 Contract.pdf, page 12]       │
              │          │  │                                      │
              │          │  │ ## What notice is required?          │
              │          │  │ - Written notice must be given       │
              │          │  │   [NEC4 ACC.pdf, page 7]             │
              │          │  └─────────────────────────────────────┘
              └────┬─────┘
                   │ SSE events: generating_subquestion (per sub-q) → completed
              ┌────▼─────┐
              │ Frontend │  Sections per sub-question, sources grouped per section
              └──────────┘

Current State (Pre-Enhancement)

Backend

Component File Current Behavior
Decomposer services/query_decomposer.py decompose(question) -> (List[str], prompt) — returns 2-5 sub-questions
Retrieval services/rag.py:retrieve() query_text = " ".join(query_keywords) — joins all sub-qs into ONE string, single ChromaDB query → flat chunk list
Filter services/relevance_filter.py filter(question, chunks) — ALL chunks scored against ORIGINAL question, single LLM call, flat output
Generate services/rag.py:generate_response() generate_response(question, chunks, metadata) — flat chunks → flat bullet answer
Orchestrator routers/query.py:_query_stream() Linear 4-stage pipeline: decompose → retrieve → filter → generate
SSE Events routers/query.py decomposed → retrieving → filtering → generating → completed — flat answer + sources in completed
History services/history_service.py Flat XML for chunks_retrieved/chunks_filtered. Flat JSON for sources. Single timing per stage.
Prompt templates prompt_service.py + sqlite_db.py 3 steps (decompose, filter, generate). Placeholders: {question}, {chunks}, {context}
Config core/config.py retrieval_n_results=10, relevance_threshold=7.0

Frontend

Component File Current Behavior
Types types/index.ts QueryStreamEvent.phase, flat extracted_questions: string[], flat answer: string, flat sources: SourceMetadata[]
SSE Client lib/api.ts queryDocumentStream() — generic JSON.parse per data: line, no sub-question awareness
State lib/queries.tsx QueryStreamState with flat answer/sources/extractedQuestions
Response components/ResponsePanel.tsx Single ReactMarkdown block for answer. Flat 2-column grid for sources. No sub-question grouping.
Questions components/ExtractedQuestionsDisplay.tsx <ol> list of question strings. No sources attached.
Citations utils/citationParser.ts Flat sources lookup — buildCitationLookup(sources) returns global map
Progress components/PipelineProgress.tsx 4-step stepper (NOT currently wired in LTTPage)

Key Test Files

File Lines Status
test_phase1_query_decomposer.py 76 Unchanged — decomposer contract stays
test_phase1_rag_service.py 139 🔴 Needs update — retrieve(), generate_response() signatures change
test_phase1_relevance_filter.py 93 🟡 Needs update — one-call pattern changes to per-sub-q grouping
test_phase1_query.py 97 🟢 Already skipped (SSE migration) — may un-skip later
test_phase3_query_history_integration.py 608 🔴 Major rewrite — pipeline simulation mirrors _query_stream 1:1
test_phase3_prompt_injection.py 238 🟡 Moderate — new generate template placeholder
test_acceptance_phase1_rag_query.py 101 🔴 Full rewrite — already broken (SSE vs JSON), new response shape
conftest.py 94 🟡 Low — may add per-sub-q mock helpers

Implementation Tasks

Sub-Phase 4.1: Backend — Per-Sub-Question Retrieval

Test files to write first:

  • test_phase4_retrieve_per_subquestion.py — Tests RAGService.retrieve_per_subquestion()
  • test_phase4_query_router_retrieval.py — Tests _query_stream retrieval stage produces per-sub-q chunks

Task 4.1.1: Add retrieve_per_subquestion() to RAGService

File: backend/app/services/rag.py

New method signature:

def retrieve_per_subquestion(
    self,
    sub_questions: List[str],
    n_results: int = 10,
) -> List[Tuple[str, List[Tuple[str, Dict[str, Any], float]]]]:
    """Retrieve chunks for each sub-question independently.

    Args:
        sub_questions: List of decomposed sub-questions.
        n_results: Number of chunks per sub-question.

    Returns:
        List of (sub_question, chunks) tuples.
        chunks is the standard retrieve() output: [(text, metadata, distance), ...].
    """

Implementation:

  • Call self.retrieve([sub_q], n_results) for each sub-question
  • Return list of (sub_question, chunks) — chunks remain deduplicated at ChromaDB level (ChromaDB automatically deduplicates by ID)
  • Existing retrieve() method is NOT modified — it continues to work as before

Task 4.1.2: Update _query_stream() retrieval stage

File: backend/app/routers/query.py

Changes:

  • Replace rag.retrieve(extracted_questions, n_results) with rag.retrieve_per_subquestion(extracted_questions, n_results)
  • Track per-sub-question retrieval timing (new field or combined timing)
  • Format chunks_retrieved XML with sub-question wrappers

New chunks_retrieved XML format:

<sub_q idx="0" question="What are time extensions?">
<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Content: Clause 61.3 states that...
</chunk_1>
<chunk_2>
...
</chunk_2>
</sub_q>
<sub_q idx="1" question="What notice is required?">
<chunk_1>
Filename: NEC4 Contract.pdf
Page: 12
Content: Notice must be given...
</chunk_1>
...
</sub_q>

Task 4.1.3: Format helpers

File: backend/app/routers/query.py

New functions:

def format_chunks_retrieved_per_subq(results: List[Tuple[str, List]]) -> str:
    """Format per-sub-question retrieved chunks as XML."""
    
def format_chunks_filtered_per_subq(results: List[Tuple[str, List]]) -> str:
    """Format per-sub-question filtered chunks as XML with relevance scores."""

Commit: "feat: Phase 4.1 per-sub-question retrieval with grouped chunk XML"

Sub-Phase 4.2: Backend — Per-Sub-Question Filtering (Single LLM Call)

Test files to write first:

  • test_phase4_relevance_filter_per_subq.py — Tests RelevanceFilter.filter_per_subquestion() with grouped chunks
  • test_phase4_query_router_filter.py — Tests filter stage with per-sub-q chunk groups

Task 4.2.1: Add filter_per_subquestion() to RelevanceFilter

File: backend/app/services/relevance_filter.py

New method signature:

async def filter_per_subquestion(
    self,
    sub_questions: List[str],
    sub_chunks: List[List[Tuple[str, Dict]]],
    threshold: float = 7.0,
) -> Tuple[List[Tuple[str, List[Tuple[str, Dict]]]], str]:
    """Filter chunks per sub-question in a single LLM call.

    Args:
        sub_questions: List of decomposed sub-questions.
        sub_chunks: List of chunk lists (one per sub-question).
        threshold: Minimum relevance score.

    Returns:
        Tuple of (filtered_results, prompt).
        filtered_results: List of (sub_question, filtered_chunks_for_that_q).
    """

Prompt design (single LLM call):

Evaluate each chunk for relevance to its associated sub-question.

Sub-question 0: "{sub_q_0}"
Chunk 0: {chunk_0_text}
Chunk 1: {chunk_1_text}
...

Sub-question 1: "{sub_q_1}"
Chunk 0: {chunk_0_text}
Chunk 1: {chunk_1_text}
...

For each chunk, rate relevance 0-10 considering ONLY its associated sub-question.
Return a JSON object mapping sub-question indices to arrays of scores:
{"0": [8.5, 3.2, 9.0], "1": [7.0, 6.5, 9.1]}

Key rules:

  • Each chunk is evaluated against its own sub-question (not the original user question)
  • JSON keys are stringified sub-question indices ("0", "1", ...)
  • Score arrays MUST match chunk count for each sub-question
  • Same JSON extraction/markdown stripping logic as existing filter()

Existing filter() method is preserved — not modified, not deprecated. The new method is additive.

Task 4.2.2: Update _query_stream() filter stage

File: backend/app/routers/query.py

Changes:

  • Call relevance_filter.filter_per_subquestion(extracted_questions, chunks_for_filter, threshold) instead of relevance_filter.filter(question, chunks, threshold)
  • Build chunks_for_filter from per-sub-question retrieval results
  • Track filter_prompt (the redesigned prompt)
  • Format chunks_filtered XML with sub-question wrappers and Relevance: scores

New chunks_filtered XML format:

<sub_q idx="0" question="What are time extensions?">
<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Relevance: 8.5
Content: Clause 61.3 states that...
</chunk_1>
</sub_q>
<sub_q idx="1" question="What notice is required?">
<chunk_1>
Filename: NEC4 Contract.pdf
Page: 12
Relevance: 9.0
Content: Notice must be given...
</chunk_1>
</sub_q>

Commit: "feat: Phase 4.2 per-sub-question filtering with single LLM call"

Sub-Phase 4.3: Backend — Sub-Question-Organized Response Generation

Test files to write first:

  • test_phase4_generate_per_subq.py — Tests RAGService.generate_response_per_subquestion()
  • test_phase4_response_format.py — Tests the final answer matches expected format

Task 4.3.1: Redesign generate_response()generate_response_per_subquestion()

File: backend/app/services/rag.py

New method signature:

async def generate_response_per_subquestion(
    self,
    sub_questions: List[str],
    sub_chunks: List[List[str]],
    sub_metadata: List[List[Dict[str, Any]]],
) -> Tuple[str, str, List[List[SourceMetadata]]]:
    """Generate sub-question-organized RAG response.

    Args:
        sub_questions: List of decomposed sub-questions.
        sub_chunks: List of chunk text lists (one per sub-question).
        sub_metadata: List of metadata dict lists (one per sub-question).

    Returns:
        Tuple of (answer, prompt, grouped_sources).
        answer: Markdown string with sections per sub-question.
        prompt: The rendered LLM prompt.
        grouped_sources: List of SourceMetadata lists (one per sub-question).
    """

New prompt template (replaces generate):

You must answer each sub-question using ONLY the document chunks provided for it.
Do not use any external knowledge.
Format your answer as markdown sections — one section per sub-question.
Each section should start with "## Sub-question N: <the question>"
Each section should contain 1-5 bullet points.
Cite your sources inline using bracket labels, e.g. [filename, page N].
Place the citation at the end of each relevant bullet point.

{context_sections}

Answer:

Context format (replaces {context}):

### Context for Sub-question 0: "What are time extensions?"
[NEC4 ACC.pdf, page 3] Source: NEC4 ACC.pdf
Summary: Clause 61.3 discusses time extensions...
Content: Clause 61.3 states that the project manager...

[NEC4 Contract.pdf, page 12] Source: NEC4 Contract.pdf
Summary: Notice requirements for time extensions...
Content: Written notice must be given within...

### Context for Sub-question 1: "What notice is required?"
[NEC4 ACC.pdf, page 7] Source: NEC4 ACC.pdf
Summary: Notice requirements...
Content: The contractor shall notify the project manager in writing...

Expected answer format:

## Sub-question 1: What are time extensions?
- Time extensions must be notified to the project manager within 2 weeks [NEC4 ACC.pdf, page 3]
- The project manager must acknowledge the notice within 1 week [NEC4 Contract.pdf, page 12]

## Sub-question 2: What notice is required?
- Written notice must be given [NEC4 ACC.pdf, page 7]

Existing generate_response() is preserved — not modified, not deprecated.

Task 4.3.2: Update _query_stream() generate stage

File: backend/app/routers/query.py

Changes:

  • Call rag.generate_response_per_subquestion(extracted_questions, chunk_texts_by_subq, metadata_by_subq)
  • New SSE event: generating_subquestion — emitted before each sub-question's section (lets frontend show progressive build)
  • completed SSE event includes both answer (markdown string) and sub_question_sources (grouped sources)

New SSE event sequence:

{"phase": "decomposed", "extracted_questions": ["q1", "q2"]}
{"phase": "retrieving"}
{"phase": "filtering"}
{"phase": "generating"}
{"phase": "completed", "answer": "## Sub-question 1: ...\n\n...", "sub_question_sources": [[SourceMetadata, ...], [SourceMetadata, ...]]}
{"phase": "error", "message": "..."}

New QueryResponse model:

File: backend/app/models/query.py

class SubQuestionSources(BaseModel):
    sub_question_index: int
    sub_question_text: str
    sources: List[SourceMetadata]

class QueryResponse(BaseModel):
    extracted_questions: List[str]
    answer: str                          # Markdown with ## sections
    sub_question_sources: List[SubQuestionSources]  # Grouped sources
    # Backward compat:
    sources: List[SourceMetadata]        # Flattened version (all sources)

Commit: "feat: Phase 4.3 sub-question-organized response generation"

Sub-Phase 4.4: Backend — History & Prompt Template Updates

Test files to write first:

  • test_phase4_history_format.py — Tests new XML/JSON history formats
  • test_phase4_prompt_templates.py — Tests new generate template with {context_sections}

Task 4.4.1: Update history recording

File: backend/app/routers/query.py (the _schedule_history / _record_history helpers)

Changes:

  • chunks_retrieved: Store new grouped XML format (with <sub_q> wrappers)
  • chunks_filtered: Store new grouped XML format (with <sub_q> wrappers and Relevance: scores)
  • sources: Store grouped JSON: json.dumps([[SourceMetadata_dict, ...], [...]]) (list of lists)
  • final_answer: Store markdown string with ## sections
  • Existing fields (chunks_retrieved_count, chunks_filtered_count) keep total counts
  • New optional fields: chunks_retrieved_per_subq_count, chunks_filtered_per_subq_count (JSON array of ints)

Task 4.4.2: Update history DB schema (minimal)

File: backend/app/core/sqlite_db.py

Add two new columns (optional, NULL-able):

ALTER TABLE query_history ADD COLUMN chunks_retrieved_per_subq_count TEXT DEFAULT NULL;
ALTER TABLE query_history ADD COLUMN chunks_filtered_per_subq_count TEXT DEFAULT NULL;

These store JSON arrays like [10, 8] — one count per sub-question. NULL for pre-Package-4 records.

Task 4.4.3: Update history Pydantic models

File: backend/app/models/history.py

Add optional fields to QueryHistoryRecord and QueryHistoryDetail:

chunks_retrieved_per_subq_count: Optional[str] = None  # JSON array string
chunks_filtered_per_subq_count: Optional[str] = None    # JSON array string

Task 4.4.4: Update prompt templates

File: backend/app/core/sqlite_db.py (seed data)

New generate template:

"generate": (
    "You must answer each sub-question using ONLY the document chunks provided for it.\n"
    "Do not use any external knowledge.\n"
    "Format your answer as markdown sections — one section per sub-question.\n"
    "Each section should start with \"## Sub-question N: <the question>\"\n"
    "Each section should contain 1-5 bullet points.\n"
    "Cite your sources inline using bracket labels, e.g. [filename, page N].\n"
    "Place the citation at the end of each relevant bullet point.\n\n"
    "{context_sections}\n\n"
    "Answer:"
)

decompose and filter templates remain unchanged (they still use {question} placeholder — the orchestrator injects the right value at call time).

Task 4.4.5: Update PromptService to handle new template placeholder

File: backend/app/services/prompt_service.py

  • Add context_sections as a known placeholder for the generate step (optional — str.replace already safe with unknown keys)
  • The reset_to_defaults() method must include the new generate template

Task 4.4.6: Update history detail API response

File: backend/app/routers/history.py

GET /api/v1/history/{id} response now includes chunks_retrieved_per_subq_count and chunks_filtered_per_subq_count when they are not NULL. Backward-compatible (older records return null for these fields).

Commit: "feat: Phase 4.4 history schema, prompt templates, and Pydantic model updates"

Sub-Phase 4.5: Frontend — Types & State Management

Test files to write first:

  • test_phase4_stream_state.test.tsx — Tests QueryStreamState handles new response shape
  • test_phase4_types.test.ts — Tests type compatibility

Task 4.5.1: Update TypeScript types

File: frontend/src/types/index.ts

New types:

interface SubQuestionSources {
  sub_question_index: number;
  sub_question_text: string;
  sources: SourceMetadata[];
}

interface QueryStreamCompletedEvent {
  phase: 'completed';
  answer: string;                              // Markdown with ## sections
  sub_question_sources: SubQuestionSources[];  // Grouped sources
}

interface QueryStreamDecomposedEvent {
  phase: 'decomposed';
  extracted_questions: string[];
}

type QueryStreamEvent = 
  | QueryStreamDecomposedEvent
  | { phase: 'retrieving' | 'filtering' | 'generating' }
  | QueryStreamCompletedEvent
  | { phase: 'error'; message: string };

Task 4.5.2: Update QueryStreamState and mutation handler

File: frontend/src/lib/queries.tsx

Changes:

interface QueryStreamState {
  extractedQuestions: string[] | null;
  answer: string | null;                        // Full markdown
  subQuestionSources: SubQuestionSources[] | null;  // NEW — grouped sources
  phase: 'idle' | 'decomposing' | 'retrieving' | 'filtering' | 'generating' | 'completed' | 'error';
  error: Error | null;
}

In the completed case:

case 'completed':
  setState(prev => ({
    ...prev,
    answer: event.answer,
    subQuestionSources: event.sub_question_sources,
    phase: 'completed',
  }));
  break;

Commit: "feat: Phase 4.5 frontend types and state management for per-sub-q responses"

Sub-Phase 4.6: Frontend — ResponsePanel & ExtractedQuestionsDisplay

Test files to write first:

  • test_phase4_response_panel.test.tsx — Tests per-sub-question section rendering
  • test_phase4_citation_parser.test.ts — Tests per-sub-question citation lookup

Task 4.6.1: Redesign ResponsePanel for sub-question sections

File: frontend/src/components/ResponsePanel.tsx

Current: single ReactMarkdown block + flat sources grid.

New layout:

┌─────────────────────────────────────────────────────┐
│  📋 Response                           [Copy All]   │
├─────────────────────────────────────────────────────┤
│                                                      │
│  ┌─ Sub-question 1: What are time extensions? ─────┐│
│  │                                                    │
│  │  • Time extensions must be notified...             │
│  │    [NEC4 ACC.pdf, page 3]                          │
│  │  • The project manager must acknowledge...         │
│  │    [NEC4 Contract.pdf, page 12]                    │
│  │                                                    │
│  │  Sources (2)                          [Expand ▼]  │
│  │  ┌──────────────────────────────────────────────┐ │
│  │  │ NEC4 ACC.pdf, Page 3  │ NEC4 Contract, p12 │ │
│  │  │ "Clause 61.3 states.." │ "Notice must be..." │ │
│  │  └──────────────────────────────────────────────┘ │
│  └────────────────────────────────────────────────────┘│
│                                                      │
│  ┌─ Sub-question 2: What notice is required? ───────┐│
│  │                                                    │
│  │  • Written notice must be given...                  │
│  │    [NEC4 ACC.pdf, page 7]                           │
│  │                                                    │
│  │  Sources (1)                          [Expand ▼]  │
│  └────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────┘

Implementation approach:

  1. Parse the answer markdown into sections using ## Sub-question N: headers
  2. Map each section to its SubQuestionSources by matching index
  3. Render each section as an accordion/card with:
    • Header: sub-question text (from SubQuestionSources)
    • Body: ReactMarkdown for bullet points (with inline citation links)
    • Footer: collapsible sources grid (only sources belonging to this sub-question)
  4. Keep the existing citation link behavior (clickable [filename, page N] → PDF viewer)

Task 4.6.2: Update citationParser.ts for per-sub-question lookup

File: frontend/src/utils/citationParser.ts

Current: buildCitationLookup(sources: SourceMetadata[]) — returns a single global map.

New: buildCitationLookup(subQuestionSources: SubQuestionSources[]) — returns a map scoped to the correct sources for each section. The citation [filename, page N] match is looked up in the relevant sub-question's source list.

Task 4.6.3: Update ExtractedQuestionsDisplay for anchors

File: frontend/src/components/ExtractedQuestionsDisplay.tsx

Minor enhancement:

  • Make each extracted question a clickable anchor that scrolls to its corresponding section in the answer
  • Add id="subq-{index}" to each section header in ResponsePanel
  • Keep existing skeleton loading behavior

Commit: "feat: Phase 4.6 frontend per-sub-question response rendering"

Sub-Phase 4.7: Testing & Polish

Test files to write:

  • test_phase4_integration_query_pipeline.py — Full integration test simulating per-sub-q pipeline
  • test_phase4_acceptance_query.py — Acceptance test with real LLM (manual run)
  • test_phase4_e2e_query_flow.test.tsx — Frontend e2e test with mocked SSE stream

Task 4.7.1: Backend unit tests

  • Run pytest backend/app/test/test_phase4_*.py -v — all must pass
  • Verify no regressions in existing Phase 1 and Phase 3 tests
  • Update test_phase1_rag_service.py for new method signatures
  • Update test_phase1_relevance_filter.py for per-sub-q behavior
  • Rewrite test_phase3_query_history_integration.py for new pipeline flow
  • Update test_phase3_prompt_injection.py for new generate template

Task 4.7.2: Backend acceptance tests

  • test_phase4_acceptance_query.py — real LLM, real ChromaDB
  • Verify: answer contains ## Sub-question headers, sources grouped by sub-question index
  • Verify: each sub-question section has 1-5 bullet points
  • Verify: inline citations match the correct sub-question's source list

Task 4.7.3: Frontend tests

  • test_phase4_response_panel.test.tsx — renders per-sub-question sections, expandable sources
  • test_phase4_citation_parser.test.ts — per-sub-question lookup returns correct source
  • test_phase4_e2e_query_flow.test.tsx — mocks SSE with new event format, verifies section rendering
  • Update existing ResponsePanel.test.tsx and citationParser.test.ts for new API

Task 4.7.4: Frontend build verification

  • npm run build — no TypeScript errors
  • npm test — all 62 existing tests pass + new Phase 4 tests
  • Verify manual flow: ask question → see extracted questions → see per-sub-question answer sections → expand sources per section

Task 4.7.5: Error handling

  • Empty decomposition: if decompose() returns [], fall back to using original question as single sub-question
  • Empty retrieval for some sub-questions: that sub-question gets no chunks → section shows "No relevant information found"
  • Filter failure (all chunks below threshold): that sub-question gets no answer → graceful empty section
  • JSON parse failure in filter: fall back to including all chunks (no filtering) for that sub-question

Task 4.7.6: Documentation

  • Update AGENTS.md with new pipeline architecture section
  • Add docstrings to all new methods (retrieve_per_subquestion, filter_per_subquestion, generate_response_per_subquestion)
  • Update prompt template documentation in system prompts page

Commit: "feat: Phase 4.7 testing, error handling, and polish for per-sub-q pipeline"


Phase 4a: Prompt Service Integration for Per-Sub-Q Filter (2026-04-27)

Root issue: filter_per_subquestion() in relevance_filter.py had a hardcoded prompt (_build_per_subq_prompt()) — completely bypassing PromptService. Users could not edit the per-sub-q filter prompt on the System Prompts page, unlike the flat filter step which was already prompt-service-driven.

Solution: Broke the per-sub-q filter prompt into 3 composable pieces, each a separately editable step on the System Prompts page:

Step Name Label Placeholders Default
filter_intro Step 2.1: Filter Intro (Preamble) (none) "Evaluate each chunk for relevance to its associated sub-question only."
filter_section Step 2.2: Filter Section (Per Sub-Q) {subq_idx}, {subq_question}, {chunks} 'Sub-question {subq_idx}: "{subq_question}"\n{chunks}'
filter_outro Step 2.3: Filter Outro (Format) (none) JSON format instructions + example

The RelevanceFilter._build_per_subq_prompt() now composes them at runtime:

filter_intro + [filter_section.replace(...) for each sub-q] + filter_outro

Falls back to built-in defaults when PromptService is unavailable.

Bugs Fixed

  1. generate_per_subq not seeded: rag.py called get_prompt_template("generate_per_subq") but this step name was never added to _VALID_STEPS, _SEED_STEPS, or _SEED_TEMPLATES — would crash at runtime with ValueError. Now properly seeded with {context_sections} placeholder.

  2. _SEED_GENERATE placeholder mismatch from Package 4: The flat generate_response() expects {question}/{context} placeholders, but Package 4 changed the seed template to use {context_sections} (intended for per-sub-q generate). Restored flat template; generate_per_subq now holds {context_sections}.

Database Backfill Migration

The existing seed_default_profiles() only inserted steps for NEWLY created profiles. Added a backfill loop that iterates ALL existing profiles and INSERT OR IGNOREs any missing step names. This ensures existing A/B/C profiles pick up filter_intro, filter_section, filter_outro, and generate_per_subq on restart.

System Prompts UI Restructured

The flat filter and generate steps were removed from the UI (they're unused by the current pipeline). The page now shows 5 steps:

UI Order Label Step Key
1 Step 1: Query Decomposition decompose
2 Step 2.1: Filter Intro (Preamble) filter_intro
3 Step 2.2: Filter Section (Per Sub-Q) filter_section
4 Step 2.3: Filter Outro (Format) filter_outro
5 Step 3: Generate (Per-Sub-Question) generate_per_subq

The old filter and generate templates remain in the DB (for API backward compatibility) but are hidden from the UI.

Files Changed

File Change
backend/app/core/sqlite_db.py 3 new seed templates + generate_per_subq seed; backfill migration; restored _SEED_GENERATE to {question}/{context}
backend/app/services/prompt_service.py Added 4 step names to _VALID_STEPS
backend/app/routers/prompts.py Added 4 step names to _VALID_STEPS
backend/app/services/relevance_filter.py Refactored _build_per_subq_prompt() to use PromptService + built-in fallback constants
frontend/src/components/PromptEditor.tsx Replaced unused flat steps with 5-step per-sub-q layout (Step 2.1-2.3 + Step 3)
frontend/src/components/PlaceholderDocs.tsx Added {context_sections}, {subq_idx}, {subq_question} docs
backend/app/test/conftest.py Added 4 new templates to mock
backend/app/test/test_phase3_sqlite_db.py Updated counts (9→21 prompts) and placeholder assertions
backend/app/test/test_phase3_prompt_service.py Updated step set + placeholder assertions
backend/app/test/test_phase3_prompts_router.py Updated step set assertion
backend/app/test/test_phase4_prompt_templates.py Updated for split generate/generate_per_subq
frontend/src/test/components/PromptEditor.test.tsx Updated to 5 textareas, new labels, new placeholder layout
frontend/src/test/components/PlaceholderDocs.test.tsx Updated to 6 placeholders

Test Results (Post-Phase 4a)

  • Backend: 295 passed, 5 skipped (pre-existing)
  • Frontend: 182 passed, 1 pre-existing failure (unrelated file-input e2e)

Sub-Phase Summary

Sub-Phase Scope Backend Frontend Tests Status
4.1 Per-sub-q retrieval rag.py, query.py, format helpers None test_phase4_retrieve_per_subquestion.py, test_phase4_query_router_retrieval.py Complete
4.2 Per-sub-q filtering (1 LLM call) relevance_filter.py, query.py None test_phase4_relevance_filter_per_subq.py, test_phase4_query_router_filter.py Complete
4.3 Sub-q-organized response generation rag.py, query.py, models/query.py None test_phase4_generate_per_subq.py, test_phase4_response_format.py Complete
4.4 History schema, prompts, models sqlite_db.py, history.py (router + models), prompt_service.py None test_phase4_history_format.py, test_phase4_prompt_templates.py Complete
4.5 Frontend types + state None types/index.ts, lib/queries.tsx test_phase4_stream_state.test.tsx, test_phase4_types.test.ts Complete
4.6 Frontend rendering None ResponsePanel.tsx, citationParser.ts, ExtractedQuestionsDisplay.tsx test_phase4_response_panel.test.tsx, test_phase4_citation_parser.test.ts Complete
4.7 Testing & polish All affected files All affected files Integration + acceptance + e2e tests Complete
4a Prompt service integration for filter_per_subq sqlite_db.py, prompt_service.py, prompts.py, relevance_filter.py PromptEditor.tsx, PlaceholderDocs.tsx Updated 7 test files, 13 total files changed Complete

Implementation Sequence & Dependencies

4.1 (Retrieval) ──┐
                  ├──► 4.2 (Filtering) ──► 4.3 (Generate) ──► 4.4 (History/Prompts)
                  │                                                    │
                  │                                                    ▼
                  │                                         4.5 (Frontend Types/State)
                  │                                                    │
                  │                                                    ▼
                  │                                         4.6 (Frontend Rendering)
                  │                                                    │
                  └─────────────────────────────────────────────────────▼
                                                              4.7 (Testing & Polish)
  • 4.1 → 4.2 sequential: Filtering needs per-sub-q chunk structure from retrieval
  • 4.2 → 4.3 sequential: Generation needs filtered chunks from filtering stage
  • 4.3 → 4.4 sequential: History recording and prompt templates need final data shapes
  • 4.4 → 4.5 parallel: Backend prompt/history changes don't block frontend type definitions
  • 4.5 → 4.6 sequential: Rendering needs types and state management
  • 4.7 blocked by all: Integration tests need everything wired together

Parallelization opportunity: 4.5 (frontend types) could start as soon as 4.3 defines the SSE contract, but it's safer to start after 4.4 confirms the final data shapes.


Affected Files — Complete Inventory

Backend — New Files

File Sub-Phase Purpose
backend/app/test/test_phase4_retrieve_per_subquestion.py 4.1 Unit test: retrieve_per_subquestion()
backend/app/test/test_phase4_query_router_retrieval.py 4.1 Unit test: retrieval stage in _query_stream
backend/app/test/test_phase4_relevance_filter_per_subq.py 4.2 Unit test: filter_per_subquestion()
backend/app/test/test_phase4_query_router_filter.py 4.2 Unit test: filter stage in _query_stream
backend/app/test/test_phase4_generate_per_subq.py 4.3 Unit test: generate_response_per_subquestion()
backend/app/test/test_phase4_response_format.py 4.3 Unit test: answer format validation
backend/app/test/test_phase4_history_format.py 4.4 Unit test: new XML/JSON history formats
backend/app/test/test_phase4_prompt_templates.py 4.4 Unit test: new generate template
backend/app/test/test_phase4_integration_query_pipeline.py 4.7 Integration test: full per-sub-q pipeline
backend/app/test/acceptance/test_phase4_acceptance_query.py 4.7 Acceptance test: real LLM

Backend — Modified Files

File Sub-Phase Changes
backend/app/services/rag.py 4.1, 4.3 Add retrieve_per_subquestion(), generate_response_per_subquestion()
backend/app/services/relevance_filter.py 4.2 Add filter_per_subquestion()
backend/app/routers/query.py 4.14.4 Refactor _query_stream(), add per-sub-q format helpers, update history recording
backend/app/models/query.py 4.3 Add SubQuestionSources model, update QueryResponse
backend/app/models/history.py 4.4 Add optional per-sub-q count fields
backend/app/core/sqlite_db.py 4.4 Add new columns, update seed generate template
backend/app/services/prompt_service.py 4.4 Update reset_to_defaults() generate template
backend/app/routers/history.py 4.4 Include new fields in detail response
backend/app/core/config.py 4.1 (Maybe) Add retrieval_n_results_per_subq setting

Backend — Tests Needing Update

File Sub-Phase Changes
backend/app/test/test_phase1_rag_service.py 4.7 Add tests for new methods; existing tests unaffected
backend/app/test/test_phase1_relevance_filter.py 4.7 Add tests for filter_per_subquestion()
backend/app/test/test_phase3_query_history_integration.py 4.7 Rewrite pipeline simulation for per-sub-q flow
backend/app/test/test_phase3_prompt_injection.py 4.7 Add tests for new generate template
backend/app/test/acceptance/test_acceptance_phase1_rag_query.py 4.7 Rewrite — SSE parsing + new response shape
backend/app/test/conftest.py 4.7 Add per-sub-q mock helpers

Frontend — New Files

File Sub-Phase Purpose
frontend/src/test/components/test_phase4_response_panel.test.tsx 4.7 Component test: per-sub-q sections
frontend/src/test/utils/test_phase4_citation_parser.test.ts 4.7 Unit test: per-sub-q citation lookup
frontend/src/test/e2e/test_phase4_query_flow.test.tsx 4.7 E2E test: mocked SSE with new format
frontend/src/test/lib/test_phase4_stream_state.test.tsx 4.5 State test: new event shapes
frontend/src/test/lib/test_phase4_types.test.ts 4.5 Type test: type compatibility

Frontend — Modified Files

File Sub-Phase Changes
frontend/src/types/index.ts 4.5 Add SubQuestionSources, update QueryStreamEvent
frontend/src/lib/queries.tsx 4.5 Update QueryStreamState, completed event handler
frontend/src/components/ResponsePanel.tsx 4.6 Redesign — per-sub-question sections with grouped sources
frontend/src/utils/citationParser.ts 4.6 Update buildCitationLookup() for per-sub-q
frontend/src/components/ExtractedQuestionsDisplay.tsx 4.6 Add anchor links to answer sections
frontend/src/pages/LTTPage.tsx 4.6 Pass new props to children

Risk Register

Risk Likelihood Impact Mitigation
LLM struggles with per-sub-q filtering prompt format Medium High — all chunks dropped Use strong prompt constraints, validate JSON, fall back to including all chunks on parse failure
LLM generates answer not matching ## Sub-question N: format Medium Medium — frontend can't parse sections Fall back to rendering as single block if parsing fails. Prompt engineering tuned for format compliance
Same chunk retrieved by multiple sub-questions → duplicated in context High Low — slightly larger prompt but acceptable Accept duplicates. ChromaDB naturally returns same doc if relevant to multiple queries. Each sub-q's evaluation is independent
Per-sub-q retrieval = more ChromaDB queries = slower Medium Medium — N × retrieval latency ChromaDB retrieval is fast (~10-50ms). 5 sub-questions × 10ms = 50ms overhead. Acceptable trade-off for better relevance.
History DB migration fails for existing records Low Low — new columns are NULL-able ALTER TABLE ADD COLUMN ... DEFAULT NULL is safe. Existing records work as before — chunks_retrieved/chunks_filtered still have flat XML.
Frontend rendering breaks on older history records Low Low — answer format differs ResponsePanel renders per-sub-q sections only when subQuestionSources is non-null. Older history records show flat answer as before.
Prompt template migration breaks user-customized prompts Medium Medium — users lose their generate template Warn in docs. The generate template changes fundamentally (single {context}{context_sections}). Users must re-customize.

Acceptance Criteria

Backend

  • POST /api/v1/query retrieves chunks per sub-question (verified by history XML showing <sub_q> wrappers)
  • Filtering uses single LLM call evaluating chunks against their originating sub-question (verified by filter prompt)
  • Response answer is organized by sub-question with ## Sub-question N: headers
  • sub_question_sources in SSE completed event is grouped by sub-question index
  • History records include new grouped XML formats for chunks_retrieved and chunks_filtered
  • History records include grouped sources JSON (list of lists)
  • History records include per-sub-q chunk counts
  • New generate prompt template uses {context_sections} placeholder
  • Prompt service reset_to_defaults() includes new generate template
  • Existing decompose, filter (old), generate_response (old) methods are unchanged
  • All Phase 1, Phase 3, and new Phase 4 unit tests pass (312 passed, 4 skipped)
  • All acceptance tests pass with real LLM (manual run)

Frontend

  • QueryStreamState includes subQuestionSources field
  • ResponsePanel renders per-sub-question sections with expandable source grids
  • Each section's sources are scoped to that sub-question (no cross-contamination)
  • Inline citations [filename, page N] link to the correct PDF viewer page
  • ExtractedQuestionsDisplay shows clickable anchors to answer sections
  • Copy button copies all answer text including section headers
  • Loading states: skeleton per section during generation
  • Empty state: "No relevant information found" per sub-question (not entire response)
  • All 62+ existing frontend tests still pass (183 passed)
  • All new Phase 4 frontend tests pass
  • npm run build succeeds with zero TypeScript errors
  • Manual verification: full query flow works end-to-end

New Dependencies

None. All changes use existing libraries (FastAPI, ChromaDB, OpenAI SDK, React, ReactMarkdown, TanStack Query).


Decisions (All Confirmed)

# Topic Decision
1 Single vs multiple filter LLM calls Single call — user explicitly requested this
2 Filter prompt design Group chunks by sub-question in one prompt. JSON response maps sub-q indices to score arrays
3 Answer format Markdown with ## Sub-question N: <question> headers
4 Sources grouping sub_question_sources: [{index, text, sources}, ...] in SSE + frontend
5 History XML format Add <sub_q idx="N" question="..."> wrappers around chunk groups
6 History DB migration Add 2 new NULL-able columns. No data migration needed.
7 Backward compatibility Preserve old retrieve(), filter(), generate_response() methods. New methods are additive.
8 Deduplication None. Same chunk may appear in multiple sub-questions. Each sub-q evaluates independently.
9 Error handling Per-sub-question graceful degradation. Filter failure → include all chunks for that sub-q. Generate failure → "Unable to generate answer for this sub-question."
10 Frontend rendering engine Keep ReactMarkdown. Parse sections client-side by splitting on ## Sub-question N: headers.

Open Questions

None — all resolved.

# Question Resolution
1 Progressive SSE events? Yes — emit generating_subquestion as each sub-question's answer is generated. Frontend renders sections progressively.
2 retrieval_n_results per sub-question or global? Global — same value for all sub-questions. Simpler config, one setting.
3 Fallback when decomposition returns 0 sub-questions? Fall back to original question — treat as single sub-question. Pipeline runs as 1-sub-q case (retrieval via original question, no filtering needed for single sub-q, flat answer).

Test Plan Summary

Backend (New Tests)

File Tests Coverage
test_phase4_retrieve_per_subquestion.py ~6 Per-sub-q retrieval, empty input, single sub-q, dedup behavior
test_phase4_query_router_retrieval.py ~4 SSE events during retrieval, chunk XML format
test_phase4_relevance_filter_per_subq.py ~6 Per-sub-q filtering, JSON response parsing, threshold behavior
test_phase4_query_router_filter.py ~4 SSE events during filtering, filtered XML format
test_phase4_generate_per_subq.py ~5 Per-sub-q generate, prompt construction, answer format
test_phase4_response_format.py ~4 Answer has ## headers, citations in correct sections
test_phase4_history_format.py ~5 New XML/JSON formats, per-sub-q counts
test_phase4_prompt_templates.py ~3 New generate template, {context_sections} placeholder
test_phase4_integration_query_pipeline.py ~5 Full pipeline simulation
test_phase4_acceptance_query.py ~3 Real LLM end-to-end (manual)

Frontend (New Tests)

File Tests Coverage
test_phase4_stream_state.test.tsx ~4 State updates for new event shapes
test_phase4_types.test.ts ~2 Type compatibility checks
test_phase4_response_panel.test.tsx ~6 Section rendering, source grouping, copy, loading
test_phase4_citation_parser.test.ts ~4 Per-sub-q lookup, cross-section isolation
test_phase4_e2e_query_flow.test.tsx ~3 Full SSE flow with mocked stream

Phase PX: Profile Export/Import (2026-04-27)

Source: User request — "add an export and import function for setting a profile. The format is json."

Scope: Add JSON export/import capability to the System Prompts page. Users can download a profile's prompt configuration as a .json file and import it into another profile (or the same one) to transfer or back up their prompt settings.

Status: 🟡 Planned — not yet implemented.


Objective

Let users:

  1. Export a single profile's prompt templates as a downloadable JSON file
  2. Import a previously exported JSON file to overwrite a profile's prompt templates
  3. Optionally, export all profiles at once for full configuration backup

Decision Register

# Decision Rationale
P1 Export single profiles, not all-at-once by default User asked "for setting a profile" — per-profile export/import is more practical for sharing individual configurations. Add "Export All" as secondary option.
P2 Import overwrites ALL prompt steps for target profile Simplest mental model. Import = full replace (not merge). User gets confirmation dialog before proceeding.
P3 Export JSON includes all 7 steps (including legacy filter, generate) Even though UI hides these, the DB stores them. Export should be a complete snapshot — import restores all 7.
P4 Do NOT export auto-increment IDs id fields are not portable between databases. Import inserts new rows; joins on (name, step_name) uniqueness.
P5 created_at/updated_at reset on import Imported profiles get fresh timestamps (datetime('now')). Original export timestamp preserved in file metadata only.
P6 Active profile state NOT imported is_active is deployment-specific. The user sets active profile separately via the existing dropdown. Import only touches prompt_template content.
P7 Validate profile name on import Only A, B, C allowed. Import into non-existent name = rejected.
P8 JSON schema versioned "format": "legco-reranker-profile/v1" for future-proofing. Reject unknown versions on import.

JSON Format Specification

Single Profile Export

{
  "format": "legco-reranker-profile/v1",
  "profile_name": "A",
  "exported_at": "2026-04-27T12:00:00Z",
  "prompts": {
    "decompose": "Given this question: '{question}'\n\nBreak it down into 2-5 simplified sub-questions...",
    "filter": "Given question '{question}' and these document chunks:\n\n{chunks}\n\n...",
    "generate": "Question: {question}\n\nContext:\n{context}\n\n...",
    "generate_per_subq": "Answer each sub-question using ONLY its document chunks...",
    "filter_intro": "Evaluate each chunk for relevance to its associated sub-question only.",
    "filter_section": "\nSub-question {subq_idx}: \"{subq_question}\"\n{chunks}",
    "filter_outro": "\nFor each chunk, rate its relevance 0-10..."
  }
}

Full Backup Export (All Profiles)

{
  "format": "legco-reranker-profile/v1",
  "exported_at": "2026-04-27T12:00:00Z",
  "active_profile": "A",
  "profiles": {
    "A": {
      "prompts": { ... }
    },
    "B": {
      "prompts": { ... }
    },
    "C": {
      "prompts": { ... }
    }
  }
}

Import Request Format

POST /api/v1/prompts/profiles/{name}/import
Content-Type: application/json

{
  "format": "legco-reranker-profile/v1",
  "profile_name": "A",
  "exported_at": "2026-04-27T12:00:00Z",
  "prompts": {
    "decompose": "...",
    ...
  }
}

Response:

{
  "status": "ok",
  "profile": "B",
  "imported_steps": 7,
  "source_profile": "A"
}

Sub-Phase Structure

Sub-Phase Scope Components Test Files
PX.1 Backend — Export endpoint routers/prompts.py, models/prompts.py test_phaseX_export.py
PX.2 Backend — Import endpoint routers/prompts.py, models/prompts.py, prompt_service.py test_phaseX_import.py
PX.3 Frontend — Export/Import UI SystemPromptsPage.tsx, ProfileList.tsx, lib/api.ts, lib/queries.tsx, types/index.ts test_phaseX_export_import.test.tsx
PX.4 Testing & Polish All affected files Integration + acceptance tests

Sub-Phase PX.1: Backend — Single Profile Export Endpoint

Test files to write first:

  • backend/app/test/test_phaseX_export.py — Tests export endpoint, JSON schema validation, empty profile handling

Task PX.1.1: Add Pydantic models

File: backend/app/models/prompts.py

class ProfileExportResponse(BaseModel):
    format: str = "legco-reranker-profile/v1"
    profile_name: str
    exported_at: str
    prompts: dict[str, str]

class AllProfilesExportResponse(BaseModel):
    format: str = "legco-reranker-profile/v1"
    exported_at: str
    active_profile: str
    profiles: dict[str, dict[str, dict[str, str]]]  # profile_name -> {"prompts": {step: text}}

Task PX.1.2: Add GET /api/v1/prompts/profiles/{name}/export endpoint

File: backend/app/routers/prompts.py

  • Reads all 7 system_prompts rows for the given profile
  • Returns ProfileExportResponse with Content-Disposition: attachment; filename="legco-profile-{name}.json"
  • Uses application/json content type

Task PX.1.3: Add GET /api/v1/prompts/export/all endpoint (optional)

  • Reads all 3 profiles + all 21 prompt rows
  • Returns AllProfilesExportResponse
  • For full backup/restore scenarios

Commit: "feat(prompts): add single-profile and full JSON export endpoints"


Sub-Phase PX.2: Backend — Single Profile Import Endpoint

Test files to write first:

  • backend/app/test/test_phaseX_import.py — Tests import endpoint, validation, error cases

Task PX.2.1: Add request model

File: backend/app/models/prompts.py

class ProfileImportRequest(BaseModel):
    format: str                                          # must be "legco-reranker-profile/v1"
    profile_name: str                                    # source profile name (informational)
    exported_at: str | None = None                       # informational timestamp
    prompts: dict[str, str]                              # step_name -> template_text

Task PX.2.2: Add POST /api/v1/prompts/profiles/{name}/import endpoint

File: backend/app/routers/prompts.py

Validation steps:

  1. Check target {name} is A, B, or C → 400 if not
  2. Check request.format == "legco-reranker-profile/v1" → 400 if not
  3. Validate that all 7 required step keys (decompose, filter, generate, generate_per_subq, filter_intro, filter_section, filter_outro) are present in request.prompts → 400 with list of missing keys if not
  4. Validate no extra/unknown step keys → reject (or warn? → decision: reject with 400, listing unknown keys)

Implementation:

  • Uses PromptService._update_all_prompts() (existing batch-update internally) to overwrite all 7 steps
  • Each step gets fresh created_at/updated_at timestamps (DB defaults)
  • Returns {"status": "ok", "profile": name, "imported_steps": len(prompts), "source_profile": request.profile_name}

Task PX.2.3: Add POST /api/v1/prompts/import/all endpoint (optional)

  • Accepts AllProfilesExportResponse format
  • Imports all 3 profiles at once
  • Does NOT change active profile (only if explicitly included)

Commit: "feat(prompts): add single-profile JSON import endpoint with full validation"


Sub-Phase PX.3: Frontend — Export/Import UI

Test files to write first:

  • frontend/src/test/components/test_phaseX_export_import.test.tsx — Tests export/import buttons, file download, file upload

Task PX.3.1: Add TypeScript types

File: frontend/src/types/index.ts

interface ProfileExportData {
  format: string
  profile_name: string
  exported_at: string
  prompts: Record<string, string>
}

interface ProfileImportResponse {
  status: string
  profile: string
  imported_steps: number
  source_profile: string
}

Task PX.3.2: Add API client functions

File: frontend/src/lib/api.ts

// Download a profile as JSON blob for browser-side save
export const exportProfile = async (name: string): Promise<ProfileExportData> => {
  const resp = await apiClient.get<ProfileExportData>(`/prompts/profiles/${name}/export`)
  return resp.data
}

// Import a profile from JSON
export const importProfile = async (name: string, data: ProfileExportData): Promise<ProfileImportResponse> => {
  const resp = await apiClient.post<ProfileImportResponse>(`/prompts/profiles/${name}/import`, data)
  return resp.data
}

Task PX.3.3: Add TanStack Query mutation for import

File: frontend/src/lib/queries.tsx

export const useImportProfile = () => {
  const queryClient = useQueryClient()
  return useMutation({
    mutationFn: ({ name, data }: { name: string; data: ProfileExportData }) =>
      importProfile(name, data),
    onSuccess: () => {
      queryClient.invalidateQueries({ queryKey: ['prompts'] })
    },
  })
}

Task PX.3.4: Add Export button to ProfileList cards

File: frontend/src/components/ProfileList.tsx

  • Add export icon button (e.g., Download from lucide-react) next to the "Edit" button on each card
  • On click: calls exportProfile(name) via fetch → creates blob → triggers browser download via URL.createObjectURL + <a> click
  • Filename: legco-profile-{name}-{date}.json

Task PX.3.5: Add Import button and dialog to SystemPromptsPage

File: frontend/src/pages/SystemPromptsPage.tsx

  • Add "Import" button in the top bar (next to "Active Profile" dropdown)
  • On click: opens a modal/dialog with:
    • File input (accept .json) — hidden <input type="file"> triggered by styled button
    • After file selected: parse JSON client-side, show preview (source profile name, export date, step count)
    • Target profile selector (dropdown: A, B, C) — defaults to source profile name if valid
    • "Import" button → confirmation dialog ("This will overwrite all prompts for Profile {target}. Continue?")
    • On confirm: calls importProfileMutation.mutate()
    • Success: show toast "Profile {target} imported successfully ({n} steps from Profile {source})"
    • Error: show inline error message with details

Task PX.3.6: Add Export All button (optional)

File: frontend/src/pages/SystemPromptsPage.tsx

  • "Export All" button in top bar
  • Downloads all 3 profiles as legco-profiles-{date}.json

Commit: "feat(prompts): add export/import UI with file download, upload dialog, and validation"


Sub-Phase PX.4: Testing & Polish

Test files:

  • backend/app/test/test_phaseX_export.py — Export endpoint: valid profile, invalid name, JSON schema validation
  • backend/app/test/test_phaseX_import.py — Import endpoint: valid import, missing steps, extra steps, invalid format version, invalid target name
  • frontend/src/test/components/test_phaseX_export_import.test.tsx — Export button click → download, Import dialog flow → file upload → preview → confirm → success/error

Task PX.4.1: Backend unit tests

  • test_export_profile_valid — GET export/A returns all 7 steps with correct format version
  • test_export_profile_invalid_name — GET export/X returns 400
  • test_export_all — GET export/all returns 3 profiles, 21 prompts total
  • test_import_valid — POST import/B with valid JSON → 200, verify all 7 steps updated
  • test_import_overwrites_existing — POST import/B → verify old content replaced
  • test_import_missing_required_step — POST import with only 6 steps → 400 with missing key listed
  • test_import_unknown_step_key — POST import with extra step → 400
  • test_import_invalid_format_version — POST import with format: "v2" → 400
  • test_import_invalid_target_name — POST import/X → 400
  • test_import_does_not_change_active — import into inactive profile → active profile unchanged

Task PX.4.2: Frontend tests

  • Export button visible on each profile card
  • Click export → fetch called, download triggered
  • Import dialog opens on button click
  • File selection → JSON parsed, preview shown
  • Invalid JSON file → error message shown
  • Target profile selector defaults to source profile
  • Confirm import → mutation called, success toast
  • Import error → inline error message
  • Export All downloads all profiles

Task PX.4.3: Integration verification

  • npm run build — no TypeScript errors
  • npm test — all frontend tests pass
  • pytest backend/app/test/test_phaseX_*.py -v — all backend tests pass
  • Manual flow: export Profile A → edit Profile B → import exported file into B → verify B's prompts match A's original

Commit: "test(prompts): add unit, integration tests for export/import"


Files Affected — Complete Inventory

Backend — New Files

File Sub-Phase Purpose
backend/app/test/test_phaseX_export.py PX.4 Unit tests for export endpoint
backend/app/test/test_phaseX_import.py PX.4 Unit tests for import endpoint

Backend — Modified Files

File Sub-Phase Changes
backend/app/models/prompts.py PX.1, PX.2 Add ProfileExportResponse, AllProfilesExportResponse, ProfileImportRequest, ProfileImportResponse
backend/app/routers/prompts.py PX.1, PX.2 Add GET /export, GET /export/all, POST /import endpoints

Frontend — New Files

File Sub-Phase Purpose
frontend/src/test/components/test_phaseX_export_import.test.tsx PX.4 Component tests for export/import UI

Frontend — Modified Files

File Sub-Phase Changes
frontend/src/types/index.ts PX.3 Add ProfileExportData, ProfileImportResponse types
frontend/src/lib/api.ts PX.3 Add exportProfile(), importProfile() API functions
frontend/src/lib/queries.tsx PX.3 Add useImportProfile() mutation hook
frontend/src/components/ProfileList.tsx PX.3 Add Export button per profile card
frontend/src/pages/SystemPromptsPage.tsx PX.3 Add Import/Export All buttons, import dialog/modal

Acceptance Criteria

Backend

  • GET /api/v1/prompts/profiles/A/export returns JSON with all 7 steps, correct format version
  • GET /api/v1/prompts/profiles/X/export returns 400 (invalid profile name)
  • GET /api/v1/prompts/export/all returns all 3 profiles, active profile marker
  • POST /api/v1/prompts/profiles/B/import with valid payload overwrites all 7 steps for Profile B
  • Import rejects payload with missing required step keys (400 + key names)
  • Import rejects payload with unknown step keys (400 + key names)
  • Import rejects payload with unknown format version (400)
  • Import does NOT change is_active flag on target profile
  • Exported JSON does NOT contain internal DB IDs (id/profile_id)
  • All existing prompt API endpoints still work unchanged

Frontend

  • Export button visible on each profile card in ProfileList
  • Clicking Export downloads a .json file with correct naming (legco-profile-A-2026-04-27.json)
  • Import button visible on SystemPromptsPage top bar
  • Clicking Import opens a modal with: file input, JSON preview, target profile selector, confirm button
  • Selecting invalid JSON file shows error message
  • Importing into a valid profile shows success confirmation with step count
  • Import error from backend shows inline error message
  • After successful import, profile data refreshes (query invalidation)
  • All existing System Prompts functionality still works unchanged

Risk Register

Risk Likelihood Impact Mitigation
JSON file too large to upload Low Low — 7 prompts × ~2KB = ~14KB Add 1MB limit on import endpoint (FastAPI Body(max_length=...))
User imports into wrong profile by mistake Medium Medium — overwrites their existing config Confirmation dialog with source/target profile names clearly displayed before import
Exported file missing legacy filter/generate steps Medium Medium — import would fail validation Always export all 7 steps (even hidden ones). Import validates all 7 are present.
Browser download API differences Low Low Use standard Blob + URL.createObjectURL approach, tested across Chrome/Firefox
Import endpoint receives malformed JSON Low Low — Pydantic validation catches this ProfileImportRequest model validates format string, dict keys, value types
User exports from one deployment and imports into another with different profile names Low Low — only 3 names (A/B/C) Import only into A/B/C — if source was "D", user must choose target manually

New Dependencies

None. All changes use existing libraries (FastAPI, Pydantic, React, TanStack Query, lucide-react icons).


Implementation Sequence

PX.1 (Backend Export) ──► PX.2 (Backend Import)
                              │
                              ▼
                         PX.3 (Frontend UI)
                              │
                              ▼
                         PX.4 (Testing)

PX.1 and PX.2 can be done together (both in routers/prompts.py). PX.3 depends on knowing the exact API contracts from PX.1/PX.2. PX.4 runs after everything is wired.