50 KiB

Raw Blame History

Package 4 Enhancement Plan — Per-Sub-Question RAG Pipeline

Source: User request (2026-04-26)
Scope: Refactor the 3-step RAG query pipeline so retrieval, filtering, and response generation are organized per sub-question instead of batch-flattened.
Status: ✅ Complete — All 7 sub-phases implemented (2026-04-26). Phase 4a Prompt Integration added (2026-04-27).

Objective

Restructure the POST /api/v1/query pipeline so that:

Retrieval per sub-question: Each sub-question independently retrieves n_results chunks from ChromaDB (instead of joining all sub-questions into one query string).
Filtering per sub-question: Each chunk is evaluated for relevance against its own originating sub-question (not the original user question). One LLM call handles all filtering — the prompt is redesigned to group chunks by sub-question.
Final answer organized by sub-question: Each sub-question gets its own bullet-point answer with its own sources. The frontend renders answer sections per sub-question rather than one monolithic bullet list.

Decision Register

#	Decision	Rationale
1	Keep `QueryDecomposer` unchanged	Input/output contract is identical — decomposition still produces a flat list of sub-questions
2	Single LLM call for filtering	User explicitly requested one call. Prompt redesigned to carry sub-question context for each chunk group
3	Keep `RAGService.retrieve()` signature	Call it N times (once per sub-question) externally in the orchestrator rather than changing its internal contract
4	Add `retrieve_per_subquestion()` to `RAGService`	New method that iterates over sub-questions, calls `retrieve()` per question, returns grouped results
5	Redesign `generate_response()` signature	Accepts structured `sub_questions: List[SubQuestionContext]` instead of flat chunk lists
6	SSE events: add `generating_subquestion` phase	Progressive streaming — frontend sees which sub-question is being answered
7	History: change XML/JSON formats in-place	Add `<sub_q>` wrappers to `chunks_retrieved`/`chunks_filtered` XML. Add sub-question grouping to `sources` JSON. No new DB columns.
8	Final answer format: markdown sections	`## Sub-question 1` headers with inline citations. Backward-compatible with existing `ReactMarkdown` rendering
9	Deduplicate chunks within a sub-question only	Same chunk may be retrieved by multiple sub-questions. Keep duplicates (different sub-questions need independent evaluation). ChromaDB `query()` naturally may return the same doc for different queries — this is acceptable.
10	Prompt template: add `generate` placeholders	New placeholder `{context_sections}` replaces single `{context}`. Filter template unchanged (sub-question injected at call site). Decompose template unchanged.
11	Progressive SSE events	Emit `generating_subquestion` event as each sub-question's answer section is generated. Frontend renders sections one by one.
12	`retrieval_n_results`	Global — same value for all sub-questions. Use existing `settings.retrieval_n_results` config.
13	Empty decomposition fallback	Treat original user question as single sub-question. Pipeline runs as 1-sub-q case — single retrieval, no filtering needed (one sub-q = no ambiguity), flat answer with `##` header.

Pipeline: Before vs After

Before (Current — Flat Batch)

User Question: "What are NEC4 time extension clauses?"
         │
    ┌────▼─────┐
    │ Decompose│  LLM Call 1
    │ → ["What are time extensions?", 
    │    "What notice is required?"]
    └────┬─────┘
         │ joined: "What are time extensions? What notice is required?"
    ┌────▼─────┐
    │ Retrieve │  1 ChromaDB query → 10 chunks (flat, no sub-q association)
    └────┬─────┘
         │ 10 chunks
    ┌────▼─────┐
    │  Filter  │  LLM Call 2 — all chunks scored against ORIGINAL question
    │          │  Score > 7 → keep (flat, no sub-q association)
    └────┬─────┘
         │ N filtered chunks
    ┌────▼─────┐
    │ Generate │  LLM Call 3 — flat answer from ALL filtered chunks
    │          │  "• Time extensions require notice [NEC4 ACC.pdf, p3]
    │          │   • The project manager must acknowledge [NEC4, p7]
    │          │   • Notice is defined as..."  (sources from all sub-qs mixed)
    └────┬─────┘
         │ single SSE completed event
    ┌────▼─────┐
    │ Frontend │  1 ReactMarkdown block, 1 flat sources list
    └──────────┘

After (Per-Sub-Question)

User Question: "What are NEC4 time extension clauses?"
         │
    ┌────▼─────┐
    │ Decompose│  LLM Call 1 (UNCHANGED)
    │ → ["What are time extensions?",
    │    "What notice is required?"]
    └────┬─────┘
         │ sub_q1                    sub_q2
    ┌────▼─────┐              ┌────▼─────┐
    │ Retrieve │              │ Retrieve │   2 ChromaDB queries → 10 chunks each
    │ q1 → 10  │              │ q2 → 10  │   chunks tagged with sub-q index
    └────┬─────┘              └────┬─────┘
         │                         │
         └─────────┬───────────────┘
                   │ grouped: {sub_q0: [chunks 0-9], sub_q1: [chunks 10-19]}
              ┌────▼─────┐
              │  Filter  │  LLM Call 2 (SINGLE CALL — redesigned prompt)
              │          │  Each chunk scored against its OWN sub-question
              │          │  Returns grouped scores → filtered per sub-q
              └────┬─────┘
                   │ filtered_by_subq: {0: [chunk_a, chunk_b], 1: [chunk_c]}
              ┌────▼─────┐
              │ Generate │  LLM Call 3 (redesigned prompt with per-sub-q context)
              │          │  ┌─────────────────────────────────────┐
              │          │  │ ## What are time extensions?         │
              │          │  │ - Time extensions must be notified   │
              │          │  │   [NEC4 ACC.pdf, page 3]             │
              │          │  │ - The project manager has 2 weeks    │
              │          │  │   [NEC4 Contract.pdf, page 12]       │
              │          │  │                                      │
              │          │  │ ## What notice is required?          │
              │          │  │ - Written notice must be given       │
              │          │  │   [NEC4 ACC.pdf, page 7]             │
              │          │  └─────────────────────────────────────┘
              └────┬─────┘
                   │ SSE events: generating_subquestion (per sub-q) → completed
              ┌────▼─────┐
              │ Frontend │  Sections per sub-question, sources grouped per section
              └──────────┘

Current State (Pre-Enhancement)

Backend

Component	File	Current Behavior
Decomposer	`services/query_decomposer.py`	`decompose(question) -> (List[str], prompt)` — returns 2-5 sub-questions
Retrieval	`services/rag.py:retrieve()`	`query_text = " ".join(query_keywords)` — joins all sub-qs into ONE string, single ChromaDB query → flat chunk list
Filter	`services/relevance_filter.py`	`filter(question, chunks)` — ALL chunks scored against ORIGINAL question, single LLM call, flat output
Generate	`services/rag.py:generate_response()`	`generate_response(question, chunks, metadata)` — flat chunks → flat bullet answer
Orchestrator	`routers/query.py:_query_stream()`	Linear 4-stage pipeline: decompose → retrieve → filter → generate
SSE Events	`routers/query.py`	`decomposed → retrieving → filtering → generating → completed` — flat answer + sources in `completed`
History	`services/history_service.py`	Flat XML for `chunks_retrieved`/`chunks_filtered`. Flat JSON for `sources`. Single timing per stage.
Prompt templates	`prompt_service.py` + `sqlite_db.py`	3 steps (`decompose`, `filter`, `generate`). Placeholders: `{question}`, `{chunks}`, `{context}`
Config	`core/config.py`	`retrieval_n_results=10`, `relevance_threshold=7.0`

Frontend

Component	File	Current Behavior
Types	`types/index.ts`	`QueryStreamEvent.phase`, flat `extracted_questions: string[]`, flat `answer: string`, flat `sources: SourceMetadata[]`
SSE Client	`lib/api.ts`	`queryDocumentStream()` — generic `JSON.parse` per `data:` line, no sub-question awareness
State	`lib/queries.tsx`	`QueryStreamState` with flat `answer`/`sources`/`extractedQuestions`
Response	`components/ResponsePanel.tsx`	Single `ReactMarkdown` block for answer. Flat 2-column grid for sources. No sub-question grouping.
Questions	`components/ExtractedQuestionsDisplay.tsx`	`<ol>` list of question strings. No sources attached.
Citations	`utils/citationParser.ts`	Flat `sources` lookup — `buildCitationLookup(sources)` returns global map
Progress	`components/PipelineProgress.tsx`	4-step stepper (NOT currently wired in LTTPage)

Key Test Files

File	Lines	Status
`test_phase1_query_decomposer.py`	76	✅ Unchanged — decomposer contract stays
`test_phase1_rag_service.py`	139	🔴 Needs update — `retrieve()`, `generate_response()` signatures change
`test_phase1_relevance_filter.py`	93	🟡 Needs update — one-call pattern changes to per-sub-q grouping
`test_phase1_query.py`	97	🟢 Already skipped (SSE migration) — may un-skip later
`test_phase3_query_history_integration.py`	608	🔴 Major rewrite — pipeline simulation mirrors `_query_stream` 1:1
`test_phase3_prompt_injection.py`	238	🟡 Moderate — new generate template placeholder
`test_acceptance_phase1_rag_query.py`	101	🔴 Full rewrite — already broken (SSE vs JSON), new response shape
`conftest.py`	94	🟡 Low — may add per-sub-q mock helpers

Implementation Tasks

Sub-Phase 4.1: Backend — Per-Sub-Question Retrieval

Test files to write first:

test_phase4_retrieve_per_subquestion.py — Tests RAGService.retrieve_per_subquestion()
test_phase4_query_router_retrieval.py — Tests _query_stream retrieval stage produces per-sub-q chunks

Task 4.1.1: Add retrieve_per_subquestion() to RAGService

File: backend/app/services/rag.py

New method signature:

def retrieve_per_subquestion(
    self,
    sub_questions: List[str],
    n_results: int = 10,
) -> List[Tuple[str, List[Tuple[str, Dict[str, Any], float]]]]:
    """Retrieve chunks for each sub-question independently.

    Args:
        sub_questions: List of decomposed sub-questions.
        n_results: Number of chunks per sub-question.

    Returns:
        List of (sub_question, chunks) tuples.
        chunks is the standard retrieve() output: [(text, metadata, distance), ...].
    """

Implementation:

Call self.retrieve([sub_q], n_results) for each sub-question
Return list of (sub_question, chunks) — chunks remain deduplicated at ChromaDB level (ChromaDB automatically deduplicates by ID)
Existing retrieve() method is NOT modified — it continues to work as before

Task 4.1.2: Update _query_stream() retrieval stage

File: backend/app/routers/query.py

Changes:

Replace rag.retrieve(extracted_questions, n_results) with rag.retrieve_per_subquestion(extracted_questions, n_results)
Track per-sub-question retrieval timing (new field or combined timing)
Format chunks_retrieved XML with sub-question wrappers

New chunks_retrieved XML format:

<sub_q idx="0" question="What are time extensions?">
<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Content: Clause 61.3 states that...
</chunk_1>
<chunk_2>
...
</chunk_2>
</sub_q>
<sub_q idx="1" question="What notice is required?">
<chunk_1>
Filename: NEC4 Contract.pdf
Page: 12
Content: Notice must be given...
</chunk_1>
...
</sub_q>

Task 4.1.3: Format helpers

File: backend/app/routers/query.py

New functions:

def format_chunks_retrieved_per_subq(results: List[Tuple[str, List]]) -> str:
    """Format per-sub-question retrieved chunks as XML."""
    
def format_chunks_filtered_per_subq(results: List[Tuple[str, List]]) -> str:
    """Format per-sub-question filtered chunks as XML with relevance scores."""

Commit: "feat: Phase 4.1 per-sub-question retrieval with grouped chunk XML"

Sub-Phase 4.2: Backend — Per-Sub-Question Filtering (Single LLM Call)

Test files to write first:

test_phase4_relevance_filter_per_subq.py — Tests RelevanceFilter.filter_per_subquestion() with grouped chunks
test_phase4_query_router_filter.py — Tests filter stage with per-sub-q chunk groups

Task 4.2.1: Add filter_per_subquestion() to RelevanceFilter

File: backend/app/services/relevance_filter.py

New method signature:

async def filter_per_subquestion(
    self,
    sub_questions: List[str],
    sub_chunks: List[List[Tuple[str, Dict]]],
    threshold: float = 7.0,
) -> Tuple[List[Tuple[str, List[Tuple[str, Dict]]]], str]:
    """Filter chunks per sub-question in a single LLM call.

    Args:
        sub_questions: List of decomposed sub-questions.
        sub_chunks: List of chunk lists (one per sub-question).
        threshold: Minimum relevance score.

    Returns:
        Tuple of (filtered_results, prompt).
        filtered_results: List of (sub_question, filtered_chunks_for_that_q).
    """

Prompt design (single LLM call):

Evaluate each chunk for relevance to its associated sub-question.

Sub-question 0: "{sub_q_0}"
Chunk 0: {chunk_0_text}
Chunk 1: {chunk_1_text}
...

Sub-question 1: "{sub_q_1}"
Chunk 0: {chunk_0_text}
Chunk 1: {chunk_1_text}
...

For each chunk, rate relevance 0-10 considering ONLY its associated sub-question.
Return a JSON object mapping sub-question indices to arrays of scores:
{"0": [8.5, 3.2, 9.0], "1": [7.0, 6.5, 9.1]}

Key rules:

Each chunk is evaluated against its own sub-question (not the original user question)
JSON keys are stringified sub-question indices ("0", "1", ...)
Score arrays MUST match chunk count for each sub-question
Same JSON extraction/markdown stripping logic as existing filter()

Existing filter() method is preserved — not modified, not deprecated. The new method is additive.

Task 4.2.2: Update _query_stream() filter stage

File: backend/app/routers/query.py

Changes:

Call relevance_filter.filter_per_subquestion(extracted_questions, chunks_for_filter, threshold) instead of relevance_filter.filter(question, chunks, threshold)
Build chunks_for_filter from per-sub-question retrieval results
Track filter_prompt (the redesigned prompt)
Format chunks_filtered XML with sub-question wrappers and Relevance: scores

New chunks_filtered XML format:

<sub_q idx="0" question="What are time extensions?">
<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Relevance: 8.5
Content: Clause 61.3 states that...
</chunk_1>
</sub_q>
<sub_q idx="1" question="What notice is required?">
<chunk_1>
Filename: NEC4 Contract.pdf
Page: 12
Relevance: 9.0
Content: Notice must be given...
</chunk_1>
</sub_q>

Commit: "feat: Phase 4.2 per-sub-question filtering with single LLM call"

Sub-Phase 4.3: Backend — Sub-Question-Organized Response Generation

Test files to write first:

test_phase4_generate_per_subq.py — Tests RAGService.generate_response_per_subquestion()
test_phase4_response_format.py — Tests the final answer matches expected format

Task 4.3.1: Redesign generate_response() → generate_response_per_subquestion()

File: backend/app/services/rag.py

New method signature:

async def generate_response_per_subquestion(
    self,
    sub_questions: List[str],
    sub_chunks: List[List[str]],
    sub_metadata: List[List[Dict[str, Any]]],
) -> Tuple[str, str, List[List[SourceMetadata]]]:
    """Generate sub-question-organized RAG response.

    Args:
        sub_questions: List of decomposed sub-questions.
        sub_chunks: List of chunk text lists (one per sub-question).
        sub_metadata: List of metadata dict lists (one per sub-question).

    Returns:
        Tuple of (answer, prompt, grouped_sources).
        answer: Markdown string with sections per sub-question.
        prompt: The rendered LLM prompt.
        grouped_sources: List of SourceMetadata lists (one per sub-question).
    """

New prompt template (replaces generate):

You must answer each sub-question using ONLY the document chunks provided for it.
Do not use any external knowledge.
Format your answer as markdown sections — one section per sub-question.
Each section should start with "## Sub-question N: <the question>"
Each section should contain 1-5 bullet points.
Cite your sources inline using bracket labels, e.g. [filename, page N].
Place the citation at the end of each relevant bullet point.

{context_sections}

Answer:

Context format (replaces {context}):

### Context for Sub-question 0: "What are time extensions?"
[NEC4 ACC.pdf, page 3] Source: NEC4 ACC.pdf
Summary: Clause 61.3 discusses time extensions...
Content: Clause 61.3 states that the project manager...

[NEC4 Contract.pdf, page 12] Source: NEC4 Contract.pdf
Summary: Notice requirements for time extensions...
Content: Written notice must be given within...

### Context for Sub-question 1: "What notice is required?"
[NEC4 ACC.pdf, page 7] Source: NEC4 ACC.pdf
Summary: Notice requirements...
Content: The contractor shall notify the project manager in writing...

Expected answer format:

## Sub-question 1: What are time extensions?
- Time extensions must be notified to the project manager within 2 weeks [NEC4 ACC.pdf, page 3]
- The project manager must acknowledge the notice within 1 week [NEC4 Contract.pdf, page 12]

## Sub-question 2: What notice is required?
- Written notice must be given [NEC4 ACC.pdf, page 7]

Existing generate_response() is preserved — not modified, not deprecated.

Task 4.3.2: Update _query_stream() generate stage

File: backend/app/routers/query.py

Changes:

Call rag.generate_response_per_subquestion(extracted_questions, chunk_texts_by_subq, metadata_by_subq)
New SSE event: generating_subquestion — emitted before each sub-question's section (lets frontend show progressive build)
completed SSE event includes both answer (markdown string) and sub_question_sources (grouped sources)

New SSE event sequence:

{"phase": "decomposed", "extracted_questions": ["q1", "q2"]}
{"phase": "retrieving"}
{"phase": "filtering"}
{"phase": "generating"}
{"phase": "completed", "answer": "## Sub-question 1: ...\n\n...", "sub_question_sources": [[SourceMetadata, ...], [SourceMetadata, ...]]}
{"phase": "error", "message": "..."}

New QueryResponse model:

File: backend/app/models/query.py

class SubQuestionSources(BaseModel):
    sub_question_index: int
    sub_question_text: str
    sources: List[SourceMetadata]

class QueryResponse(BaseModel):
    extracted_questions: List[str]
    answer: str                          # Markdown with ## sections
    sub_question_sources: List[SubQuestionSources]  # Grouped sources
    # Backward compat:
    sources: List[SourceMetadata]        # Flattened version (all sources)

Commit: "feat: Phase 4.3 sub-question-organized response generation"

Sub-Phase 4.4: Backend — History & Prompt Template Updates

Test files to write first:

test_phase4_history_format.py — Tests new XML/JSON history formats
test_phase4_prompt_templates.py — Tests new generate template with {context_sections}

Task 4.4.1: Update history recording

File: backend/app/routers/query.py (the _schedule_history / _record_history helpers)

Changes:

chunks_retrieved: Store new grouped XML format (with <sub_q> wrappers)
chunks_filtered: Store new grouped XML format (with <sub_q> wrappers and Relevance: scores)
sources: Store grouped JSON: json.dumps([[SourceMetadata_dict, ...], [...]]) (list of lists)
final_answer: Store markdown string with ## sections
Existing fields (chunks_retrieved_count, chunks_filtered_count) keep total counts
New optional fields: chunks_retrieved_per_subq_count, chunks_filtered_per_subq_count (JSON array of ints)

Task 4.4.2: Update history DB schema (minimal)

File: backend/app/core/sqlite_db.py

Add two new columns (optional, NULL-able):

ALTER TABLE query_history ADD COLUMN chunks_retrieved_per_subq_count TEXT DEFAULT NULL;
ALTER TABLE query_history ADD COLUMN chunks_filtered_per_subq_count TEXT DEFAULT NULL;

These store JSON arrays like [10, 8] — one count per sub-question. NULL for pre-Package-4 records.

Task 4.4.3: Update history Pydantic models

File: backend/app/models/history.py

Add optional fields to QueryHistoryRecord and QueryHistoryDetail:

chunks_retrieved_per_subq_count: Optional[str] = None  # JSON array string
chunks_filtered_per_subq_count: Optional[str] = None    # JSON array string

Task 4.4.4: Update prompt templates

File: backend/app/core/sqlite_db.py (seed data)

New generate template:

"generate": (
    "You must answer each sub-question using ONLY the document chunks provided for it.\n"
    "Do not use any external knowledge.\n"
    "Format your answer as markdown sections — one section per sub-question.\n"
    "Each section should start with \"## Sub-question N: <the question>\"\n"
    "Each section should contain 1-5 bullet points.\n"
    "Cite your sources inline using bracket labels, e.g. [filename, page N].\n"
    "Place the citation at the end of each relevant bullet point.\n\n"
    "{context_sections}\n\n"
    "Answer:"
)

decompose and filter templates remain unchanged (they still use {question} placeholder — the orchestrator injects the right value at call time).

Task 4.4.5: Update PromptService to handle new template placeholder

File: backend/app/services/prompt_service.py

Add context_sections as a known placeholder for the generate step (optional — str.replace already safe with unknown keys)
The reset_to_defaults() method must include the new generate template

Task 4.4.6: Update history detail API response

File: backend/app/routers/history.py

GET /api/v1/history/{id} response now includes chunks_retrieved_per_subq_count and chunks_filtered_per_subq_count when they are not NULL. Backward-compatible (older records return null for these fields).

Commit: "feat: Phase 4.4 history schema, prompt templates, and Pydantic model updates"

Sub-Phase 4.5: Frontend — Types & State Management

Test files to write first:

test_phase4_stream_state.test.tsx — Tests QueryStreamState handles new response shape
test_phase4_types.test.ts — Tests type compatibility

Task 4.5.1: Update TypeScript types

File: frontend/src/types/index.ts

New types:

interface SubQuestionSources {
  sub_question_index: number;
  sub_question_text: string;
  sources: SourceMetadata[];
}

interface QueryStreamCompletedEvent {
  phase: 'completed';
  answer: string;                              // Markdown with ## sections
  sub_question_sources: SubQuestionSources[];  // Grouped sources
}

interface QueryStreamDecomposedEvent {
  phase: 'decomposed';
  extracted_questions: string[];
}

type QueryStreamEvent = 
  | QueryStreamDecomposedEvent
  | { phase: 'retrieving' | 'filtering' | 'generating' }
  | QueryStreamCompletedEvent
  | { phase: 'error'; message: string };

Task 4.5.2: Update QueryStreamState and mutation handler

File: frontend/src/lib/queries.tsx

Changes:

interface QueryStreamState {
  extractedQuestions: string[] | null;
  answer: string | null;                        // Full markdown
  subQuestionSources: SubQuestionSources[] | null;  // NEW — grouped sources
  phase: 'idle' | 'decomposing' | 'retrieving' | 'filtering' | 'generating' | 'completed' | 'error';
  error: Error | null;
}

In the completed case:

case 'completed':
  setState(prev => ({
    ...prev,
    answer: event.answer,
    subQuestionSources: event.sub_question_sources,
    phase: 'completed',
  }));
  break;

Commit: "feat: Phase 4.5 frontend types and state management for per-sub-q responses"

Sub-Phase 4.6: Frontend — ResponsePanel & ExtractedQuestionsDisplay

Test files to write first:

test_phase4_response_panel.test.tsx — Tests per-sub-question section rendering
test_phase4_citation_parser.test.ts — Tests per-sub-question citation lookup

Task 4.6.1: Redesign ResponsePanel for sub-question sections

File: frontend/src/components/ResponsePanel.tsx

Current: single ReactMarkdown block + flat sources grid.

New layout:

┌─────────────────────────────────────────────────────┐
│  📋 Response                           [Copy All]   │
├─────────────────────────────────────────────────────┤
│                                                      │
│  ┌─ Sub-question 1: What are time extensions? ─────┐│
│  │                                                    │
│  │  • Time extensions must be notified...             │
│  │    [NEC4 ACC.pdf, page 3]                          │
│  │  • The project manager must acknowledge...         │
│  │    [NEC4 Contract.pdf, page 12]                    │
│  │                                                    │
│  │  Sources (2)                          [Expand ▼]  │
│  │  ┌──────────────────────────────────────────────┐ │
│  │  │ NEC4 ACC.pdf, Page 3  │ NEC4 Contract, p12 │ │
│  │  │ "Clause 61.3 states.." │ "Notice must be..." │ │
│  │  └──────────────────────────────────────────────┘ │
│  └────────────────────────────────────────────────────┘│
│                                                      │
│  ┌─ Sub-question 2: What notice is required? ───────┐│
│  │                                                    │
│  │  • Written notice must be given...                  │
│  │    [NEC4 ACC.pdf, page 7]                           │
│  │                                                    │
│  │  Sources (1)                          [Expand ▼]  │
│  └────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────┘

Implementation approach:

Parse the answer markdown into sections using ## Sub-question N: headers
Map each section to its SubQuestionSources by matching index
Render each section as an accordion/card with:
- Header: sub-question text (from SubQuestionSources)
- Body: ReactMarkdown for bullet points (with inline citation links)
- Footer: collapsible sources grid (only sources belonging to this sub-question)
Keep the existing citation link behavior (clickable [filename, page N] → PDF viewer)

Task 4.6.2: Update citationParser.ts for per-sub-question lookup

File: frontend/src/utils/citationParser.ts

Current: buildCitationLookup(sources: SourceMetadata[]) — returns a single global map.

New: buildCitationLookup(subQuestionSources: SubQuestionSources[]) — returns a map scoped to the correct sources for each section. The citation [filename, page N] match is looked up in the relevant sub-question's source list.

Task 4.6.3: Update ExtractedQuestionsDisplay for anchors

File: frontend/src/components/ExtractedQuestionsDisplay.tsx

Minor enhancement:

Make each extracted question a clickable anchor that scrolls to its corresponding section in the answer
Add id="subq-{index}" to each section header in ResponsePanel
Keep existing skeleton loading behavior

Commit: "feat: Phase 4.6 frontend per-sub-question response rendering"

Sub-Phase 4.7: Testing & Polish

Test files to write:

test_phase4_integration_query_pipeline.py — Full integration test simulating per-sub-q pipeline
test_phase4_acceptance_query.py — Acceptance test with real LLM (manual run)
test_phase4_e2e_query_flow.test.tsx — Frontend e2e test with mocked SSE stream

Task 4.7.1: Backend unit tests

Run pytest backend/app/test/test_phase4_*.py -v — all must pass
Verify no regressions in existing Phase 1 and Phase 3 tests
Update test_phase1_rag_service.py for new method signatures
Update test_phase1_relevance_filter.py for per-sub-q behavior
Rewrite test_phase3_query_history_integration.py for new pipeline flow
Update test_phase3_prompt_injection.py for new generate template

Task 4.7.2: Backend acceptance tests

test_phase4_acceptance_query.py — real LLM, real ChromaDB
Verify: answer contains ## Sub-question headers, sources grouped by sub-question index
Verify: each sub-question section has 1-5 bullet points
Verify: inline citations match the correct sub-question's source list

Task 4.7.3: Frontend tests

test_phase4_response_panel.test.tsx — renders per-sub-question sections, expandable sources
test_phase4_citation_parser.test.ts — per-sub-question lookup returns correct source
test_phase4_e2e_query_flow.test.tsx — mocks SSE with new event format, verifies section rendering
Update existing ResponsePanel.test.tsx and citationParser.test.ts for new API

Task 4.7.4: Frontend build verification

npm run build — no TypeScript errors
npm test — all 62 existing tests pass + new Phase 4 tests
Verify manual flow: ask question → see extracted questions → see per-sub-question answer sections → expand sources per section

Task 4.7.5: Error handling

Empty decomposition: if decompose() returns [], fall back to using original question as single sub-question
Empty retrieval for some sub-questions: that sub-question gets no chunks → section shows "No relevant information found"
Filter failure (all chunks below threshold): that sub-question gets no answer → graceful empty section
JSON parse failure in filter: fall back to including all chunks (no filtering) for that sub-question

Task 4.7.6: Documentation

Update AGENTS.md with new pipeline architecture section
Add docstrings to all new methods (retrieve_per_subquestion, filter_per_subquestion, generate_response_per_subquestion)
Update prompt template documentation in system prompts page

Commit: "feat: Phase 4.7 testing, error handling, and polish for per-sub-q pipeline"

Phase 4a: Prompt Service Integration for Per-Sub-Q Filter (2026-04-27)

Root issue: filter_per_subquestion() in relevance_filter.py had a hardcoded prompt (_build_per_subq_prompt()) — completely bypassing PromptService. Users could not edit the per-sub-q filter prompt on the System Prompts page, unlike the flat filter step which was already prompt-service-driven.

Solution: Broke the per-sub-q filter prompt into 3 composable pieces, each a separately editable step on the System Prompts page:

Step Name	Label	Placeholders	Default
`filter_intro`	Step 2.1: Filter Intro (Preamble)	(none)	`"Evaluate each chunk for relevance to its associated sub-question only."`
`filter_section`	Step 2.2: Filter Section (Per Sub-Q)	`{subq_idx}`, `{subq_question}`, `{chunks}`	`'Sub-question {subq_idx}: "{subq_question}"\n{chunks}'`
`filter_outro`	Step 2.3: Filter Outro (Format)	(none)	JSON format instructions + example

The RelevanceFilter._build_per_subq_prompt() now composes them at runtime:

filter_intro + [filter_section.replace(...) for each sub-q] + filter_outro

Falls back to built-in defaults when PromptService is unavailable.

Bugs Fixed

generate_per_subq not seeded: rag.py called get_prompt_template("generate_per_subq") but this step name was never added to _VALID_STEPS, _SEED_STEPS, or _SEED_TEMPLATES — would crash at runtime with ValueError. Now properly seeded with {context_sections} placeholder.
_SEED_GENERATE placeholder mismatch from Package 4: The flat generate_response() expects {question}/{context} placeholders, but Package 4 changed the seed template to use {context_sections} (intended for per-sub-q generate). Restored flat template; generate_per_subq now holds {context_sections}.

Database Backfill Migration

The existing seed_default_profiles() only inserted steps for NEWLY created profiles. Added a backfill loop that iterates ALL existing profiles and INSERT OR IGNOREs any missing step names. This ensures existing A/B/C profiles pick up filter_intro, filter_section, filter_outro, and generate_per_subq on restart.

System Prompts UI Restructured

The flat filter and generate steps were removed from the UI (they're unused by the current pipeline). The page now shows 5 steps:

UI Order	Label	Step Key
1	Step 1: Query Decomposition	`decompose`
2	Step 2.1: Filter Intro (Preamble)	`filter_intro`
3	Step 2.2: Filter Section (Per Sub-Q)	`filter_section`
4	Step 2.3: Filter Outro (Format)	`filter_outro`
5	Step 3: Generate (Per-Sub-Question)	`generate_per_subq`

The old filter and generate templates remain in the DB (for API backward compatibility) but are hidden from the UI.

Files Changed

File	Change
`backend/app/core/sqlite_db.py`	3 new seed templates + `generate_per_subq` seed; backfill migration; restored `_SEED_GENERATE` to `{question}`/`{context}`
`backend/app/services/prompt_service.py`	Added 4 step names to `_VALID_STEPS`
`backend/app/routers/prompts.py`	Added 4 step names to `_VALID_STEPS`
`backend/app/services/relevance_filter.py`	Refactored `_build_per_subq_prompt()` to use PromptService + built-in fallback constants
`frontend/src/components/PromptEditor.tsx`	Replaced unused flat steps with 5-step per-sub-q layout (Step 2.1-2.3 + Step 3)
`frontend/src/components/PlaceholderDocs.tsx`	Added `{context_sections}`, `{subq_idx}`, `{subq_question}` docs
`backend/app/test/conftest.py`	Added 4 new templates to mock
`backend/app/test/test_phase3_sqlite_db.py`	Updated counts (9→21 prompts) and placeholder assertions
`backend/app/test/test_phase3_prompt_service.py`	Updated step set + placeholder assertions
`backend/app/test/test_phase3_prompts_router.py`	Updated step set assertion
`backend/app/test/test_phase4_prompt_templates.py`	Updated for split generate/generate_per_subq
`frontend/src/test/components/PromptEditor.test.tsx`	Updated to 5 textareas, new labels, new placeholder layout
`frontend/src/test/components/PlaceholderDocs.test.tsx`	Updated to 6 placeholders

Test Results (Post-Phase 4a)

Backend: 295 passed, 5 skipped (pre-existing)
Frontend: 182 passed, 1 pre-existing failure (unrelated file-input e2e)

Sub-Phase Summary

Sub-Phase	Scope	Backend	Frontend	Tests	Status
4.1	Per-sub-q retrieval	`rag.py`, `query.py`, format helpers	None	`test_phase4_retrieve_per_subquestion.py`, `test_phase4_query_router_retrieval.py`	✅ Complete
4.2	Per-sub-q filtering (1 LLM call)	`relevance_filter.py`, `query.py`	None	`test_phase4_relevance_filter_per_subq.py`, `test_phase4_query_router_filter.py`	✅ Complete
4.3	Sub-q-organized response generation	`rag.py`, `query.py`, `models/query.py`	None	`test_phase4_generate_per_subq.py`, `test_phase4_response_format.py`	✅ Complete
4.4	History schema, prompts, models	`sqlite_db.py`, `history.py` (router + models), `prompt_service.py`	None	`test_phase4_history_format.py`, `test_phase4_prompt_templates.py`	✅ Complete
4.5	Frontend types + state	None	`types/index.ts`, `lib/queries.tsx`	`test_phase4_stream_state.test.tsx`, `test_phase4_types.test.ts`	✅ Complete
4.6	Frontend rendering	None	`ResponsePanel.tsx`, `citationParser.ts`, `ExtractedQuestionsDisplay.tsx`	`test_phase4_response_panel.test.tsx`, `test_phase4_citation_parser.test.ts`	✅ Complete
4.7	Testing & polish	All affected files	All affected files	Integration + acceptance + e2e tests	✅ Complete
4a	Prompt service integration for filter_per_subq	`sqlite_db.py`, `prompt_service.py`, `prompts.py`, `relevance_filter.py`	`PromptEditor.tsx`, `PlaceholderDocs.tsx`	Updated 7 test files, 13 total files changed	✅ Complete

Implementation Sequence & Dependencies

4.1 (Retrieval) ──┐
                  ├──► 4.2 (Filtering) ──► 4.3 (Generate) ──► 4.4 (History/Prompts)
                  │                                                    │
                  │                                                    ▼
                  │                                         4.5 (Frontend Types/State)
                  │                                                    │
                  │                                                    ▼
                  │                                         4.6 (Frontend Rendering)
                  │                                                    │
                  └─────────────────────────────────────────────────────▼
                                                              4.7 (Testing & Polish)

4.1 → 4.2 sequential: Filtering needs per-sub-q chunk structure from retrieval
4.2 → 4.3 sequential: Generation needs filtered chunks from filtering stage
4.3 → 4.4 sequential: History recording and prompt templates need final data shapes
4.4 → 4.5 parallel: Backend prompt/history changes don't block frontend type definitions
4.5 → 4.6 sequential: Rendering needs types and state management
4.7 blocked by all: Integration tests need everything wired together

Parallelization opportunity: 4.5 (frontend types) could start as soon as 4.3 defines the SSE contract, but it's safer to start after 4.4 confirms the final data shapes.

Affected Files — Complete Inventory

Backend — New Files

File	Sub-Phase	Purpose
`backend/app/test/test_phase4_retrieve_per_subquestion.py`	4.1	Unit test: `retrieve_per_subquestion()`
`backend/app/test/test_phase4_query_router_retrieval.py`	4.1	Unit test: retrieval stage in `_query_stream`
`backend/app/test/test_phase4_relevance_filter_per_subq.py`	4.2	Unit test: `filter_per_subquestion()`
`backend/app/test/test_phase4_query_router_filter.py`	4.2	Unit test: filter stage in `_query_stream`
`backend/app/test/test_phase4_generate_per_subq.py`	4.3	Unit test: `generate_response_per_subquestion()`
`backend/app/test/test_phase4_response_format.py`	4.3	Unit test: answer format validation
`backend/app/test/test_phase4_history_format.py`	4.4	Unit test: new XML/JSON history formats
`backend/app/test/test_phase4_prompt_templates.py`	4.4	Unit test: new generate template
`backend/app/test/test_phase4_integration_query_pipeline.py`	4.7	Integration test: full per-sub-q pipeline
`backend/app/test/acceptance/test_phase4_acceptance_query.py`	4.7	Acceptance test: real LLM

Backend — Modified Files

File	Sub-Phase	Changes
`backend/app/services/rag.py`	4.1, 4.3	Add `retrieve_per_subquestion()`, `generate_response_per_subquestion()`
`backend/app/services/relevance_filter.py`	4.2	Add `filter_per_subquestion()`
`backend/app/routers/query.py`	4.1–4.4	Refactor `_query_stream()`, add per-sub-q format helpers, update history recording
`backend/app/models/query.py`	4.3	Add `SubQuestionSources` model, update `QueryResponse`
`backend/app/models/history.py`	4.4	Add optional per-sub-q count fields
`backend/app/core/sqlite_db.py`	4.4	Add new columns, update seed generate template
`backend/app/services/prompt_service.py`	4.4	Update `reset_to_defaults()` generate template
`backend/app/routers/history.py`	4.4	Include new fields in detail response
`backend/app/core/config.py`	4.1	(Maybe) Add `retrieval_n_results_per_subq` setting

Backend — Tests Needing Update

File	Sub-Phase	Changes
`backend/app/test/test_phase1_rag_service.py`	4.7	Add tests for new methods; existing tests unaffected
`backend/app/test/test_phase1_relevance_filter.py`	4.7	Add tests for `filter_per_subquestion()`
`backend/app/test/test_phase3_query_history_integration.py`	4.7	Rewrite pipeline simulation for per-sub-q flow
`backend/app/test/test_phase3_prompt_injection.py`	4.7	Add tests for new generate template
`backend/app/test/acceptance/test_acceptance_phase1_rag_query.py`	4.7	Rewrite — SSE parsing + new response shape
`backend/app/test/conftest.py`	4.7	Add per-sub-q mock helpers

Frontend — New Files

File	Sub-Phase	Purpose
`frontend/src/test/components/test_phase4_response_panel.test.tsx`	4.7	Component test: per-sub-q sections
`frontend/src/test/utils/test_phase4_citation_parser.test.ts`	4.7	Unit test: per-sub-q citation lookup
`frontend/src/test/e2e/test_phase4_query_flow.test.tsx`	4.7	E2E test: mocked SSE with new format
`frontend/src/test/lib/test_phase4_stream_state.test.tsx`	4.5	State test: new event shapes
`frontend/src/test/lib/test_phase4_types.test.ts`	4.5	Type test: type compatibility

Frontend — Modified Files

File	Sub-Phase	Changes
`frontend/src/types/index.ts`	4.5	Add `SubQuestionSources`, update `QueryStreamEvent`
`frontend/src/lib/queries.tsx`	4.5	Update `QueryStreamState`, `completed` event handler
`frontend/src/components/ResponsePanel.tsx`	4.6	Redesign — per-sub-question sections with grouped sources
`frontend/src/utils/citationParser.ts`	4.6	Update `buildCitationLookup()` for per-sub-q
`frontend/src/components/ExtractedQuestionsDisplay.tsx`	4.6	Add anchor links to answer sections
`frontend/src/pages/LTTPage.tsx`	4.6	Pass new props to children

Risk Register

Risk	Likelihood	Impact	Mitigation
LLM struggles with per-sub-q filtering prompt format	Medium	High — all chunks dropped	Use strong prompt constraints, validate JSON, fall back to including all chunks on parse failure
LLM generates answer not matching `## Sub-question N:` format	Medium	Medium — frontend can't parse sections	Fall back to rendering as single block if parsing fails. Prompt engineering tuned for format compliance
Same chunk retrieved by multiple sub-questions → duplicated in context	High	Low — slightly larger prompt but acceptable	Accept duplicates. ChromaDB naturally returns same doc if relevant to multiple queries. Each sub-q's evaluation is independent
Per-sub-q retrieval = more ChromaDB queries = slower	Medium	Medium — N × retrieval latency	ChromaDB retrieval is fast (~10-50ms). 5 sub-questions × 10ms = 50ms overhead. Acceptable trade-off for better relevance.
History DB migration fails for existing records	Low	Low — new columns are NULL-able	`ALTER TABLE ADD COLUMN ... DEFAULT NULL` is safe. Existing records work as before — `chunks_retrieved`/`chunks_filtered` still have flat XML.
Frontend rendering breaks on older history records	Low	Low — answer format differs	`ResponsePanel` renders per-sub-q sections only when `subQuestionSources` is non-null. Older history records show flat answer as before.
Prompt template migration breaks user-customized prompts	Medium	Medium — users lose their generate template	Warn in docs. The `generate` template changes fundamentally (single `{context}` → `{context_sections}`). Users must re-customize.

Acceptance Criteria

Backend

POST /api/v1/query retrieves chunks per sub-question (verified by history XML showing <sub_q> wrappers)
Filtering uses single LLM call evaluating chunks against their originating sub-question (verified by filter prompt)
Response answer is organized by sub-question with ## Sub-question N: headers
sub_question_sources in SSE completed event is grouped by sub-question index
History records include new grouped XML formats for chunks_retrieved and chunks_filtered
History records include grouped sources JSON (list of lists)
History records include per-sub-q chunk counts
New generate prompt template uses {context_sections} placeholder
Prompt service reset_to_defaults() includes new generate template
Existing decompose, filter (old), generate_response (old) methods are unchanged
All Phase 1, Phase 3, and new Phase 4 unit tests pass (312 passed, 4 skipped)
All acceptance tests pass with real LLM (manual run)

Frontend

QueryStreamState includes subQuestionSources field
ResponsePanel renders per-sub-question sections with expandable source grids
Each section's sources are scoped to that sub-question (no cross-contamination)
Inline citations [filename, page N] link to the correct PDF viewer page
ExtractedQuestionsDisplay shows clickable anchors to answer sections
Copy button copies all answer text including section headers
Loading states: skeleton per section during generation
Empty state: "No relevant information found" per sub-question (not entire response)
All 62+ existing frontend tests still pass (183 passed)
All new Phase 4 frontend tests pass
npm run build succeeds with zero TypeScript errors
Manual verification: full query flow works end-to-end

New Dependencies

None. All changes use existing libraries (FastAPI, ChromaDB, OpenAI SDK, React, ReactMarkdown, TanStack Query).

Decisions (All Confirmed)

#	Topic	Decision
1	Single vs multiple filter LLM calls	Single call — user explicitly requested this
2	Filter prompt design	Group chunks by sub-question in one prompt. JSON response maps sub-q indices to score arrays
3	Answer format	Markdown with `## Sub-question N: <question>` headers
4	Sources grouping	`sub_question_sources: [{index, text, sources}, ...]` in SSE + frontend
5	History XML format	Add `<sub_q idx="N" question="...">` wrappers around chunk groups
6	History DB migration	Add 2 new NULL-able columns. No data migration needed.
7	Backward compatibility	Preserve old `retrieve()`, `filter()`, `generate_response()` methods. New methods are additive.
8	Deduplication	None. Same chunk may appear in multiple sub-questions. Each sub-q evaluates independently.
9	Error handling	Per-sub-question graceful degradation. Filter failure → include all chunks for that sub-q. Generate failure → "Unable to generate answer for this sub-question."
10	Frontend rendering engine	Keep `ReactMarkdown`. Parse sections client-side by splitting on `## Sub-question N:` headers.

Open Questions

None — all resolved.

#	Question	Resolution
1	Progressive SSE events?	Yes — emit `generating_subquestion` as each sub-question's answer is generated. Frontend renders sections progressively.
2	`retrieval_n_results` per sub-question or global?	Global — same value for all sub-questions. Simpler config, one setting.
3	Fallback when decomposition returns 0 sub-questions?	Fall back to original question — treat as single sub-question. Pipeline runs as 1-sub-q case (retrieval via original question, no filtering needed for single sub-q, flat answer).

Test Plan Summary

Backend (New Tests)

File	Tests	Coverage
`test_phase4_retrieve_per_subquestion.py`	~6	Per-sub-q retrieval, empty input, single sub-q, dedup behavior
`test_phase4_query_router_retrieval.py`	~4	SSE events during retrieval, chunk XML format
`test_phase4_relevance_filter_per_subq.py`	~6	Per-sub-q filtering, JSON response parsing, threshold behavior
`test_phase4_query_router_filter.py`	~4	SSE events during filtering, filtered XML format
`test_phase4_generate_per_subq.py`	~5	Per-sub-q generate, prompt construction, answer format
`test_phase4_response_format.py`	~4	Answer has `##` headers, citations in correct sections
`test_phase4_history_format.py`	~5	New XML/JSON formats, per-sub-q counts
`test_phase4_prompt_templates.py`	~3	New generate template, `{context_sections}` placeholder
`test_phase4_integration_query_pipeline.py`	~5	Full pipeline simulation
`test_phase4_acceptance_query.py`	~3	Real LLM end-to-end (manual)

Frontend (New Tests)

File	Tests	Coverage
`test_phase4_stream_state.test.tsx`	~4	State updates for new event shapes
`test_phase4_types.test.ts`	~2	Type compatibility checks
`test_phase4_response_panel.test.tsx`	~6	Section rendering, source grouping, copy, loading
`test_phase4_citation_parser.test.ts`	~4	Per-sub-q lookup, cross-section isolation
`test_phase4_e2e_query_flow.test.tsx`	~3	Full SSE flow with mocked stream

50 KiB Raw Blame History Unescape Escape

Package 4 Enhancement Plan — Per-Sub-Question RAG Pipeline

Objective

Decision Register

Pipeline: Before vs After

Before (Current — Flat Batch)

After (Per-Sub-Question)

Current State (Pre-Enhancement)

Backend

Frontend

Key Test Files

Implementation Tasks

Sub-Phase 4.1: Backend — Per-Sub-Question Retrieval

Sub-Phase 4.2: Backend — Per-Sub-Question Filtering (Single LLM Call)

Sub-Phase 4.3: Backend — Sub-Question-Organized Response Generation

Sub-Phase 4.4: Backend — History & Prompt Template Updates

Sub-Phase 4.5: Frontend — Types & State Management

Sub-Phase 4.6: Frontend — ResponsePanel & ExtractedQuestionsDisplay

Sub-Phase 4.7: Testing & Polish

Phase 4a: Prompt Service Integration for Per-Sub-Q Filter (2026-04-27)

Bugs Fixed

Database Backfill Migration

System Prompts UI Restructured

Files Changed

Test Results (Post-Phase 4a)

Sub-Phase Summary

Implementation Sequence & Dependencies

Affected Files — Complete Inventory

Backend — New Files

Backend — Modified Files

Backend — Tests Needing Update

Frontend — New Files

Frontend — Modified Files

Risk Register

Acceptance Criteria

Backend

Frontend

New Dependencies

Decisions (All Confirmed)

Open Questions

Test Plan Summary

Backend (New Tests)

Frontend (New Tests)

50 KiB

Raw Blame History