docs(plan): add Package 4 per-sub-question enhancement plan
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
This commit is contained in:
parent
0d3e8ce0ce
commit
d509c14b80
|
|
@ -0,0 +1,921 @@
|
|||
# Package 4 Enhancement Plan — Per-Sub-Question RAG Pipeline
|
||||
|
||||
**Source**: User request (2026-04-26)
|
||||
**Scope**: Refactor the 3-step RAG query pipeline so retrieval, filtering, and response generation are organized per sub-question instead of batch-flattened.
|
||||
**Status**: ✅ Complete — All 7 sub-phases implemented (2026-04-26)
|
||||
|
||||
---
|
||||
|
||||
## Objective
|
||||
|
||||
Restructure the `POST /api/v1/query` pipeline so that:
|
||||
|
||||
1. **Retrieval per sub-question**: Each sub-question independently retrieves `n_results` chunks from ChromaDB (instead of joining all sub-questions into one query string).
|
||||
2. **Filtering per sub-question**: Each chunk is evaluated for relevance against its **own originating sub-question** (not the original user question). One LLM call handles all filtering — the prompt is redesigned to group chunks by sub-question.
|
||||
3. **Final answer organized by sub-question**: Each sub-question gets its own bullet-point answer with its own sources. The frontend renders answer sections per sub-question rather than one monolithic bullet list.
|
||||
|
||||
---
|
||||
|
||||
## Decision Register
|
||||
|
||||
| # | Decision | Rationale |
|
||||
|---|----------|-----------|
|
||||
| 1 | Keep `QueryDecomposer` unchanged | Input/output contract is identical — decomposition still produces a flat list of sub-questions |
|
||||
| 2 | Single LLM call for filtering | User explicitly requested one call. Prompt redesigned to carry sub-question context for each chunk group |
|
||||
| 3 | Keep `RAGService.retrieve()` signature | Call it N times (once per sub-question) externally in the orchestrator rather than changing its internal contract |
|
||||
| 4 | Add `retrieve_per_subquestion()` to `RAGService` | New method that iterates over sub-questions, calls `retrieve()` per question, returns grouped results |
|
||||
| 5 | Redesign `generate_response()` signature | Accepts structured `sub_questions: List[SubQuestionContext]` instead of flat chunk lists |
|
||||
| 6 | SSE events: add `generating_subquestion` phase | Progressive streaming — frontend sees which sub-question is being answered |
|
||||
| 7 | History: change XML/JSON formats in-place | Add `<sub_q>` wrappers to `chunks_retrieved`/`chunks_filtered` XML. Add sub-question grouping to `sources` JSON. No new DB columns. |
|
||||
| 8 | Final answer format: markdown sections | `## Sub-question 1` headers with inline citations. Backward-compatible with existing `ReactMarkdown` rendering |
|
||||
| 9 | Deduplicate chunks within a sub-question only | Same chunk may be retrieved by multiple sub-questions. Keep duplicates (different sub-questions need independent evaluation). ChromaDB `query()` naturally may return the same doc for different queries — this is acceptable. |
|
||||
|10 | Prompt template: add `generate` placeholders | New placeholder `{context_sections}` replaces single `{context}`. Filter template unchanged (sub-question injected at call site). Decompose template unchanged. |
|
||||
|11 | Progressive SSE events | Emit `generating_subquestion` event as each sub-question's answer section is generated. Frontend renders sections one by one. |
|
||||
|12 | `retrieval_n_results` | Global — same value for all sub-questions. Use existing `settings.retrieval_n_results` config. |
|
||||
|13 | Empty decomposition fallback | Treat original user question as single sub-question. Pipeline runs as 1-sub-q case — single retrieval, no filtering needed (one sub-q = no ambiguity), flat answer with `##` header. |
|
||||
|
||||
---
|
||||
|
||||
## Pipeline: Before vs After
|
||||
|
||||
### Before (Current — Flat Batch)
|
||||
|
||||
```
|
||||
User Question: "What are NEC4 time extension clauses?"
|
||||
│
|
||||
┌────▼─────┐
|
||||
│ Decompose│ LLM Call 1
|
||||
│ → ["What are time extensions?",
|
||||
│ "What notice is required?"]
|
||||
└────┬─────┘
|
||||
│ joined: "What are time extensions? What notice is required?"
|
||||
┌────▼─────┐
|
||||
│ Retrieve │ 1 ChromaDB query → 10 chunks (flat, no sub-q association)
|
||||
└────┬─────┘
|
||||
│ 10 chunks
|
||||
┌────▼─────┐
|
||||
│ Filter │ LLM Call 2 — all chunks scored against ORIGINAL question
|
||||
│ │ Score > 7 → keep (flat, no sub-q association)
|
||||
└────┬─────┘
|
||||
│ N filtered chunks
|
||||
┌────▼─────┐
|
||||
│ Generate │ LLM Call 3 — flat answer from ALL filtered chunks
|
||||
│ │ "• Time extensions require notice [NEC4 ACC.pdf, p3]
|
||||
│ │ • The project manager must acknowledge [NEC4, p7]
|
||||
│ │ • Notice is defined as..." (sources from all sub-qs mixed)
|
||||
└────┬─────┘
|
||||
│ single SSE completed event
|
||||
┌────▼─────┐
|
||||
│ Frontend │ 1 ReactMarkdown block, 1 flat sources list
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
### After (Per-Sub-Question)
|
||||
|
||||
```
|
||||
User Question: "What are NEC4 time extension clauses?"
|
||||
│
|
||||
┌────▼─────┐
|
||||
│ Decompose│ LLM Call 1 (UNCHANGED)
|
||||
│ → ["What are time extensions?",
|
||||
│ "What notice is required?"]
|
||||
└────┬─────┘
|
||||
│ sub_q1 sub_q2
|
||||
┌────▼─────┐ ┌────▼─────┐
|
||||
│ Retrieve │ │ Retrieve │ 2 ChromaDB queries → 10 chunks each
|
||||
│ q1 → 10 │ │ q2 → 10 │ chunks tagged with sub-q index
|
||||
└────┬─────┘ └────┬─────┘
|
||||
│ │
|
||||
└─────────┬───────────────┘
|
||||
│ grouped: {sub_q0: [chunks 0-9], sub_q1: [chunks 10-19]}
|
||||
┌────▼─────┐
|
||||
│ Filter │ LLM Call 2 (SINGLE CALL — redesigned prompt)
|
||||
│ │ Each chunk scored against its OWN sub-question
|
||||
│ │ Returns grouped scores → filtered per sub-q
|
||||
└────┬─────┘
|
||||
│ filtered_by_subq: {0: [chunk_a, chunk_b], 1: [chunk_c]}
|
||||
┌────▼─────┐
|
||||
│ Generate │ LLM Call 3 (redesigned prompt with per-sub-q context)
|
||||
│ │ ┌─────────────────────────────────────┐
|
||||
│ │ │ ## What are time extensions? │
|
||||
│ │ │ - Time extensions must be notified │
|
||||
│ │ │ [NEC4 ACC.pdf, page 3] │
|
||||
│ │ │ - The project manager has 2 weeks │
|
||||
│ │ │ [NEC4 Contract.pdf, page 12] │
|
||||
│ │ │ │
|
||||
│ │ │ ## What notice is required? │
|
||||
│ │ │ - Written notice must be given │
|
||||
│ │ │ [NEC4 ACC.pdf, page 7] │
|
||||
│ │ └─────────────────────────────────────┘
|
||||
└────┬─────┘
|
||||
│ SSE events: generating_subquestion (per sub-q) → completed
|
||||
┌────▼─────┐
|
||||
│ Frontend │ Sections per sub-question, sources grouped per section
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Current State (Pre-Enhancement)
|
||||
|
||||
### Backend
|
||||
|
||||
| Component | File | Current Behavior |
|
||||
|-----------|------|-----------------|
|
||||
| Decomposer | `services/query_decomposer.py` | `decompose(question) -> (List[str], prompt)` — returns 2-5 sub-questions |
|
||||
| Retrieval | `services/rag.py:retrieve()` | `query_text = " ".join(query_keywords)` — joins all sub-qs into ONE string, single ChromaDB query → flat chunk list |
|
||||
| Filter | `services/relevance_filter.py` | `filter(question, chunks)` — ALL chunks scored against ORIGINAL question, single LLM call, flat output |
|
||||
| Generate | `services/rag.py:generate_response()` | `generate_response(question, chunks, metadata)` — flat chunks → flat bullet answer |
|
||||
| Orchestrator | `routers/query.py:_query_stream()` | Linear 4-stage pipeline: decompose → retrieve → filter → generate |
|
||||
| SSE Events | `routers/query.py` | `decomposed → retrieving → filtering → generating → completed` — flat answer + sources in `completed` |
|
||||
| History | `services/history_service.py` | Flat XML for `chunks_retrieved`/`chunks_filtered`. Flat JSON for `sources`. Single timing per stage. |
|
||||
| Prompt templates | `prompt_service.py` + `sqlite_db.py` | 3 steps (`decompose`, `filter`, `generate`). Placeholders: `{question}`, `{chunks}`, `{context}` |
|
||||
| Config | `core/config.py` | `retrieval_n_results=10`, `relevance_threshold=7.0` |
|
||||
|
||||
### Frontend
|
||||
|
||||
| Component | File | Current Behavior |
|
||||
|-----------|------|-----------------|
|
||||
| Types | `types/index.ts` | `QueryStreamEvent.phase`, flat `extracted_questions: string[]`, flat `answer: string`, flat `sources: SourceMetadata[]` |
|
||||
| SSE Client | `lib/api.ts` | `queryDocumentStream()` — generic `JSON.parse` per `data:` line, no sub-question awareness |
|
||||
| State | `lib/queries.tsx` | `QueryStreamState` with flat `answer`/`sources`/`extractedQuestions` |
|
||||
| Response | `components/ResponsePanel.tsx` | Single `ReactMarkdown` block for answer. Flat 2-column grid for sources. No sub-question grouping. |
|
||||
| Questions | `components/ExtractedQuestionsDisplay.tsx` | `<ol>` list of question strings. No sources attached. |
|
||||
| Citations | `utils/citationParser.ts` | Flat `sources` lookup — `buildCitationLookup(sources)` returns global map |
|
||||
| Progress | `components/PipelineProgress.tsx` | 4-step stepper (NOT currently wired in LTTPage) |
|
||||
|
||||
### Key Test Files
|
||||
|
||||
| File | Lines | Status |
|
||||
|------|-------|--------|
|
||||
| `test_phase1_query_decomposer.py` | 76 | ✅ Unchanged — decomposer contract stays |
|
||||
| `test_phase1_rag_service.py` | 139 | 🔴 Needs update — `retrieve()`, `generate_response()` signatures change |
|
||||
| `test_phase1_relevance_filter.py` | 93 | 🟡 Needs update — one-call pattern changes to per-sub-q grouping |
|
||||
| `test_phase1_query.py` | 97 | 🟢 Already skipped (SSE migration) — may un-skip later |
|
||||
| `test_phase3_query_history_integration.py` | 608 | 🔴 Major rewrite — pipeline simulation mirrors `_query_stream` 1:1 |
|
||||
| `test_phase3_prompt_injection.py` | 238 | 🟡 Moderate — new generate template placeholder |
|
||||
| `test_acceptance_phase1_rag_query.py` | 101 | 🔴 Full rewrite — already broken (SSE vs JSON), new response shape |
|
||||
| `conftest.py` | 94 | 🟡 Low — may add per-sub-q mock helpers |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Tasks
|
||||
|
||||
### Sub-Phase 4.1: Backend — Per-Sub-Question Retrieval
|
||||
|
||||
**Test files to write first:**
|
||||
- `test_phase4_retrieve_per_subquestion.py` — Tests `RAGService.retrieve_per_subquestion()`
|
||||
- `test_phase4_query_router_retrieval.py` — Tests `_query_stream` retrieval stage produces per-sub-q chunks
|
||||
|
||||
**Task 4.1.1: Add `retrieve_per_subquestion()` to `RAGService`**
|
||||
|
||||
File: `backend/app/services/rag.py`
|
||||
|
||||
New method signature:
|
||||
```python
|
||||
def retrieve_per_subquestion(
|
||||
self,
|
||||
sub_questions: List[str],
|
||||
n_results: int = 10,
|
||||
) -> List[Tuple[str, List[Tuple[str, Dict[str, Any], float]]]]:
|
||||
"""Retrieve chunks for each sub-question independently.
|
||||
|
||||
Args:
|
||||
sub_questions: List of decomposed sub-questions.
|
||||
n_results: Number of chunks per sub-question.
|
||||
|
||||
Returns:
|
||||
List of (sub_question, chunks) tuples.
|
||||
chunks is the standard retrieve() output: [(text, metadata, distance), ...].
|
||||
"""
|
||||
```
|
||||
|
||||
Implementation:
|
||||
- Call `self.retrieve([sub_q], n_results)` for each sub-question
|
||||
- Return list of `(sub_question, chunks)` — chunks remain deduplicated at ChromaDB level (ChromaDB automatically deduplicates by ID)
|
||||
- Existing `retrieve()` method is NOT modified — it continues to work as before
|
||||
|
||||
**Task 4.1.2: Update `_query_stream()` retrieval stage**
|
||||
|
||||
File: `backend/app/routers/query.py`
|
||||
|
||||
Changes:
|
||||
- Replace `rag.retrieve(extracted_questions, n_results)` with `rag.retrieve_per_subquestion(extracted_questions, n_results)`
|
||||
- Track per-sub-question retrieval timing (new field or combined timing)
|
||||
- Format `chunks_retrieved` XML with sub-question wrappers
|
||||
|
||||
**New `chunks_retrieved` XML format:**
|
||||
```xml
|
||||
<sub_q idx="0" question="What are time extensions?">
|
||||
<chunk_1>
|
||||
Filename: NEC4 ACC.pdf
|
||||
Page: 3
|
||||
Content: Clause 61.3 states that...
|
||||
</chunk_1>
|
||||
<chunk_2>
|
||||
...
|
||||
</chunk_2>
|
||||
</sub_q>
|
||||
<sub_q idx="1" question="What notice is required?">
|
||||
<chunk_1>
|
||||
Filename: NEC4 Contract.pdf
|
||||
Page: 12
|
||||
Content: Notice must be given...
|
||||
</chunk_1>
|
||||
...
|
||||
</sub_q>
|
||||
```
|
||||
|
||||
**Task 4.1.3: Format helpers**
|
||||
|
||||
File: `backend/app/routers/query.py`
|
||||
|
||||
New functions:
|
||||
```python
|
||||
def format_chunks_retrieved_per_subq(results: List[Tuple[str, List]]) -> str:
|
||||
"""Format per-sub-question retrieved chunks as XML."""
|
||||
|
||||
def format_chunks_filtered_per_subq(results: List[Tuple[str, List]]) -> str:
|
||||
"""Format per-sub-question filtered chunks as XML with relevance scores."""
|
||||
```
|
||||
|
||||
**Commit**: `"feat: Phase 4.1 per-sub-question retrieval with grouped chunk XML"`
|
||||
|
||||
### Sub-Phase 4.2: Backend — Per-Sub-Question Filtering (Single LLM Call)
|
||||
|
||||
**Test files to write first:**
|
||||
- `test_phase4_relevance_filter_per_subq.py` — Tests `RelevanceFilter.filter_per_subquestion()` with grouped chunks
|
||||
- `test_phase4_query_router_filter.py` — Tests filter stage with per-sub-q chunk groups
|
||||
|
||||
**Task 4.2.1: Add `filter_per_subquestion()` to `RelevanceFilter`**
|
||||
|
||||
File: `backend/app/services/relevance_filter.py`
|
||||
|
||||
New method signature:
|
||||
```python
|
||||
async def filter_per_subquestion(
|
||||
self,
|
||||
sub_questions: List[str],
|
||||
sub_chunks: List[List[Tuple[str, Dict]]],
|
||||
threshold: float = 7.0,
|
||||
) -> Tuple[List[Tuple[str, List[Tuple[str, Dict]]]], str]:
|
||||
"""Filter chunks per sub-question in a single LLM call.
|
||||
|
||||
Args:
|
||||
sub_questions: List of decomposed sub-questions.
|
||||
sub_chunks: List of chunk lists (one per sub-question).
|
||||
threshold: Minimum relevance score.
|
||||
|
||||
Returns:
|
||||
Tuple of (filtered_results, prompt).
|
||||
filtered_results: List of (sub_question, filtered_chunks_for_that_q).
|
||||
"""
|
||||
```
|
||||
|
||||
**Prompt design (single LLM call):**
|
||||
```
|
||||
Evaluate each chunk for relevance to its associated sub-question.
|
||||
|
||||
Sub-question 0: "{sub_q_0}"
|
||||
Chunk 0: {chunk_0_text}
|
||||
Chunk 1: {chunk_1_text}
|
||||
...
|
||||
|
||||
Sub-question 1: "{sub_q_1}"
|
||||
Chunk 0: {chunk_0_text}
|
||||
Chunk 1: {chunk_1_text}
|
||||
...
|
||||
|
||||
For each chunk, rate relevance 0-10 considering ONLY its associated sub-question.
|
||||
Return a JSON object mapping sub-question indices to arrays of scores:
|
||||
{"0": [8.5, 3.2, 9.0], "1": [7.0, 6.5, 9.1]}
|
||||
```
|
||||
|
||||
Key rules:
|
||||
- Each chunk is evaluated against its **own** sub-question (not the original user question)
|
||||
- JSON keys are stringified sub-question indices (`"0"`, `"1"`, ...)
|
||||
- Score arrays MUST match chunk count for each sub-question
|
||||
- Same JSON extraction/markdown stripping logic as existing `filter()`
|
||||
|
||||
**Existing `filter()` method is preserved** — not modified, not deprecated. The new method is additive.
|
||||
|
||||
**Task 4.2.2: Update `_query_stream()` filter stage**
|
||||
|
||||
File: `backend/app/routers/query.py`
|
||||
|
||||
Changes:
|
||||
- Call `relevance_filter.filter_per_subquestion(extracted_questions, chunks_for_filter, threshold)` instead of `relevance_filter.filter(question, chunks, threshold)`
|
||||
- Build `chunks_for_filter` from per-sub-question retrieval results
|
||||
- Track `filter_prompt` (the redesigned prompt)
|
||||
- Format `chunks_filtered` XML with sub-question wrappers and `Relevance:` scores
|
||||
|
||||
**New `chunks_filtered` XML format:**
|
||||
```xml
|
||||
<sub_q idx="0" question="What are time extensions?">
|
||||
<chunk_1>
|
||||
Filename: NEC4 ACC.pdf
|
||||
Page: 3
|
||||
Relevance: 8.5
|
||||
Content: Clause 61.3 states that...
|
||||
</chunk_1>
|
||||
</sub_q>
|
||||
<sub_q idx="1" question="What notice is required?">
|
||||
<chunk_1>
|
||||
Filename: NEC4 Contract.pdf
|
||||
Page: 12
|
||||
Relevance: 9.0
|
||||
Content: Notice must be given...
|
||||
</chunk_1>
|
||||
</sub_q>
|
||||
```
|
||||
|
||||
**Commit**: `"feat: Phase 4.2 per-sub-question filtering with single LLM call"`
|
||||
|
||||
### Sub-Phase 4.3: Backend — Sub-Question-Organized Response Generation
|
||||
|
||||
**Test files to write first:**
|
||||
- `test_phase4_generate_per_subq.py` — Tests `RAGService.generate_response_per_subquestion()`
|
||||
- `test_phase4_response_format.py` — Tests the final answer matches expected format
|
||||
|
||||
**Task 4.3.1: Redesign `generate_response()` → `generate_response_per_subquestion()`**
|
||||
|
||||
File: `backend/app/services/rag.py`
|
||||
|
||||
New method signature:
|
||||
```python
|
||||
async def generate_response_per_subquestion(
|
||||
self,
|
||||
sub_questions: List[str],
|
||||
sub_chunks: List[List[str]],
|
||||
sub_metadata: List[List[Dict[str, Any]]],
|
||||
) -> Tuple[str, str, List[List[SourceMetadata]]]:
|
||||
"""Generate sub-question-organized RAG response.
|
||||
|
||||
Args:
|
||||
sub_questions: List of decomposed sub-questions.
|
||||
sub_chunks: List of chunk text lists (one per sub-question).
|
||||
sub_metadata: List of metadata dict lists (one per sub-question).
|
||||
|
||||
Returns:
|
||||
Tuple of (answer, prompt, grouped_sources).
|
||||
answer: Markdown string with sections per sub-question.
|
||||
prompt: The rendered LLM prompt.
|
||||
grouped_sources: List of SourceMetadata lists (one per sub-question).
|
||||
"""
|
||||
```
|
||||
|
||||
**New prompt template (replaces `generate`):**
|
||||
```
|
||||
You must answer each sub-question using ONLY the document chunks provided for it.
|
||||
Do not use any external knowledge.
|
||||
Format your answer as markdown sections — one section per sub-question.
|
||||
Each section should start with "## Sub-question N: <the question>"
|
||||
Each section should contain 1-5 bullet points.
|
||||
Cite your sources inline using bracket labels, e.g. [filename, page N].
|
||||
Place the citation at the end of each relevant bullet point.
|
||||
|
||||
{context_sections}
|
||||
|
||||
Answer:
|
||||
```
|
||||
|
||||
**Context format (replaces `{context}`):**
|
||||
```
|
||||
### Context for Sub-question 0: "What are time extensions?"
|
||||
[NEC4 ACC.pdf, page 3] Source: NEC4 ACC.pdf
|
||||
Summary: Clause 61.3 discusses time extensions...
|
||||
Content: Clause 61.3 states that the project manager...
|
||||
|
||||
[NEC4 Contract.pdf, page 12] Source: NEC4 Contract.pdf
|
||||
Summary: Notice requirements for time extensions...
|
||||
Content: Written notice must be given within...
|
||||
|
||||
### Context for Sub-question 1: "What notice is required?"
|
||||
[NEC4 ACC.pdf, page 7] Source: NEC4 ACC.pdf
|
||||
Summary: Notice requirements...
|
||||
Content: The contractor shall notify the project manager in writing...
|
||||
```
|
||||
|
||||
**Expected answer format:**
|
||||
```markdown
|
||||
## Sub-question 1: What are time extensions?
|
||||
- Time extensions must be notified to the project manager within 2 weeks [NEC4 ACC.pdf, page 3]
|
||||
- The project manager must acknowledge the notice within 1 week [NEC4 Contract.pdf, page 12]
|
||||
|
||||
## Sub-question 2: What notice is required?
|
||||
- Written notice must be given [NEC4 ACC.pdf, page 7]
|
||||
```
|
||||
|
||||
**Existing `generate_response()` is preserved** — not modified, not deprecated.
|
||||
|
||||
**Task 4.3.2: Update `_query_stream()` generate stage**
|
||||
|
||||
File: `backend/app/routers/query.py`
|
||||
|
||||
Changes:
|
||||
- Call `rag.generate_response_per_subquestion(extracted_questions, chunk_texts_by_subq, metadata_by_subq)`
|
||||
- New SSE event: `generating_subquestion` — emitted before each sub-question's section (lets frontend show progressive build)
|
||||
- `completed` SSE event includes both `answer` (markdown string) and `sub_question_sources` (grouped sources)
|
||||
|
||||
**New SSE event sequence:**
|
||||
```json
|
||||
{"phase": "decomposed", "extracted_questions": ["q1", "q2"]}
|
||||
{"phase": "retrieving"}
|
||||
{"phase": "filtering"}
|
||||
{"phase": "generating"}
|
||||
{"phase": "completed", "answer": "## Sub-question 1: ...\n\n...", "sub_question_sources": [[SourceMetadata, ...], [SourceMetadata, ...]]}
|
||||
{"phase": "error", "message": "..."}
|
||||
```
|
||||
|
||||
**New `QueryResponse` model:**
|
||||
|
||||
File: `backend/app/models/query.py`
|
||||
|
||||
```python
|
||||
class SubQuestionSources(BaseModel):
|
||||
sub_question_index: int
|
||||
sub_question_text: str
|
||||
sources: List[SourceMetadata]
|
||||
|
||||
class QueryResponse(BaseModel):
|
||||
extracted_questions: List[str]
|
||||
answer: str # Markdown with ## sections
|
||||
sub_question_sources: List[SubQuestionSources] # Grouped sources
|
||||
# Backward compat:
|
||||
sources: List[SourceMetadata] # Flattened version (all sources)
|
||||
```
|
||||
|
||||
**Commit**: `"feat: Phase 4.3 sub-question-organized response generation"`
|
||||
|
||||
### Sub-Phase 4.4: Backend — History & Prompt Template Updates
|
||||
|
||||
**Test files to write first:**
|
||||
- `test_phase4_history_format.py` — Tests new XML/JSON history formats
|
||||
- `test_phase4_prompt_templates.py` — Tests new generate template with `{context_sections}`
|
||||
|
||||
**Task 4.4.1: Update history recording**
|
||||
|
||||
File: `backend/app/routers/query.py` (the `_schedule_history` / `_record_history` helpers)
|
||||
|
||||
Changes:
|
||||
- `chunks_retrieved`: Store new grouped XML format (with `<sub_q>` wrappers)
|
||||
- `chunks_filtered`: Store new grouped XML format (with `<sub_q>` wrappers and `Relevance:` scores)
|
||||
- `sources`: Store grouped JSON: `json.dumps([[SourceMetadata_dict, ...], [...]])` (list of lists)
|
||||
- `final_answer`: Store markdown string with `##` sections
|
||||
- Existing fields (`chunks_retrieved_count`, `chunks_filtered_count`) keep total counts
|
||||
- New optional fields: `chunks_retrieved_per_subq_count`, `chunks_filtered_per_subq_count` (JSON array of ints)
|
||||
|
||||
**Task 4.4.2: Update history DB schema (minimal)**
|
||||
|
||||
File: `backend/app/core/sqlite_db.py`
|
||||
|
||||
Add two new columns (optional, NULL-able):
|
||||
```sql
|
||||
ALTER TABLE query_history ADD COLUMN chunks_retrieved_per_subq_count TEXT DEFAULT NULL;
|
||||
ALTER TABLE query_history ADD COLUMN chunks_filtered_per_subq_count TEXT DEFAULT NULL;
|
||||
```
|
||||
|
||||
These store JSON arrays like `[10, 8]` — one count per sub-question. NULL for pre-Package-4 records.
|
||||
|
||||
**Task 4.4.3: Update history Pydantic models**
|
||||
|
||||
File: `backend/app/models/history.py`
|
||||
|
||||
Add optional fields to `QueryHistoryRecord` and `QueryHistoryDetail`:
|
||||
```python
|
||||
chunks_retrieved_per_subq_count: Optional[str] = None # JSON array string
|
||||
chunks_filtered_per_subq_count: Optional[str] = None # JSON array string
|
||||
```
|
||||
|
||||
**Task 4.4.4: Update prompt templates**
|
||||
|
||||
File: `backend/app/core/sqlite_db.py` (seed data)
|
||||
|
||||
New `generate` template:
|
||||
```python
|
||||
"generate": (
|
||||
"You must answer each sub-question using ONLY the document chunks provided for it.\n"
|
||||
"Do not use any external knowledge.\n"
|
||||
"Format your answer as markdown sections — one section per sub-question.\n"
|
||||
"Each section should start with \"## Sub-question N: <the question>\"\n"
|
||||
"Each section should contain 1-5 bullet points.\n"
|
||||
"Cite your sources inline using bracket labels, e.g. [filename, page N].\n"
|
||||
"Place the citation at the end of each relevant bullet point.\n\n"
|
||||
"{context_sections}\n\n"
|
||||
"Answer:"
|
||||
)
|
||||
```
|
||||
|
||||
`decompose` and `filter` templates remain unchanged (they still use `{question}` placeholder — the orchestrator injects the right value at call time).
|
||||
|
||||
**Task 4.4.5: Update `PromptService` to handle new template placeholder**
|
||||
|
||||
File: `backend/app/services/prompt_service.py`
|
||||
|
||||
- Add `context_sections` as a known placeholder for the `generate` step (optional — `str.replace` already safe with unknown keys)
|
||||
- The `reset_to_defaults()` method must include the new generate template
|
||||
|
||||
**Task 4.4.6: Update history detail API response**
|
||||
|
||||
File: `backend/app/routers/history.py`
|
||||
|
||||
`GET /api/v1/history/{id}` response now includes `chunks_retrieved_per_subq_count` and `chunks_filtered_per_subq_count` when they are not NULL. Backward-compatible (older records return `null` for these fields).
|
||||
|
||||
**Commit**: `"feat: Phase 4.4 history schema, prompt templates, and Pydantic model updates"`
|
||||
|
||||
### Sub-Phase 4.5: Frontend — Types & State Management
|
||||
|
||||
**Test files to write first:**
|
||||
- `test_phase4_stream_state.test.tsx` — Tests `QueryStreamState` handles new response shape
|
||||
- `test_phase4_types.test.ts` — Tests type compatibility
|
||||
|
||||
**Task 4.5.1: Update TypeScript types**
|
||||
|
||||
File: `frontend/src/types/index.ts`
|
||||
|
||||
New types:
|
||||
```typescript
|
||||
interface SubQuestionSources {
|
||||
sub_question_index: number;
|
||||
sub_question_text: string;
|
||||
sources: SourceMetadata[];
|
||||
}
|
||||
|
||||
interface QueryStreamCompletedEvent {
|
||||
phase: 'completed';
|
||||
answer: string; // Markdown with ## sections
|
||||
sub_question_sources: SubQuestionSources[]; // Grouped sources
|
||||
}
|
||||
|
||||
interface QueryStreamDecomposedEvent {
|
||||
phase: 'decomposed';
|
||||
extracted_questions: string[];
|
||||
}
|
||||
|
||||
type QueryStreamEvent =
|
||||
| QueryStreamDecomposedEvent
|
||||
| { phase: 'retrieving' | 'filtering' | 'generating' }
|
||||
| QueryStreamCompletedEvent
|
||||
| { phase: 'error'; message: string };
|
||||
```
|
||||
|
||||
**Task 4.5.2: Update `QueryStreamState` and mutation handler**
|
||||
|
||||
File: `frontend/src/lib/queries.tsx`
|
||||
|
||||
Changes:
|
||||
```typescript
|
||||
interface QueryStreamState {
|
||||
extractedQuestions: string[] | null;
|
||||
answer: string | null; // Full markdown
|
||||
subQuestionSources: SubQuestionSources[] | null; // NEW — grouped sources
|
||||
phase: 'idle' | 'decomposing' | 'retrieving' | 'filtering' | 'generating' | 'completed' | 'error';
|
||||
error: Error | null;
|
||||
}
|
||||
```
|
||||
|
||||
In the `completed` case:
|
||||
```typescript
|
||||
case 'completed':
|
||||
setState(prev => ({
|
||||
...prev,
|
||||
answer: event.answer,
|
||||
subQuestionSources: event.sub_question_sources,
|
||||
phase: 'completed',
|
||||
}));
|
||||
break;
|
||||
```
|
||||
|
||||
**Commit**: `"feat: Phase 4.5 frontend types and state management for per-sub-q responses"`
|
||||
|
||||
### Sub-Phase 4.6: Frontend — ResponsePanel & ExtractedQuestionsDisplay
|
||||
|
||||
**Test files to write first:**
|
||||
- `test_phase4_response_panel.test.tsx` — Tests per-sub-question section rendering
|
||||
- `test_phase4_citation_parser.test.ts` — Tests per-sub-question citation lookup
|
||||
|
||||
**Task 4.6.1: Redesign `ResponsePanel` for sub-question sections**
|
||||
|
||||
File: `frontend/src/components/ResponsePanel.tsx`
|
||||
|
||||
Current: single `ReactMarkdown` block + flat sources grid.
|
||||
|
||||
New layout:
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ 📋 Response [Copy All] │
|
||||
├─────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─ Sub-question 1: What are time extensions? ─────┐│
|
||||
│ │ │
|
||||
│ │ • Time extensions must be notified... │
|
||||
│ │ [NEC4 ACC.pdf, page 3] │
|
||||
│ │ • The project manager must acknowledge... │
|
||||
│ │ [NEC4 Contract.pdf, page 12] │
|
||||
│ │ │
|
||||
│ │ Sources (2) [Expand ▼] │
|
||||
│ │ ┌──────────────────────────────────────────────┐ │
|
||||
│ │ │ NEC4 ACC.pdf, Page 3 │ NEC4 Contract, p12 │ │
|
||||
│ │ │ "Clause 61.3 states.." │ "Notice must be..." │ │
|
||||
│ │ └──────────────────────────────────────────────┘ │
|
||||
│ └────────────────────────────────────────────────────┘│
|
||||
│ │
|
||||
│ ┌─ Sub-question 2: What notice is required? ───────┐│
|
||||
│ │ │
|
||||
│ │ • Written notice must be given... │
|
||||
│ │ [NEC4 ACC.pdf, page 7] │
|
||||
│ │ │
|
||||
│ │ Sources (1) [Expand ▼] │
|
||||
│ └────────────────────────────────────────────────────┘│
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Implementation approach:
|
||||
1. Parse the `answer` markdown into sections using `## Sub-question N:` headers
|
||||
2. Map each section to its `SubQuestionSources` by matching index
|
||||
3. Render each section as an accordion/card with:
|
||||
- Header: sub-question text (from `SubQuestionSources`)
|
||||
- Body: `ReactMarkdown` for bullet points (with inline citation links)
|
||||
- Footer: collapsible sources grid (only sources belonging to this sub-question)
|
||||
4. Keep the existing citation link behavior (clickable `[filename, page N]` → PDF viewer)
|
||||
|
||||
**Task 4.6.2: Update `citationParser.ts` for per-sub-question lookup**
|
||||
|
||||
File: `frontend/src/utils/citationParser.ts`
|
||||
|
||||
Current: `buildCitationLookup(sources: SourceMetadata[])` — returns a single global map.
|
||||
|
||||
New: `buildCitationLookup(subQuestionSources: SubQuestionSources[])` — returns a map scoped to the correct sources for each section. The citation `[filename, page N]` match is looked up in the relevant sub-question's source list.
|
||||
|
||||
**Task 4.6.3: Update `ExtractedQuestionsDisplay` for anchors**
|
||||
|
||||
File: `frontend/src/components/ExtractedQuestionsDisplay.tsx`
|
||||
|
||||
Minor enhancement:
|
||||
- Make each extracted question a clickable anchor that scrolls to its corresponding section in the answer
|
||||
- Add `id="subq-{index}"` to each section header in `ResponsePanel`
|
||||
- Keep existing skeleton loading behavior
|
||||
|
||||
**Commit**: `"feat: Phase 4.6 frontend per-sub-question response rendering"`
|
||||
|
||||
### Sub-Phase 4.7: Testing & Polish
|
||||
|
||||
**Test files to write:**
|
||||
- `test_phase4_integration_query_pipeline.py` — Full integration test simulating per-sub-q pipeline
|
||||
- `test_phase4_acceptance_query.py` — Acceptance test with real LLM (manual run)
|
||||
- `test_phase4_e2e_query_flow.test.tsx` — Frontend e2e test with mocked SSE stream
|
||||
|
||||
**Task 4.7.1: Backend unit tests**
|
||||
|
||||
- Run `pytest backend/app/test/test_phase4_*.py -v` — all must pass
|
||||
- Verify no regressions in existing Phase 1 and Phase 3 tests
|
||||
- Update `test_phase1_rag_service.py` for new method signatures
|
||||
- Update `test_phase1_relevance_filter.py` for per-sub-q behavior
|
||||
- Rewrite `test_phase3_query_history_integration.py` for new pipeline flow
|
||||
- Update `test_phase3_prompt_injection.py` for new generate template
|
||||
|
||||
**Task 4.7.2: Backend acceptance tests**
|
||||
|
||||
- `test_phase4_acceptance_query.py` — real LLM, real ChromaDB
|
||||
- Verify: answer contains `## Sub-question` headers, sources grouped by sub-question index
|
||||
- Verify: each sub-question section has 1-5 bullet points
|
||||
- Verify: inline citations match the correct sub-question's source list
|
||||
|
||||
**Task 4.7.3: Frontend tests**
|
||||
|
||||
- `test_phase4_response_panel.test.tsx` — renders per-sub-question sections, expandable sources
|
||||
- `test_phase4_citation_parser.test.ts` — per-sub-question lookup returns correct source
|
||||
- `test_phase4_e2e_query_flow.test.tsx` — mocks SSE with new event format, verifies section rendering
|
||||
- Update existing `ResponsePanel.test.tsx` and `citationParser.test.ts` for new API
|
||||
|
||||
**Task 4.7.4: Frontend build verification**
|
||||
|
||||
- `npm run build` — no TypeScript errors
|
||||
- `npm test` — all 62 existing tests pass + new Phase 4 tests
|
||||
- Verify manual flow: ask question → see extracted questions → see per-sub-question answer sections → expand sources per section
|
||||
|
||||
**Task 4.7.5: Error handling**
|
||||
|
||||
- Empty decomposition: if `decompose()` returns `[]`, fall back to using original question as single sub-question
|
||||
- Empty retrieval for some sub-questions: that sub-question gets no chunks → section shows "No relevant information found"
|
||||
- Filter failure (all chunks below threshold): that sub-question gets no answer → graceful empty section
|
||||
- JSON parse failure in filter: fall back to including all chunks (no filtering) for that sub-question
|
||||
|
||||
**Task 4.7.6: Documentation**
|
||||
|
||||
- Update `AGENTS.md` with new pipeline architecture section
|
||||
- Add docstrings to all new methods (`retrieve_per_subquestion`, `filter_per_subquestion`, `generate_response_per_subquestion`)
|
||||
- Update prompt template documentation in system prompts page
|
||||
|
||||
**Commit**: `"feat: Phase 4.7 testing, error handling, and polish for per-sub-q pipeline"`
|
||||
|
||||
---
|
||||
|
||||
## Sub-Phase Summary
|
||||
|
||||
| Sub-Phase | Scope | Backend | Frontend | Tests | Status |
|
||||
|-----------|-------|---------|----------|-------|--------|
|
||||
| 4.1 | Per-sub-q retrieval | `rag.py`, `query.py`, format helpers | None | `test_phase4_retrieve_per_subquestion.py`, `test_phase4_query_router_retrieval.py` | ✅ Complete |
|
||||
| 4.2 | Per-sub-q filtering (1 LLM call) | `relevance_filter.py`, `query.py` | None | `test_phase4_relevance_filter_per_subq.py`, `test_phase4_query_router_filter.py` | ✅ Complete |
|
||||
| 4.3 | Sub-q-organized response generation | `rag.py`, `query.py`, `models/query.py` | None | `test_phase4_generate_per_subq.py`, `test_phase4_response_format.py` | ✅ Complete |
|
||||
| 4.4 | History schema, prompts, models | `sqlite_db.py`, `history.py` (router + models), `prompt_service.py` | None | `test_phase4_history_format.py`, `test_phase4_prompt_templates.py` | ✅ Complete |
|
||||
| 4.5 | Frontend types + state | None | `types/index.ts`, `lib/queries.tsx` | `test_phase4_stream_state.test.tsx`, `test_phase4_types.test.ts` | ✅ Complete |
|
||||
| 4.6 | Frontend rendering | None | `ResponsePanel.tsx`, `citationParser.ts`, `ExtractedQuestionsDisplay.tsx` | `test_phase4_response_panel.test.tsx`, `test_phase4_citation_parser.test.ts` | ✅ Complete |
|
||||
| 4.7 | Testing & polish | All affected files | All affected files | Integration + acceptance + e2e tests | ✅ Complete |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Sequence & Dependencies
|
||||
|
||||
```
|
||||
4.1 (Retrieval) ──┐
|
||||
├──► 4.2 (Filtering) ──► 4.3 (Generate) ──► 4.4 (History/Prompts)
|
||||
│ │
|
||||
│ ▼
|
||||
│ 4.5 (Frontend Types/State)
|
||||
│ │
|
||||
│ ▼
|
||||
│ 4.6 (Frontend Rendering)
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────▼
|
||||
4.7 (Testing & Polish)
|
||||
```
|
||||
|
||||
- **4.1 → 4.2 sequential**: Filtering needs per-sub-q chunk structure from retrieval
|
||||
- **4.2 → 4.3 sequential**: Generation needs filtered chunks from filtering stage
|
||||
- **4.3 → 4.4 sequential**: History recording and prompt templates need final data shapes
|
||||
- **4.4 → 4.5 parallel**: Backend prompt/history changes don't block frontend type definitions
|
||||
- **4.5 → 4.6 sequential**: Rendering needs types and state management
|
||||
- **4.7 blocked by all**: Integration tests need everything wired together
|
||||
|
||||
**Parallelization opportunity**: 4.5 (frontend types) could start as soon as 4.3 defines the SSE contract, but it's safer to start after 4.4 confirms the final data shapes.
|
||||
|
||||
---
|
||||
|
||||
## Affected Files — Complete Inventory
|
||||
|
||||
### Backend — New Files
|
||||
| File | Sub-Phase | Purpose |
|
||||
|------|-----------|---------|
|
||||
| `backend/app/test/test_phase4_retrieve_per_subquestion.py` | 4.1 | Unit test: `retrieve_per_subquestion()` |
|
||||
| `backend/app/test/test_phase4_query_router_retrieval.py` | 4.1 | Unit test: retrieval stage in `_query_stream` |
|
||||
| `backend/app/test/test_phase4_relevance_filter_per_subq.py` | 4.2 | Unit test: `filter_per_subquestion()` |
|
||||
| `backend/app/test/test_phase4_query_router_filter.py` | 4.2 | Unit test: filter stage in `_query_stream` |
|
||||
| `backend/app/test/test_phase4_generate_per_subq.py` | 4.3 | Unit test: `generate_response_per_subquestion()` |
|
||||
| `backend/app/test/test_phase4_response_format.py` | 4.3 | Unit test: answer format validation |
|
||||
| `backend/app/test/test_phase4_history_format.py` | 4.4 | Unit test: new XML/JSON history formats |
|
||||
| `backend/app/test/test_phase4_prompt_templates.py` | 4.4 | Unit test: new generate template |
|
||||
| `backend/app/test/test_phase4_integration_query_pipeline.py` | 4.7 | Integration test: full per-sub-q pipeline |
|
||||
| `backend/app/test/acceptance/test_phase4_acceptance_query.py` | 4.7 | Acceptance test: real LLM |
|
||||
|
||||
### Backend — Modified Files
|
||||
| File | Sub-Phase | Changes |
|
||||
|------|-----------|---------|
|
||||
| `backend/app/services/rag.py` | 4.1, 4.3 | Add `retrieve_per_subquestion()`, `generate_response_per_subquestion()` |
|
||||
| `backend/app/services/relevance_filter.py` | 4.2 | Add `filter_per_subquestion()` |
|
||||
| `backend/app/routers/query.py` | 4.1–4.4 | Refactor `_query_stream()`, add per-sub-q format helpers, update history recording |
|
||||
| `backend/app/models/query.py` | 4.3 | Add `SubQuestionSources` model, update `QueryResponse` |
|
||||
| `backend/app/models/history.py` | 4.4 | Add optional per-sub-q count fields |
|
||||
| `backend/app/core/sqlite_db.py` | 4.4 | Add new columns, update seed generate template |
|
||||
| `backend/app/services/prompt_service.py` | 4.4 | Update `reset_to_defaults()` generate template |
|
||||
| `backend/app/routers/history.py` | 4.4 | Include new fields in detail response |
|
||||
| `backend/app/core/config.py` | 4.1 | (Maybe) Add `retrieval_n_results_per_subq` setting |
|
||||
|
||||
### Backend — Tests Needing Update
|
||||
| File | Sub-Phase | Changes |
|
||||
|------|-----------|---------|
|
||||
| `backend/app/test/test_phase1_rag_service.py` | 4.7 | Add tests for new methods; existing tests unaffected |
|
||||
| `backend/app/test/test_phase1_relevance_filter.py` | 4.7 | Add tests for `filter_per_subquestion()` |
|
||||
| `backend/app/test/test_phase3_query_history_integration.py` | 4.7 | Rewrite pipeline simulation for per-sub-q flow |
|
||||
| `backend/app/test/test_phase3_prompt_injection.py` | 4.7 | Add tests for new generate template |
|
||||
| `backend/app/test/acceptance/test_acceptance_phase1_rag_query.py` | 4.7 | Rewrite — SSE parsing + new response shape |
|
||||
| `backend/app/test/conftest.py` | 4.7 | Add per-sub-q mock helpers |
|
||||
|
||||
### Frontend — New Files
|
||||
| File | Sub-Phase | Purpose |
|
||||
|------|-----------|---------|
|
||||
| `frontend/src/test/components/test_phase4_response_panel.test.tsx` | 4.7 | Component test: per-sub-q sections |
|
||||
| `frontend/src/test/utils/test_phase4_citation_parser.test.ts` | 4.7 | Unit test: per-sub-q citation lookup |
|
||||
| `frontend/src/test/e2e/test_phase4_query_flow.test.tsx` | 4.7 | E2E test: mocked SSE with new format |
|
||||
| `frontend/src/test/lib/test_phase4_stream_state.test.tsx` | 4.5 | State test: new event shapes |
|
||||
| `frontend/src/test/lib/test_phase4_types.test.ts` | 4.5 | Type test: type compatibility |
|
||||
|
||||
### Frontend — Modified Files
|
||||
| File | Sub-Phase | Changes |
|
||||
|------|-----------|---------|
|
||||
| `frontend/src/types/index.ts` | 4.5 | Add `SubQuestionSources`, update `QueryStreamEvent` |
|
||||
| `frontend/src/lib/queries.tsx` | 4.5 | Update `QueryStreamState`, `completed` event handler |
|
||||
| `frontend/src/components/ResponsePanel.tsx` | 4.6 | Redesign — per-sub-question sections with grouped sources |
|
||||
| `frontend/src/utils/citationParser.ts` | 4.6 | Update `buildCitationLookup()` for per-sub-q |
|
||||
| `frontend/src/components/ExtractedQuestionsDisplay.tsx` | 4.6 | Add anchor links to answer sections |
|
||||
| `frontend/src/pages/LTTPage.tsx` | 4.6 | Pass new props to children |
|
||||
|
||||
---
|
||||
|
||||
## Risk Register
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| LLM struggles with per-sub-q filtering prompt format | Medium | High — all chunks dropped | Use strong prompt constraints, validate JSON, fall back to including all chunks on parse failure |
|
||||
| LLM generates answer not matching `## Sub-question N:` format | Medium | Medium — frontend can't parse sections | Fall back to rendering as single block if parsing fails. Prompt engineering tuned for format compliance |
|
||||
| Same chunk retrieved by multiple sub-questions → duplicated in context | High | Low — slightly larger prompt but acceptable | Accept duplicates. ChromaDB naturally returns same doc if relevant to multiple queries. Each sub-q's evaluation is independent |
|
||||
| Per-sub-q retrieval = more ChromaDB queries = slower | Medium | Medium — N × retrieval latency | ChromaDB retrieval is fast (~10-50ms). 5 sub-questions × 10ms = 50ms overhead. Acceptable trade-off for better relevance. |
|
||||
| History DB migration fails for existing records | Low | Low — new columns are NULL-able | `ALTER TABLE ADD COLUMN ... DEFAULT NULL` is safe. Existing records work as before — `chunks_retrieved`/`chunks_filtered` still have flat XML. |
|
||||
| Frontend rendering breaks on older history records | Low | Low — answer format differs | `ResponsePanel` renders per-sub-q sections only when `subQuestionSources` is non-null. Older history records show flat answer as before. |
|
||||
| Prompt template migration breaks user-customized prompts | Medium | Medium — users lose their generate template | Warn in docs. The `generate` template changes fundamentally (single `{context}` → `{context_sections}`). Users must re-customize. |
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Backend
|
||||
- [x] `POST /api/v1/query` retrieves chunks per sub-question (verified by history XML showing `<sub_q>` wrappers)
|
||||
- [x] Filtering uses single LLM call evaluating chunks against their originating sub-question (verified by filter prompt)
|
||||
- [x] Response answer is organized by sub-question with `## Sub-question N:` headers
|
||||
- [x] `sub_question_sources` in SSE `completed` event is grouped by sub-question index
|
||||
- [x] History records include new grouped XML formats for `chunks_retrieved` and `chunks_filtered`
|
||||
- [x] History records include grouped `sources` JSON (list of lists)
|
||||
- [x] History records include per-sub-q chunk counts
|
||||
- [x] New `generate` prompt template uses `{context_sections}` placeholder
|
||||
- [x] Prompt service `reset_to_defaults()` includes new generate template
|
||||
- [x] Existing `decompose`, `filter` (old), `generate_response` (old) methods are unchanged
|
||||
- [x] All Phase 1, Phase 3, and new Phase 4 unit tests pass (312 passed, 4 skipped)
|
||||
- [x] All acceptance tests pass with real LLM (manual run)
|
||||
|
||||
### Frontend
|
||||
- [x] `QueryStreamState` includes `subQuestionSources` field
|
||||
- [x] `ResponsePanel` renders per-sub-question sections with expandable source grids
|
||||
- [x] Each section's sources are scoped to that sub-question (no cross-contamination)
|
||||
- [x] Inline citations `[filename, page N]` link to the correct PDF viewer page
|
||||
- [x] `ExtractedQuestionsDisplay` shows clickable anchors to answer sections
|
||||
- [x] Copy button copies all answer text including section headers
|
||||
- [x] Loading states: skeleton per section during generation
|
||||
- [x] Empty state: "No relevant information found" per sub-question (not entire response)
|
||||
- [x] All 62+ existing frontend tests still pass (183 passed)
|
||||
- [x] All new Phase 4 frontend tests pass
|
||||
- [x] `npm run build` succeeds with zero TypeScript errors
|
||||
- [x] Manual verification: full query flow works end-to-end
|
||||
|
||||
---
|
||||
|
||||
## New Dependencies
|
||||
|
||||
None. All changes use existing libraries (FastAPI, ChromaDB, OpenAI SDK, React, ReactMarkdown, TanStack Query).
|
||||
|
||||
---
|
||||
|
||||
## Decisions (All Confirmed)
|
||||
|
||||
| # | Topic | Decision |
|
||||
|---|-------|----------|
|
||||
| 1 | Single vs multiple filter LLM calls | **Single call** — user explicitly requested this |
|
||||
| 2 | Filter prompt design | Group chunks by sub-question in one prompt. JSON response maps sub-q indices to score arrays |
|
||||
| 3 | Answer format | Markdown with `## Sub-question N: <question>` headers |
|
||||
| 4 | Sources grouping | `sub_question_sources: [{index, text, sources}, ...]` in SSE + frontend |
|
||||
| 5 | History XML format | Add `<sub_q idx="N" question="...">` wrappers around chunk groups |
|
||||
| 6 | History DB migration | Add 2 new NULL-able columns. No data migration needed. |
|
||||
| 7 | Backward compatibility | Preserve old `retrieve()`, `filter()`, `generate_response()` methods. New methods are additive. |
|
||||
| 8 | Deduplication | None. Same chunk may appear in multiple sub-questions. Each sub-q evaluates independently. |
|
||||
| 9 | Error handling | Per-sub-question graceful degradation. Filter failure → include all chunks for that sub-q. Generate failure → "Unable to generate answer for this sub-question." |
|
||||
| 10 | Frontend rendering engine | Keep `ReactMarkdown`. Parse sections client-side by splitting on `## Sub-question N:` headers. |
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
None — all resolved.
|
||||
|
||||
| # | Question | Resolution |
|
||||
|---|----------|------------|
|
||||
| 1 | Progressive SSE events? | **Yes** — emit `generating_subquestion` as each sub-question's answer is generated. Frontend renders sections progressively. |
|
||||
| 2 | `retrieval_n_results` per sub-question or global? | **Global** — same value for all sub-questions. Simpler config, one setting. |
|
||||
| 3 | Fallback when decomposition returns 0 sub-questions? | **Fall back to original question** — treat as single sub-question. Pipeline runs as 1-sub-q case (retrieval via original question, no filtering needed for single sub-q, flat answer). |
|
||||
|
||||
---
|
||||
|
||||
## Test Plan Summary
|
||||
|
||||
### Backend (New Tests)
|
||||
|
||||
| File | Tests | Coverage |
|
||||
|------|-------|----------|
|
||||
| `test_phase4_retrieve_per_subquestion.py` | ~6 | Per-sub-q retrieval, empty input, single sub-q, dedup behavior |
|
||||
| `test_phase4_query_router_retrieval.py` | ~4 | SSE events during retrieval, chunk XML format |
|
||||
| `test_phase4_relevance_filter_per_subq.py` | ~6 | Per-sub-q filtering, JSON response parsing, threshold behavior |
|
||||
| `test_phase4_query_router_filter.py` | ~4 | SSE events during filtering, filtered XML format |
|
||||
| `test_phase4_generate_per_subq.py` | ~5 | Per-sub-q generate, prompt construction, answer format |
|
||||
| `test_phase4_response_format.py` | ~4 | Answer has `##` headers, citations in correct sections |
|
||||
| `test_phase4_history_format.py` | ~5 | New XML/JSON formats, per-sub-q counts |
|
||||
| `test_phase4_prompt_templates.py` | ~3 | New generate template, `{context_sections}` placeholder |
|
||||
| `test_phase4_integration_query_pipeline.py` | ~5 | Full pipeline simulation |
|
||||
| `test_phase4_acceptance_query.py` | ~3 | Real LLM end-to-end (manual) |
|
||||
|
||||
### Frontend (New Tests)
|
||||
|
||||
| File | Tests | Coverage |
|
||||
|------|-------|----------|
|
||||
| `test_phase4_stream_state.test.tsx` | ~4 | State updates for new event shapes |
|
||||
| `test_phase4_types.test.ts` | ~2 | Type compatibility checks |
|
||||
| `test_phase4_response_panel.test.tsx` | ~6 | Section rendering, source grouping, copy, loading |
|
||||
| `test_phase4_citation_parser.test.ts` | ~4 | Per-sub-q lookup, cross-section isolation |
|
||||
| `test_phase4_e2e_query_flow.test.tsx` | ~3 | Full SSE flow with mocked stream |
|
||||
Loading…
Reference in New Issue