legco_ai_assistant/.plans/package5_enhancement_plan.md

# Package 5 Enhancement Plan — Structured Output + Robust Citation Linking

**Source**: User request (2026-04-28)
**Scope**:
- Phase 5.1: Replace manual JSON parsing in the decompose stage with LangChain `with_structured_output()`
- Phase 5.2: Fix missing PDF links in citations and improve citation robustness
**Status**: Phases 5.1 ✅, 5.2 ✅ — 5.3 Deferred, 5.4 Planned (2026-04-28)

**LangChain version**: 1.2.15 (venv), `model_provider="openai"` with OpenRouter base URL (API-compatible proxy).

**Test results**:
- Backend: 115 passed, 0 failed (Phase 5.1 + Phase 5.2 + all integration/regression tests)
- Frontend: 187 passed, 1 failed (pre-existing e2e test failure unrelated to these changes)

---

## Objective

1. **Decompose structured output**: Eliminate `json.JSONDecodeError` failures in `QueryDecomposer.decompose()` by integrating LangChain's `with_structured_output()` to enforce a Pydantic schema at the API level. The LLM response is guaranteed to be a valid `SubQuestions` object — no manual `json.loads()`, no regex markdown stripping, no silent failures.

2. **Robust citation linking**: Fix the citation→PDF link pipeline so that:
   - `document_id` flows through to the frontend for fallback document-level links
   - `chunk_file_path` is always available (generate per-chunk PDFs for DOCX/TXT too, or provide a document-level PDF fallback)
   - Citation matching in `citationParser.ts` handles fuzzy filename matching (strips extensions, tolerates whitespace variations)
   - Frontend provides fallback "View Document" links when chunk-level PDF is unavailable

---

## Decision Register

| # | Decision | Rationale |
|---|----------|-----------|
| 1 | Use LangChain `with_structured_output()` (not OpenAI `response_format` directly) | User explicitly chose Option B. Provides cleaner API, auto-retry on validation failure, and future flexibility for other pipeline stages (filter, generate). |
| 2 | Add `langchain` + `langchain-openai` to `requirements.txt` | Required dependencies for `init_chat_model()` and `with_structured_output()`. `langchain` ~0.3.x for stable API. |
| 3 | Define `SubQuestions` Pydantic model with `questions: list[str]` | LangChain's `with_structured_output()` requires a wrapper Pydantic model — bare `list[str]` is unsupported by provider-native schema enforcement. |
| 4 | Keep `LLMClient` as the central LLM access layer, add LangChain-based `complete_structured()` method | Minimizes refactoring. `QueryDecomposer` calls `llm_client.complete_structured(prompt, SubQuestions)` instead of `llm_client.complete(prompt)`. Other callers (filter, generate) remain unchanged. |
| 5 | Run decomposition at `temperature=0.0` (was `0.7`) | Structured output benefits from deterministic behavior. Lower temperature = more reliable schema compliance. |
| 6 | Add `document_id` to `SourceMetadata` Pydantic model and frontend type | `document_id` is already stored in ChromaDB metadata (`metadata.py:70`) but is discarded during serialization. Adding it enables document-level fallback links. |
| 7 | ~~Generate **monolithic** PDFs for DOCX/TXT documents~~ → **DEFERRED** | More complex than needed. Instead, use fallback document-level links via `document_id` when `chunk_file_path` is null. DOCX/TXT PDF generation deferred to Phase 5.3. |
| 8 | Fuzzy citation matching: strip extensions, trim whitespace | `citationParser.ts` currently requires exact filename match. LLM may shorten `NEC4 ACC.pdf` to `NEC4 ACC` in citations. |
| 9 | Fallback "View Document" link when `chunk_file_path` is null | Even after Decision #7, network failures or edge cases may leave null paths. The frontend should show a document-level PDF link as fallback. |
| 10 | Keep `_extract_json_from_markdown()` as a fallback for backward compatibility | During a transition period (or if `with_structured_output()` fails), the existing regex-based extraction serves as a safety net. Log a warning when fallback is used. |
| 11 | Add `logger.warning` for JSON parse failures before returning empty | The biggest blind spot today: JSON parse failures are silent. Log the raw LLM response (truncated) so operators can debug. |
| 12 | Keep `QueryDecomposer.decompose()` return type as `Tuple[List[str], str]` | Existing callers unpack the tuple. Adding `Tuple[List[str], str, SubQuestions | None]` would break tests unnecessarily. The Pydantic model is internal to `complete_structured()`. |
| 13 | Spike-test LangChain structured output with OpenRouter BEFORE implementation | 2-minute test calling `init_chat_model().with_structured_output().ainvoke()` through OpenRouter to confirm `response_format={"type": "json_schema"}` is proxied correctly. If not, fall back to `method="function_calling"`. |
| 14 | Tighten `generate_per_subq` prompt alongside frontend fuzzy matching | Add "Copy the exact bracket labels shown in the document chunks — do not modify filenames or add/remove extensions." to seed template. Two-layer defense: prompt reduces hallucinations + fuzzy matching catches remaining cases. No separate task — folded into Task 5.2.3. |

---

## Phase 5.1 — Structured Output for Decompose

### Test Files (write BEFORE implementation)

| # | Test File | Coverage |
|---|-----------|----------|
| T5.1.1 | `backend/app/test/test_phase5_llm_client_structured.py` | `LLMClient.complete_structured()` with mock LangChain model. Tests: valid Pydantic return, validation error → retry, empty questions list, non-JSON fallback. |
| T5.1.2 | `backend/app/test/test_phase5_query_decomposer_structured.py` | `QueryDecomposer.decompose()` using `MockLLMClient.complete_structured()`. Tests: valid SubQuestions, empty questions, LLM error fallback, prompt service integration. |
| T5.1.3 | `backend/app/test/test_phase5_subquestions_model.py` | `SubQuestions` Pydantic model validation. Tests: valid input, empty list, too many questions, non-string items rejected. |
| T5.1.4 | `backend/app/test/test_phase5_decompose_logging.py` | Verify `logger.warning` is emitted when JSON parse fallback is triggered (backward-compat path). |

### Acceptance Tests

| # | Test File | Coverage |
|---|-----------|----------|
| AT5.1.1 | `backend/app/test/acceptance/test_acceptance_phase5_structured_decompose.py` | Real LLM call with structured output. Tests: Cantonese question → valid sub-questions, English question → valid sub-questions, very short question → 1 sub-question, very long question → ≤5 sub-questions. |

### Implementation Tasks

#### Task 5.1.1: Add LangChain dependencies

- [ ] Add `langchain>=0.3.0,<0.4.0` and `langchain-openai>=0.3.0,<0.4.0` to `backend/requirements.txt`
- [ ] Run `pip install -r backend/requirements.txt` in dev venv
- **Test file**: `test_phase5_subquestions_model.py` (can run immediately after install)

#### Task 5.1.2: Define `SubQuestions` Pydantic model

- [ ] Create `backend/app/models/decompose.py` with:
  ```python
  class SubQuestions(BaseModel):
      questions: list[str] = Field(
          description="2-5 simplified sub-questions, each focused on one aspect",
          min_length=1,
          max_length=5,
      )
  ```
- [ ] Add `min_length=1` and `max_length=5` Pydantic constraints (aligns with decompose prompt's "2-5")
- **Test file**: `test_phase5_subquestions_model.py`

#### Task 5.1.3: Add `complete_structured()` method to `LLMClient`

- [ ] In `llm_client.py`, import `init_chat_model` from `langchain.chat_models`
- [ ] Add `self._langchain_model` attribute (lazy-init from settings)
- [ ] Add `async complete_structured(prompt, pydantic_model, step_name) -> BaseModel` method:
  1. Calls `self._langchain_model.with_structured_output(pydantic_model, method="json_schema").ainvoke(prompt)`
  2. Returns the validated Pydantic model instance
  3. Logs timing (same pattern as existing `complete()`)
  4. Wraps errors in `LLMClientError`
- [ ] Use `temperature=0.0` via model config for structured calls
- **Test file**: `test_phase5_llm_client_structured.py`

#### Task 5.1.4: Refactor `QueryDecomposer.decompose()` to use structured output

- [ ] Change `decompose()` to call `self.llm_client.complete_structured(prompt, SubQuestions, step_name="QueryDecomposer")`
- [ ] Add fallback path: if `complete_structured()` raises → log warning → attempt legacy `complete()` + `json.loads()` → if that works, log info "structured output failed, fallback succeeded"
- [ ] Add `logger.warning("Decompose JSON parse failed, raw response (first 500 chars): %s", response[:500])` when both paths fail
- [ ] Keep return type `Tuple[List[str], str]` unchanged
- [ ] Keep `_extract_json_from_markdown()` for backward-compat fallback path
- **Test file**: `test_phase5_query_decomposer_structured.py` and `test_phase5_decompose_logging.py`

#### Task 5.1.5: Update prompt template for structured output

- [ ] Update `_SEED_DECOMPOSE` in `sqlite_db.py` to instruct the LLM about the expected structure
- [ ] New seed prompt: mention that output will be validated against a schema — more explicit about JSON array of strings requirement
- [ ] Run `seed_default_profiles()` to backfill existing profiles
- **Test file**: Existing `test_phase3_prompt_service.py` should continue to pass

#### Task 5.1.6: Integration test — end-to-end query pipeline

- [ ] Verify existing integration tests still pass (`test_integration_phase1.py`, `test_phase4_integration_query_pipeline.py`)
- [ ] Verify acceptance test passes with real LLM (`test_acceptance_phase1_rag_query.py`)
- [ ] Run full test suite: `cd backend && pytest app/test/test_phase5*.py app/test/test_phase4*.py app/test/test_phase3*.py -v`

---

## Phase 5.2 — Robust Citation Linking

### Test Files (write BEFORE implementation)

| # | Test File | Coverage |
|---|-----------|----------|
| T5.2.1 | `backend/app/test/test_phase5_source_metadata.py` | `SourceMetadata` model with `document_id`. Tests: serialization includes document_id, backward compat (old data without document_id). |
| T5.2.2 | `backend/app/test/test_phase5_docx_pdf_generation.py` | DOCX/TXT ingestion now sets `chunk_file_path`. Tests: DOCX ingestion produces chunk PDFs, TXT ingestion produces chunk PDFs, PDF generation errors are handled gracefully. |
| T5.2.3 | `frontend/src/test/utils/test_phase5_citation_parser_fuzzy.test.ts` | Fuzzy citation matching. Tests: citation `[NEC4 ACC]` matches source `NEC4 ACC.pdf`, citation `[nec4 acc.pdf, page 3]` matches after whitespace trim, citation `[NEC4 ACC.PDF]` matches case-insensitively, fallback "View Document" link shown when `chunk_file_path` is null. |
| T5.2.4 | `frontend/src/test/utils/test_phase5_citation_fallback_link.test.ts` | Fallback document link rendering. Tests: chunk with `chunk_file_path: null` but `document_id` present → renders "View Document" link, chunk with both null → remains plain text, chunk with `chunk_file_path` → renders page-level PDF link. |

### Acceptance Tests

| # | Test File | Coverage |
|---|-----------|----------|
| AT5.2.1 | `backend/app/test/acceptance/test_acceptance_phase5_citation_links.py` | Real LLM query with DOCX and PDF documents. Verify citations in the answer are clickable in the SSE response (sources include document_id and chunk_file_path). |

### Implementation Tasks

#### Task 5.2.1: Add `document_id` to `SourceMetadata` model

- [ ] In `backend/app/models/common.py`, add `document_id: Optional[str] = None` to `SourceMetadata`
- [ ] In `backend/app/routers/query.py` lines 310-319, include `document_id=meta.get("document_id")` when building `SourceMetadata` objects
- [ ] In `frontend/src/types/index.ts`, add `document_id: string | null` to `SourceMetadata` interface
- **Test file**: `test_phase5_source_metadata.py`

#### Task 5.2.2: Generate PDFs for DOCX/TXT documents during ingestion

- [ ] Add `reportlab` to `backend/requirements.txt` (lightweight, pure Python PDF generation, no external binaries)
- [ ] In `backend/app/routers/ingest.py` DOCX and TXT branches, add PDF generation logic:
  1. After chunking, generate a single PDF from the full text (one page per chunk)
  2. Store `chunk_filename = f"{stem}_chunk_{idx}.pdf"` for each chunk
  3. Set `chunk_file_paths` list and pass to `extract_metadata()`
- [ ] Add error handling: if PDF generation fails, `chunk_file_path` stays `None` (graceful degradation)
- [ ] Use `logger.warning` on generation failure
- **Test file**: `test_phase5_docx_pdf_generation.py`

#### Task 5.2.3: Improve `citationParser.ts` with fuzzy matching

- [ ] Add extension-stripping helper: `stripExtension(filename: string): string` — removes `.pdf`, `.docx`, `.txt`
- [ ] Modify `buildCitationLookup()` to register both `filename` and `stripExtension(filename)` as lookup keys
- [ ] Add trim-whitespace normalization on citation text before lookup
- [ ] Add test for LLM-common variations: `NEC4 ACC.pdf` vs `NEC4 ACC` vs `NEC4_acc.pdf`
- **Test file**: `test_phase5_citation_parser_fuzzy.test.ts`

#### Task 5.2.4: Add fallback "View Document" link in frontend

- [ ] In `citationParser.ts` `replaceCitationPatterns()`, when `source?.chunk_file_path` is null but `source?.document_id` exists:
  1. Build a URL to the document chunk list page: `/rag-database?document_id=${source.document_id}`
  2. Return `[${trimmed}](${url})` with a different CSS class (e.g., `text-green-600` for document-level vs `text-blue-600` for page-level)
- [ ] In `ResponsePanel.tsx`, update `CitationLink` component to accept a `variant` prop for visual differentiation
- **Test file**: `test_phase5_citation_fallback_link.test.ts`

#### Task 5.2.5: Integration and regression testing

- [ ] Verify all existing citation parser tests still pass: `cd frontend && npx vitest run src/test/utils/citationParser.test.ts`
- [ ] Verify ResponsePanel tests still pass: `npx vitest run src/test/components/ResponsePanel.test.tsx`
- [ ] Run full frontend test suite: `npm test`
- [ ] Verify SSE streaming integration: query with a mix of PDF and DOCX documents, confirm citations are clickable

---

## Dependency Graph

```
Phase 5.1 (Structured Output)
  Task 5.1.1 (add deps) ──┬── Task 5.1.2 (SubQuestions model) ── Task 5.1.3 (complete_structured)
                           │                                           │
                           │                                           ▼
                           │                              Task 5.1.4 (refactor decompose)
                           │                                           │
                           │                              Task 5.1.5 (update prompt template)
                           │                                           │
                           │                                           ▼
                           │                              Task 5.1.6 (integration tests)
                           │
Phase 5.2 (Citation Linking) — independent, can run in parallel with 5.1
  Task 5.2.1 (document_id in model) ──┬── Task 5.2.3 (fuzzy matching)
  Task 5.2.2 (DOCX/TXT PDF gen)    ──┤
                                      ├── Task 5.2.4 (fallback link)
                                      │
                                      ▼
                              Task 5.2.5 (integration tests)
```

---

## Acceptance Criteria

### Phase 5.1 Completion Checklist

- [x] `LLMClient.complete_structured()` returns validated `SubQuestions` Pydantic model — no `json.JSONDecodeError` possible
- [x] `QueryDecomposer.decompose()` never returns `[]` due to JSON parse failure
- [x] Fallback path (legacy `json.loads()`) logs a warning when triggered
- [x] Existing decompose tests pass (`test_phase1_query_decomposer.py`)
- [x] New structured output tests pass (`test_phase5_*.py`) — 33 tests
- [x] Spike test passed: Cantonese + English → valid sub-questions
- [x] `SQLite` seed templates updated and backfilled to all profiles
- [x] `langchain` and `langchain-openai` installed in venv (1.2.x)

### Phase 5.2 Completion Checklist

- [x] `SourceMetadata` includes `document_id` in both backend and frontend types
- [ ] ~~DOCX/TXT ingestion generates per-chunk PDF files~~ → **DEFERRED** to Phase 5.3
- [x] `citationParser.ts` matches `[NEC4 ACC]` to source `NEC4 ACC.pdf` (fuzzy matching)
- [x] `citationParser.ts` renders fallback link to `/rag-database?document=xxx` when `chunk_file_path` is null but `document_id` exists
- [x] `RAGDatabasePage` auto-expands document from `?document=` URL param
- [x] All existing citation parser tests pass (14 tests)
- [x] All existing ResponsePanel tests pass
- [x] `generate_per_subq` seed prompt tightened: "Copy the exact bracket labels shown"

---

## Rollback Plan

If `with_structured_output()` causes issues in production:
1. The `complete_structured()` method wraps errors in `LLMClientError` — same exception type as existing `complete()`
2. `QueryDecomposer.decompose()` has a fallback to legacy `complete()` + `json.loads()` path
3. The `_extract_json_from_markdown()` function is preserved for backward compatibility
4. If LangChain is a complete failure, revert `requirements.txt` and `llm_client.py` changes (3 files), keeping the Pydantic model and improved logging

---

---

## Phase 5.3 — DOCX/TXT PDF Generation ✅

Generate per-chunk PDF files for DOCX/TXT documents at ingestion time so they have the same `chunk_file_path` → PDF viewer flow as PDF documents.

**Status**: Complete (2026-04-28). Implemented in commit `25b26c9`.
- `reportlab==4.2.5` added to `requirements.txt`
- New `backend/app/utils/text_to_pdf.py`: renders chunk text as simple PDFs with word wrapping
- `ingest.py` DOCX/TXT branches: generates `{stem}_chunk_{idx}.pdf` per chunk, passes `chunk_file_paths` to `extract_metadata()`
- Graceful degradation: `chunk_file_path` stays `None` on generation failure (logged as warning)
- Tests: `test_phase5_docx_pdf_generation.py` (5 tests), updated `test_phase1_ingest_page_aware.py` (2 assertions)

---

## Phase 5.4 — Sentence-Level Highlighting (PLANNED)

### Problem

When a user clicks a citation link to view a cited chunk, they see the full chunk text (up to ~1000 tokens). They have to manually scan to find which sentences actually drove the relevance. This is especially painful for long, dense chunks.

### Solution

**On-the-fly highlighted HTML chunk views** served by the backend. When a citation link is clicked, the frontend passes the sub-question that retrieved that chunk. The backend splits the chunk into sentences, computes embedding similarity of each sentence to the sub-question, and returns a styled HTML page with relevant sentences highlighted.

### Why HTML, not PDF?

| Approach | Complexity | Works for all doc types? | Preserves original formatting? |
|---|---|---|---|
| Highlighted HTML page | **Low** | ✅ Yes (uses chunk text) | ❌ Plain text only |
| Highlighted PDF via reportlab | Medium | ✅ Yes (new PDF) | ❌ Plain text only |
| Overlay highlights on existing PDF | High | ⚠️ PDF only | ✅ Yes |

**Recommendation: HTML page.** Simple, fast, works uniformly for PDF/DOCX/TXT chunks. Original formatting is preserved in the existing PDF viewer (`chunk_file_path` link) — the highlighted HTML view is a **supplementary** view reached via a separate button/link. The two views coexist: "View Original PDF" vs "View Highlighted Text".

### How It Works (No LLM Needed)

```
User clicks citation [NEC4 ACC, chunk 3]
       │
       ▼
Frontend sends: GET /api/v1/chunks/highlight?document_id=abc&chunk_index=2&sub_question=...
       │
       ▼
Backend:
  1. Fetch chunk text from ChromaDB                          [chromadb get()]
  2. Split into sentences                                    [nltk.sent_tokenize or regex]
  3. Embed sub-question                                      [existing embedding model]
  4. Embed each sentence (batch, parallel)                   [same model]
  5. Compute cosine similarity per sentence vs sub-question  [numpy]
  6. Return HTML with yellow background on sentences > threshold
       │
       ▼
Frontend renders HTML in an iframe or new tab
```

### What Gets Highlighted

```
┌──────────────────────────────────────────────────────────┐
│ Chunk: NEC4 ACC, page 12          [View Original PDF →]  │
├──────────────────────────────────────────────────────────┤
│                                                            │
│ The programme shall be prepared in a form acceptable to   │
│ the Project Manager. It shall include:                    │
│                                                            │
│ ████████████████████████████████████████████████████████ │
│ █ The starting date, access dates, and Key Dates.       █ │  ← High similarity
│ ████████████████████████████████████████████████████████ │
│                                                            │
│ The Contractor shall submit a first programme within      │
│ ████████████████████████████████████████████████████████ │
│ █ two weeks of the starting date.                       █ │  ← High similarity
│ ████████████████████████████████████████████████████████ │
│                                                            │
│ The Project Manager may instruct the Contractor to        │
│ submit a revised programme showing the effects of a       │
│ compensation event. This does not affect the Contractor's │
│ right to be paid for preparing the programme.             │  ← Low similarity (no highlight)
│                                                            │
└──────────────────────────────────────────────────────────┘
```

### Key Design Decisions

| # | Decision | Rationale |
|---|---|---|
| 1 | HTML page, not PDF | Zero dependency (`reportlab` not needed). Faster to generate. CSS-based highlighting is more flexible. Original PDF view remains available separately. |
| 2 | Embedding similarity, not LLM | No API cost, no latency. The embedding model is already running. Cosine similarity is cheap. |
| 3 | Sentence-level granularity | Paragraph-level is too coarse (whole paragraph might be dimly relevant). Word/phrase-level is too noisy. Sentences are the natural unit of meaning. |
| 4 | Embed sentences in batch | A 1000-token chunk has ~8-12 sentences. One batch embedding call is fast (single API round-trip). |
| 5 | Configurable threshold (env var) | `HIGHLIGHT_SIMILARITY_THRESHOLD` (default 0.5). Tune per embedding model. |
| 6 | Cache sentence embeddings per chunk | A chunk may be cited in multiple queries. Cache sentence embeddings in ChromaDB metadata or SQLite to avoid recomputation. |
| 7 | Graceful degradation | If embedding fails → return plain text chunk view. If sentence splitting fails → highlight entire chunk. |
| 8 | Frontend: "View Highlighted" link alongside "View PDF" | The existing PDF viewer link (`chunk_file_path`) stays. A second link opens the highlighted HTML view. Both visible, user chooses. |

### Implementation Tasks

#### Task 5.4.1: Backend — Sentence splitting utility

- [ ] Create `backend/app/utils/sentence_splitter.py`
- [ ] Function `split_sentences(text: str) -> list[dict]` returns `[{text, start_char, end_char}, ...]`
- [ ] Use `nltk.sent_tokenize` with fallback to regex (`re.split(r'(?<=[.!?])\s+')`)
- [ ] NLTK punkt data auto-downloaded on first use (or bundled)
- [ ] Handle edge cases: empty text, single sentence, lists/bullets
- **Test file**: `test_phase5_sentence_splitter.py`

#### Task 5.4.2: Backend — Highlighted chunk endpoint

- [ ] New endpoint: `GET /api/v1/chunks/highlight`
- [ ] Query params: `document_id`, `chunk_index`, `sub_question`
- [ ] Returns `text/html` (not JSON)
- [ ] Logic in `backend/app/services/chunk_highlight_service.py`:
  1. Fetch chunk from ChromaDB by `document_id` + `chunk_index`
  2. Split into sentences via `split_sentences()`
  3. Get embedding for `sub_question` via existing embedding model
  4. Get embeddings for all sentences in one batch call
  5. Compute cosine similarity: `np.dot(q_emb, s_emb) / (norm(q) * norm(s))`
  6. Mark sentences with similarity > threshold as highlighted
  7. Render HTML template with inline CSS (yellow background, subtle border)
- **Test file**: `test_phase5_chunk_highlight.py`

#### Task 5.4.3: Frontend — "View Highlighted" link in citations and sources

- [ ] In `citationParser.ts` and `ResponsePanel.tsx`, add a "🔍" or "View Highlighted" link next to each source
- [ ] Link target: `/api/v1/chunks/highlight?document_id=...&chunk_index=...&sub_question=...`
- [ ] The sub-question is the one that retrieved this chunk (already available in the sources structure: `source.sub_question_index` → look up sub-question text)
- [ ] Open in new tab or modal
- **Test file**: Update `citationParser.test.ts` and `ResponsePanel.test.tsx`

#### Task 5.4.4: Integration testing

- [ ] Verify highlight endpoint returns 200 with valid HTML for all doc types (PDF, DOCX, TXT)
- [ ] Verify sentence highlighting is proportional to relevance (spot-check manually)
- [ ] Verify caching works (second request for same chunk is faster)
- [ ] Verify graceful degradation (embedding API down → plain text still served)
- [ ] Run full test suite

### Test Files

| # | Test File | Coverage |
|---|-----------|----------|
| T5.4.1 | `backend/app/test/test_phase5_sentence_splitter.py` | Sentence splitting: English, mixed punctuation, empty, single sentence, bullet lists |
| T5.4.2 | `backend/app/test/test_phase5_chunk_highlight.py` | Highlight endpoint: valid request → HTML with highlights, threshold filtering, no sentences above threshold → all plain, missing document/chunk → 404, embedding failure → fallback plain text |
| T5.4.3 | `frontend/src/test/utils/citationParser.test.ts` (update) | Citation links include highlight URL when sub-question context available |
| T5.4.4 | `frontend/src/test/components/ResponsePanel.test.tsx` (update) | Sources section renders "View Highlighted" link alongside "View PDF" |

### Acceptance Tests

| # | Test File | Coverage |
|---|-----------|----------|
| AT5.4.1 | `backend/app/test/acceptance/test_acceptance_phase5_highlight.py` | Real LLM query → real embeddings → open highlighted view → verify yellow spans exist on relevant sentences |

---

## Commit Plan

| Commit | Message | Scope |
|--------|---------|-------|
| 1 | `feat: add LangChain deps and SubQuestions Pydantic model` | Tasks 5.1.1 + 5.1.2 + tests |
| 2 | `feat: add LLMClient.complete_structured() with LangChain` | Task 5.1.3 + tests |
| 3 | `feat: refactor QueryDecomposer to use structured output with fallback` | Task 5.1.4 + tests |
| 4 | `chore: update decompose seed prompt for structured output` | Task 5.1.5 |
| 5 | `feat: add document_id to SourceMetadata model` | Task 5.2.1 + tests |
| 6 | `feat: fuzzy citation matching and document fallback links` | Tasks 5.2.3 + 5.2.4 + tests |
| 7 | `feat: sentence-level chunk highlighting via embedding similarity` | Phase 5.4 (all tasks) |