From 8370f4963154706e9990fa025a4de240ed6b808b Mon Sep 17 00:00:00 2001 From: Woody Date: Fri, 15 May 2026 09:58:07 +0800 Subject: [PATCH] =?UTF-8?q?docs:=20Package=207=20=E2=80=94=20switch=20comp?= =?UTF-8?q?act=20citations=20to=20sequential=20[1]=20[2]=20[3]=20numbering?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .plans/package7_enhancement_plan.md | 85 +++++++++++++++++++---------- 1 file changed, 57 insertions(+), 28 deletions(-) diff --git a/.plans/package7_enhancement_plan.md b/.plans/package7_enhancement_plan.md index cfc92c7..3e780e7 100644 --- a/.plans/package7_enhancement_plan.md +++ b/.plans/package7_enhancement_plan.md @@ -1,7 +1,7 @@ # Package 7 Enhancement Plan — Response Highlighting & Compact Citations **Source**: User request (2026-05-15) -**Scope**: Two enhancements to the final RAG response: (1) yellow-highlight figures/dates in the answer using LLM-identified terms, (2) replace verbose `[filename.pdf, page N]` citations with compact `[ref]` clickable links. +**Scope**: Two enhancements to the final RAG response: (1) yellow-highlight figures/dates in the answer using LLM-identified terms, (2) replace verbose `[filename.pdf, page N]` citations with compact sequentially-numbered `[1] [2] [3]` clickable links. **Status**: Draft --- @@ -12,7 +12,7 @@ Ask the LLM to identify monetary figures, percentages, statistics, and dates in the final response. Return these as a separate list. The frontend then searches the rendered answer and wraps matching terms in yellow `` highlights. ### Feature 2: Compact Citation Links -Replace the current inline citation display `[document_file_name.pdf, page N]` with a compact `[ref]` clickable link. The full source details remain visible in the collapsible source cards below each sub-question section. +Replace the current inline citation display `[document_file_name.pdf, page N]` with compact sequentially-numbered links: `[1]`, `[2]`, `[3]`, etc. The full source details remain visible in the collapsible source cards below each sub-question section. **Non-goals**: The underlying citation URL logic (PDF viewer vs highlight page routing) remains unchanged. Source cards are not modified. @@ -93,14 +93,15 @@ ReactMarkdown with custom mark component: ``` Current: "...according to the report [NEC4 ACC.pdf, page 3]..." -Desired: "...according to the report [ref]..." +Desired: "...according to the report [1] and further noted in [2]..." Implementation: In replaceCitationPatterns() (citationParser.ts:105-131), -replace the output format from [trimmed](url) to [ref](url). +add a closure counter. Each matched citation gets a sequential number: + [trimmed](url) → [1](url), [2](url), [3](url) ... ``` The citation URL, source lookup, and "View PDF" source cards are unchanged. -The compact `[ref]` label is clickable and opens the same linked page as before. +Each `[N]` label is clickable and opens the same linked page as before. --- @@ -112,8 +113,8 @@ The compact `[ref]` label is clickable and opens the same linked page as before. | 2 | **Prompt-only change for LLM** (no backend code change for parsing) | The highlight instruction is added to the `generate_per_subq` prompt template only. `rag.py` requires zero changes — the `==term==` markers are part of the answer string and transparent to existing code. | | 3 | **Custom `` component in ReactMarkdown** (not raw HTML injection) | `ReactMarkdown` strips raw HTML by default. Using `components={{ mark: HighlightMark }}` is the proper React way. Styling via Tailwind classes: `bg-yellow-200 rounded px-0.5`. | | 4 | **`==term==` syntax choice** | `==...==` is used in many wiki/markdown dialects for highlighting (Obsidian, Markdown-it-mark). It's visually distinct from `**bold**`, `*italic*`, and `~~strikethrough~~`. No risk of colliding with existing markdown in LLM output. | -| 5 | **`[ref]` label for all citations** (not `[ref N]` sequential) | User explicitly requested `[ref]`. The source cards below each sub-question section already provide full source details. Multiple `[ref]` links on a page are distinguishable by their target URL (shown on hover in most browsers). If needed, sequential numbering (`[1]`, `[2]`) can be added later. | -| 6 | **Pure frontend change for compact citations** (no backend changes) | Citations are parsed purely on the frontend in `citationParser.ts`. The LLM still produces `[filename, page N]` — the frontend converts to `[ref]` during `replaceCitationPatterns()`. Backend is untouched. | +| 5 | **Sequential numbering `[1]` `[2]` `[3]`** (not a single `[ref]` label) | User requested sequential numbering for better visual clarity. Each citation in the answer gets a unique number (`[1]`, `[2]`, `[3]`...), making it easy to distinguish multiple references at a glance. The source cards below still provide full details. Implemented via a closure counter in `replaceCitationPatterns()`. | +| 6 | **Pure frontend change for compact citations** (no backend changes) | Citations are parsed purely on the frontend in `citationParser.ts`. The LLM still produces `[filename, page N]` — the frontend converts to sequential `[1]`, `[2]`, `[3]` during `replaceCitationPatterns()`. Backend is untouched. | | 7 | **`processCitations` before highlight preprocessing** | Run citation processing first (convert `[filename]` to markdown links), then highlight preprocessing (convert `==term==` to ``). This order ensures `==` markers inside citation brackets don't interfere with citation regex and vice versa. | | 8 | **Seed template update only** (not database migration) | The `_SEED_GENERATE_PER_SUBQ` template in `sqlite_db.py` is updated. Existing databases will NOT be auto-migrated — users must reset prompts or manually update via the API. This matches the existing pattern (all prompt changes are seed-only). | | 9 | **Yellow highlight color: `bg-yellow-200`** | Tailwind's `yellow-200` (`#FEF08A`) provides a soft, readable yellow that works on both light backgrounds and is distinct from the `text-blue-600` citation links. Add `rounded px-0.5` for visual polish. | @@ -125,7 +126,7 @@ The compact `[ref]` label is clickable and opens the same linked page as before. | # | File | Purpose | |---|------|---------| | F1 | `frontend/src/test/utils/highlightParser.test.ts` | Unit tests for `highlightTerms()` function | -| F2 | `frontend/src/test/utils/citationCompactRef.test.ts` | Unit tests for compact `[ref]` citation format | +| F2 | `frontend/src/test/utils/citationCompactRef.test.ts` | Unit tests for compact sequential `[1] [2] [3]` citation format | | F3 | `frontend/src/test/components/ResponsePanel_highlights.test.tsx` | Integration: ResponsePanel renders highlights and compact refs | --- @@ -135,10 +136,10 @@ The compact `[ref]` label is clickable and opens the same linked page as before. | # | File | Change | |---|------|--------| | M1 | `backend/app/core/sqlite_db.py` | Update `_SEED_GENERATE_PER_SUBQ` template (lines 42–53): add `==term==` instruction for figures/dates | -| M2 | `frontend/src/utils/citationParser.ts` | (a) Change `replaceCitationPatterns()` output from `[trimmed](url)` to `[ref](url)`. (b) Add `highlightTerms(markdown: string): string` function: regex `==(.+?)==` → `$1` | +| M2 | `frontend/src/utils/citationParser.ts` | (a) Add closure counter in `replaceCitationPatterns()` to output `[1](url)`, `[2](url)` instead of `[trimmed](url)`. (b) Add `highlightTerms(markdown: string): string` function: regex `==(.+?)==` → `$1` | | M3 | `frontend/src/components/ResponsePanel.tsx` | (a) Add `HighlightMark` component. (b) Add `highlightTerms()` call in `SubQuestionSection` and `FlatResponse` before ReactMarkdown. (c) Add `mark` to ReactMarkdown `components`. | | M4 | `frontend/src/styles.css` | Add `.prose mark { background-color: #FEF08A; border-radius: 0.125rem; padding: 0 0.125rem; }` | -| M5 | `frontend/src/test/utils/citationParser.test.ts` | Update existing citation tests to expect `[ref](url)` output format | +| M5 | `frontend/src/test/utils/citationParser.test.ts` | Update existing citation tests to expect sequential `[1](url)`, `[2](url)` output format | --- @@ -177,16 +178,43 @@ The compact `[ref]` label is clickable and opens the same linked page as before. - [ ] Export from `citationParser.ts` (add to existing exports) - [ ] **Test file**: `frontend/src/test/utils/highlightParser.test.ts` -### Task 7.3: Change citation output to compact `[ref]` +### Task 7.3: Change citation output to sequential `[1] [2] [3]` -- [ ] In `frontend/src/utils/citationParser.ts`, `replaceCitationPatterns()` (line 125): +- [ ] In `frontend/src/utils/citationParser.ts`, `replaceCitationPatterns()` (lines 105–131): + - Add a `let refCounter = 0` before the `.replace()` call + - Increment counter on each matched citation and output `[${refCounter}](${url})`: ```typescript - // Before: - return `[${trimmed}](${url})` - // After: - return `[ref](${url})` + function replaceCitationPatterns( + text: string, + lookup: Map, + highlightKeys?: Set + ): string { + const citationPattern = /(? { + const trimmed = content.trim() + const source = findSource(trimmed, lookup) + + if (source) { + let isReady = false + if (highlightKeys && source.document_id && source.sub_question_text) { + isReady = highlightKeys.has( + `${source.document_id}_${source.chunk_index}_${encodeURIComponent(source.sub_question_text)}` + ) + } + const url = buildCitationUrl(source, isReady) + if (url) { + refCounter++ + return `[${refCounter}](${url})` + } + } + + return fullMatch + }) + } ``` -- [ ] Update existing tests in `frontend/src/test/utils/citationParser.test.ts` to expect `[ref]` output +- [ ] Update existing tests in `frontend/src/test/utils/citationParser.test.ts` to expect sequential `[1](url)`, `[2](url)` output - [ ] **Test file**: `frontend/src/test/utils/citationCompactRef.test.ts` (optional — existing tests cover this after update) ### Task 7.4: Wire highlighting into ResponsePanel @@ -229,9 +257,9 @@ The compact `[ref]` label is clickable and opens the same linked page as before. - [ ] Run frontend tests: `cd frontend && pnpm test` - [ ] Run backend tests: `cd backend && pytest app/test/ -v` (no regressions) -- [ ] Verify existing citation tests pass with `[ref]` output format +- [ ] Verify existing citation tests pass with sequential `[1](url)` `[2](url)` output format - [ ] Verify new highlight tests pass -- [ ] Visual manual test: ask a question with figures/dates, verify yellow highlights appear and `[ref]` links work +- [ ] Visual manual test: ask a question with figures/dates, verify yellow highlights appear and sequential `[1] [2]` links work --- @@ -240,8 +268,8 @@ The compact `[ref]` label is clickable and opens the same linked page as before. | # | Test File | Type | Coverage | |---|-----------|------|----------| | T7.2 | `frontend/src/test/utils/highlightParser.test.ts` | Unit | `highlightTerms()`: basic `==term==` → ``, multiple highlights, no false positives on `==` in code, edge cases (empty, no markers, adjacent markers) | -| T7.3 | `frontend/src/test/utils/citationParser.test.ts` (update) | Unit | Existing 16 tests updated to expect `[ref](url)` output. Add test: multiple citations all render as `[ref]` | -| T7.4 | `frontend/src/test/components/ResponsePanel_highlights.test.tsx` | Integration | Full `ResponsePanel` with mock answer containing `==figure==` markers and `[citation]` brackets: verifies yellow `` elements render, `[ref]` links are clickable, source cards unchanged | +| T7.3 | `frontend/src/test/utils/citationParser.test.ts` (update) | Unit | Existing 16 tests updated to expect sequential `[1](url)`, `[2](url)` output. Add test: multiple citations render as `[1]`, `[2]`, `[3]` | +| T7.4 | `frontend/src/test/components/ResponsePanel_highlights.test.tsx` | Integration | Full `ResponsePanel` with mock answer containing `==figure==` markers and `[citation]` brackets: verifies yellow `` elements render, `[1] [2]` links are clickable, source cards unchanged | --- @@ -252,7 +280,7 @@ Task 7.1 (update seed template) │ │ (independent — prompt template change tested implicitly) │ -Task 7.2 (highlightTerms function) Task 7.3 (compact [ref] output) +Task 7.2 (highlightTerms function) Task 7.3 (sequential citation refs) │ │ └──────────────┬───────────────────────┘ │ @@ -274,15 +302,16 @@ Tasks 7.2 and 7.3 are independent and can run in parallel. - [ ] `highlightTerms()` correctly converts all `==term==` patterns to `term` - [ ] Yellow highlights render in the browser for monetary amounts (e.g., `HK$1,000,000`), percentages (e.g., `35%`), and dates (e.g., `1 January 2024`) - [ ] Highlights do NOT appear inside code blocks or inline code -- [ ] Highlights work correctly alongside citation links (`[ref]`) +- [ ] Highlights work correctly alongside citation links (`[1] [2] [3]`) - [ ] Highlights work in both sub-question mode and flat response mode - [ ] No regressions in existing tests ### Feature 2: Compact Citations -- [ ] All inline citations display as `[ref]` instead of `[filename.pdf, page N]` -- [ ] `[ref]` links are clickable and navigate to the correct PDF viewer or highlight page +- [ ] All inline citations display as sequential `[1]`, `[2]`, `[3]` instead of `[filename.pdf, page N]` +- [ ] Sequential numbers increment correctly per answer section (reset per sub-question section) +- [ ] `[1]` `[2]` links are clickable and navigate to the correct PDF viewer or highlight page - [ ] Source cards below each section still show full filename, page, date, and summary -- [ ] Existing citation tests pass with updated `[ref]` output format +- [ ] Existing citation tests pass with updated sequential `[1](url)` output format - [ ] No regressions in existing tests --- @@ -297,7 +326,7 @@ Tasks 7.2 and 7.3 are independent and can run in parallel. 5. `highlightTerms()` function can remain in `citationParser.ts` (no harm) ### Feature 2 (Compact Citations): -1. Revert `citationParser.ts` line 125 from `[ref](${url})` back to `[${trimmed}](${url})` +1. Revert `citationParser.ts` line 125 from `[${refCounter}](${url})` back to `[${trimmed}](${url})` and remove the counter 2. Update test expectations back to full citation text Both features are independent — can roll back one without affecting the other. @@ -310,7 +339,7 @@ Both features are independent — can roll back one without affecting the other. - ❌ Do NOT change the SSE event schema (no new fields in `completed` event) - ❌ Do NOT change the citation URL routing logic (`buildCitationUrl()` stays as-is) - ❌ Do NOT modify source cards (`SubQuestionSourceCard`) — they still show full details -- ❌ Do NOT add tooltips or popovers on `[ref]` links (future enhancement) +- ❌ Do NOT add tooltips or popovers on `[N]` links (future enhancement) - ❌ Do NOT add per-term highlight metadata (type: figure vs date, color coding) - ❌ Do NOT add configuration UI for highlight colors - ❌ Do NOT modify the non-sub-question fallback `generate_response()` (legacy flat mode — highlight markers work from prompt template alone)