docs: Package 7 — switch compact citations to sequential [1] [2] [3] numbering

This commit is contained in:
Woody 2026-05-15 09:58:07 +08:00
parent 29d2920b32
commit 8370f49631
1 changed files with 57 additions and 28 deletions

View File

@ -1,7 +1,7 @@
# Package 7 Enhancement Plan — Response Highlighting & Compact Citations # Package 7 Enhancement Plan — Response Highlighting & Compact Citations
**Source**: User request (2026-05-15) **Source**: User request (2026-05-15)
**Scope**: Two enhancements to the final RAG response: (1) yellow-highlight figures/dates in the answer using LLM-identified terms, (2) replace verbose `[filename.pdf, page N]` citations with compact `[ref]` clickable links. **Scope**: Two enhancements to the final RAG response: (1) yellow-highlight figures/dates in the answer using LLM-identified terms, (2) replace verbose `[filename.pdf, page N]` citations with compact sequentially-numbered `[1] [2] [3]` clickable links.
**Status**: Draft **Status**: Draft
--- ---
@ -12,7 +12,7 @@
Ask the LLM to identify monetary figures, percentages, statistics, and dates in the final response. Return these as a separate list. The frontend then searches the rendered answer and wraps matching terms in yellow `<mark>` highlights. Ask the LLM to identify monetary figures, percentages, statistics, and dates in the final response. Return these as a separate list. The frontend then searches the rendered answer and wraps matching terms in yellow `<mark>` highlights.
### Feature 2: Compact Citation Links ### Feature 2: Compact Citation Links
Replace the current inline citation display `[document_file_name.pdf, page N]` with a compact `[ref]` clickable link. The full source details remain visible in the collapsible source cards below each sub-question section. Replace the current inline citation display `[document_file_name.pdf, page N]` with compact sequentially-numbered links: `[1]`, `[2]`, `[3]`, etc. The full source details remain visible in the collapsible source cards below each sub-question section.
**Non-goals**: The underlying citation URL logic (PDF viewer vs highlight page routing) remains unchanged. Source cards are not modified. **Non-goals**: The underlying citation URL logic (PDF viewer vs highlight page routing) remains unchanged. Source cards are not modified.
@ -93,14 +93,15 @@ ReactMarkdown with custom mark component:
``` ```
Current: "...according to the report [NEC4 ACC.pdf, page 3]..." Current: "...according to the report [NEC4 ACC.pdf, page 3]..."
Desired: "...according to the report [ref]..." Desired: "...according to the report [1] and further noted in [2]..."
Implementation: In replaceCitationPatterns() (citationParser.ts:105-131), Implementation: In replaceCitationPatterns() (citationParser.ts:105-131),
replace the output format from [trimmed](url) to [ref](url). add a closure counter. Each matched citation gets a sequential number:
[trimmed](url) → [1](url), [2](url), [3](url) ...
``` ```
The citation URL, source lookup, and "View PDF" source cards are unchanged. The citation URL, source lookup, and "View PDF" source cards are unchanged.
The compact `[ref]` label is clickable and opens the same linked page as before. Each `[N]` label is clickable and opens the same linked page as before.
--- ---
@ -112,8 +113,8 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
| 2 | **Prompt-only change for LLM** (no backend code change for parsing) | The highlight instruction is added to the `generate_per_subq` prompt template only. `rag.py` requires zero changes — the `==term==` markers are part of the answer string and transparent to existing code. | | 2 | **Prompt-only change for LLM** (no backend code change for parsing) | The highlight instruction is added to the `generate_per_subq` prompt template only. `rag.py` requires zero changes — the `==term==` markers are part of the answer string and transparent to existing code. |
| 3 | **Custom `<mark>` component in ReactMarkdown** (not raw HTML injection) | `ReactMarkdown` strips raw HTML by default. Using `components={{ mark: HighlightMark }}` is the proper React way. Styling via Tailwind classes: `bg-yellow-200 rounded px-0.5`. | | 3 | **Custom `<mark>` component in ReactMarkdown** (not raw HTML injection) | `ReactMarkdown` strips raw HTML by default. Using `components={{ mark: HighlightMark }}` is the proper React way. Styling via Tailwind classes: `bg-yellow-200 rounded px-0.5`. |
| 4 | **`==term==` syntax choice** | `==...==` is used in many wiki/markdown dialects for highlighting (Obsidian, Markdown-it-mark). It's visually distinct from `**bold**`, `*italic*`, and `~~strikethrough~~`. No risk of colliding with existing markdown in LLM output. | | 4 | **`==term==` syntax choice** | `==...==` is used in many wiki/markdown dialects for highlighting (Obsidian, Markdown-it-mark). It's visually distinct from `**bold**`, `*italic*`, and `~~strikethrough~~`. No risk of colliding with existing markdown in LLM output. |
| 5 | **`[ref]` label for all citations** (not `[ref N]` sequential) | User explicitly requested `[ref]`. The source cards below each sub-question section already provide full source details. Multiple `[ref]` links on a page are distinguishable by their target URL (shown on hover in most browsers). If needed, sequential numbering (`[1]`, `[2]`) can be added later. | | 5 | **Sequential numbering `[1]` `[2]` `[3]`** (not a single `[ref]` label) | User requested sequential numbering for better visual clarity. Each citation in the answer gets a unique number (`[1]`, `[2]`, `[3]`...), making it easy to distinguish multiple references at a glance. The source cards below still provide full details. Implemented via a closure counter in `replaceCitationPatterns()`. |
| 6 | **Pure frontend change for compact citations** (no backend changes) | Citations are parsed purely on the frontend in `citationParser.ts`. The LLM still produces `[filename, page N]` — the frontend converts to `[ref]` during `replaceCitationPatterns()`. Backend is untouched. | | 6 | **Pure frontend change for compact citations** (no backend changes) | Citations are parsed purely on the frontend in `citationParser.ts`. The LLM still produces `[filename, page N]` — the frontend converts to sequential `[1]`, `[2]`, `[3]` during `replaceCitationPatterns()`. Backend is untouched. |
| 7 | **`processCitations` before highlight preprocessing** | Run citation processing first (convert `[filename]` to markdown links), then highlight preprocessing (convert `==term==` to `<mark>`). This order ensures `==` markers inside citation brackets don't interfere with citation regex and vice versa. | | 7 | **`processCitations` before highlight preprocessing** | Run citation processing first (convert `[filename]` to markdown links), then highlight preprocessing (convert `==term==` to `<mark>`). This order ensures `==` markers inside citation brackets don't interfere with citation regex and vice versa. |
| 8 | **Seed template update only** (not database migration) | The `_SEED_GENERATE_PER_SUBQ` template in `sqlite_db.py` is updated. Existing databases will NOT be auto-migrated — users must reset prompts or manually update via the API. This matches the existing pattern (all prompt changes are seed-only). | | 8 | **Seed template update only** (not database migration) | The `_SEED_GENERATE_PER_SUBQ` template in `sqlite_db.py` is updated. Existing databases will NOT be auto-migrated — users must reset prompts or manually update via the API. This matches the existing pattern (all prompt changes are seed-only). |
| 9 | **Yellow highlight color: `bg-yellow-200`** | Tailwind's `yellow-200` (`#FEF08A`) provides a soft, readable yellow that works on both light backgrounds and is distinct from the `text-blue-600` citation links. Add `rounded px-0.5` for visual polish. | | 9 | **Yellow highlight color: `bg-yellow-200`** | Tailwind's `yellow-200` (`#FEF08A`) provides a soft, readable yellow that works on both light backgrounds and is distinct from the `text-blue-600` citation links. Add `rounded px-0.5` for visual polish. |
@ -125,7 +126,7 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
| # | File | Purpose | | # | File | Purpose |
|---|------|---------| |---|------|---------|
| F1 | `frontend/src/test/utils/highlightParser.test.ts` | Unit tests for `highlightTerms()` function | | F1 | `frontend/src/test/utils/highlightParser.test.ts` | Unit tests for `highlightTerms()` function |
| F2 | `frontend/src/test/utils/citationCompactRef.test.ts` | Unit tests for compact `[ref]` citation format | | F2 | `frontend/src/test/utils/citationCompactRef.test.ts` | Unit tests for compact sequential `[1] [2] [3]` citation format |
| F3 | `frontend/src/test/components/ResponsePanel_highlights.test.tsx` | Integration: ResponsePanel renders highlights and compact refs | | F3 | `frontend/src/test/components/ResponsePanel_highlights.test.tsx` | Integration: ResponsePanel renders highlights and compact refs |
--- ---
@ -135,10 +136,10 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
| # | File | Change | | # | File | Change |
|---|------|--------| |---|------|--------|
| M1 | `backend/app/core/sqlite_db.py` | Update `_SEED_GENERATE_PER_SUBQ` template (lines 4253): add `==term==` instruction for figures/dates | | M1 | `backend/app/core/sqlite_db.py` | Update `_SEED_GENERATE_PER_SUBQ` template (lines 4253): add `==term==` instruction for figures/dates |
| M2 | `frontend/src/utils/citationParser.ts` | (a) Change `replaceCitationPatterns()` output from `[trimmed](url)` to `[ref](url)`. (b) Add `highlightTerms(markdown: string): string` function: regex `==(.+?)==``<mark>$1</mark>` | | M2 | `frontend/src/utils/citationParser.ts` | (a) Add closure counter in `replaceCitationPatterns()` to output `[1](url)`, `[2](url)` instead of `[trimmed](url)`. (b) Add `highlightTerms(markdown: string): string` function: regex `==(.+?)==``<mark>$1</mark>` |
| M3 | `frontend/src/components/ResponsePanel.tsx` | (a) Add `HighlightMark` component. (b) Add `highlightTerms()` call in `SubQuestionSection` and `FlatResponse` before ReactMarkdown. (c) Add `mark` to ReactMarkdown `components`. | | M3 | `frontend/src/components/ResponsePanel.tsx` | (a) Add `HighlightMark` component. (b) Add `highlightTerms()` call in `SubQuestionSection` and `FlatResponse` before ReactMarkdown. (c) Add `mark` to ReactMarkdown `components`. |
| M4 | `frontend/src/styles.css` | Add `.prose mark { background-color: #FEF08A; border-radius: 0.125rem; padding: 0 0.125rem; }` | | M4 | `frontend/src/styles.css` | Add `.prose mark { background-color: #FEF08A; border-radius: 0.125rem; padding: 0 0.125rem; }` |
| M5 | `frontend/src/test/utils/citationParser.test.ts` | Update existing citation tests to expect `[ref](url)` output format | | M5 | `frontend/src/test/utils/citationParser.test.ts` | Update existing citation tests to expect sequential `[1](url)`, `[2](url)` output format |
--- ---
@ -177,16 +178,43 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
- [ ] Export from `citationParser.ts` (add to existing exports) - [ ] Export from `citationParser.ts` (add to existing exports)
- [ ] **Test file**: `frontend/src/test/utils/highlightParser.test.ts` - [ ] **Test file**: `frontend/src/test/utils/highlightParser.test.ts`
### Task 7.3: Change citation output to compact `[ref]` ### Task 7.3: Change citation output to sequential `[1] [2] [3]`
- [ ] In `frontend/src/utils/citationParser.ts`, `replaceCitationPatterns()` (line 125): - [ ] In `frontend/src/utils/citationParser.ts`, `replaceCitationPatterns()` (lines 105131):
- Add a `let refCounter = 0` before the `.replace()` call
- Increment counter on each matched citation and output `[${refCounter}](${url})`:
```typescript ```typescript
// Before: function replaceCitationPatterns(
return `[${trimmed}](${url})` text: string,
// After: lookup: Map<string, SourceMetadata>,
return `[ref](${url})` highlightKeys?: Set<string>
): string {
const citationPattern = /(?<!!)\[([^\]]+)\](?!\()/g
let refCounter = 0
return text.replace(citationPattern, (fullMatch, content: string) => {
const trimmed = content.trim()
const source = findSource(trimmed, lookup)
if (source) {
let isReady = false
if (highlightKeys && source.document_id && source.sub_question_text) {
isReady = highlightKeys.has(
`${source.document_id}_${source.chunk_index}_${encodeURIComponent(source.sub_question_text)}`
)
}
const url = buildCitationUrl(source, isReady)
if (url) {
refCounter++
return `[${refCounter}](${url})`
}
}
return fullMatch
})
}
``` ```
- [ ] Update existing tests in `frontend/src/test/utils/citationParser.test.ts` to expect `[ref]` output - [ ] Update existing tests in `frontend/src/test/utils/citationParser.test.ts` to expect sequential `[1](url)`, `[2](url)` output
- [ ] **Test file**: `frontend/src/test/utils/citationCompactRef.test.ts` (optional — existing tests cover this after update) - [ ] **Test file**: `frontend/src/test/utils/citationCompactRef.test.ts` (optional — existing tests cover this after update)
### Task 7.4: Wire highlighting into ResponsePanel ### Task 7.4: Wire highlighting into ResponsePanel
@ -229,9 +257,9 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
- [ ] Run frontend tests: `cd frontend && pnpm test` - [ ] Run frontend tests: `cd frontend && pnpm test`
- [ ] Run backend tests: `cd backend && pytest app/test/ -v` (no regressions) - [ ] Run backend tests: `cd backend && pytest app/test/ -v` (no regressions)
- [ ] Verify existing citation tests pass with `[ref]` output format - [ ] Verify existing citation tests pass with sequential `[1](url)` `[2](url)` output format
- [ ] Verify new highlight tests pass - [ ] Verify new highlight tests pass
- [ ] Visual manual test: ask a question with figures/dates, verify yellow highlights appear and `[ref]` links work - [ ] Visual manual test: ask a question with figures/dates, verify yellow highlights appear and sequential `[1] [2]` links work
--- ---
@ -240,8 +268,8 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
| # | Test File | Type | Coverage | | # | Test File | Type | Coverage |
|---|-----------|------|----------| |---|-----------|------|----------|
| T7.2 | `frontend/src/test/utils/highlightParser.test.ts` | Unit | `highlightTerms()`: basic `==term==``<mark>`, multiple highlights, no false positives on `==` in code, edge cases (empty, no markers, adjacent markers) | | T7.2 | `frontend/src/test/utils/highlightParser.test.ts` | Unit | `highlightTerms()`: basic `==term==``<mark>`, multiple highlights, no false positives on `==` in code, edge cases (empty, no markers, adjacent markers) |
| T7.3 | `frontend/src/test/utils/citationParser.test.ts` (update) | Unit | Existing 16 tests updated to expect `[ref](url)` output. Add test: multiple citations all render as `[ref]` | | T7.3 | `frontend/src/test/utils/citationParser.test.ts` (update) | Unit | Existing 16 tests updated to expect sequential `[1](url)`, `[2](url)` output. Add test: multiple citations render as `[1]`, `[2]`, `[3]` |
| T7.4 | `frontend/src/test/components/ResponsePanel_highlights.test.tsx` | Integration | Full `ResponsePanel` with mock answer containing `==figure==` markers and `[citation]` brackets: verifies yellow `<mark>` elements render, `[ref]` links are clickable, source cards unchanged | | T7.4 | `frontend/src/test/components/ResponsePanel_highlights.test.tsx` | Integration | Full `ResponsePanel` with mock answer containing `==figure==` markers and `[citation]` brackets: verifies yellow `<mark>` elements render, `[1] [2]` links are clickable, source cards unchanged |
--- ---
@ -252,7 +280,7 @@ Task 7.1 (update seed template)
│ (independent — prompt template change tested implicitly) │ (independent — prompt template change tested implicitly)
Task 7.2 (highlightTerms function) Task 7.3 (compact [ref] output) Task 7.2 (highlightTerms function) Task 7.3 (sequential citation refs)
│ │ │ │
└──────────────┬───────────────────────┘ └──────────────┬───────────────────────┘
@ -274,15 +302,16 @@ Tasks 7.2 and 7.3 are independent and can run in parallel.
- [ ] `highlightTerms()` correctly converts all `==term==` patterns to `<mark class="bg-yellow-200 rounded px-0.5">term</mark>` - [ ] `highlightTerms()` correctly converts all `==term==` patterns to `<mark class="bg-yellow-200 rounded px-0.5">term</mark>`
- [ ] Yellow highlights render in the browser for monetary amounts (e.g., `HK$1,000,000`), percentages (e.g., `35%`), and dates (e.g., `1 January 2024`) - [ ] Yellow highlights render in the browser for monetary amounts (e.g., `HK$1,000,000`), percentages (e.g., `35%`), and dates (e.g., `1 January 2024`)
- [ ] Highlights do NOT appear inside code blocks or inline code - [ ] Highlights do NOT appear inside code blocks or inline code
- [ ] Highlights work correctly alongside citation links (`[ref]`) - [ ] Highlights work correctly alongside citation links (`[1] [2] [3]`)
- [ ] Highlights work in both sub-question mode and flat response mode - [ ] Highlights work in both sub-question mode and flat response mode
- [ ] No regressions in existing tests - [ ] No regressions in existing tests
### Feature 2: Compact Citations ### Feature 2: Compact Citations
- [ ] All inline citations display as `[ref]` instead of `[filename.pdf, page N]` - [ ] All inline citations display as sequential `[1]`, `[2]`, `[3]` instead of `[filename.pdf, page N]`
- [ ] `[ref]` links are clickable and navigate to the correct PDF viewer or highlight page - [ ] Sequential numbers increment correctly per answer section (reset per sub-question section)
- [ ] `[1]` `[2]` links are clickable and navigate to the correct PDF viewer or highlight page
- [ ] Source cards below each section still show full filename, page, date, and summary - [ ] Source cards below each section still show full filename, page, date, and summary
- [ ] Existing citation tests pass with updated `[ref]` output format - [ ] Existing citation tests pass with updated sequential `[1](url)` output format
- [ ] No regressions in existing tests - [ ] No regressions in existing tests
--- ---
@ -297,7 +326,7 @@ Tasks 7.2 and 7.3 are independent and can run in parallel.
5. `highlightTerms()` function can remain in `citationParser.ts` (no harm) 5. `highlightTerms()` function can remain in `citationParser.ts` (no harm)
### Feature 2 (Compact Citations): ### Feature 2 (Compact Citations):
1. Revert `citationParser.ts` line 125 from `[ref](${url})` back to `[${trimmed}](${url})` 1. Revert `citationParser.ts` line 125 from `[${refCounter}](${url})` back to `[${trimmed}](${url})` and remove the counter
2. Update test expectations back to full citation text 2. Update test expectations back to full citation text
Both features are independent — can roll back one without affecting the other. Both features are independent — can roll back one without affecting the other.
@ -310,7 +339,7 @@ Both features are independent — can roll back one without affecting the other.
- ❌ Do NOT change the SSE event schema (no new fields in `completed` event) - ❌ Do NOT change the SSE event schema (no new fields in `completed` event)
- ❌ Do NOT change the citation URL routing logic (`buildCitationUrl()` stays as-is) - ❌ Do NOT change the citation URL routing logic (`buildCitationUrl()` stays as-is)
- ❌ Do NOT modify source cards (`SubQuestionSourceCard`) — they still show full details - ❌ Do NOT modify source cards (`SubQuestionSourceCard`) — they still show full details
- ❌ Do NOT add tooltips or popovers on `[ref]` links (future enhancement) - ❌ Do NOT add tooltips or popovers on `[N]` links (future enhancement)
- ❌ Do NOT add per-term highlight metadata (type: figure vs date, color coding) - ❌ Do NOT add per-term highlight metadata (type: figure vs date, color coding)
- ❌ Do NOT add configuration UI for highlight colors - ❌ Do NOT add configuration UI for highlight colors
- ❌ Do NOT modify the non-sub-question fallback `generate_response()` (legacy flat mode — highlight markers work from prompt template alone) - ❌ Do NOT modify the non-sub-question fallback `generate_response()` (legacy flat mode — highlight markers work from prompt template alone)