docs: Package 7 — switch compact citations to sequential [1] [2] [3] numbering
This commit is contained in:
parent
29d2920b32
commit
8370f49631
|
|
@ -1,7 +1,7 @@
|
|||
# Package 7 Enhancement Plan — Response Highlighting & Compact Citations
|
||||
|
||||
**Source**: User request (2026-05-15)
|
||||
**Scope**: Two enhancements to the final RAG response: (1) yellow-highlight figures/dates in the answer using LLM-identified terms, (2) replace verbose `[filename.pdf, page N]` citations with compact `[ref]` clickable links.
|
||||
**Scope**: Two enhancements to the final RAG response: (1) yellow-highlight figures/dates in the answer using LLM-identified terms, (2) replace verbose `[filename.pdf, page N]` citations with compact sequentially-numbered `[1] [2] [3]` clickable links.
|
||||
**Status**: Draft
|
||||
|
||||
---
|
||||
|
|
@ -12,7 +12,7 @@
|
|||
Ask the LLM to identify monetary figures, percentages, statistics, and dates in the final response. Return these as a separate list. The frontend then searches the rendered answer and wraps matching terms in yellow `<mark>` highlights.
|
||||
|
||||
### Feature 2: Compact Citation Links
|
||||
Replace the current inline citation display `[document_file_name.pdf, page N]` with a compact `[ref]` clickable link. The full source details remain visible in the collapsible source cards below each sub-question section.
|
||||
Replace the current inline citation display `[document_file_name.pdf, page N]` with compact sequentially-numbered links: `[1]`, `[2]`, `[3]`, etc. The full source details remain visible in the collapsible source cards below each sub-question section.
|
||||
|
||||
**Non-goals**: The underlying citation URL logic (PDF viewer vs highlight page routing) remains unchanged. Source cards are not modified.
|
||||
|
||||
|
|
@ -93,14 +93,15 @@ ReactMarkdown with custom mark component:
|
|||
|
||||
```
|
||||
Current: "...according to the report [NEC4 ACC.pdf, page 3]..."
|
||||
Desired: "...according to the report [ref]..."
|
||||
Desired: "...according to the report [1] and further noted in [2]..."
|
||||
|
||||
Implementation: In replaceCitationPatterns() (citationParser.ts:105-131),
|
||||
replace the output format from [trimmed](url) to [ref](url).
|
||||
add a closure counter. Each matched citation gets a sequential number:
|
||||
[trimmed](url) → [1](url), [2](url), [3](url) ...
|
||||
```
|
||||
|
||||
The citation URL, source lookup, and "View PDF" source cards are unchanged.
|
||||
The compact `[ref]` label is clickable and opens the same linked page as before.
|
||||
Each `[N]` label is clickable and opens the same linked page as before.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -112,8 +113,8 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
|
|||
| 2 | **Prompt-only change for LLM** (no backend code change for parsing) | The highlight instruction is added to the `generate_per_subq` prompt template only. `rag.py` requires zero changes — the `==term==` markers are part of the answer string and transparent to existing code. |
|
||||
| 3 | **Custom `<mark>` component in ReactMarkdown** (not raw HTML injection) | `ReactMarkdown` strips raw HTML by default. Using `components={{ mark: HighlightMark }}` is the proper React way. Styling via Tailwind classes: `bg-yellow-200 rounded px-0.5`. |
|
||||
| 4 | **`==term==` syntax choice** | `==...==` is used in many wiki/markdown dialects for highlighting (Obsidian, Markdown-it-mark). It's visually distinct from `**bold**`, `*italic*`, and `~~strikethrough~~`. No risk of colliding with existing markdown in LLM output. |
|
||||
| 5 | **`[ref]` label for all citations** (not `[ref N]` sequential) | User explicitly requested `[ref]`. The source cards below each sub-question section already provide full source details. Multiple `[ref]` links on a page are distinguishable by their target URL (shown on hover in most browsers). If needed, sequential numbering (`[1]`, `[2]`) can be added later. |
|
||||
| 6 | **Pure frontend change for compact citations** (no backend changes) | Citations are parsed purely on the frontend in `citationParser.ts`. The LLM still produces `[filename, page N]` — the frontend converts to `[ref]` during `replaceCitationPatterns()`. Backend is untouched. |
|
||||
| 5 | **Sequential numbering `[1]` `[2]` `[3]`** (not a single `[ref]` label) | User requested sequential numbering for better visual clarity. Each citation in the answer gets a unique number (`[1]`, `[2]`, `[3]`...), making it easy to distinguish multiple references at a glance. The source cards below still provide full details. Implemented via a closure counter in `replaceCitationPatterns()`. |
|
||||
| 6 | **Pure frontend change for compact citations** (no backend changes) | Citations are parsed purely on the frontend in `citationParser.ts`. The LLM still produces `[filename, page N]` — the frontend converts to sequential `[1]`, `[2]`, `[3]` during `replaceCitationPatterns()`. Backend is untouched. |
|
||||
| 7 | **`processCitations` before highlight preprocessing** | Run citation processing first (convert `[filename]` to markdown links), then highlight preprocessing (convert `==term==` to `<mark>`). This order ensures `==` markers inside citation brackets don't interfere with citation regex and vice versa. |
|
||||
| 8 | **Seed template update only** (not database migration) | The `_SEED_GENERATE_PER_SUBQ` template in `sqlite_db.py` is updated. Existing databases will NOT be auto-migrated — users must reset prompts or manually update via the API. This matches the existing pattern (all prompt changes are seed-only). |
|
||||
| 9 | **Yellow highlight color: `bg-yellow-200`** | Tailwind's `yellow-200` (`#FEF08A`) provides a soft, readable yellow that works on both light backgrounds and is distinct from the `text-blue-600` citation links. Add `rounded px-0.5` for visual polish. |
|
||||
|
|
@ -125,7 +126,7 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
|
|||
| # | File | Purpose |
|
||||
|---|------|---------|
|
||||
| F1 | `frontend/src/test/utils/highlightParser.test.ts` | Unit tests for `highlightTerms()` function |
|
||||
| F2 | `frontend/src/test/utils/citationCompactRef.test.ts` | Unit tests for compact `[ref]` citation format |
|
||||
| F2 | `frontend/src/test/utils/citationCompactRef.test.ts` | Unit tests for compact sequential `[1] [2] [3]` citation format |
|
||||
| F3 | `frontend/src/test/components/ResponsePanel_highlights.test.tsx` | Integration: ResponsePanel renders highlights and compact refs |
|
||||
|
||||
---
|
||||
|
|
@ -135,10 +136,10 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
|
|||
| # | File | Change |
|
||||
|---|------|--------|
|
||||
| M1 | `backend/app/core/sqlite_db.py` | Update `_SEED_GENERATE_PER_SUBQ` template (lines 42–53): add `==term==` instruction for figures/dates |
|
||||
| M2 | `frontend/src/utils/citationParser.ts` | (a) Change `replaceCitationPatterns()` output from `[trimmed](url)` to `[ref](url)`. (b) Add `highlightTerms(markdown: string): string` function: regex `==(.+?)==` → `<mark>$1</mark>` |
|
||||
| M2 | `frontend/src/utils/citationParser.ts` | (a) Add closure counter in `replaceCitationPatterns()` to output `[1](url)`, `[2](url)` instead of `[trimmed](url)`. (b) Add `highlightTerms(markdown: string): string` function: regex `==(.+?)==` → `<mark>$1</mark>` |
|
||||
| M3 | `frontend/src/components/ResponsePanel.tsx` | (a) Add `HighlightMark` component. (b) Add `highlightTerms()` call in `SubQuestionSection` and `FlatResponse` before ReactMarkdown. (c) Add `mark` to ReactMarkdown `components`. |
|
||||
| M4 | `frontend/src/styles.css` | Add `.prose mark { background-color: #FEF08A; border-radius: 0.125rem; padding: 0 0.125rem; }` |
|
||||
| M5 | `frontend/src/test/utils/citationParser.test.ts` | Update existing citation tests to expect `[ref](url)` output format |
|
||||
| M5 | `frontend/src/test/utils/citationParser.test.ts` | Update existing citation tests to expect sequential `[1](url)`, `[2](url)` output format |
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -177,16 +178,43 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
|
|||
- [ ] Export from `citationParser.ts` (add to existing exports)
|
||||
- [ ] **Test file**: `frontend/src/test/utils/highlightParser.test.ts`
|
||||
|
||||
### Task 7.3: Change citation output to compact `[ref]`
|
||||
### Task 7.3: Change citation output to sequential `[1] [2] [3]`
|
||||
|
||||
- [ ] In `frontend/src/utils/citationParser.ts`, `replaceCitationPatterns()` (line 125):
|
||||
- [ ] In `frontend/src/utils/citationParser.ts`, `replaceCitationPatterns()` (lines 105–131):
|
||||
- Add a `let refCounter = 0` before the `.replace()` call
|
||||
- Increment counter on each matched citation and output `[${refCounter}](${url})`:
|
||||
```typescript
|
||||
// Before:
|
||||
return `[${trimmed}](${url})`
|
||||
// After:
|
||||
return `[ref](${url})`
|
||||
function replaceCitationPatterns(
|
||||
text: string,
|
||||
lookup: Map<string, SourceMetadata>,
|
||||
highlightKeys?: Set<string>
|
||||
): string {
|
||||
const citationPattern = /(?<!!)\[([^\]]+)\](?!\()/g
|
||||
let refCounter = 0
|
||||
|
||||
return text.replace(citationPattern, (fullMatch, content: string) => {
|
||||
const trimmed = content.trim()
|
||||
const source = findSource(trimmed, lookup)
|
||||
|
||||
if (source) {
|
||||
let isReady = false
|
||||
if (highlightKeys && source.document_id && source.sub_question_text) {
|
||||
isReady = highlightKeys.has(
|
||||
`${source.document_id}_${source.chunk_index}_${encodeURIComponent(source.sub_question_text)}`
|
||||
)
|
||||
}
|
||||
const url = buildCitationUrl(source, isReady)
|
||||
if (url) {
|
||||
refCounter++
|
||||
return `[${refCounter}](${url})`
|
||||
}
|
||||
}
|
||||
|
||||
return fullMatch
|
||||
})
|
||||
}
|
||||
```
|
||||
- [ ] Update existing tests in `frontend/src/test/utils/citationParser.test.ts` to expect `[ref]` output
|
||||
- [ ] Update existing tests in `frontend/src/test/utils/citationParser.test.ts` to expect sequential `[1](url)`, `[2](url)` output
|
||||
- [ ] **Test file**: `frontend/src/test/utils/citationCompactRef.test.ts` (optional — existing tests cover this after update)
|
||||
|
||||
### Task 7.4: Wire highlighting into ResponsePanel
|
||||
|
|
@ -229,9 +257,9 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
|
|||
|
||||
- [ ] Run frontend tests: `cd frontend && pnpm test`
|
||||
- [ ] Run backend tests: `cd backend && pytest app/test/ -v` (no regressions)
|
||||
- [ ] Verify existing citation tests pass with `[ref]` output format
|
||||
- [ ] Verify existing citation tests pass with sequential `[1](url)` `[2](url)` output format
|
||||
- [ ] Verify new highlight tests pass
|
||||
- [ ] Visual manual test: ask a question with figures/dates, verify yellow highlights appear and `[ref]` links work
|
||||
- [ ] Visual manual test: ask a question with figures/dates, verify yellow highlights appear and sequential `[1] [2]` links work
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -240,8 +268,8 @@ The compact `[ref]` label is clickable and opens the same linked page as before.
|
|||
| # | Test File | Type | Coverage |
|
||||
|---|-----------|------|----------|
|
||||
| T7.2 | `frontend/src/test/utils/highlightParser.test.ts` | Unit | `highlightTerms()`: basic `==term==` → `<mark>`, multiple highlights, no false positives on `==` in code, edge cases (empty, no markers, adjacent markers) |
|
||||
| T7.3 | `frontend/src/test/utils/citationParser.test.ts` (update) | Unit | Existing 16 tests updated to expect `[ref](url)` output. Add test: multiple citations all render as `[ref]` |
|
||||
| T7.4 | `frontend/src/test/components/ResponsePanel_highlights.test.tsx` | Integration | Full `ResponsePanel` with mock answer containing `==figure==` markers and `[citation]` brackets: verifies yellow `<mark>` elements render, `[ref]` links are clickable, source cards unchanged |
|
||||
| T7.3 | `frontend/src/test/utils/citationParser.test.ts` (update) | Unit | Existing 16 tests updated to expect sequential `[1](url)`, `[2](url)` output. Add test: multiple citations render as `[1]`, `[2]`, `[3]` |
|
||||
| T7.4 | `frontend/src/test/components/ResponsePanel_highlights.test.tsx` | Integration | Full `ResponsePanel` with mock answer containing `==figure==` markers and `[citation]` brackets: verifies yellow `<mark>` elements render, `[1] [2]` links are clickable, source cards unchanged |
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -252,7 +280,7 @@ Task 7.1 (update seed template)
|
|||
│
|
||||
│ (independent — prompt template change tested implicitly)
|
||||
│
|
||||
Task 7.2 (highlightTerms function) Task 7.3 (compact [ref] output)
|
||||
Task 7.2 (highlightTerms function) Task 7.3 (sequential citation refs)
|
||||
│ │
|
||||
└──────────────┬───────────────────────┘
|
||||
│
|
||||
|
|
@ -274,15 +302,16 @@ Tasks 7.2 and 7.3 are independent and can run in parallel.
|
|||
- [ ] `highlightTerms()` correctly converts all `==term==` patterns to `<mark class="bg-yellow-200 rounded px-0.5">term</mark>`
|
||||
- [ ] Yellow highlights render in the browser for monetary amounts (e.g., `HK$1,000,000`), percentages (e.g., `35%`), and dates (e.g., `1 January 2024`)
|
||||
- [ ] Highlights do NOT appear inside code blocks or inline code
|
||||
- [ ] Highlights work correctly alongside citation links (`[ref]`)
|
||||
- [ ] Highlights work correctly alongside citation links (`[1] [2] [3]`)
|
||||
- [ ] Highlights work in both sub-question mode and flat response mode
|
||||
- [ ] No regressions in existing tests
|
||||
|
||||
### Feature 2: Compact Citations
|
||||
- [ ] All inline citations display as `[ref]` instead of `[filename.pdf, page N]`
|
||||
- [ ] `[ref]` links are clickable and navigate to the correct PDF viewer or highlight page
|
||||
- [ ] All inline citations display as sequential `[1]`, `[2]`, `[3]` instead of `[filename.pdf, page N]`
|
||||
- [ ] Sequential numbers increment correctly per answer section (reset per sub-question section)
|
||||
- [ ] `[1]` `[2]` links are clickable and navigate to the correct PDF viewer or highlight page
|
||||
- [ ] Source cards below each section still show full filename, page, date, and summary
|
||||
- [ ] Existing citation tests pass with updated `[ref]` output format
|
||||
- [ ] Existing citation tests pass with updated sequential `[1](url)` output format
|
||||
- [ ] No regressions in existing tests
|
||||
|
||||
---
|
||||
|
|
@ -297,7 +326,7 @@ Tasks 7.2 and 7.3 are independent and can run in parallel.
|
|||
5. `highlightTerms()` function can remain in `citationParser.ts` (no harm)
|
||||
|
||||
### Feature 2 (Compact Citations):
|
||||
1. Revert `citationParser.ts` line 125 from `[ref](${url})` back to `[${trimmed}](${url})`
|
||||
1. Revert `citationParser.ts` line 125 from `[${refCounter}](${url})` back to `[${trimmed}](${url})` and remove the counter
|
||||
2. Update test expectations back to full citation text
|
||||
|
||||
Both features are independent — can roll back one without affecting the other.
|
||||
|
|
@ -310,7 +339,7 @@ Both features are independent — can roll back one without affecting the other.
|
|||
- ❌ Do NOT change the SSE event schema (no new fields in `completed` event)
|
||||
- ❌ Do NOT change the citation URL routing logic (`buildCitationUrl()` stays as-is)
|
||||
- ❌ Do NOT modify source cards (`SubQuestionSourceCard`) — they still show full details
|
||||
- ❌ Do NOT add tooltips or popovers on `[ref]` links (future enhancement)
|
||||
- ❌ Do NOT add tooltips or popovers on `[N]` links (future enhancement)
|
||||
- ❌ Do NOT add per-term highlight metadata (type: figure vs date, color coding)
|
||||
- ❌ Do NOT add configuration UI for highlight colors
|
||||
- ❌ Do NOT modify the non-sub-question fallback `generate_response()` (legacy flat mode — highlight markers work from prompt template alone)
|
||||
|
|
|
|||
Loading…
Reference in New Issue