25 KiB
Package 2 Enhancement Plan
Source: User request (2026-04-24)
Scope: LTT page UX improvements + PDF preview + inline citations
Status: 🔄 In Progress (2.1 ✅, 2.2 ✅, 2.3 ✅, 2.4 ✅)
Objective
Enhance the LTT (Legislative Transcript Tracker) page and RAG Database page with 6 features, prioritized by difficulty (easiest first):
- Remove Upload from LTT — Remove IngestPanel from LTT page (upload lives on RAG Database page)
- Display Question Near Submit — Show the submitted question next to the submit button after clicking
- Extracted Questions — Rename "Keywords" to "Extracted Questions"; decompose into simplified sub-questions instead of keywords
- Adjustable Split Panel — Make the upper/lower LTT layout resizable by dragging a divider
- PDF Viewer — Preview PDFs in a new browser tab on both LTT and RAG Database pages instead of downloading
- Inline Citations — Replace
[1]citations with[filename, page N]format with clickable links to source PDF
Current State
What Exists
- LTT page at
/with CSS Grid layout:grid-rows-[30%_1fr] grid-cols-2- Top-Left: Video placeholder (Phase 2)
- Top-Right: QueryInput → KeywordsDisplay → IngestPanel (vertical stack)
- Bottom (full width): ResponsePanel (answer + sources)
- RAG Database page at
/rag-databasewith document management + upload - QueryDecomposer extracts keywords (short phrases) from question
- RAG prompt generates
[1],[2]style citations in context chunks - LLM instructed: "Cite the source name in for each point"
- Frontend renders citations as plain text via ReactMarkdown — NO linking
- Source cards in ResponsePanel show filename, page number, "View PDF" link (opens in new tab = download)
- ChunkList on RAG Database page has "View PDF" links (also opens in new tab)
What's Missing (Gaps This Plan Fills)
- Upload button still on LTT page (redundant with RAG Database page)
- No display of the submitted question after clicking submit
- Keywords are not useful — user wants decomposed sub-questions
- LTT layout is fixed (30%/70% split) — not adjustable
- PDF links download the file instead of previewing in browser
- Citations like
[1]are meaningless to users — no traceability to source
Sub-Phase Breakdown
| Sub-Phase | Feature | Difficulty | Backend | Frontend | Status |
|---|---|---|---|---|---|
| 2.1 | Remove Upload from LTT | ⭐ Easy | None | Remove IngestPanel from LTTPage | ✅ Done |
| 2.2 | Display Question Near Submit | ⭐ Easy | None | Show submitted question text | ✅ Done |
| 2.3 | Extracted Questions | ⭐⭐ Medium | Prompt change | Rename + display questions | ✅ Done |
| 2.4 | Adjustable Split Panel | ⭐⭐ Medium | None | react-resizable-panels | ✅ Done |
| 2.5 | PDF Viewer (New Tab) | ⭐⭐ Medium | None | PdfViewerPage route + react-pdf | 📋 Pending |
| 2.6 | Inline Citations | ⭐⭐⭐ Hard | Prompt + response | Citation parser + rendering | 📋 Pending |
Dependency Graph
2.1 (Remove Upload) ─────┐
2.2 (Question Display) ──┤
├─► 2.4 (Resizable Layout) ─┐
2.3 (Questions) ──────────┤ ├─► 2.6 (Inline Citations)
│ │
└─► 2.5 (PDF Viewer) ───────┘
- 2.1, 2.2, 2.3 are independent — can run in parallel
- 2.4 should wait for 2.1 (layout changes after removing IngestPanel)
- 2.5 is independent but benefits from 2.4's layout work
- 2.6 is the hardest — benefits from 2.5 (PDF viewer) being available for linked citations
Sub-Phase 2.1: Remove Upload from LTT ⭐ Easy
Objective
Remove the IngestPanel from LTT page since document upload now exists on the RAG Database page.
Changes Required
Frontend only — No backend changes.
| File | Change |
|---|---|
frontend/src/pages/LTTPage.tsx |
Remove IngestPanel import and component; remove useIngestDocument hook |
Current (LTTPage.tsx):
import { IngestPanel } from '../components/IngestPanel'
// ...
const ingestMutation = useIngestDocument()
// ...
<IngestPanel
onUpload={handleFileUpload}
isLoading={ingestMutation.isPending}
success={...}
error={...}
/>
After: Remove all of the above. The top-right section becomes just QueryInput + KeywordsDisplay.
Acceptance Criteria
- LTT page no longer shows upload button
- RAG Database page upload still works
- All existing tests pass (update LTTPage tests if needed)
Sub-Phase 2.2: Display Question Near Submit ⭐ Easy
Objective
After the user clicks submit, display the original question text next to/near the submit button so they remember what they asked.
Changes Required
Frontend only — No backend changes.
| File | Change |
|---|---|
frontend/src/components/QueryInput.tsx |
Display the last submitted question below the input after submission |
Implementation approach:
- After successful submit, show the submitted question as static text below the input area
- Clear when a new question is being typed
- Style: subtle gray text, italic, with a small label like "Your question:"
┌──────────────────────────────────────────────┐
│ [textarea: type question here] [Submit] │
│ │
│ Your question: "What is the NEC4 clause │
│ about time extensions?" │
└──────────────────────────────────────────────┘
Acceptance Criteria
- After submit, the submitted question appears below the input
- Text clears when user starts typing a new question
- Works correctly during loading state
- No layout shift or jank
Sub-Phase 2.3: Extracted Questions ⭐⭐ Medium
Objective
Rename "Keywords" to "Extracted Questions" and change the backend to decompose the user's question into simplified sub-questions instead of keyword phrases.
Current Behavior
- Backend
QueryDecomposer.decompose()prompt:"Given question: '{question}', extract key search keywords as JSON array" - Returns:
["NEC4", "time extension", "clause"] - Frontend:
KeywordsDisplaycomponent shows blue pills labeled "Extracted Keywords:"
New Behavior
- Backend prompt: Decompose into 2-5 simplified sub-questions
- Returns:
["What are the time extension provisions?", "What notice is required?", "How is extended time calculated?"] - Frontend: Rename to "Extracted Questions:" and display as numbered list
Backend Changes
| File | Change |
|---|---|
backend/app/services/query_decomposer.py |
Change prompt to generate sub-questions instead of keywords |
backend/app/models/query.py |
Rename keywords field to extracted_questions (or keep keywords but add alias) |
backend/app/routers/query.py |
Update variable naming if model changes |
New prompt (replace line 54 in query_decomposer.py):
prompt = (
f"Given this question: '{question}'\n\n"
f"Break it down into 2-5 simplified sub-questions that would help "
f"search for relevant information. Each sub-question should be short "
f"and focused on one aspect. Return as a JSON array of strings."
)
Decision confirmed: Rename API field from keywords to extracted_questions (Decision 1B).
Backend Changes
| File | Change |
|---|---|
backend/app/services/query_decomposer.py |
Change prompt to generate sub-questions instead of keywords |
backend/app/models/query.py |
Rename keywords: List[str] to extracted_questions: List[str] |
backend/app/routers/query.py |
Update QueryResponse usage: keywords → extracted_questions |
Frontend Changes
| File | Change |
|---|---|
frontend/src/components/KeywordsDisplay.tsx |
Rename to ExtractedQuestionsDisplay.tsx, change pill style to numbered list style |
frontend/src/lib/api.ts |
Update QueryResponse type: keywords → extracted_questions |
frontend/src/lib/queries.tsx |
Update QueryResponse type reference |
frontend/src/types/index.ts |
Rename keywords to extracted_questions in QueryResponse |
frontend/src/pages/LTTPage.tsx |
Update prop: keywords → extracted_questions |
Rename:
- "Extracted Keywords:" → "Extracted Questions:"
data-testid="keywords-*"→data-testid="extracted-questions-*"- Pill badges → numbered question list with subtle styling
Acceptance Criteria
- Backend returns 2-5 sub-questions instead of keywords
- Frontend displays "Extracted Questions:" label
- Questions display as a numbered list (1. 2. 3.)
- Graceful fallback if LLM returns empty list
- Existing query pipeline still works (retrieve uses these as search terms)
Sub-Phase 2.4: Adjustable Split Panel ⭐⭐ Medium
Objective
Replace the fixed CSS Grid layout on LTT page with a resizable split panel, allowing users to drag the divider between the upper section (video + query) and lower section (response).
New Dependency
| Package | Purpose | Weekly Downloads | Notes |
|---|---|---|---|
react-resizable-panels |
Resizable split panels | ~34.7M | By Brian Vaughn (React core team), zero deps, MIT |
npm install react-resizable-panels
Changes Required
Frontend only — No backend changes.
| File | Change |
|---|---|
frontend/src/pages/LTTPage.tsx |
Replace CSS Grid with PanelGroup + Panel + PanelResizeHandle |
frontend/package.json |
Add react-resizable-panels |
Current layout (LTTPage.tsx):
<div className="h-full grid grid-rows-[30%_1fr] grid-cols-2 bg-gray-50">
<div className="border-r border-b ...">VideoPlaceholder</div>
<div className="border-b ...">QueryInput + KeywordsDisplay + IngestPanel</div>
<div className="col-span-2 ...">ResponsePanel</div>
</div>
New layout:
import { Panel, PanelGroup, PanelResizeHandle } from 'react-resizable-panels'
<div className="h-full flex flex-col bg-gray-50">
<PanelGroup direction="vertical">
{/* Upper section: video + query */}
<Panel defaultSize={30} minSize={15} maxSize={60}>
<div className="h-full grid grid-cols-2">
<div className="border-r ...">VideoPlaceholder</div>
<div className="...">QueryInput + KeywordsDisplay</div>
</div>
</Panel>
{/* Draggable divider */}
<PanelResizeHandle className="h-2 bg-gray-200 hover:bg-blue-300 cursor-row-resize transition-colors" />
{/* Lower section: response */}
<Panel minSize={20}>
<div className="h-full overflow-y-auto">
<ResponsePanel ... />
</div>
</Panel>
</PanelGroup>
</div>
Design decisions:
- Default split: 30% upper / 70% lower (same as current fixed layout)
- Min/Max: Upper panel 15-60%, lower panel 20%+ (prevents collapsing)
- Handle style: 8px horizontal bar, gray by default, blue on hover, cursor row-resize
- Persistence: Optionally save split ratio to localStorage (nice-to-have)
Acceptance Criteria
- LTT page has a draggable divider between upper and lower sections
- Default split matches current 30/70 layout
- Divider is visually obvious and provides cursor feedback on hover
- Both sections enforce minimum sizes (no collapsing)
- Content in each section scrolls independently
- Layout works on window resize
Sub-Phase 2.5: PDF Viewer in New Tab ⭐⭐ Medium
Objective
Preview PDF files in-browser via a dedicated viewer page that opens in a new browser tab, on both the LTT page (source cards) and RAG Database page (chunk list).
Design Decision (Confirmed)
PDF viewer is a dedicated route (/pdf-viewer) that opens in a new browser tab. This avoids modal state management and provides a clean full-page PDF viewing experience. All "View PDF" links and inline citations open this page via target="_blank".
New Dependencies
| Package | Purpose | Weekly Downloads | Notes |
|---|---|---|---|
react-pdf |
PDF rendering via PDF.js | ~1M | MIT, lightweight (309KB), Vite compatible |
pdfjs-dist |
PDF.js worker | (peer dep) | Required by react-pdf |
npm install react-pdf pdfjs-dist
Changes Required
Frontend only — No backend changes (the existing GET /chunks/{file_path}/pdf endpoint serves PDFs).
| File | Change |
|---|---|
frontend/src/pages/PdfViewerPage.tsx |
NEW — Dedicated full-page PDF viewer route |
frontend/src/App.tsx |
Add /pdf-viewer route |
frontend/src/components/ResponsePanel.tsx |
"View PDF" links point to /pdf-viewer?url=...&page=N with target="_blank" |
frontend/src/components/ChunkList.tsx |
Same as ResponsePanel |
frontend/src/lib/api.ts |
Add getPdfViewerUrl() helper |
frontend/vite.config.ts |
Configure PDF.js worker |
frontend/package.json |
Add react-pdf, pdfjs-dist |
PdfViewerPage design:
┌──────────────────────────────────────────────────────┐
│ ◀ Back to LTT NEC4 ACC — Page 3 of 97 ▶ │
├──────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ │
│ │ │ │
│ │ PDF rendered │ │
│ │ page content │ │
│ │ │ │
│ │ │ │
│ └─────────────────────┘ │
│ │
│ [Zoom -] 100% [Zoom +] │
└──────────────────────────────────────────────────────┘
Query params: ?url=<encoded PDF URL>&page=<page number>&title=<filename>
Implementation:
- Standalone page at
/pdf-viewerroute (not embedded in main layout) - Uses
react-pdf<Document>+<Page>components - PDF URL from query param, validated against backend endpoint
- Page navigation controls (prev/next) with page number display
- Zoom controls (+/-)
- Title shows filename from query param
- "Back to LTT" link to return to the app
- No NavBar on this page (clean viewer experience)
Integration points:
ResponsePanel.tsx: ChangehreffromgetChunkPdfUrl()togetPdfViewerUrl()withtarget="_blank"ChunkList.tsx: Same change- Inline citations (Sub-Phase 2.6): Same URL pattern
Acceptance Criteria
- Clicking "View PDF" opens a new browser tab with the PDF viewer page
- PDF renders correctly with original formatting preserved
- Page navigation (prev/next) works for multi-page PDFs
- Zoom controls work
- Page title shows filename and page number
- "Back to LTT" link returns to the app
- Works from both LTT and RAG Database pages
- URL query params properly encode/decode special characters
- PDF.js worker loads correctly in Vite build
Sub-Phase 2.6: Inline Citations with Clickable Links ⭐⭐⭐ Hard
Objective
Replace the current [1], [2] citation format in RAG responses with [filename, page N] format that includes a clickable link to the source chunk PDF.
Current Citation Flow
- Backend (
rag.py): Context chunks labeled[1],[2],[3]... - LLM prompt: "Cite the source name in for each point"
- LLM output: Answer text contains
[1],[NEC4 ACC], or similar bracketed citations - Frontend: ReactMarkdown renders citations as plain text — no linking
New Citation Flow
- Backend: Context chunks labeled
[filename, page N]with strict format - LLM prompt: "Cite sources as
[filename, page N]" (strict — Decision 3A) - LLM output: Answer text contains
[NEC4 ACC.pdf, page 3]format citations - Frontend: Custom ReactMarkdown renderer parses strict citation patterns, replaces with clickable links
- Click: Opens PDF viewer page in new tab (Decision 4B) at the cited page
Backend Changes
| File | Change |
|---|---|
backend/app/services/rag.py |
Change context chunk labeling and citation instruction in prompt |
backend/app/models/common.py |
Add chunk_index mapping info if needed |
backend/app/models/query.py |
(Possibly) Add citation map to response |
rag.py — Context building change (lines 94-102):
Current:
context_parts.append(
f"[{i + 1}] Source: {source}\n"
f"Summary: {summary}\n"
f"Content: {chunk}\n"
)
New:
source_name = meta.get("filename", "unknown")
page_num = meta.get("page_number")
citation_label = f"{source_name}, page {page_num}" if page_num else source_name
context_parts.append(
f"[{citation_label}] Source: {source_name}\n"
f"Summary: {summary}\n"
f"Content: {chunk}\n"
)
rag.py — Prompt change (lines 106-113):
Current:
f"Cite the source name in [ ] for each point.\n\n"
New:
f"Cite your sources inline using the exact bracket labels provided, "
f"e.g. [filename, page N]. Place the citation at the end of each relevant point.\n\n"
Important design decision: We need to pass citation metadata (filename → chunk_file_path mapping) to the frontend so it can create clickable links. Two approaches:
- Option A: Include
chunk_file_pathin the source metadata (already done!) — frontend parses[filename, page N]from answer text, looks up matching source insources[]array to getchunk_file_path - Option B: Add a
citation_map: Dict[str, CitationInfo]toQueryResponse— more structured but adds complexity
Recommendation: Option A — simpler, uses existing sources[] data. Frontend regex matches [filename, page N] patterns in answer text and cross-references with sources array by filename + page_number. Links open in new tab to PDF viewer page (Decision 4B).
Frontend Changes
| File | Change |
|---|---|
frontend/src/components/ResponsePanel.tsx |
Custom markdown renderer that replaces citation patterns with clickable links |
frontend/src/utils/citationParser.ts |
NEW — Regex parser for citation patterns |
Citation parser (citationParser.ts):
// Matches patterns like [NEC4 ACC.pdf, page 3] or [meeting_notes.docx]
const CITATION_PATTERN = /\[([^\]]+?(?:,\s*page\s+\d+)?)\]/g
interface ParsedCitation {
label: string // "NEC4 ACC.pdf, page 3"
filename: string // "NEC4 ACC.pdf"
pageNumber: number | null // 3
chunkFilePath: string | null // from sources lookup
}
Rendering approach:
- Use ReactMarkdown's
componentsprop with a custom text renderer - Intercept text nodes, split on citation pattern (strict format — Decision 3A)
- Replace
[filename, page N]with<a>elements opening PDF viewer in new tab - URL format:
/pdf-viewer?url={chunkPdfUrl}&page={N}&title={filename} - Fallback: If no matching source found, render as plain text (graceful degradation)
Example rendered answer:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• The estimated value threshold is HK$1,000,000
[NEC4 ACC.pdf, page 3] ←── clickable link (opens new tab)
• Prior instructions from the Client may override
this threshold [NEC4 ACC.pdf, page 5] ←── clickable
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Acceptance Criteria
- Backend context labels use
[filename, page N]format instead of[1] - LLM prompt instructs to use the new citation format
- Frontend parses
[filename, page N]patterns from answer text - Parsed citations are rendered as clickable links (blue, underlined)
- Clicking a citation opens the PDF viewer page in a new browser tab at the cited page
- If citation can't be matched to a source, renders as plain text (no broken links)
- The Sources section below the answer still shows full source cards
- Citation pattern handles: spaces in filenames, missing page numbers (DOCX), long filenames
Risks & Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| LLM doesn't follow citation format exactly | Citations not parsed | Use flexible regex; fallback to plain text for unrecognized patterns |
| LLM invents citations not in sources | Clickable link to non-existent source | Only linkify citations that match an actual source; show others as text |
| Filenames with special characters break regex | Citation not detected | Use lazy regex matching; test with real filenames (spaces, dots, hyphens) |
| Long filenames make answer text hard to read | Poor UX | Optionally truncate filename in display (e.g., "NEC4 AC...pdf, page 3") |
| Page number is null (DOCX files) | Citation shows just filename | Show [meeting_notes.docx] without page number; still clickable for text preview |
New Dependencies
Frontend
| Package | Purpose | Version | Size |
|---|---|---|---|
react-resizable-panels |
Resizable split panels (2.4) | ^4.10.0 |
~517KB |
react-pdf |
PDF rendering via PDF.js (2.5) | ^10.4.1 |
~309KB |
pdfjs-dist |
PDF.js worker (peer dep of react-pdf) | ^5.3.31 |
— |
Backend
| Package | Purpose | Already installed? |
|---|---|---|
| (none) | All backend changes use existing LLM client | ✅ |
Implementation Sequence
Recommended Order (difficulty-first)
Phase 2.1 (Remove Upload) ─┐
Phase 2.2 (Question Display) ─┤── Can run in PARALLEL
Phase 2.3 (Questions) ─┘
│
▼
Phase 2.4 (Resizable Layout) ─── After 2.1 (layout changes)
│
▼
Phase 2.5 (PDF Viewer) ───────── After 2.4 (uses new layout)
│
▼
Phase 2.6 (Inline Citations) ─── After 2.5 (links to PDF viewer)
Parallelization Opportunities
- 2.1, 2.2, 2.3 can run in parallel (independent frontend/backend changes)
- 2.4 depends on 2.1 (layout changes affect same file)
- 2.5 depends on 2.4 (viewer page within new layout)
- 2.6 depends on 2.5 (citations link to PDF viewer page)
Test Plan
Backend Tests (New/Modified Files)
| File | Coverage |
|---|---|
test_phase1_query_decomposer.py |
Update: verify sub-question generation instead of keywords |
Frontend Tests (New/Modified Files)
| File | Coverage |
|---|---|
QueryInput.test.tsx |
Submitted question display, clear on new input |
KeywordsDisplay.test.tsx |
Update: "Extracted Questions" label, numbered list rendering |
LTTPage.test.tsx |
Update: no IngestPanel, resizable layout |
PdfViewerPage.test.tsx |
NEW — PDF rendering, page nav, zoom, query params |
citationParser.test.ts |
NEW — Citation regex parsing, edge cases (spaces, nulls, special chars) |
ResponsePanel.test.tsx |
Update: inline citation rendering, clickable links |
Acceptance Tests
| File | Coverage |
|---|---|
test_acceptance_package2_questions.py |
Real LLM generates sub-questions from complex question |
test_acceptance_package2_citations.py |
Full flow: upload → query → answer has [filename, page N] citations |
Decisions (Confirmed)
| # | Question | Decision |
|---|---|---|
| 1 | API field name for extracted questions | B: Rename to extracted_questions — cleaner naming, breaking change acceptable |
| 2 | PDF viewer integration | New browser tab — Dedicated viewer page opens in a new tab with react-pdf rendering |
| 3 | Citation pattern strictness | A: Strict [filename, page N] — enforce exact format in backend prompt and frontend parser |
| 4 | Citation link target | B: Open new tab — same dedicated PDF viewer page as "View PDF" links |
| 5 | Resizable panels persistence | B: Always default 30/70 — no localStorage, keep simple |
| 6 | Citation display format in answer | A: Full [filename, page N] — most traceable for users |
Decision 2 & 4 Implications
PDF viewer is a dedicated route page (/pdf-viewer) that opens in a new browser tab. This means:
- No modal component or state management needed
- No context/callback props threaded through components
- All PDF links (View PDF, inline citations) use
target="_blank"pointing to the viewer page - Viewer page receives PDF URL, page number, and title via query params
react-pdfrenders directly on this standalone page
Open Questions
None — all decisions confirmed.