# Package 2 Enhancement Plan
**Source**: User request (2026-04-24)
**Scope**: LTT page UX improvements + PDF preview + inline citations
**Status**: 🔄 In Progress (2.1 ✅, 2.2 ✅, 2.3 ✅, 2.4 ✅, 2.5 ✅)
---
## Objective
Enhance the LTT (Legislative Transcript Tracker) page and RAG Database page with 6 features, prioritized by difficulty (easiest first):
1. **Remove Upload from LTT** — Remove IngestPanel from LTT page (upload lives on RAG Database page)
2. **Display Question Near Submit** — Show the submitted question next to the submit button after clicking
3. **Extracted Questions** — Rename "Keywords" to "Extracted Questions"; decompose into simplified sub-questions instead of keywords
4. **Adjustable Split Panel** — Make the upper/lower LTT layout resizable by dragging a divider
5. **PDF Viewer** — Preview PDFs in a new browser tab on both LTT and RAG Database pages instead of downloading
6. **Inline Citations** — Replace `[1]` citations with `[filename, page N]` format with clickable links to source PDF
---
## Current State
### What Exists
- LTT page at `/` with CSS Grid layout: `grid-rows-[30%_1fr] grid-cols-2`
- Top-Left: Video placeholder (Phase 2)
- Top-Right: QueryInput → KeywordsDisplay → IngestPanel (vertical stack)
- Bottom (full width): ResponsePanel (answer + sources)
- RAG Database page at `/rag-database` with document management + upload
- QueryDecomposer extracts keywords (short phrases) from question
- RAG prompt generates `[1]`, `[2]` style citations in context chunks
- LLM instructed: "Cite the source name in [ ] for each point"
- Frontend renders citations as plain text via ReactMarkdown — NO linking
- Source cards in ResponsePanel show filename, page number, "View PDF" link (opens in new tab = download)
- ChunkList on RAG Database page has "View PDF" links (also opens in new tab)
### What's Missing (Gaps This Plan Fills)
- Upload button still on LTT page (redundant with RAG Database page)
- No display of the submitted question after clicking submit
- Keywords are not useful — user wants decomposed sub-questions
- LTT layout is fixed (30%/70% split) — not adjustable
- PDF links download the file instead of previewing in browser
- Citations like `[1]` are meaningless to users — no traceability to source
---
## Sub-Phase Breakdown
| Sub-Phase | Feature | Difficulty | Backend | Frontend | Status |
|-----------|---------|-----------|---------|----------|--------|
| 2.1 | Remove Upload from LTT | ⭐ Easy | None | Remove IngestPanel from LTTPage | ✅ Done |
| 2.2 | Display Question Near Submit | ⭐ Easy | None | Show submitted question text | ✅ Done |
| 2.3 | Extracted Questions | ⭐⭐ Medium | Prompt change | Rename + display questions | ✅ Done |
| 2.4 | Adjustable Split Panel | ⭐⭐ Medium | None | react-resizable-panels | ✅ Done |
| 2.5 | PDF Viewer (New Tab) | ⭐⭐ Medium | None | PdfViewerPage route + react-pdf | ✅ Done |
| 2.6 | Inline Citations | ⭐⭐⭐ Hard | Prompt + response | Citation parser + rendering | 📋 Pending |
### Dependency Graph
```
2.1 (Remove Upload) ─────┐
2.2 (Question Display) ──┤
├─► 2.4 (Resizable Layout) ─┐
2.3 (Questions) ──────────┤ ├─► 2.6 (Inline Citations)
│ │
└─► 2.5 (PDF Viewer) ───────┘
```
- **2.1, 2.2, 2.3** are independent — can run in parallel
- **2.4** should wait for 2.1 (layout changes after removing IngestPanel)
- **2.5** is independent but benefits from 2.4's layout work
- **2.6** is the hardest — benefits from 2.5 (PDF viewer) being available for linked citations
---
## Sub-Phase 2.1: Remove Upload from LTT ⭐ Easy
### Objective
Remove the IngestPanel from LTT page since document upload now exists on the RAG Database page.
### Changes Required
**Frontend only** — No backend changes.
| File | Change |
|------|--------|
| `frontend/src/pages/LTTPage.tsx` | Remove IngestPanel import and component; remove `useIngestDocument` hook |
**Current** (LTTPage.tsx):
```tsx
import { IngestPanel } from '../components/IngestPanel'
// ...
const ingestMutation = useIngestDocument()
// ...
```
**After**: Remove all of the above. The top-right section becomes just QueryInput + KeywordsDisplay.
### Acceptance Criteria
- [ ] LTT page no longer shows upload button
- [ ] RAG Database page upload still works
- [ ] All existing tests pass (update LTTPage tests if needed)
---
## Sub-Phase 2.2: Display Question Near Submit ⭐ Easy
### Objective
After the user clicks submit, display the original question text next to/near the submit button so they remember what they asked.
### Changes Required
**Frontend only** — No backend changes.
| File | Change |
|------|--------|
| `frontend/src/components/QueryInput.tsx` | Display the last submitted question below the input after submission |
**Implementation approach**:
- After successful submit, show the submitted question as static text below the input area
- Clear when a new question is being typed
- Style: subtle gray text, italic, with a small label like "Your question:"
```
┌──────────────────────────────────────────────┐
│ [textarea: type question here] [Submit] │
│ │
│ Your question: "What is the NEC4 clause │
│ about time extensions?" │
└──────────────────────────────────────────────┘
```
### Acceptance Criteria
- [ ] After submit, the submitted question appears below the input
- [ ] Text clears when user starts typing a new question
- [ ] Works correctly during loading state
- [ ] No layout shift or jank
---
## Sub-Phase 2.3: Extracted Questions ⭐⭐ Medium
### Objective
Rename "Keywords" to "Extracted Questions" and change the backend to decompose the user's question into simplified sub-questions instead of keyword phrases.
### Current Behavior
- Backend `QueryDecomposer.decompose()` prompt: `"Given question: '{question}', extract key search keywords as JSON array"`
- Returns: `["NEC4", "time extension", "clause"]`
- Frontend: `KeywordsDisplay` component shows blue pills labeled "Extracted Keywords:"
### New Behavior
- Backend prompt: Decompose into 2-5 simplified sub-questions
- Returns: `["What are the time extension provisions?", "What notice is required?", "How is extended time calculated?"]`
- Frontend: Rename to "Extracted Questions:" and display as numbered list
### Backend Changes
| File | Change |
|------|--------|
| `backend/app/services/query_decomposer.py` | Change prompt to generate sub-questions instead of keywords |
| `backend/app/models/query.py` | Rename `keywords` field to `extracted_questions` (or keep `keywords` but add alias) |
| `backend/app/routers/query.py` | Update variable naming if model changes |
**New prompt** (replace line 54 in query_decomposer.py):
```python
prompt = (
f"Given this question: '{question}'\n\n"
f"Break it down into 2-5 simplified sub-questions that would help "
f"search for relevant information. Each sub-question should be short "
f"and focused on one aspect. Return as a JSON array of strings."
)
```
**Decision confirmed**: Rename API field from `keywords` to `extracted_questions` (Decision 1B).
### Backend Changes
| File | Change |
|------|--------|
| `backend/app/services/query_decomposer.py` | Change prompt to generate sub-questions instead of keywords |
| `backend/app/models/query.py` | Rename `keywords: List[str]` to `extracted_questions: List[str]` |
| `backend/app/routers/query.py` | Update `QueryResponse` usage: `keywords` → `extracted_questions` |
### Frontend Changes
| File | Change |
|------|--------|
| `frontend/src/components/KeywordsDisplay.tsx` | Rename to `ExtractedQuestionsDisplay.tsx`, change pill style to numbered list style |
| `frontend/src/lib/api.ts` | Update `QueryResponse` type: `keywords` → `extracted_questions` |
| `frontend/src/lib/queries.tsx` | Update `QueryResponse` type reference |
| `frontend/src/types/index.ts` | Rename `keywords` to `extracted_questions` in `QueryResponse` |
| `frontend/src/pages/LTTPage.tsx` | Update prop: `keywords` → `extracted_questions` |
**Rename**:
- "Extracted Keywords:" → "Extracted Questions:"
- `data-testid="keywords-*"` → `data-testid="extracted-questions-*"`
- Pill badges → numbered question list with subtle styling
### Acceptance Criteria
- [ ] Backend returns 2-5 sub-questions instead of keywords
- [ ] Frontend displays "Extracted Questions:" label
- [ ] Questions display as a numbered list (1. 2. 3.)
- [ ] Graceful fallback if LLM returns empty list
- [ ] Existing query pipeline still works (retrieve uses these as search terms)
---
## Sub-Phase 2.4: Adjustable Split Panel ⭐⭐ Medium
### Objective
Replace the fixed CSS Grid layout on LTT page with a resizable split panel, allowing users to drag the divider between the upper section (video + query) and lower section (response).
### New Dependency
| Package | Purpose | Weekly Downloads | Notes |
|---------|---------|------------------|-------|
| `react-resizable-panels` | Resizable split panels | ~34.7M | By Brian Vaughn (React core team), zero deps, MIT |
```bash
npm install react-resizable-panels
```
### Changes Required
**Frontend only** — No backend changes.
| File | Change |
|------|--------|
| `frontend/src/pages/LTTPage.tsx` | Replace CSS Grid with `PanelGroup` + `Panel` + `PanelResizeHandle` |
| `frontend/package.json` | Add `react-resizable-panels` |
**Current layout** (LTTPage.tsx):
```tsx
VideoPlaceholder
QueryInput + KeywordsDisplay + IngestPanel
ResponsePanel
```
**New layout**:
```tsx
import { Panel, PanelGroup, PanelResizeHandle } from 'react-resizable-panels'
{/* Upper section: video + query */}
VideoPlaceholder
QueryInput + KeywordsDisplay
{/* Draggable divider */}
{/* Lower section: response */}
```
**Design decisions**:
- **Default split**: 30% upper / 70% lower (same as current fixed layout)
- **Min/Max**: Upper panel 15-60%, lower panel 20%+ (prevents collapsing)
- **Handle style**: 8px horizontal bar, gray by default, blue on hover, cursor row-resize
- **Persistence**: Optionally save split ratio to localStorage (nice-to-have)
### Acceptance Criteria
- [ ] LTT page has a draggable divider between upper and lower sections
- [ ] Default split matches current 30/70 layout
- [ ] Divider is visually obvious and provides cursor feedback on hover
- [ ] Both sections enforce minimum sizes (no collapsing)
- [ ] Content in each section scrolls independently
- [ ] Layout works on window resize
---
## Sub-Phase 2.5: PDF Viewer in New Tab ⭐⭐ Medium
### Objective
Preview PDF files in-browser via a dedicated viewer page that opens in a new browser tab, on both the LTT page (source cards) and RAG Database page (chunk list).
### Design Decision (Confirmed)
PDF viewer is a **dedicated route** (`/pdf-viewer`) that opens in a new browser tab. This avoids modal state management and provides a clean full-page PDF viewing experience. All "View PDF" links and inline citations open this page via `target="_blank"`.
### New Dependencies
| Package | Purpose | Weekly Downloads | Notes |
|---------|---------|------------------|-------|
| `react-pdf` | PDF rendering via PDF.js | ~1M | MIT, lightweight (309KB), Vite compatible |
| `pdfjs-dist` | PDF.js worker | (peer dep) | Required by react-pdf |
```bash
npm install react-pdf pdfjs-dist
```
### Changes Required
**Frontend only** — No backend changes (the existing `GET /chunks/{file_path}/pdf` endpoint serves PDFs).
| File | Change |
|------|--------|
| `frontend/src/pages/PdfViewerPage.tsx` | **NEW** — Dedicated full-page PDF viewer route |
| `frontend/src/App.tsx` | Add `/pdf-viewer` route |
| `frontend/src/components/ResponsePanel.tsx` | "View PDF" links point to `/pdf-viewer?url=...&page=N` with `target="_blank"` |
| `frontend/src/components/ChunkList.tsx` | Same as ResponsePanel |
| `frontend/src/lib/api.ts` | Add `getPdfViewerUrl()` helper |
| `frontend/vite.config.ts` | Configure PDF.js worker |
| `frontend/package.json` | Add react-pdf, pdfjs-dist |
**PdfViewerPage design**:
```
┌──────────────────────────────────────────────────────┐
│ ◀ Back to LTT NEC4 ACC — Page 3 of 97 ▶ │
├──────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ │
│ │ │ │
│ │ PDF rendered │ │
│ │ page content │ │
│ │ │ │
│ │ │ │
│ └─────────────────────┘ │
│ │
│ [Zoom -] 100% [Zoom +] │
└──────────────────────────────────────────────────────┘
```
**Query params**: `?url=&page=&title=`
**Implementation**:
- Standalone page at `/pdf-viewer` route (not embedded in main layout)
- Uses `react-pdf` `` + `` components
- PDF URL from query param, validated against backend endpoint
- Page navigation controls (prev/next) with page number display
- Zoom controls (+/-)
- Title shows filename from query param
- "Back to LTT" link to return to the app
- No NavBar on this page (clean viewer experience)
**Integration points**:
- `ResponsePanel.tsx`: Change `href` from `getChunkPdfUrl()` to `getPdfViewerUrl()` with `target="_blank"`
- `ChunkList.tsx`: Same change
- Inline citations (Sub-Phase 2.6): Same URL pattern
### Acceptance Criteria
- [ ] Clicking "View PDF" opens a new browser tab with the PDF viewer page
- [ ] PDF renders correctly with original formatting preserved
- [ ] Page navigation (prev/next) works for multi-page PDFs
- [ ] Zoom controls work
- [ ] Page title shows filename and page number
- [ ] "Back to LTT" link returns to the app
- [ ] Works from both LTT and RAG Database pages
- [ ] URL query params properly encode/decode special characters
- [ ] PDF.js worker loads correctly in Vite build
---
## Sub-Phase 2.6: Inline Citations with Clickable Links ⭐⭐⭐ Hard
### Objective
Replace the current `[1]`, `[2]` citation format in RAG responses with `[filename, page N]` format that includes a clickable link to the source chunk PDF.
### Current Citation Flow
1. **Backend** (`rag.py`): Context chunks labeled `[1]`, `[2]`, `[3]`...
2. **LLM prompt**: "Cite the source name in [ ] for each point"
3. **LLM output**: Answer text contains `[1]`, `[NEC4 ACC]`, or similar bracketed citations
4. **Frontend**: ReactMarkdown renders citations as plain text — no linking
### New Citation Flow
1. **Backend**: Context chunks labeled `[filename, page N]` with strict format
2. **LLM prompt**: "Cite sources as `[filename, page N]`" (strict — Decision 3A)
3. **LLM output**: Answer text contains `[NEC4 ACC.pdf, page 3]` format citations
4. **Frontend**: Custom ReactMarkdown renderer parses strict citation patterns, replaces with clickable links
5. **Click**: Opens PDF viewer page in new tab (Decision 4B) at the cited page
### Backend Changes
| File | Change |
|------|--------|
| `backend/app/services/rag.py` | Change context chunk labeling and citation instruction in prompt |
| `backend/app/models/common.py` | Add `chunk_index` mapping info if needed |
| `backend/app/models/query.py` | (Possibly) Add citation map to response |
**`rag.py` — Context building change** (lines 94-102):
Current:
```python
context_parts.append(
f"[{i + 1}] Source: {source}\n"
f"Summary: {summary}\n"
f"Content: {chunk}\n"
)
```
New:
```python
source_name = meta.get("filename", "unknown")
page_num = meta.get("page_number")
citation_label = f"{source_name}, page {page_num}" if page_num else source_name
context_parts.append(
f"[{citation_label}] Source: {source_name}\n"
f"Summary: {summary}\n"
f"Content: {chunk}\n"
)
```
**`rag.py` — Prompt change** (lines 106-113):
Current:
```python
f"Cite the source name in [ ] for each point.\n\n"
```
New:
```python
f"Cite your sources inline using the exact bracket labels provided, "
f"e.g. [filename, page N]. Place the citation at the end of each relevant point.\n\n"
```
**Important design decision**: We need to pass citation metadata (filename → chunk_file_path mapping) to the frontend so it can create clickable links. Two approaches:
- **Option A**: Include `chunk_file_path` in the source metadata (already done!) — frontend parses `[filename, page N]` from answer text, looks up matching source in `sources[]` array to get `chunk_file_path`
- **Option B**: Add a `citation_map: Dict[str, CitationInfo]` to `QueryResponse` — more structured but adds complexity
**Recommendation**: Option A — simpler, uses existing `sources[]` data. Frontend regex matches `[filename, page N]` patterns in answer text and cross-references with `sources` array by filename + page_number. Links open in new tab to PDF viewer page (Decision 4B).
### Frontend Changes
| File | Change |
|------|--------|
| `frontend/src/components/ResponsePanel.tsx` | Custom markdown renderer that replaces citation patterns with clickable links |
| `frontend/src/utils/citationParser.ts` | **NEW** — Regex parser for citation patterns |
**Citation parser** (`citationParser.ts`):
```typescript
// Matches patterns like [NEC4 ACC.pdf, page 3] or [meeting_notes.docx]
const CITATION_PATTERN = /\[([^\]]+?(?:,\s*page\s+\d+)?)\]/g
interface ParsedCitation {
label: string // "NEC4 ACC.pdf, page 3"
filename: string // "NEC4 ACC.pdf"
pageNumber: number | null // 3
chunkFilePath: string | null // from sources lookup
}
```
**Rendering approach**:
- Use ReactMarkdown's `components` prop with a custom text renderer
- Intercept text nodes, split on citation pattern (strict format — Decision 3A)
- Replace `[filename, page N]` with `` elements opening PDF viewer in new tab
- URL format: `/pdf-viewer?url={chunkPdfUrl}&page={N}&title={filename}`
- Fallback: If no matching source found, render as plain text (graceful degradation)
```
Example rendered answer:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• The estimated value threshold is HK$1,000,000
[NEC4 ACC.pdf, page 3] ←── clickable link (opens new tab)
• Prior instructions from the Client may override
this threshold [NEC4 ACC.pdf, page 5] ←── clickable
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```
### Acceptance Criteria
- [ ] Backend context labels use `[filename, page N]` format instead of `[1]`
- [ ] LLM prompt instructs to use the new citation format
- [ ] Frontend parses `[filename, page N]` patterns from answer text
- [ ] Parsed citations are rendered as clickable links (blue, underlined)
- [ ] Clicking a citation opens the PDF viewer page in a new browser tab at the cited page
- [ ] If citation can't be matched to a source, renders as plain text (no broken links)
- [ ] The Sources section below the answer still shows full source cards
- [ ] Citation pattern handles: spaces in filenames, missing page numbers (DOCX), long filenames
### Risks & Mitigations
| Risk | Impact | Mitigation |
|------|--------|------------|
| LLM doesn't follow citation format exactly | Citations not parsed | Use flexible regex; fallback to plain text for unrecognized patterns |
| LLM invents citations not in sources | Clickable link to non-existent source | Only linkify citations that match an actual source; show others as text |
| Filenames with special characters break regex | Citation not detected | Use lazy regex matching; test with real filenames (spaces, dots, hyphens) |
| Long filenames make answer text hard to read | Poor UX | Optionally truncate filename in display (e.g., "NEC4 AC...pdf, page 3") |
| Page number is null (DOCX files) | Citation shows just filename | Show `[meeting_notes.docx]` without page number; still clickable for text preview |
---
## New Dependencies
### Frontend
| Package | Purpose | Version | Size |
|---------|---------|---------|------|
| `react-resizable-panels` | Resizable split panels (2.4) | `^4.10.0` | ~517KB |
| `react-pdf` | PDF rendering via PDF.js (2.5) | `^10.4.1` | ~309KB |
| `pdfjs-dist` | PDF.js worker (peer dep of react-pdf) | `^5.3.31` | — |
### Backend
| Package | Purpose | Already installed? |
|---------|---------|--------------------|
| (none) | All backend changes use existing LLM client | ✅ |
---
## Implementation Sequence
### Recommended Order (difficulty-first)
```
Phase 2.1 (Remove Upload) ─┐
Phase 2.2 (Question Display) ─┤── Can run in PARALLEL
Phase 2.3 (Questions) ─┘
│
▼
Phase 2.4 (Resizable Layout) ─── After 2.1 (layout changes)
│
▼
Phase 2.5 (PDF Viewer) ───────── After 2.4 (uses new layout)
│
▼
Phase 2.6 (Inline Citations) ─── After 2.5 (links to PDF viewer)
```
### Parallelization Opportunities
- **2.1, 2.2, 2.3** can run in parallel (independent frontend/backend changes)
- **2.4** depends on 2.1 (layout changes affect same file)
- **2.5** depends on 2.4 (viewer page within new layout)
- **2.6** depends on 2.5 (citations link to PDF viewer page)
---
## Test Plan
### Backend Tests (New/Modified Files)
| File | Coverage |
|------|----------|
| `test_phase1_query_decomposer.py` | Update: verify sub-question generation instead of keywords |
### Frontend Tests (New/Modified Files)
| File | Coverage |
|------|----------|
| `QueryInput.test.tsx` | Submitted question display, clear on new input |
| `KeywordsDisplay.test.tsx` | Update: "Extracted Questions" label, numbered list rendering |
| `LTTPage.test.tsx` | Update: no IngestPanel, resizable layout |
| `PdfViewerPage.test.tsx` | **NEW** — PDF rendering, page nav, zoom, query params |
| `citationParser.test.ts` | **NEW** — Citation regex parsing, edge cases (spaces, nulls, special chars) |
| `ResponsePanel.test.tsx` | Update: inline citation rendering, clickable links |
### Acceptance Tests
| File | Coverage |
|------|----------|
| `test_acceptance_package2_questions.py` | Real LLM generates sub-questions from complex question |
| `test_acceptance_package2_citations.py` | Full flow: upload → query → answer has `[filename, page N]` citations |
---
## Decisions (Confirmed)
| # | Question | Decision |
|---|----------|----------|
| 1 | API field name for extracted questions | **B**: Rename to `extracted_questions` — cleaner naming, breaking change acceptable |
| 2 | PDF viewer integration | **New browser tab** — Dedicated viewer page opens in a new tab with react-pdf rendering |
| 3 | Citation pattern strictness | **A**: Strict `[filename, page N]` — enforce exact format in backend prompt and frontend parser |
| 4 | Citation link target | **B**: Open new tab — same dedicated PDF viewer page as "View PDF" links |
| 5 | Resizable panels persistence | **B**: Always default 30/70 — no localStorage, keep simple |
| 6 | Citation display format in answer | **A**: Full `[filename, page N]` — most traceable for users |
### Decision 2 & 4 Implications
PDF viewer is a **dedicated route page** (`/pdf-viewer`) that opens in a new browser tab. This means:
- No modal component or state management needed
- No context/callback props threaded through components
- All PDF links (View PDF, inline citations) use `target="_blank"` pointing to the viewer page
- Viewer page receives PDF URL, page number, and title via query params
- `react-pdf` renders directly on this standalone page
## Open Questions
None — all decisions confirmed.