# Package 2 Enhancement Plan **Source**: User request (2026-04-24) **Scope**: LTT page UX improvements + PDF preview + inline citations **Status**: 🔄 In Progress (2.1 ✅, 2.2 ✅) --- ## Objective Enhance the LTT (Legislative Transcript Tracker) page and RAG Database page with 6 features, prioritized by difficulty (easiest first): 1. **Remove Upload from LTT** — Remove IngestPanel from LTT page (upload lives on RAG Database page) 2. **Display Question Near Submit** — Show the submitted question next to the submit button after clicking 3. **Extracted Questions** — Rename "Keywords" to "Extracted Questions"; decompose into simplified sub-questions instead of keywords 4. **Adjustable Split Panel** — Make the upper/lower LTT layout resizable by dragging a divider 5. **PDF Viewer** — Preview PDFs in a new browser tab on both LTT and RAG Database pages instead of downloading 6. **Inline Citations** — Replace `[1]` citations with `[filename, page N]` format with clickable links to source PDF --- ## Current State ### What Exists - LTT page at `/` with CSS Grid layout: `grid-rows-[30%_1fr] grid-cols-2` - Top-Left: Video placeholder (Phase 2) - Top-Right: QueryInput → KeywordsDisplay → IngestPanel (vertical stack) - Bottom (full width): ResponsePanel (answer + sources) - RAG Database page at `/rag-database` with document management + upload - QueryDecomposer extracts keywords (short phrases) from question - RAG prompt generates `[1]`, `[2]` style citations in context chunks - LLM instructed: "Cite the source name in [ ] for each point" - Frontend renders citations as plain text via ReactMarkdown — NO linking - Source cards in ResponsePanel show filename, page number, "View PDF" link (opens in new tab = download) - ChunkList on RAG Database page has "View PDF" links (also opens in new tab) ### What's Missing (Gaps This Plan Fills) - Upload button still on LTT page (redundant with RAG Database page) - No display of the submitted question after clicking submit - Keywords are not useful — user wants decomposed sub-questions - LTT layout is fixed (30%/70% split) — not adjustable - PDF links download the file instead of previewing in browser - Citations like `[1]` are meaningless to users — no traceability to source --- ## Sub-Phase Breakdown | Sub-Phase | Feature | Difficulty | Backend | Frontend | Status | |-----------|---------|-----------|---------|----------|--------| | 2.1 | Remove Upload from LTT | ⭐ Easy | None | Remove IngestPanel from LTTPage | ✅ Done | | 2.2 | Display Question Near Submit | ⭐ Easy | None | Show submitted question text | ✅ Done | | 2.3 | Extracted Questions | ⭐⭐ Medium | Prompt change | Rename + display questions | 📋 Pending | | 2.4 | Adjustable Split Panel | ⭐⭐ Medium | None | react-resizable-panels | 📋 Pending | | 2.5 | PDF Viewer (New Tab) | ⭐⭐ Medium | None | PdfViewerPage route + react-pdf | 📋 Pending | | 2.6 | Inline Citations | ⭐⭐⭐ Hard | Prompt + response | Citation parser + rendering | 📋 Pending | ### Dependency Graph ``` 2.1 (Remove Upload) ─────┐ 2.2 (Question Display) ──┤ ├─► 2.4 (Resizable Layout) ─┐ 2.3 (Questions) ──────────┤ ├─► 2.6 (Inline Citations) │ │ └─► 2.5 (PDF Viewer) ───────┘ ``` - **2.1, 2.2, 2.3** are independent — can run in parallel - **2.4** should wait for 2.1 (layout changes after removing IngestPanel) - **2.5** is independent but benefits from 2.4's layout work - **2.6** is the hardest — benefits from 2.5 (PDF viewer) being available for linked citations --- ## Sub-Phase 2.1: Remove Upload from LTT ⭐ Easy ### Objective Remove the IngestPanel from LTT page since document upload now exists on the RAG Database page. ### Changes Required **Frontend only** — No backend changes. | File | Change | |------|--------| | `frontend/src/pages/LTTPage.tsx` | Remove IngestPanel import and component; remove `useIngestDocument` hook | **Current** (LTTPage.tsx): ```tsx import { IngestPanel } from '../components/IngestPanel' // ... const ingestMutation = useIngestDocument() // ... ``` **After**: Remove all of the above. The top-right section becomes just QueryInput + KeywordsDisplay. ### Acceptance Criteria - [ ] LTT page no longer shows upload button - [ ] RAG Database page upload still works - [ ] All existing tests pass (update LTTPage tests if needed) --- ## Sub-Phase 2.2: Display Question Near Submit ⭐ Easy ### Objective After the user clicks submit, display the original question text next to/near the submit button so they remember what they asked. ### Changes Required **Frontend only** — No backend changes. | File | Change | |------|--------| | `frontend/src/components/QueryInput.tsx` | Display the last submitted question below the input after submission | **Implementation approach**: - After successful submit, show the submitted question as static text below the input area - Clear when a new question is being typed - Style: subtle gray text, italic, with a small label like "Your question:" ``` ┌──────────────────────────────────────────────┐ │ [textarea: type question here] [Submit] │ │ │ │ Your question: "What is the NEC4 clause │ │ about time extensions?" │ └──────────────────────────────────────────────┘ ``` ### Acceptance Criteria - [ ] After submit, the submitted question appears below the input - [ ] Text clears when user starts typing a new question - [ ] Works correctly during loading state - [ ] No layout shift or jank --- ## Sub-Phase 2.3: Extracted Questions ⭐⭐ Medium ### Objective Rename "Keywords" to "Extracted Questions" and change the backend to decompose the user's question into simplified sub-questions instead of keyword phrases. ### Current Behavior - Backend `QueryDecomposer.decompose()` prompt: `"Given question: '{question}', extract key search keywords as JSON array"` - Returns: `["NEC4", "time extension", "clause"]` - Frontend: `KeywordsDisplay` component shows blue pills labeled "Extracted Keywords:" ### New Behavior - Backend prompt: Decompose into 2-5 simplified sub-questions - Returns: `["What are the time extension provisions?", "What notice is required?", "How is extended time calculated?"]` - Frontend: Rename to "Extracted Questions:" and display as numbered list ### Backend Changes | File | Change | |------|--------| | `backend/app/services/query_decomposer.py` | Change prompt to generate sub-questions instead of keywords | | `backend/app/models/query.py` | Rename `keywords` field to `extracted_questions` (or keep `keywords` but add alias) | | `backend/app/routers/query.py` | Update variable naming if model changes | **New prompt** (replace line 54 in query_decomposer.py): ```python prompt = ( f"Given this question: '{question}'\n\n" f"Break it down into 2-5 simplified sub-questions that would help " f"search for relevant information. Each sub-question should be short " f"and focused on one aspect. Return as a JSON array of strings." ) ``` **Decision confirmed**: Rename API field from `keywords` to `extracted_questions` (Decision 1B). ### Backend Changes | File | Change | |------|--------| | `backend/app/services/query_decomposer.py` | Change prompt to generate sub-questions instead of keywords | | `backend/app/models/query.py` | Rename `keywords: List[str]` to `extracted_questions: List[str]` | | `backend/app/routers/query.py` | Update `QueryResponse` usage: `keywords` → `extracted_questions` | ### Frontend Changes | File | Change | |------|--------| | `frontend/src/components/KeywordsDisplay.tsx` | Rename to `ExtractedQuestionsDisplay.tsx`, change pill style to numbered list style | | `frontend/src/lib/api.ts` | Update `QueryResponse` type: `keywords` → `extracted_questions` | | `frontend/src/lib/queries.tsx` | Update `QueryResponse` type reference | | `frontend/src/types/index.ts` | Rename `keywords` to `extracted_questions` in `QueryResponse` | | `frontend/src/pages/LTTPage.tsx` | Update prop: `keywords` → `extracted_questions` | **Rename**: - "Extracted Keywords:" → "Extracted Questions:" - `data-testid="keywords-*"` → `data-testid="extracted-questions-*"` - Pill badges → numbered question list with subtle styling ### Acceptance Criteria - [ ] Backend returns 2-5 sub-questions instead of keywords - [ ] Frontend displays "Extracted Questions:" label - [ ] Questions display as a numbered list (1. 2. 3.) - [ ] Graceful fallback if LLM returns empty list - [ ] Existing query pipeline still works (retrieve uses these as search terms) --- ## Sub-Phase 2.4: Adjustable Split Panel ⭐⭐ Medium ### Objective Replace the fixed CSS Grid layout on LTT page with a resizable split panel, allowing users to drag the divider between the upper section (video + query) and lower section (response). ### New Dependency | Package | Purpose | Weekly Downloads | Notes | |---------|---------|------------------|-------| | `react-resizable-panels` | Resizable split panels | ~34.7M | By Brian Vaughn (React core team), zero deps, MIT | ```bash npm install react-resizable-panels ``` ### Changes Required **Frontend only** — No backend changes. | File | Change | |------|--------| | `frontend/src/pages/LTTPage.tsx` | Replace CSS Grid with `PanelGroup` + `Panel` + `PanelResizeHandle` | | `frontend/package.json` | Add `react-resizable-panels` | **Current layout** (LTTPage.tsx): ```tsx
VideoPlaceholder
QueryInput + KeywordsDisplay + IngestPanel
ResponsePanel
``` **New layout**: ```tsx import { Panel, PanelGroup, PanelResizeHandle } from 'react-resizable-panels'
{/* Upper section: video + query */}
VideoPlaceholder
QueryInput + KeywordsDisplay
{/* Draggable divider */} {/* Lower section: response */}
``` **Design decisions**: - **Default split**: 30% upper / 70% lower (same as current fixed layout) - **Min/Max**: Upper panel 15-60%, lower panel 20%+ (prevents collapsing) - **Handle style**: 8px horizontal bar, gray by default, blue on hover, cursor row-resize - **Persistence**: Optionally save split ratio to localStorage (nice-to-have) ### Acceptance Criteria - [ ] LTT page has a draggable divider between upper and lower sections - [ ] Default split matches current 30/70 layout - [ ] Divider is visually obvious and provides cursor feedback on hover - [ ] Both sections enforce minimum sizes (no collapsing) - [ ] Content in each section scrolls independently - [ ] Layout works on window resize --- ## Sub-Phase 2.5: PDF Viewer in New Tab ⭐⭐ Medium ### Objective Preview PDF files in-browser via a dedicated viewer page that opens in a new browser tab, on both the LTT page (source cards) and RAG Database page (chunk list). ### Design Decision (Confirmed) PDF viewer is a **dedicated route** (`/pdf-viewer`) that opens in a new browser tab. This avoids modal state management and provides a clean full-page PDF viewing experience. All "View PDF" links and inline citations open this page via `target="_blank"`. ### New Dependencies | Package | Purpose | Weekly Downloads | Notes | |---------|---------|------------------|-------| | `react-pdf` | PDF rendering via PDF.js | ~1M | MIT, lightweight (309KB), Vite compatible | | `pdfjs-dist` | PDF.js worker | (peer dep) | Required by react-pdf | ```bash npm install react-pdf pdfjs-dist ``` ### Changes Required **Frontend only** — No backend changes (the existing `GET /chunks/{file_path}/pdf` endpoint serves PDFs). | File | Change | |------|--------| | `frontend/src/pages/PdfViewerPage.tsx` | **NEW** — Dedicated full-page PDF viewer route | | `frontend/src/App.tsx` | Add `/pdf-viewer` route | | `frontend/src/components/ResponsePanel.tsx` | "View PDF" links point to `/pdf-viewer?url=...&page=N` with `target="_blank"` | | `frontend/src/components/ChunkList.tsx` | Same as ResponsePanel | | `frontend/src/lib/api.ts` | Add `getPdfViewerUrl()` helper | | `frontend/vite.config.ts` | Configure PDF.js worker | | `frontend/package.json` | Add react-pdf, pdfjs-dist | **PdfViewerPage design**: ``` ┌──────────────────────────────────────────────────────┐ │ ◀ Back to LTT NEC4 ACC — Page 3 of 97 ▶ │ ├──────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────┐ │ │ │ │ │ │ │ PDF rendered │ │ │ │ page content │ │ │ │ │ │ │ │ │ │ │ └─────────────────────┘ │ │ │ │ [Zoom -] 100% [Zoom +] │ └──────────────────────────────────────────────────────┘ ``` **Query params**: `?url=&page=&title=` **Implementation**: - Standalone page at `/pdf-viewer` route (not embedded in main layout) - Uses `react-pdf` `` + `` components - PDF URL from query param, validated against backend endpoint - Page navigation controls (prev/next) with page number display - Zoom controls (+/-) - Title shows filename from query param - "Back to LTT" link to return to the app - No NavBar on this page (clean viewer experience) **Integration points**: - `ResponsePanel.tsx`: Change `href` from `getChunkPdfUrl()` to `getPdfViewerUrl()` with `target="_blank"` - `ChunkList.tsx`: Same change - Inline citations (Sub-Phase 2.6): Same URL pattern ### Acceptance Criteria - [ ] Clicking "View PDF" opens a new browser tab with the PDF viewer page - [ ] PDF renders correctly with original formatting preserved - [ ] Page navigation (prev/next) works for multi-page PDFs - [ ] Zoom controls work - [ ] Page title shows filename and page number - [ ] "Back to LTT" link returns to the app - [ ] Works from both LTT and RAG Database pages - [ ] URL query params properly encode/decode special characters - [ ] PDF.js worker loads correctly in Vite build --- ## Sub-Phase 2.6: Inline Citations with Clickable Links ⭐⭐⭐ Hard ### Objective Replace the current `[1]`, `[2]` citation format in RAG responses with `[filename, page N]` format that includes a clickable link to the source chunk PDF. ### Current Citation Flow 1. **Backend** (`rag.py`): Context chunks labeled `[1]`, `[2]`, `[3]`... 2. **LLM prompt**: "Cite the source name in [ ] for each point" 3. **LLM output**: Answer text contains `[1]`, `[NEC4 ACC]`, or similar bracketed citations 4. **Frontend**: ReactMarkdown renders citations as plain text — no linking ### New Citation Flow 1. **Backend**: Context chunks labeled `[filename, page N]` with strict format 2. **LLM prompt**: "Cite sources as `[filename, page N]`" (strict — Decision 3A) 3. **LLM output**: Answer text contains `[NEC4 ACC.pdf, page 3]` format citations 4. **Frontend**: Custom ReactMarkdown renderer parses strict citation patterns, replaces with clickable links 5. **Click**: Opens PDF viewer page in new tab (Decision 4B) at the cited page ### Backend Changes | File | Change | |------|--------| | `backend/app/services/rag.py` | Change context chunk labeling and citation instruction in prompt | | `backend/app/models/common.py` | Add `chunk_index` mapping info if needed | | `backend/app/models/query.py` | (Possibly) Add citation map to response | **`rag.py` — Context building change** (lines 94-102): Current: ```python context_parts.append( f"[{i + 1}] Source: {source}\n" f"Summary: {summary}\n" f"Content: {chunk}\n" ) ``` New: ```python source_name = meta.get("filename", "unknown") page_num = meta.get("page_number") citation_label = f"{source_name}, page {page_num}" if page_num else source_name context_parts.append( f"[{citation_label}] Source: {source_name}\n" f"Summary: {summary}\n" f"Content: {chunk}\n" ) ``` **`rag.py` — Prompt change** (lines 106-113): Current: ```python f"Cite the source name in [ ] for each point.\n\n" ``` New: ```python f"Cite your sources inline using the exact bracket labels provided, " f"e.g. [filename, page N]. Place the citation at the end of each relevant point.\n\n" ``` **Important design decision**: We need to pass citation metadata (filename → chunk_file_path mapping) to the frontend so it can create clickable links. Two approaches: - **Option A**: Include `chunk_file_path` in the source metadata (already done!) — frontend parses `[filename, page N]` from answer text, looks up matching source in `sources[]` array to get `chunk_file_path` - **Option B**: Add a `citation_map: Dict[str, CitationInfo]` to `QueryResponse` — more structured but adds complexity **Recommendation**: Option A — simpler, uses existing `sources[]` data. Frontend regex matches `[filename, page N]` patterns in answer text and cross-references with `sources` array by filename + page_number. Links open in new tab to PDF viewer page (Decision 4B). ### Frontend Changes | File | Change | |------|--------| | `frontend/src/components/ResponsePanel.tsx` | Custom markdown renderer that replaces citation patterns with clickable links | | `frontend/src/utils/citationParser.ts` | **NEW** — Regex parser for citation patterns | **Citation parser** (`citationParser.ts`): ```typescript // Matches patterns like [NEC4 ACC.pdf, page 3] or [meeting_notes.docx] const CITATION_PATTERN = /\[([^\]]+?(?:,\s*page\s+\d+)?)\]/g interface ParsedCitation { label: string // "NEC4 ACC.pdf, page 3" filename: string // "NEC4 ACC.pdf" pageNumber: number | null // 3 chunkFilePath: string | null // from sources lookup } ``` **Rendering approach**: - Use ReactMarkdown's `components` prop with a custom text renderer - Intercept text nodes, split on citation pattern (strict format — Decision 3A) - Replace `[filename, page N]` with `` elements opening PDF viewer in new tab - URL format: `/pdf-viewer?url={chunkPdfUrl}&page={N}&title={filename}` - Fallback: If no matching source found, render as plain text (graceful degradation) ``` Example rendered answer: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • The estimated value threshold is HK$1,000,000 [NEC4 ACC.pdf, page 3] ←── clickable link (opens new tab) • Prior instructions from the Client may override this threshold [NEC4 ACC.pdf, page 5] ←── clickable ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ``` ### Acceptance Criteria - [ ] Backend context labels use `[filename, page N]` format instead of `[1]` - [ ] LLM prompt instructs to use the new citation format - [ ] Frontend parses `[filename, page N]` patterns from answer text - [ ] Parsed citations are rendered as clickable links (blue, underlined) - [ ] Clicking a citation opens the PDF viewer page in a new browser tab at the cited page - [ ] If citation can't be matched to a source, renders as plain text (no broken links) - [ ] The Sources section below the answer still shows full source cards - [ ] Citation pattern handles: spaces in filenames, missing page numbers (DOCX), long filenames ### Risks & Mitigations | Risk | Impact | Mitigation | |------|--------|------------| | LLM doesn't follow citation format exactly | Citations not parsed | Use flexible regex; fallback to plain text for unrecognized patterns | | LLM invents citations not in sources | Clickable link to non-existent source | Only linkify citations that match an actual source; show others as text | | Filenames with special characters break regex | Citation not detected | Use lazy regex matching; test with real filenames (spaces, dots, hyphens) | | Long filenames make answer text hard to read | Poor UX | Optionally truncate filename in display (e.g., "NEC4 AC...pdf, page 3") | | Page number is null (DOCX files) | Citation shows just filename | Show `[meeting_notes.docx]` without page number; still clickable for text preview | --- ## New Dependencies ### Frontend | Package | Purpose | Version | Size | |---------|---------|---------|------| | `react-resizable-panels` | Resizable split panels (2.4) | `^4.10.0` | ~517KB | | `react-pdf` | PDF rendering via PDF.js (2.5) | `^10.4.1` | ~309KB | | `pdfjs-dist` | PDF.js worker (peer dep of react-pdf) | `^5.3.31` | — | ### Backend | Package | Purpose | Already installed? | |---------|---------|--------------------| | (none) | All backend changes use existing LLM client | ✅ | --- ## Implementation Sequence ### Recommended Order (difficulty-first) ``` Phase 2.1 (Remove Upload) ─┐ Phase 2.2 (Question Display) ─┤── Can run in PARALLEL Phase 2.3 (Questions) ─┘ │ ▼ Phase 2.4 (Resizable Layout) ─── After 2.1 (layout changes) │ ▼ Phase 2.5 (PDF Viewer) ───────── After 2.4 (uses new layout) │ ▼ Phase 2.6 (Inline Citations) ─── After 2.5 (links to PDF viewer) ``` ### Parallelization Opportunities - **2.1, 2.2, 2.3** can run in parallel (independent frontend/backend changes) - **2.4** depends on 2.1 (layout changes affect same file) - **2.5** depends on 2.4 (viewer page within new layout) - **2.6** depends on 2.5 (citations link to PDF viewer page) --- ## Test Plan ### Backend Tests (New/Modified Files) | File | Coverage | |------|----------| | `test_phase1_query_decomposer.py` | Update: verify sub-question generation instead of keywords | ### Frontend Tests (New/Modified Files) | File | Coverage | |------|----------| | `QueryInput.test.tsx` | Submitted question display, clear on new input | | `KeywordsDisplay.test.tsx` | Update: "Extracted Questions" label, numbered list rendering | | `LTTPage.test.tsx` | Update: no IngestPanel, resizable layout | | `PdfViewerPage.test.tsx` | **NEW** — PDF rendering, page nav, zoom, query params | | `citationParser.test.ts` | **NEW** — Citation regex parsing, edge cases (spaces, nulls, special chars) | | `ResponsePanel.test.tsx` | Update: inline citation rendering, clickable links | ### Acceptance Tests | File | Coverage | |------|----------| | `test_acceptance_package2_questions.py` | Real LLM generates sub-questions from complex question | | `test_acceptance_package2_citations.py` | Full flow: upload → query → answer has `[filename, page N]` citations | --- ## Decisions (Confirmed) | # | Question | Decision | |---|----------|----------| | 1 | API field name for extracted questions | **B**: Rename to `extracted_questions` — cleaner naming, breaking change acceptable | | 2 | PDF viewer integration | **New browser tab** — Dedicated viewer page opens in a new tab with react-pdf rendering | | 3 | Citation pattern strictness | **A**: Strict `[filename, page N]` — enforce exact format in backend prompt and frontend parser | | 4 | Citation link target | **B**: Open new tab — same dedicated PDF viewer page as "View PDF" links | | 5 | Resizable panels persistence | **B**: Always default 30/70 — no localStorage, keep simple | | 6 | Citation display format in answer | **A**: Full `[filename, page N]` — most traceable for users | ### Decision 2 & 4 Implications PDF viewer is a **dedicated route page** (`/pdf-viewer`) that opens in a new browser tab. This means: - No modal component or state management needed - No context/callback props threaded through components - All PDF links (View PDF, inline citations) use `target="_blank"` pointing to the viewer page - Viewer page receives PDF URL, page number, and title via query params - `react-pdf` renders directly on this standalone page ## Open Questions None — all decisions confirmed.