legco_ai_assistant/.plans/package2_enhancement_plan.md

25 KiB

Package 2 Enhancement Plan

Source: User request (2026-04-24)
Scope: LTT page UX improvements + PDF preview + inline citations
Status: Complete (2.1 , 2.2 , 2.3 , 2.4 , 2.5 , 2.6 )


Objective

Enhance the LTT (Legislative Transcript Tracker) page and RAG Database page with 6 features, prioritized by difficulty (easiest first):

  1. Remove Upload from LTT — Remove IngestPanel from LTT page (upload lives on RAG Database page)
  2. Display Question Near Submit — Show the submitted question next to the submit button after clicking
  3. Extracted Questions — Rename "Keywords" to "Extracted Questions"; decompose into simplified sub-questions instead of keywords
  4. Adjustable Split Panel — Make the upper/lower LTT layout resizable by dragging a divider
  5. PDF Viewer — Preview PDFs in a new browser tab on both LTT and RAG Database pages instead of downloading
  6. Inline Citations — Replace [1] citations with [filename, page N] format with clickable links to source PDF

Current State

What Exists

  • LTT page at / with CSS Grid layout: grid-rows-[30%_1fr] grid-cols-2
    • Top-Left: Video placeholder (Phase 2)
    • Top-Right: QueryInput → KeywordsDisplay → IngestPanel (vertical stack)
    • Bottom (full width): ResponsePanel (answer + sources)
  • RAG Database page at /rag-database with document management + upload
  • QueryDecomposer extracts keywords (short phrases) from question
  • RAG prompt generates [1], [2] style citations in context chunks
  • LLM instructed: "Cite the source name in for each point"
  • Frontend renders citations as plain text via ReactMarkdown — NO linking
  • Source cards in ResponsePanel show filename, page number, "View PDF" link (opens in new tab = download)
  • ChunkList on RAG Database page has "View PDF" links (also opens in new tab)

What's Missing (Gaps This Plan Fills)

  • Upload button still on LTT page (redundant with RAG Database page)
  • No display of the submitted question after clicking submit
  • Keywords are not useful — user wants decomposed sub-questions
  • LTT layout is fixed (30%/70% split) — not adjustable
  • PDF links download the file instead of previewing in browser
  • Citations like [1] are meaningless to users — no traceability to source

Sub-Phase Breakdown

Sub-Phase Feature Difficulty Backend Frontend Status
2.1 Remove Upload from LTT Easy None Remove IngestPanel from LTTPage Done
2.2 Display Question Near Submit Easy None Show submitted question text Done
2.3 Extracted Questions Medium Prompt change Rename + display questions Done
2.4 Adjustable Split Panel Medium None react-resizable-panels Done
2.5 PDF Viewer (New Tab) Medium None PdfViewerPage route + react-pdf Done
2.6 Inline Citations Hard Prompt + response Citation parser + rendering Done

Dependency Graph

2.1 (Remove Upload) ─────┐
2.2 (Question Display) ──┤
                          ├─► 2.4 (Resizable Layout) ─┐
2.3 (Questions) ──────────┤                            ├─► 2.6 (Inline Citations)
                          │                            │
                          └─► 2.5 (PDF Viewer) ───────┘
  • 2.1, 2.2, 2.3 are independent — can run in parallel
  • 2.4 should wait for 2.1 (layout changes after removing IngestPanel)
  • 2.5 is independent but benefits from 2.4's layout work
  • 2.6 is the hardest — benefits from 2.5 (PDF viewer) being available for linked citations

Sub-Phase 2.1: Remove Upload from LTT Easy

Objective

Remove the IngestPanel from LTT page since document upload now exists on the RAG Database page.

Changes Required

Frontend only — No backend changes.

File Change
frontend/src/pages/LTTPage.tsx Remove IngestPanel import and component; remove useIngestDocument hook

Current (LTTPage.tsx):

import { IngestPanel } from '../components/IngestPanel'
// ...
const ingestMutation = useIngestDocument()
// ...
<IngestPanel
  onUpload={handleFileUpload}
  isLoading={ingestMutation.isPending}
  success={...}
  error={...}
/>

After: Remove all of the above. The top-right section becomes just QueryInput + KeywordsDisplay.

Acceptance Criteria

  • LTT page no longer shows upload button
  • RAG Database page upload still works
  • All existing tests pass (update LTTPage tests if needed)

Sub-Phase 2.2: Display Question Near Submit Easy

Objective

After the user clicks submit, display the original question text next to/near the submit button so they remember what they asked.

Changes Required

Frontend only — No backend changes.

File Change
frontend/src/components/QueryInput.tsx Display the last submitted question below the input after submission

Implementation approach:

  • After successful submit, show the submitted question as static text below the input area
  • Clear when a new question is being typed
  • Style: subtle gray text, italic, with a small label like "Your question:"
┌──────────────────────────────────────────────┐
│ [textarea: type question here]      [Submit] │
│                                               │
│ Your question: "What is the NEC4 clause      │
│ about time extensions?"                       │
└──────────────────────────────────────────────┘

Acceptance Criteria

  • After submit, the submitted question appears below the input
  • Text clears when user starts typing a new question
  • Works correctly during loading state
  • No layout shift or jank

Sub-Phase 2.3: Extracted Questions Medium

Objective

Rename "Keywords" to "Extracted Questions" and change the backend to decompose the user's question into simplified sub-questions instead of keyword phrases.

Current Behavior

  • Backend QueryDecomposer.decompose() prompt: "Given question: '{question}', extract key search keywords as JSON array"
  • Returns: ["NEC4", "time extension", "clause"]
  • Frontend: KeywordsDisplay component shows blue pills labeled "Extracted Keywords:"

New Behavior

  • Backend prompt: Decompose into 2-5 simplified sub-questions
  • Returns: ["What are the time extension provisions?", "What notice is required?", "How is extended time calculated?"]
  • Frontend: Rename to "Extracted Questions:" and display as numbered list

Backend Changes

File Change
backend/app/services/query_decomposer.py Change prompt to generate sub-questions instead of keywords
backend/app/models/query.py Rename keywords field to extracted_questions (or keep keywords but add alias)
backend/app/routers/query.py Update variable naming if model changes

New prompt (replace line 54 in query_decomposer.py):

prompt = (
    f"Given this question: '{question}'\n\n"
    f"Break it down into 2-5 simplified sub-questions that would help "
    f"search for relevant information. Each sub-question should be short "
    f"and focused on one aspect. Return as a JSON array of strings."
)

Decision confirmed: Rename API field from keywords to extracted_questions (Decision 1B).

Backend Changes

File Change
backend/app/services/query_decomposer.py Change prompt to generate sub-questions instead of keywords
backend/app/models/query.py Rename keywords: List[str] to extracted_questions: List[str]
backend/app/routers/query.py Update QueryResponse usage: keywordsextracted_questions

Frontend Changes

File Change
frontend/src/components/KeywordsDisplay.tsx Rename to ExtractedQuestionsDisplay.tsx, change pill style to numbered list style
frontend/src/lib/api.ts Update QueryResponse type: keywordsextracted_questions
frontend/src/lib/queries.tsx Update QueryResponse type reference
frontend/src/types/index.ts Rename keywords to extracted_questions in QueryResponse
frontend/src/pages/LTTPage.tsx Update prop: keywordsextracted_questions

Rename:

  • "Extracted Keywords:" → "Extracted Questions:"
  • data-testid="keywords-*"data-testid="extracted-questions-*"
  • Pill badges → numbered question list with subtle styling

Acceptance Criteria

  • Backend returns 2-5 sub-questions instead of keywords
  • Frontend displays "Extracted Questions:" label
  • Questions display as a numbered list (1. 2. 3.)
  • Graceful fallback if LLM returns empty list
  • Existing query pipeline still works (retrieve uses these as search terms)

Sub-Phase 2.4: Adjustable Split Panel Medium

Objective

Replace the fixed CSS Grid layout on LTT page with a resizable split panel, allowing users to drag the divider between the upper section (video + query) and lower section (response).

New Dependency

Package Purpose Weekly Downloads Notes
react-resizable-panels Resizable split panels ~34.7M By Brian Vaughn (React core team), zero deps, MIT
npm install react-resizable-panels

Changes Required

Frontend only — No backend changes.

File Change
frontend/src/pages/LTTPage.tsx Replace CSS Grid with PanelGroup + Panel + PanelResizeHandle
frontend/package.json Add react-resizable-panels

Current layout (LTTPage.tsx):

<div className="h-full grid grid-rows-[30%_1fr] grid-cols-2 bg-gray-50">
  <div className="border-r border-b ...">VideoPlaceholder</div>
  <div className="border-b ...">QueryInput + KeywordsDisplay + IngestPanel</div>
  <div className="col-span-2 ...">ResponsePanel</div>
</div>

New layout:

import { Panel, PanelGroup, PanelResizeHandle } from 'react-resizable-panels'

<div className="h-full flex flex-col bg-gray-50">
  <PanelGroup direction="vertical">
    {/* Upper section: video + query */}
    <Panel defaultSize={30} minSize={15} maxSize={60}>
      <div className="h-full grid grid-cols-2">
        <div className="border-r ...">VideoPlaceholder</div>
        <div className="...">QueryInput + KeywordsDisplay</div>
      </div>
    </Panel>
    
    {/* Draggable divider */}
    <PanelResizeHandle className="h-2 bg-gray-200 hover:bg-blue-300 cursor-row-resize transition-colors" />
    
    {/* Lower section: response */}
    <Panel minSize={20}>
      <div className="h-full overflow-y-auto">
        <ResponsePanel ... />
      </div>
    </Panel>
  </PanelGroup>
</div>

Design decisions:

  • Default split: 30% upper / 70% lower (same as current fixed layout)
  • Min/Max: Upper panel 15-60%, lower panel 20%+ (prevents collapsing)
  • Handle style: 8px horizontal bar, gray by default, blue on hover, cursor row-resize
  • Persistence: Optionally save split ratio to localStorage (nice-to-have)

Acceptance Criteria

  • LTT page has a draggable divider between upper and lower sections
  • Default split matches current 30/70 layout
  • Divider is visually obvious and provides cursor feedback on hover
  • Both sections enforce minimum sizes (no collapsing)
  • Content in each section scrolls independently
  • Layout works on window resize

Sub-Phase 2.5: PDF Viewer in New Tab Medium

Objective

Preview PDF files in-browser via a dedicated viewer page that opens in a new browser tab, on both the LTT page (source cards) and RAG Database page (chunk list).

Design Decision (Confirmed)

PDF viewer is a dedicated route (/pdf-viewer) that opens in a new browser tab. This avoids modal state management and provides a clean full-page PDF viewing experience. All "View PDF" links and inline citations open this page via target="_blank".

New Dependencies

Package Purpose Weekly Downloads Notes
react-pdf PDF rendering via PDF.js ~1M MIT, lightweight (309KB), Vite compatible
pdfjs-dist PDF.js worker (peer dep) Required by react-pdf
npm install react-pdf pdfjs-dist

Changes Required

Frontend only — No backend changes (the existing GET /chunks/{file_path}/pdf endpoint serves PDFs).

File Change
frontend/src/pages/PdfViewerPage.tsx NEW — Dedicated full-page PDF viewer route
frontend/src/App.tsx Add /pdf-viewer route
frontend/src/components/ResponsePanel.tsx "View PDF" links point to /pdf-viewer?url=...&page=N with target="_blank"
frontend/src/components/ChunkList.tsx Same as ResponsePanel
frontend/src/lib/api.ts Add getPdfViewerUrl() helper
frontend/vite.config.ts Configure PDF.js worker
frontend/package.json Add react-pdf, pdfjs-dist

PdfViewerPage design:

┌──────────────────────────────────────────────────────┐
│  ◀ Back to LTT          NEC4 ACC — Page 3 of 97  ▶  │
├──────────────────────────────────────────────────────┤
│                                                       │
│           ┌─────────────────────┐                     │
│           │                     │                     │
│           │    PDF rendered     │                     │
│           │    page content     │                     │
│           │                     │                     │
│           │                     │                     │
│           └─────────────────────┘                     │
│                                                       │
│              [Zoom -] 100% [Zoom +]                   │
└──────────────────────────────────────────────────────┘

Query params: ?url=<encoded PDF URL>&page=<page number>&title=<filename>

Implementation:

  • Standalone page at /pdf-viewer route (not embedded in main layout)
  • Uses react-pdf <Document> + <Page> components
  • PDF URL from query param, validated against backend endpoint
  • Page navigation controls (prev/next) with page number display
  • Zoom controls (+/-)
  • Title shows filename from query param
  • "Back to LTT" link to return to the app
  • No NavBar on this page (clean viewer experience)

Integration points:

  • ResponsePanel.tsx: Change href from getChunkPdfUrl() to getPdfViewerUrl() with target="_blank"
  • ChunkList.tsx: Same change
  • Inline citations (Sub-Phase 2.6): Same URL pattern

Acceptance Criteria

  • Clicking "View PDF" opens a new browser tab with the PDF viewer page
  • PDF renders correctly with original formatting preserved
  • Page navigation (prev/next) works for multi-page PDFs
  • Zoom controls work
  • Page title shows filename and page number
  • "Back to LTT" link returns to the app
  • Works from both LTT and RAG Database pages
  • URL query params properly encode/decode special characters
  • PDF.js worker loads correctly in Vite build

Objective

Replace the current [1], [2] citation format in RAG responses with [filename, page N] format that includes a clickable link to the source chunk PDF.

Current Citation Flow

  1. Backend (rag.py): Context chunks labeled [1], [2], [3]...
  2. LLM prompt: "Cite the source name in for each point"
  3. LLM output: Answer text contains [1], [NEC4 ACC], or similar bracketed citations
  4. Frontend: ReactMarkdown renders citations as plain text — no linking

New Citation Flow

  1. Backend: Context chunks labeled [filename, page N] with strict format
  2. LLM prompt: "Cite sources as [filename, page N]" (strict — Decision 3A)
  3. LLM output: Answer text contains [NEC4 ACC.pdf, page 3] format citations
  4. Frontend: Custom ReactMarkdown renderer parses strict citation patterns, replaces with clickable links
  5. Click: Opens PDF viewer page in new tab (Decision 4B) at the cited page

Backend Changes

File Change
backend/app/services/rag.py Change context chunk labeling and citation instruction in prompt
backend/app/models/common.py Add chunk_index mapping info if needed
backend/app/models/query.py (Possibly) Add citation map to response

rag.py — Context building change (lines 94-102):

Current:

context_parts.append(
    f"[{i + 1}] Source: {source}\n"
    f"Summary: {summary}\n"
    f"Content: {chunk}\n"
)

New:

source_name = meta.get("filename", "unknown")
page_num = meta.get("page_number")
citation_label = f"{source_name}, page {page_num}" if page_num else source_name

context_parts.append(
    f"[{citation_label}] Source: {source_name}\n"
    f"Summary: {summary}\n"
    f"Content: {chunk}\n"
)

rag.py — Prompt change (lines 106-113):

Current:

f"Cite the source name in [ ] for each point.\n\n"

New:

f"Cite your sources inline using the exact bracket labels provided, "
f"e.g. [filename, page N]. Place the citation at the end of each relevant point.\n\n"

Important design decision: We need to pass citation metadata (filename → chunk_file_path mapping) to the frontend so it can create clickable links. Two approaches:

  • Option A: Include chunk_file_path in the source metadata (already done!) — frontend parses [filename, page N] from answer text, looks up matching source in sources[] array to get chunk_file_path
  • Option B: Add a citation_map: Dict[str, CitationInfo] to QueryResponse — more structured but adds complexity

Recommendation: Option A — simpler, uses existing sources[] data. Frontend regex matches [filename, page N] patterns in answer text and cross-references with sources array by filename + page_number. Links open in new tab to PDF viewer page (Decision 4B).

Frontend Changes

File Change
frontend/src/components/ResponsePanel.tsx Custom markdown renderer that replaces citation patterns with clickable links
frontend/src/utils/citationParser.ts NEW — Regex parser for citation patterns

Citation parser (citationParser.ts):

// Matches patterns like [NEC4 ACC.pdf, page 3] or [meeting_notes.docx]
const CITATION_PATTERN = /\[([^\]]+?(?:,\s*page\s+\d+)?)\]/g

interface ParsedCitation {
  label: string        // "NEC4 ACC.pdf, page 3"
  filename: string     // "NEC4 ACC.pdf"
  pageNumber: number | null  // 3
  chunkFilePath: string | null  // from sources lookup
}

Rendering approach:

  • Use ReactMarkdown's components prop with a custom text renderer
  • Intercept text nodes, split on citation pattern (strict format — Decision 3A)
  • Replace [filename, page N] with <a> elements opening PDF viewer in new tab
  • URL format: /pdf-viewer?url={chunkPdfUrl}&page={N}&title={filename}
  • Fallback: If no matching source found, render as plain text (graceful degradation)
Example rendered answer:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• The estimated value threshold is HK$1,000,000 
  [NEC4 ACC.pdf, page 3] ←── clickable link (opens new tab)
• Prior instructions from the Client may override 
  this threshold [NEC4 ACC.pdf, page 5] ←── clickable
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Acceptance Criteria

  • Backend context labels use [filename, page N] format instead of [1]
  • LLM prompt instructs to use the new citation format
  • Frontend parses [filename, page N] patterns from answer text
  • Parsed citations are rendered as clickable links (blue, underlined)
  • Clicking a citation opens the PDF viewer page in a new browser tab at the cited page
  • If citation can't be matched to a source, renders as plain text (no broken links)
  • The Sources section below the answer still shows full source cards
  • Citation pattern handles: spaces in filenames, missing page numbers (DOCX), long filenames

Risks & Mitigations

Risk Impact Mitigation
LLM doesn't follow citation format exactly Citations not parsed Use flexible regex; fallback to plain text for unrecognized patterns
LLM invents citations not in sources Clickable link to non-existent source Only linkify citations that match an actual source; show others as text
Filenames with special characters break regex Citation not detected Use lazy regex matching; test with real filenames (spaces, dots, hyphens)
Long filenames make answer text hard to read Poor UX Optionally truncate filename in display (e.g., "NEC4 AC...pdf, page 3")
Page number is null (DOCX files) Citation shows just filename Show [meeting_notes.docx] without page number; still clickable for text preview

New Dependencies

Frontend

Package Purpose Version Size
react-resizable-panels Resizable split panels (2.4) ^4.10.0 ~517KB
react-pdf PDF rendering via PDF.js (2.5) ^10.4.1 ~309KB
pdfjs-dist PDF.js worker (peer dep of react-pdf) ^5.3.31

Backend

Package Purpose Already installed?
(none) All backend changes use existing LLM client

Implementation Sequence

Phase 2.1 (Remove Upload)     ─┐
Phase 2.2 (Question Display)  ─┤── Can run in PARALLEL
Phase 2.3 (Questions)         ─┘
         │
         ▼
Phase 2.4 (Resizable Layout) ─── After 2.1 (layout changes)
         │
         ▼
Phase 2.5 (PDF Viewer) ───────── After 2.4 (uses new layout)
         │
         ▼
Phase 2.6 (Inline Citations) ─── After 2.5 (links to PDF viewer)

Parallelization Opportunities

  • 2.1, 2.2, 2.3 can run in parallel (independent frontend/backend changes)
  • 2.4 depends on 2.1 (layout changes affect same file)
  • 2.5 depends on 2.4 (viewer page within new layout)
  • 2.6 depends on 2.5 (citations link to PDF viewer page)

Test Plan

Backend Tests (New/Modified Files)

File Coverage
test_phase1_query_decomposer.py Update: verify sub-question generation instead of keywords

Frontend Tests (New/Modified Files)

File Coverage
QueryInput.test.tsx Submitted question display, clear on new input
KeywordsDisplay.test.tsx Update: "Extracted Questions" label, numbered list rendering
LTTPage.test.tsx Update: no IngestPanel, resizable layout
PdfViewerPage.test.tsx NEW — PDF rendering, page nav, zoom, query params
citationParser.test.ts NEW — Citation regex parsing, edge cases (spaces, nulls, special chars)
ResponsePanel.test.tsx Update: inline citation rendering, clickable links

Acceptance Tests

File Coverage
test_acceptance_package2_questions.py Real LLM generates sub-questions from complex question
test_acceptance_package2_citations.py Full flow: upload → query → answer has [filename, page N] citations

Decisions (Confirmed)

# Question Decision
1 API field name for extracted questions B: Rename to extracted_questions — cleaner naming, breaking change acceptable
2 PDF viewer integration New browser tab — Dedicated viewer page opens in a new tab with react-pdf rendering
3 Citation pattern strictness A: Strict [filename, page N] — enforce exact format in backend prompt and frontend parser
4 Citation link target B: Open new tab — same dedicated PDF viewer page as "View PDF" links
5 Resizable panels persistence B: Always default 30/70 — no localStorage, keep simple
6 Citation display format in answer A: Full [filename, page N] — most traceable for users

Decision 2 & 4 Implications

PDF viewer is a dedicated route page (/pdf-viewer) that opens in a new browser tab. This means:

  • No modal component or state management needed
  • No context/callback props threaded through components
  • All PDF links (View PDF, inline citations) use target="_blank" pointing to the viewer page
  • Viewer page receives PDF URL, page number, and title via query params
  • react-pdf renders directly on this standalone page

Open Questions

None — all decisions confirmed.