595 lines
25 KiB
Markdown
595 lines
25 KiB
Markdown
# Package 2 Enhancement Plan
|
|
|
|
**Source**: User request (2026-04-24)
|
|
**Scope**: LTT page UX improvements + PDF preview + inline citations
|
|
**Status**: 🔄 In Progress (2.1 ✅, 2.2 ✅, 2.3 ✅, 2.4 ✅, 2.5 ✅)
|
|
|
|
---
|
|
|
|
## Objective
|
|
|
|
Enhance the LTT (Legislative Transcript Tracker) page and RAG Database page with 6 features, prioritized by difficulty (easiest first):
|
|
|
|
1. **Remove Upload from LTT** — Remove IngestPanel from LTT page (upload lives on RAG Database page)
|
|
2. **Display Question Near Submit** — Show the submitted question next to the submit button after clicking
|
|
3. **Extracted Questions** — Rename "Keywords" to "Extracted Questions"; decompose into simplified sub-questions instead of keywords
|
|
4. **Adjustable Split Panel** — Make the upper/lower LTT layout resizable by dragging a divider
|
|
5. **PDF Viewer** — Preview PDFs in a new browser tab on both LTT and RAG Database pages instead of downloading
|
|
6. **Inline Citations** — Replace `[1]` citations with `[filename, page N]` format with clickable links to source PDF
|
|
|
|
---
|
|
|
|
## Current State
|
|
|
|
### What Exists
|
|
- LTT page at `/` with CSS Grid layout: `grid-rows-[30%_1fr] grid-cols-2`
|
|
- Top-Left: Video placeholder (Phase 2)
|
|
- Top-Right: QueryInput → KeywordsDisplay → IngestPanel (vertical stack)
|
|
- Bottom (full width): ResponsePanel (answer + sources)
|
|
- RAG Database page at `/rag-database` with document management + upload
|
|
- QueryDecomposer extracts keywords (short phrases) from question
|
|
- RAG prompt generates `[1]`, `[2]` style citations in context chunks
|
|
- LLM instructed: "Cite the source name in [ ] for each point"
|
|
- Frontend renders citations as plain text via ReactMarkdown — NO linking
|
|
- Source cards in ResponsePanel show filename, page number, "View PDF" link (opens in new tab = download)
|
|
- ChunkList on RAG Database page has "View PDF" links (also opens in new tab)
|
|
|
|
### What's Missing (Gaps This Plan Fills)
|
|
- Upload button still on LTT page (redundant with RAG Database page)
|
|
- No display of the submitted question after clicking submit
|
|
- Keywords are not useful — user wants decomposed sub-questions
|
|
- LTT layout is fixed (30%/70% split) — not adjustable
|
|
- PDF links download the file instead of previewing in browser
|
|
- Citations like `[1]` are meaningless to users — no traceability to source
|
|
|
|
---
|
|
|
|
## Sub-Phase Breakdown
|
|
|
|
| Sub-Phase | Feature | Difficulty | Backend | Frontend | Status |
|
|
|-----------|---------|-----------|---------|----------|--------|
|
|
| 2.1 | Remove Upload from LTT | ⭐ Easy | None | Remove IngestPanel from LTTPage | ✅ Done |
|
|
| 2.2 | Display Question Near Submit | ⭐ Easy | None | Show submitted question text | ✅ Done |
|
|
| 2.3 | Extracted Questions | ⭐⭐ Medium | Prompt change | Rename + display questions | ✅ Done |
|
|
| 2.4 | Adjustable Split Panel | ⭐⭐ Medium | None | react-resizable-panels | ✅ Done |
|
|
| 2.5 | PDF Viewer (New Tab) | ⭐⭐ Medium | None | PdfViewerPage route + react-pdf | ✅ Done |
|
|
| 2.6 | Inline Citations | ⭐⭐⭐ Hard | Prompt + response | Citation parser + rendering | 📋 Pending |
|
|
|
|
### Dependency Graph
|
|
|
|
```
|
|
2.1 (Remove Upload) ─────┐
|
|
2.2 (Question Display) ──┤
|
|
├─► 2.4 (Resizable Layout) ─┐
|
|
2.3 (Questions) ──────────┤ ├─► 2.6 (Inline Citations)
|
|
│ │
|
|
└─► 2.5 (PDF Viewer) ───────┘
|
|
```
|
|
|
|
- **2.1, 2.2, 2.3** are independent — can run in parallel
|
|
- **2.4** should wait for 2.1 (layout changes after removing IngestPanel)
|
|
- **2.5** is independent but benefits from 2.4's layout work
|
|
- **2.6** is the hardest — benefits from 2.5 (PDF viewer) being available for linked citations
|
|
|
|
---
|
|
|
|
## Sub-Phase 2.1: Remove Upload from LTT ⭐ Easy
|
|
|
|
### Objective
|
|
Remove the IngestPanel from LTT page since document upload now exists on the RAG Database page.
|
|
|
|
### Changes Required
|
|
|
|
**Frontend only** — No backend changes.
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `frontend/src/pages/LTTPage.tsx` | Remove IngestPanel import and component; remove `useIngestDocument` hook |
|
|
|
|
**Current** (LTTPage.tsx):
|
|
```tsx
|
|
import { IngestPanel } from '../components/IngestPanel'
|
|
// ...
|
|
const ingestMutation = useIngestDocument()
|
|
// ...
|
|
<IngestPanel
|
|
onUpload={handleFileUpload}
|
|
isLoading={ingestMutation.isPending}
|
|
success={...}
|
|
error={...}
|
|
/>
|
|
```
|
|
|
|
**After**: Remove all of the above. The top-right section becomes just QueryInput + KeywordsDisplay.
|
|
|
|
### Acceptance Criteria
|
|
- [ ] LTT page no longer shows upload button
|
|
- [ ] RAG Database page upload still works
|
|
- [ ] All existing tests pass (update LTTPage tests if needed)
|
|
|
|
---
|
|
|
|
## Sub-Phase 2.2: Display Question Near Submit ⭐ Easy
|
|
|
|
### Objective
|
|
After the user clicks submit, display the original question text next to/near the submit button so they remember what they asked.
|
|
|
|
### Changes Required
|
|
|
|
**Frontend only** — No backend changes.
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `frontend/src/components/QueryInput.tsx` | Display the last submitted question below the input after submission |
|
|
|
|
**Implementation approach**:
|
|
- After successful submit, show the submitted question as static text below the input area
|
|
- Clear when a new question is being typed
|
|
- Style: subtle gray text, italic, with a small label like "Your question:"
|
|
|
|
```
|
|
┌──────────────────────────────────────────────┐
|
|
│ [textarea: type question here] [Submit] │
|
|
│ │
|
|
│ Your question: "What is the NEC4 clause │
|
|
│ about time extensions?" │
|
|
└──────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Acceptance Criteria
|
|
- [ ] After submit, the submitted question appears below the input
|
|
- [ ] Text clears when user starts typing a new question
|
|
- [ ] Works correctly during loading state
|
|
- [ ] No layout shift or jank
|
|
|
|
---
|
|
|
|
## Sub-Phase 2.3: Extracted Questions ⭐⭐ Medium
|
|
|
|
### Objective
|
|
Rename "Keywords" to "Extracted Questions" and change the backend to decompose the user's question into simplified sub-questions instead of keyword phrases.
|
|
|
|
### Current Behavior
|
|
- Backend `QueryDecomposer.decompose()` prompt: `"Given question: '{question}', extract key search keywords as JSON array"`
|
|
- Returns: `["NEC4", "time extension", "clause"]`
|
|
- Frontend: `KeywordsDisplay` component shows blue pills labeled "Extracted Keywords:"
|
|
|
|
### New Behavior
|
|
- Backend prompt: Decompose into 2-5 simplified sub-questions
|
|
- Returns: `["What are the time extension provisions?", "What notice is required?", "How is extended time calculated?"]`
|
|
- Frontend: Rename to "Extracted Questions:" and display as numbered list
|
|
|
|
### Backend Changes
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `backend/app/services/query_decomposer.py` | Change prompt to generate sub-questions instead of keywords |
|
|
| `backend/app/models/query.py` | Rename `keywords` field to `extracted_questions` (or keep `keywords` but add alias) |
|
|
| `backend/app/routers/query.py` | Update variable naming if model changes |
|
|
|
|
**New prompt** (replace line 54 in query_decomposer.py):
|
|
```python
|
|
prompt = (
|
|
f"Given this question: '{question}'\n\n"
|
|
f"Break it down into 2-5 simplified sub-questions that would help "
|
|
f"search for relevant information. Each sub-question should be short "
|
|
f"and focused on one aspect. Return as a JSON array of strings."
|
|
)
|
|
```
|
|
|
|
**Decision confirmed**: Rename API field from `keywords` to `extracted_questions` (Decision 1B).
|
|
|
|
### Backend Changes
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `backend/app/services/query_decomposer.py` | Change prompt to generate sub-questions instead of keywords |
|
|
| `backend/app/models/query.py` | Rename `keywords: List[str]` to `extracted_questions: List[str]` |
|
|
| `backend/app/routers/query.py` | Update `QueryResponse` usage: `keywords` → `extracted_questions` |
|
|
|
|
### Frontend Changes
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `frontend/src/components/KeywordsDisplay.tsx` | Rename to `ExtractedQuestionsDisplay.tsx`, change pill style to numbered list style |
|
|
| `frontend/src/lib/api.ts` | Update `QueryResponse` type: `keywords` → `extracted_questions` |
|
|
| `frontend/src/lib/queries.tsx` | Update `QueryResponse` type reference |
|
|
| `frontend/src/types/index.ts` | Rename `keywords` to `extracted_questions` in `QueryResponse` |
|
|
| `frontend/src/pages/LTTPage.tsx` | Update prop: `keywords` → `extracted_questions` |
|
|
|
|
**Rename**:
|
|
- "Extracted Keywords:" → "Extracted Questions:"
|
|
- `data-testid="keywords-*"` → `data-testid="extracted-questions-*"`
|
|
- Pill badges → numbered question list with subtle styling
|
|
|
|
### Acceptance Criteria
|
|
- [ ] Backend returns 2-5 sub-questions instead of keywords
|
|
- [ ] Frontend displays "Extracted Questions:" label
|
|
- [ ] Questions display as a numbered list (1. 2. 3.)
|
|
- [ ] Graceful fallback if LLM returns empty list
|
|
- [ ] Existing query pipeline still works (retrieve uses these as search terms)
|
|
|
|
---
|
|
|
|
## Sub-Phase 2.4: Adjustable Split Panel ⭐⭐ Medium
|
|
|
|
### Objective
|
|
Replace the fixed CSS Grid layout on LTT page with a resizable split panel, allowing users to drag the divider between the upper section (video + query) and lower section (response).
|
|
|
|
### New Dependency
|
|
|
|
| Package | Purpose | Weekly Downloads | Notes |
|
|
|---------|---------|------------------|-------|
|
|
| `react-resizable-panels` | Resizable split panels | ~34.7M | By Brian Vaughn (React core team), zero deps, MIT |
|
|
|
|
```bash
|
|
npm install react-resizable-panels
|
|
```
|
|
|
|
### Changes Required
|
|
|
|
**Frontend only** — No backend changes.
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `frontend/src/pages/LTTPage.tsx` | Replace CSS Grid with `PanelGroup` + `Panel` + `PanelResizeHandle` |
|
|
| `frontend/package.json` | Add `react-resizable-panels` |
|
|
|
|
**Current layout** (LTTPage.tsx):
|
|
```tsx
|
|
<div className="h-full grid grid-rows-[30%_1fr] grid-cols-2 bg-gray-50">
|
|
<div className="border-r border-b ...">VideoPlaceholder</div>
|
|
<div className="border-b ...">QueryInput + KeywordsDisplay + IngestPanel</div>
|
|
<div className="col-span-2 ...">ResponsePanel</div>
|
|
</div>
|
|
```
|
|
|
|
**New layout**:
|
|
```tsx
|
|
import { Panel, PanelGroup, PanelResizeHandle } from 'react-resizable-panels'
|
|
|
|
<div className="h-full flex flex-col bg-gray-50">
|
|
<PanelGroup direction="vertical">
|
|
{/* Upper section: video + query */}
|
|
<Panel defaultSize={30} minSize={15} maxSize={60}>
|
|
<div className="h-full grid grid-cols-2">
|
|
<div className="border-r ...">VideoPlaceholder</div>
|
|
<div className="...">QueryInput + KeywordsDisplay</div>
|
|
</div>
|
|
</Panel>
|
|
|
|
{/* Draggable divider */}
|
|
<PanelResizeHandle className="h-2 bg-gray-200 hover:bg-blue-300 cursor-row-resize transition-colors" />
|
|
|
|
{/* Lower section: response */}
|
|
<Panel minSize={20}>
|
|
<div className="h-full overflow-y-auto">
|
|
<ResponsePanel ... />
|
|
</div>
|
|
</Panel>
|
|
</PanelGroup>
|
|
</div>
|
|
```
|
|
|
|
**Design decisions**:
|
|
- **Default split**: 30% upper / 70% lower (same as current fixed layout)
|
|
- **Min/Max**: Upper panel 15-60%, lower panel 20%+ (prevents collapsing)
|
|
- **Handle style**: 8px horizontal bar, gray by default, blue on hover, cursor row-resize
|
|
- **Persistence**: Optionally save split ratio to localStorage (nice-to-have)
|
|
|
|
### Acceptance Criteria
|
|
- [ ] LTT page has a draggable divider between upper and lower sections
|
|
- [ ] Default split matches current 30/70 layout
|
|
- [ ] Divider is visually obvious and provides cursor feedback on hover
|
|
- [ ] Both sections enforce minimum sizes (no collapsing)
|
|
- [ ] Content in each section scrolls independently
|
|
- [ ] Layout works on window resize
|
|
|
|
---
|
|
|
|
## Sub-Phase 2.5: PDF Viewer in New Tab ⭐⭐ Medium
|
|
|
|
### Objective
|
|
Preview PDF files in-browser via a dedicated viewer page that opens in a new browser tab, on both the LTT page (source cards) and RAG Database page (chunk list).
|
|
|
|
### Design Decision (Confirmed)
|
|
PDF viewer is a **dedicated route** (`/pdf-viewer`) that opens in a new browser tab. This avoids modal state management and provides a clean full-page PDF viewing experience. All "View PDF" links and inline citations open this page via `target="_blank"`.
|
|
|
|
### New Dependencies
|
|
|
|
| Package | Purpose | Weekly Downloads | Notes |
|
|
|---------|---------|------------------|-------|
|
|
| `react-pdf` | PDF rendering via PDF.js | ~1M | MIT, lightweight (309KB), Vite compatible |
|
|
| `pdfjs-dist` | PDF.js worker | (peer dep) | Required by react-pdf |
|
|
|
|
```bash
|
|
npm install react-pdf pdfjs-dist
|
|
```
|
|
|
|
### Changes Required
|
|
|
|
**Frontend only** — No backend changes (the existing `GET /chunks/{file_path}/pdf` endpoint serves PDFs).
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `frontend/src/pages/PdfViewerPage.tsx` | **NEW** — Dedicated full-page PDF viewer route |
|
|
| `frontend/src/App.tsx` | Add `/pdf-viewer` route |
|
|
| `frontend/src/components/ResponsePanel.tsx` | "View PDF" links point to `/pdf-viewer?url=...&page=N` with `target="_blank"` |
|
|
| `frontend/src/components/ChunkList.tsx` | Same as ResponsePanel |
|
|
| `frontend/src/lib/api.ts` | Add `getPdfViewerUrl()` helper |
|
|
| `frontend/vite.config.ts` | Configure PDF.js worker |
|
|
| `frontend/package.json` | Add react-pdf, pdfjs-dist |
|
|
|
|
**PdfViewerPage design**:
|
|
```
|
|
┌──────────────────────────────────────────────────────┐
|
|
│ ◀ Back to LTT NEC4 ACC — Page 3 of 97 ▶ │
|
|
├──────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────────────┐ │
|
|
│ │ │ │
|
|
│ │ PDF rendered │ │
|
|
│ │ page content │ │
|
|
│ │ │ │
|
|
│ │ │ │
|
|
│ └─────────────────────┘ │
|
|
│ │
|
|
│ [Zoom -] 100% [Zoom +] │
|
|
└──────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Query params**: `?url=<encoded PDF URL>&page=<page number>&title=<filename>`
|
|
|
|
**Implementation**:
|
|
- Standalone page at `/pdf-viewer` route (not embedded in main layout)
|
|
- Uses `react-pdf` `<Document>` + `<Page>` components
|
|
- PDF URL from query param, validated against backend endpoint
|
|
- Page navigation controls (prev/next) with page number display
|
|
- Zoom controls (+/-)
|
|
- Title shows filename from query param
|
|
- "Back to LTT" link to return to the app
|
|
- No NavBar on this page (clean viewer experience)
|
|
|
|
**Integration points**:
|
|
- `ResponsePanel.tsx`: Change `href` from `getChunkPdfUrl()` to `getPdfViewerUrl()` with `target="_blank"`
|
|
- `ChunkList.tsx`: Same change
|
|
- Inline citations (Sub-Phase 2.6): Same URL pattern
|
|
|
|
### Acceptance Criteria
|
|
- [ ] Clicking "View PDF" opens a new browser tab with the PDF viewer page
|
|
- [ ] PDF renders correctly with original formatting preserved
|
|
- [ ] Page navigation (prev/next) works for multi-page PDFs
|
|
- [ ] Zoom controls work
|
|
- [ ] Page title shows filename and page number
|
|
- [ ] "Back to LTT" link returns to the app
|
|
- [ ] Works from both LTT and RAG Database pages
|
|
- [ ] URL query params properly encode/decode special characters
|
|
- [ ] PDF.js worker loads correctly in Vite build
|
|
|
|
---
|
|
|
|
## Sub-Phase 2.6: Inline Citations with Clickable Links ⭐⭐⭐ Hard
|
|
|
|
### Objective
|
|
Replace the current `[1]`, `[2]` citation format in RAG responses with `[filename, page N]` format that includes a clickable link to the source chunk PDF.
|
|
|
|
### Current Citation Flow
|
|
1. **Backend** (`rag.py`): Context chunks labeled `[1]`, `[2]`, `[3]`...
|
|
2. **LLM prompt**: "Cite the source name in [ ] for each point"
|
|
3. **LLM output**: Answer text contains `[1]`, `[NEC4 ACC]`, or similar bracketed citations
|
|
4. **Frontend**: ReactMarkdown renders citations as plain text — no linking
|
|
|
|
### New Citation Flow
|
|
1. **Backend**: Context chunks labeled `[filename, page N]` with strict format
|
|
2. **LLM prompt**: "Cite sources as `[filename, page N]`" (strict — Decision 3A)
|
|
3. **LLM output**: Answer text contains `[NEC4 ACC.pdf, page 3]` format citations
|
|
4. **Frontend**: Custom ReactMarkdown renderer parses strict citation patterns, replaces with clickable links
|
|
5. **Click**: Opens PDF viewer page in new tab (Decision 4B) at the cited page
|
|
|
|
### Backend Changes
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `backend/app/services/rag.py` | Change context chunk labeling and citation instruction in prompt |
|
|
| `backend/app/models/common.py` | Add `chunk_index` mapping info if needed |
|
|
| `backend/app/models/query.py` | (Possibly) Add citation map to response |
|
|
|
|
**`rag.py` — Context building change** (lines 94-102):
|
|
|
|
Current:
|
|
```python
|
|
context_parts.append(
|
|
f"[{i + 1}] Source: {source}\n"
|
|
f"Summary: {summary}\n"
|
|
f"Content: {chunk}\n"
|
|
)
|
|
```
|
|
|
|
New:
|
|
```python
|
|
source_name = meta.get("filename", "unknown")
|
|
page_num = meta.get("page_number")
|
|
citation_label = f"{source_name}, page {page_num}" if page_num else source_name
|
|
|
|
context_parts.append(
|
|
f"[{citation_label}] Source: {source_name}\n"
|
|
f"Summary: {summary}\n"
|
|
f"Content: {chunk}\n"
|
|
)
|
|
```
|
|
|
|
**`rag.py` — Prompt change** (lines 106-113):
|
|
|
|
Current:
|
|
```python
|
|
f"Cite the source name in [ ] for each point.\n\n"
|
|
```
|
|
|
|
New:
|
|
```python
|
|
f"Cite your sources inline using the exact bracket labels provided, "
|
|
f"e.g. [filename, page N]. Place the citation at the end of each relevant point.\n\n"
|
|
```
|
|
|
|
**Important design decision**: We need to pass citation metadata (filename → chunk_file_path mapping) to the frontend so it can create clickable links. Two approaches:
|
|
|
|
- **Option A**: Include `chunk_file_path` in the source metadata (already done!) — frontend parses `[filename, page N]` from answer text, looks up matching source in `sources[]` array to get `chunk_file_path`
|
|
- **Option B**: Add a `citation_map: Dict[str, CitationInfo]` to `QueryResponse` — more structured but adds complexity
|
|
|
|
**Recommendation**: Option A — simpler, uses existing `sources[]` data. Frontend regex matches `[filename, page N]` patterns in answer text and cross-references with `sources` array by filename + page_number. Links open in new tab to PDF viewer page (Decision 4B).
|
|
|
|
### Frontend Changes
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `frontend/src/components/ResponsePanel.tsx` | Custom markdown renderer that replaces citation patterns with clickable links |
|
|
| `frontend/src/utils/citationParser.ts` | **NEW** — Regex parser for citation patterns |
|
|
|
|
**Citation parser** (`citationParser.ts`):
|
|
```typescript
|
|
// Matches patterns like [NEC4 ACC.pdf, page 3] or [meeting_notes.docx]
|
|
const CITATION_PATTERN = /\[([^\]]+?(?:,\s*page\s+\d+)?)\]/g
|
|
|
|
interface ParsedCitation {
|
|
label: string // "NEC4 ACC.pdf, page 3"
|
|
filename: string // "NEC4 ACC.pdf"
|
|
pageNumber: number | null // 3
|
|
chunkFilePath: string | null // from sources lookup
|
|
}
|
|
```
|
|
|
|
**Rendering approach**:
|
|
- Use ReactMarkdown's `components` prop with a custom text renderer
|
|
- Intercept text nodes, split on citation pattern (strict format — Decision 3A)
|
|
- Replace `[filename, page N]` with `<a>` elements opening PDF viewer in new tab
|
|
- URL format: `/pdf-viewer?url={chunkPdfUrl}&page={N}&title={filename}`
|
|
- Fallback: If no matching source found, render as plain text (graceful degradation)
|
|
|
|
```
|
|
Example rendered answer:
|
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
• The estimated value threshold is HK$1,000,000
|
|
[NEC4 ACC.pdf, page 3] ←── clickable link (opens new tab)
|
|
• Prior instructions from the Client may override
|
|
this threshold [NEC4 ACC.pdf, page 5] ←── clickable
|
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
```
|
|
|
|
### Acceptance Criteria
|
|
- [ ] Backend context labels use `[filename, page N]` format instead of `[1]`
|
|
- [ ] LLM prompt instructs to use the new citation format
|
|
- [ ] Frontend parses `[filename, page N]` patterns from answer text
|
|
- [ ] Parsed citations are rendered as clickable links (blue, underlined)
|
|
- [ ] Clicking a citation opens the PDF viewer page in a new browser tab at the cited page
|
|
- [ ] If citation can't be matched to a source, renders as plain text (no broken links)
|
|
- [ ] The Sources section below the answer still shows full source cards
|
|
- [ ] Citation pattern handles: spaces in filenames, missing page numbers (DOCX), long filenames
|
|
|
|
### Risks & Mitigations
|
|
|
|
| Risk | Impact | Mitigation |
|
|
|------|--------|------------|
|
|
| LLM doesn't follow citation format exactly | Citations not parsed | Use flexible regex; fallback to plain text for unrecognized patterns |
|
|
| LLM invents citations not in sources | Clickable link to non-existent source | Only linkify citations that match an actual source; show others as text |
|
|
| Filenames with special characters break regex | Citation not detected | Use lazy regex matching; test with real filenames (spaces, dots, hyphens) |
|
|
| Long filenames make answer text hard to read | Poor UX | Optionally truncate filename in display (e.g., "NEC4 AC...pdf, page 3") |
|
|
| Page number is null (DOCX files) | Citation shows just filename | Show `[meeting_notes.docx]` without page number; still clickable for text preview |
|
|
|
|
---
|
|
|
|
## New Dependencies
|
|
|
|
### Frontend
|
|
| Package | Purpose | Version | Size |
|
|
|---------|---------|---------|------|
|
|
| `react-resizable-panels` | Resizable split panels (2.4) | `^4.10.0` | ~517KB |
|
|
| `react-pdf` | PDF rendering via PDF.js (2.5) | `^10.4.1` | ~309KB |
|
|
| `pdfjs-dist` | PDF.js worker (peer dep of react-pdf) | `^5.3.31` | — |
|
|
|
|
### Backend
|
|
| Package | Purpose | Already installed? |
|
|
|---------|---------|--------------------|
|
|
| (none) | All backend changes use existing LLM client | ✅ |
|
|
|
|
---
|
|
|
|
## Implementation Sequence
|
|
|
|
### Recommended Order (difficulty-first)
|
|
|
|
```
|
|
Phase 2.1 (Remove Upload) ─┐
|
|
Phase 2.2 (Question Display) ─┤── Can run in PARALLEL
|
|
Phase 2.3 (Questions) ─┘
|
|
│
|
|
▼
|
|
Phase 2.4 (Resizable Layout) ─── After 2.1 (layout changes)
|
|
│
|
|
▼
|
|
Phase 2.5 (PDF Viewer) ───────── After 2.4 (uses new layout)
|
|
│
|
|
▼
|
|
Phase 2.6 (Inline Citations) ─── After 2.5 (links to PDF viewer)
|
|
```
|
|
|
|
### Parallelization Opportunities
|
|
|
|
- **2.1, 2.2, 2.3** can run in parallel (independent frontend/backend changes)
|
|
- **2.4** depends on 2.1 (layout changes affect same file)
|
|
- **2.5** depends on 2.4 (viewer page within new layout)
|
|
- **2.6** depends on 2.5 (citations link to PDF viewer page)
|
|
|
|
---
|
|
|
|
## Test Plan
|
|
|
|
### Backend Tests (New/Modified Files)
|
|
|
|
| File | Coverage |
|
|
|------|----------|
|
|
| `test_phase1_query_decomposer.py` | Update: verify sub-question generation instead of keywords |
|
|
|
|
### Frontend Tests (New/Modified Files)
|
|
|
|
| File | Coverage |
|
|
|------|----------|
|
|
| `QueryInput.test.tsx` | Submitted question display, clear on new input |
|
|
| `KeywordsDisplay.test.tsx` | Update: "Extracted Questions" label, numbered list rendering |
|
|
| `LTTPage.test.tsx` | Update: no IngestPanel, resizable layout |
|
|
| `PdfViewerPage.test.tsx` | **NEW** — PDF rendering, page nav, zoom, query params |
|
|
| `citationParser.test.ts` | **NEW** — Citation regex parsing, edge cases (spaces, nulls, special chars) |
|
|
| `ResponsePanel.test.tsx` | Update: inline citation rendering, clickable links |
|
|
|
|
### Acceptance Tests
|
|
|
|
| File | Coverage |
|
|
|------|----------|
|
|
| `test_acceptance_package2_questions.py` | Real LLM generates sub-questions from complex question |
|
|
| `test_acceptance_package2_citations.py` | Full flow: upload → query → answer has `[filename, page N]` citations |
|
|
|
|
---
|
|
|
|
## Decisions (Confirmed)
|
|
|
|
| # | Question | Decision |
|
|
|---|----------|----------|
|
|
| 1 | API field name for extracted questions | **B**: Rename to `extracted_questions` — cleaner naming, breaking change acceptable |
|
|
| 2 | PDF viewer integration | **New browser tab** — Dedicated viewer page opens in a new tab with react-pdf rendering |
|
|
| 3 | Citation pattern strictness | **A**: Strict `[filename, page N]` — enforce exact format in backend prompt and frontend parser |
|
|
| 4 | Citation link target | **B**: Open new tab — same dedicated PDF viewer page as "View PDF" links |
|
|
| 5 | Resizable panels persistence | **B**: Always default 30/70 — no localStorage, keep simple |
|
|
| 6 | Citation display format in answer | **A**: Full `[filename, page N]` — most traceable for users |
|
|
|
|
### Decision 2 & 4 Implications
|
|
|
|
PDF viewer is a **dedicated route page** (`/pdf-viewer`) that opens in a new browser tab. This means:
|
|
- No modal component or state management needed
|
|
- No context/callback props threaded through components
|
|
- All PDF links (View PDF, inline citations) use `target="_blank"` pointing to the viewer page
|
|
- Viewer page receives PDF URL, page number, and title via query params
|
|
- `react-pdf` renders directly on this standalone page
|
|
|
|
## Open Questions
|
|
|
|
None — all decisions confirmed.
|