feat(frontend): add nav bar with routing, markdown rendering, and enhancement plan
- Add react-router-dom with NavBar component (LTT + RAG Database tabs) - Extract AppContent into LTTPage, add RAGDatabasePage placeholder - Refactor App.tsx to BrowserRouter + Routes layout - Switch ResponsePanel to react-markdown for rich formatting - Fix ResponsePanel test for markdown rendering - Update RAG prompt to cite source name instead of number - Save Phase 1 enhancement plan (.plans/phase1_enhancement_plan.md)
This commit is contained in:
parent
029a0e490f
commit
52c09b86cb
|
|
@ -0,0 +1,637 @@
|
||||||
|
# Phase 1 Enhancement Plan
|
||||||
|
|
||||||
|
**Source**: User request (2026-04-23)
|
||||||
|
**Scope**: Frontend navigation + RAG Database management page + page-aware chunking with chunk PDFs
|
||||||
|
**Status**: 🔄 In Progress — Feature 1 ✅ Complete, Features 2-3 pending
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
|
||||||
|
Enhance the existing Phase 1 application with three features:
|
||||||
|
|
||||||
|
1. **Navigation Bar** — Top nav bar with two pages: "LTT" (current query page) and "RAG Database"
|
||||||
|
2. **RAG Database Page** — View/manage ChromaDB documents (list, delete, upload)
|
||||||
|
3. **Page-Aware Chunking** — Chunks tagged with page numbers, saved as PDFs in `document_chunk/`, with clickable links in RAG responses
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current State (Pre-Enhancement)
|
||||||
|
|
||||||
|
### Current State (Pre-Enhancement)
|
||||||
|
|
||||||
|
### What Exists
|
||||||
|
- Multi-page React app with react-router-dom routing
|
||||||
|
- Nav bar with "LTT" and "RAG Database" tabs
|
||||||
|
- "LTT" page at `/` with current query interface
|
||||||
|
- "RAG Database" placeholder page at `/rag-database`
|
||||||
|
- 2 API endpoints: `POST /api/v1/ingest`, `POST /api/v1/query`
|
||||||
|
- Flat chunking: PDF text extracted page-by-page but concatenated into one string before chunking
|
||||||
|
- Metadata per chunk: `filename`, `upload_date`, `content_summary`, `chunk_index`
|
||||||
|
- ChromaDB collection `documents` with UUID-based IDs (`{document_id}_{chunk_index}`)
|
||||||
|
- Frontend: pages in `pages/`, components in `components/`, TanStack Query, react-markdown
|
||||||
|
|
||||||
|
### What's Missing (Gaps This Plan Fills)
|
||||||
|
- ~~No routing or multi-page support~~ ✅ Done in Feature 1
|
||||||
|
- No way to view what's stored in ChromaDB
|
||||||
|
- No way to delete documents or chunks
|
||||||
|
- No page-level awareness in chunking (all pages flattened before token splitting)
|
||||||
|
- No persistent chunk files (chunks only exist as ChromaDB document text)
|
||||||
|
- No clickable links in RAG responses to view source chunks
|
||||||
|
- ~~Upload only via IngestPanel on the main query page~~ (IngestPanel stays on LTT, upload also coming to RAG DB page)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Feature 1: Navigation Bar & Multi-Page Layout ✅ COMPLETE
|
||||||
|
|
||||||
|
**Completed**: 2026-04-23
|
||||||
|
|
||||||
|
### 1.1 Changes Required
|
||||||
|
|
||||||
|
**Frontend**:
|
||||||
|
- ~~Install `react-router-dom`~~ ✅
|
||||||
|
- ~~Create `frontend/src/components/NavBar.tsx` — top navigation bar~~ ✅
|
||||||
|
- ~~Create `frontend/src/pages/LTTPage.tsx` — move current App.tsx content here~~ ✅
|
||||||
|
- ~~Create `frontend/src/pages/RAGDatabasePage.tsx` — placeholder, fleshed out in Feature 2~~ ✅
|
||||||
|
- ~~Refactor `frontend/src/App.tsx` — Router + NavBar + route definitions~~ ✅
|
||||||
|
|
||||||
|
**Backend**: None
|
||||||
|
|
||||||
|
### 1.2 Nav Bar Design
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────┐
|
||||||
|
│ 🔍 LTT (active) │ 📚 RAG Database │ ← top nav bar (fixed)
|
||||||
|
├─────────────────────────────────────────────────┤
|
||||||
|
│ │
|
||||||
|
│ [Page content: LTT or RAG Database] │
|
||||||
|
│ │
|
||||||
|
└─────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
- Fixed top bar, full width
|
||||||
|
- Two tabs: "LTT" (current query page) and "RAG Database"
|
||||||
|
- Active tab highlighted
|
||||||
|
- "LTT" route: `/` (default)
|
||||||
|
- "RAG Database" route: `/rag-database`
|
||||||
|
|
||||||
|
### 1.3 Implementation Tasks
|
||||||
|
|
||||||
|
| Task | Description | Files | Status |
|
||||||
|
|------|-------------|-------|--------|
|
||||||
|
| Install react-router-dom | `npm install react-router-dom` | `package.json` | ✅ |
|
||||||
|
| Create NavBar component | Horizontal nav with two links | `frontend/src/components/NavBar.tsx` | ✅ |
|
||||||
|
| Create LTTPage | Extract current AppContent into page component | `frontend/src/pages/LTTPage.tsx` | ✅ |
|
||||||
|
| Create RAGDatabasePage | Placeholder page (scaffold for Feature 2) | `frontend/src/pages/RAGDatabasePage.tsx` | ✅ |
|
||||||
|
| Refactor App.tsx | BrowserRouter + Routes + NavBar wrapper | `frontend/src/App.tsx` | ✅ |
|
||||||
|
|
||||||
|
### 1.4 Acceptance Criteria
|
||||||
|
|
||||||
|
- [x] Nav bar visible at top of every page
|
||||||
|
- [x] Clicking "LTT" navigates to `/` and shows current query interface
|
||||||
|
- [x] Clicking "RAG Database" navigates to `/rag-database`
|
||||||
|
- [x] Current page highlighted in nav bar
|
||||||
|
- [x] All existing functionality preserved (query, ingest, response display)
|
||||||
|
- [x] Build passes, no TypeScript errors
|
||||||
|
- [x] 62/62 frontend tests pass
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Feature 2: RAG Database Management Page
|
||||||
|
|
||||||
|
### 2.1 Overview
|
||||||
|
|
||||||
|
A dedicated page to view and manage all documents/chunks stored in ChromaDB.
|
||||||
|
|
||||||
|
### 2.2 Backend Changes
|
||||||
|
|
||||||
|
**New API Endpoints**:
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `GET` | `/api/v1/documents` | List all documents with chunk counts |
|
||||||
|
| `GET` | `/api/v1/documents/{document_id}/chunks` | List all chunks for a document |
|
||||||
|
| `DELETE` | `/api/v1/documents/{document_id}` | Delete all chunks for a document |
|
||||||
|
| `DELETE` | `/api/v1/chunks/{chunk_id}` | Delete a single chunk |
|
||||||
|
|
||||||
|
**New/Modified Files**:
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `backend/app/routers/documents.py` | **NEW** — CRUD endpoints for documents/chunks |
|
||||||
|
| `backend/app/services/rag.py` | Add `list_documents()`, `list_chunks()`, `delete_document()`, `delete_chunk()` methods |
|
||||||
|
| `backend/app/models/documents.py` | **NEW** — Pydantic schemas for document/chunk listing |
|
||||||
|
| `backend/app/main.py` | Register `documents` router |
|
||||||
|
|
||||||
|
**New Pydantic Schemas** (`models/documents.py`):
|
||||||
|
|
||||||
|
```python
|
||||||
|
class DocumentInfo(BaseModel):
|
||||||
|
document_id: str
|
||||||
|
filename: str
|
||||||
|
chunk_count: int
|
||||||
|
upload_date: str
|
||||||
|
|
||||||
|
class ChunkInfo(BaseModel):
|
||||||
|
chunk_id: str
|
||||||
|
chunk_index: int
|
||||||
|
content_summary: str
|
||||||
|
page_number: int | None = None # Added by Feature 3
|
||||||
|
chunk_file_path: str | None = None # Added by Feature 3
|
||||||
|
|
||||||
|
class DocumentListResponse(BaseModel):
|
||||||
|
documents: List[DocumentInfo]
|
||||||
|
total_documents: int
|
||||||
|
total_chunks: int
|
||||||
|
|
||||||
|
class DeleteResponse(BaseModel):
|
||||||
|
deleted: bool
|
||||||
|
message: str
|
||||||
|
```
|
||||||
|
|
||||||
|
**Implementation Notes for ChromaDB Operations**:
|
||||||
|
|
||||||
|
- `list_documents()`: ChromaDB has no native "group by document" — need to `collection.get(include=["metadatas"])`, then group by `filename` and extract `document_id` from chunk IDs (format: `{document_id}_{chunk_index}`)
|
||||||
|
- `delete_document()`: Use `collection.delete(where={"filename": "..."})` or collect all chunk IDs matching the document_id prefix and call `collection.delete(ids=[...])`
|
||||||
|
- `delete_chunk()`: Use `collection.delete(ids=[chunk_id])`
|
||||||
|
- **Important**: When deleting a document, also clean up associated chunk PDF files from `document_chunk/` (Feature 3)
|
||||||
|
|
||||||
|
### 2.3 Frontend Changes
|
||||||
|
|
||||||
|
**New/Modified Files**:
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `frontend/src/pages/RAGDatabasePage.tsx` | Full implementation |
|
||||||
|
| `frontend/src/components/DocumentList.tsx` | **NEW** — Document table/cards |
|
||||||
|
| `frontend/src/components/ChunkList.tsx` | **NEW** — Chunk table for selected document |
|
||||||
|
| `frontend/src/components/DocumentUpload.tsx` | **NEW** — Upload form (can reuse IngestPanel logic) |
|
||||||
|
| `frontend/src/lib/api.ts` | Add `listDocuments()`, `deleteDocument()`, `deleteChunk()` |
|
||||||
|
| `frontend/src/lib/queries.tsx` | Add TanStack Query hooks for new endpoints |
|
||||||
|
| `frontend/src/types/index.ts` | Add `DocumentInfo`, `ChunkInfo`, `DeleteResponse` types |
|
||||||
|
|
||||||
|
### 2.4 Page Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────────────────────────────────────┐
|
||||||
|
│ RAG Database [Upload] │
|
||||||
|
├──────────────────────────────────────────────────┤
|
||||||
|
│ Total: 5 documents, 342 chunks │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────────────────────────────────────────┐ │
|
||||||
|
│ │ 📄 NEC4 ACC.pdf │ 101 chunks │ 2026-04-23 │ │
|
||||||
|
│ │ [View Chunks] [Delete] │ │
|
||||||
|
│ ├──────────────────────────────────────────────┤ │
|
||||||
|
│ │ 📄 meeting_notes.docx │ 45 chunks │ 2026-04-22│ │
|
||||||
|
│ │ [View Chunks] [Delete] │ │
|
||||||
|
│ ├──────────────────────────────────────────────┤ │
|
||||||
|
│ │ 📄 budget_report.txt │ 28 chunks │ 2026-04-21│ │
|
||||||
|
│ │ [View Chunks] [Delete] │ │
|
||||||
|
│ └──────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ [Expanded chunk view when "View Chunks" clicked] │
|
||||||
|
│ ┌──────────────────────────────────────────────┐ │
|
||||||
|
│ │ Chunk 0 │ p.3 │ "Discussion of budget..." │ │
|
||||||
|
│ │ │ [View PDF] [Delete Chunk] │ │
|
||||||
|
│ ├──────────────────────────────────────────────┤ │
|
||||||
|
│ │ Chunk 1 │ p.4 │ "Allocation for Q4..." │ │
|
||||||
|
│ │ │ [View PDF] [Delete Chunk] │ │
|
||||||
|
│ └──────────────────────────────────────────────┘ │
|
||||||
|
└──────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.5 Acceptance Criteria
|
||||||
|
|
||||||
|
- [ ] `GET /api/v1/documents` returns all documents with chunk counts
|
||||||
|
- [ ] `DELETE /api/v1/documents/{document_id}` removes all chunks from ChromaDB + associated chunk PDFs
|
||||||
|
- [ ] `DELETE /api/v1/chunks/{chunk_id}` removes a single chunk
|
||||||
|
- [ ] RAG Database page shows all documents with chunk counts
|
||||||
|
- [ ] User can expand a document to see its chunks
|
||||||
|
- [ ] User can delete a document (with confirmation)
|
||||||
|
- [ ] User can delete individual chunks (with confirmation)
|
||||||
|
- [ ] User can upload documents from this page
|
||||||
|
- [ ] Stats displayed: total documents, total chunks
|
||||||
|
- [ ] Uploading a file with existing filename triggers automatic replacement (old data deleted first)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Feature 3: Page-Aware Chunking & Chunk PDF Storage
|
||||||
|
|
||||||
|
### 3.1 Overview
|
||||||
|
|
||||||
|
When a document is uploaded:
|
||||||
|
1. Parse it page-by-page (PDF) or section-by-section (DOCX)
|
||||||
|
2. Each chunk is tagged with its source page number
|
||||||
|
3. Each chunk's source page is saved as a PDF in `document_chunk/`
|
||||||
|
4. RAG responses include clickable links to the chunk PDF
|
||||||
|
|
||||||
|
### 3.2 Backend Changes
|
||||||
|
|
||||||
|
#### 3.2.1 Page-Aware PDF Parsing
|
||||||
|
|
||||||
|
**Current**: `parse_pdf()` concatenates all pages into one string, losing page boundaries.
|
||||||
|
|
||||||
|
**New**: `parse_pdf_by_page()` returns `List[Tuple[int, str]]` — list of (page_number, page_text) tuples.
|
||||||
|
|
||||||
|
**Modified Files**:
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `backend/app/utils/pdf_parser.py` | Add `parse_pdf_by_page()` function |
|
||||||
|
|
||||||
|
```python
|
||||||
|
def parse_pdf_by_page(file_path: str) -> List[Tuple[int, str]]:
|
||||||
|
"""Parse PDF and return per-page text with page numbers (1-indexed)."""
|
||||||
|
reader = PdfReader(file_path)
|
||||||
|
pages = []
|
||||||
|
for i, page in enumerate(reader.pages, start=1):
|
||||||
|
text = page.extract_text()
|
||||||
|
if text and text.strip():
|
||||||
|
pages.append((i, text.strip()))
|
||||||
|
return pages
|
||||||
|
```
|
||||||
|
|
||||||
|
**DOCX Note**: DOCX files don't have true page numbers. For DOCX, we can use paragraph-based indexing or skip page tracking. Suggested approach: chunk DOCX normally, set `page_number = None` in metadata.
|
||||||
|
|
||||||
|
#### 3.2.2 Page-Aware Chunking
|
||||||
|
|
||||||
|
**Current**: `TokenChunkingStrategy.chunk(text)` takes a flat string and splits by tokens.
|
||||||
|
|
||||||
|
**New**: Page-as-chunk-unit with overlap context from adjacent pages.
|
||||||
|
|
||||||
|
**Chunking Algorithm (confirmed)**:
|
||||||
|
```
|
||||||
|
For page N (1-indexed):
|
||||||
|
overlap_before = last 200 tokens of page N-1 text (or empty if page 1)
|
||||||
|
overlap_after = first 200 tokens of page N+1 text (or empty if last page)
|
||||||
|
chunk_text = overlap_before + page_N_text + overlap_after
|
||||||
|
```
|
||||||
|
|
||||||
|
- One chunk per page — **never split** a page even if it exceeds 1000 tokens
|
||||||
|
- Overlap provides surrounding context for better embedding/retrieval
|
||||||
|
- The `page_number` metadata always refers to the main page (N), not the overlap pages
|
||||||
|
|
||||||
|
**Modified Files**:
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `backend/app/utils/chunking.py` | Add `chunk_pages()` method to `TokenChunkingStrategy` |
|
||||||
|
|
||||||
|
```python
|
||||||
|
def chunk_pages(
|
||||||
|
self, pages: List[Tuple[int, str]], overlap_tokens: int = 200
|
||||||
|
) -> List[Tuple[str, int]]:
|
||||||
|
"""Chunk page-segmented text with overlap from adjacent pages.
|
||||||
|
|
||||||
|
For each page, creates one chunk containing:
|
||||||
|
[last overlap_tokens of previous page] + [full current page] + [first overlap_tokens of next page]
|
||||||
|
|
||||||
|
Args:
|
||||||
|
pages: List of (page_number, page_text) tuples. 1-indexed.
|
||||||
|
overlap_tokens: Number of tokens to include from adjacent pages.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of (chunk_text, page_number) tuples. One chunk per page.
|
||||||
|
"""
|
||||||
|
if not pages:
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Tokenize all pages upfront
|
||||||
|
tokenized = []
|
||||||
|
for page_num, page_text in pages:
|
||||||
|
tokens = self._encoding.encode(page_text)
|
||||||
|
tokenized.append((page_num, tokens, page_text))
|
||||||
|
|
||||||
|
chunks = []
|
||||||
|
for i, (page_num, tokens, _text) in enumerate(tokenized):
|
||||||
|
parts = []
|
||||||
|
|
||||||
|
# Overlap from previous page (last N tokens)
|
||||||
|
if i > 0:
|
||||||
|
prev_tokens = tokenized[i - 1][1]
|
||||||
|
overlap = prev_tokens[-overlap_tokens:] if len(prev_tokens) >= overlap_tokens else prev_tokens
|
||||||
|
if overlap:
|
||||||
|
parts.append(self._encoding.decode(overlap))
|
||||||
|
|
||||||
|
# Full current page text (use original text, not re-decoded)
|
||||||
|
parts.append(pages[i][1])
|
||||||
|
|
||||||
|
# Overlap from next page (first N tokens)
|
||||||
|
if i < len(tokenized) - 1:
|
||||||
|
next_tokens = tokenized[i + 1][1]
|
||||||
|
overlap = next_tokens[:overlap_tokens] if len(next_tokens) >= overlap_tokens else next_tokens
|
||||||
|
if overlap:
|
||||||
|
parts.append(self._encoding.decode(overlap))
|
||||||
|
|
||||||
|
chunk_text = "\n".join(parts)
|
||||||
|
chunks.append((chunk_text, page_num))
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3.2.3 Chunk PDF Generation & Storage
|
||||||
|
|
||||||
|
**New directory**: `document_chunk/` (at project root, alongside `chroma_db/`)
|
||||||
|
|
||||||
|
**Naming convention**: `{original_filename_without_ext}_page_{page_number}.pdf`
|
||||||
|
|
||||||
|
Example: `NEC4 ACC_page_3.pdf`
|
||||||
|
|
||||||
|
**One file per page** — multiple chunks never exist for a page (decision: never split a page), so deduplication is not needed.
|
||||||
|
|
||||||
|
**Content**: The actual page extracted from the source PDF — preserves original formatting, layout, tables, images. Not a generated text PDF.
|
||||||
|
|
||||||
|
**Modified/New Files**:
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `backend/app/utils/pdf_extractor.py` | **NEW** — Extract and save individual PDF pages |
|
||||||
|
| `backend/app/core/config.py` | Add `DOCUMENT_CHUNK_PATH` setting (default: `./document_chunk`) |
|
||||||
|
|
||||||
|
```python
|
||||||
|
# pdf_extractor.py
|
||||||
|
from pypdf import PdfReader, PdfWriter
|
||||||
|
|
||||||
|
def extract_page_as_pdf(source_path: str, page_number: int, output_path: str) -> str:
|
||||||
|
"""Extract a single page from a PDF and save as a new PDF file.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
source_path: Path to original PDF
|
||||||
|
page_number: 1-indexed page number
|
||||||
|
output_path: Where to save the extracted page PDF
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
The output_path of the saved PDF
|
||||||
|
"""
|
||||||
|
reader = PdfReader(source_path)
|
||||||
|
writer = PdfWriter()
|
||||||
|
writer.add_page(reader.pages[page_number - 1]) # 0-indexed in reader
|
||||||
|
with open(output_path, "wb") as f:
|
||||||
|
writer.write(f)
|
||||||
|
return output_path
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: For DOCX files, chunk PDF generation is skipped (set `chunk_file_path = None` in metadata). Only PDFs support page extraction.
|
||||||
|
|
||||||
|
#### 3.2.4 Enhanced Metadata
|
||||||
|
|
||||||
|
**Current metadata**:
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"filename": "report.pdf",
|
||||||
|
"upload_date": "2026-04-23T...",
|
||||||
|
"content_summary": "First 200 chars...",
|
||||||
|
"chunk_index": 0,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Enhanced metadata**:
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"filename": "report.pdf",
|
||||||
|
"upload_date": "2026-04-23T...",
|
||||||
|
"content_summary": "First 200 chars...",
|
||||||
|
"chunk_index": 0,
|
||||||
|
"page_number": 3, # NEW
|
||||||
|
"chunk_file_path": "report_page_3.pdf", # NEW (relative path)
|
||||||
|
"document_id": "uuid-string", # NEW (for grouping)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Modified Files**:
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `backend/app/utils/metadata.py` | Add `page_number`, `chunk_file_path`, `document_id` to metadata |
|
||||||
|
| `backend/app/models/common.py` | Add new fields to `SourceMetadata` |
|
||||||
|
|
||||||
|
#### 3.2.5 Chunk File Serving Endpoint
|
||||||
|
|
||||||
|
**New endpoint**:
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `GET` | `/api/v1/chunks/{file_path}/pdf` | Serve chunk PDF file |
|
||||||
|
|
||||||
|
**Modified Files**:
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `backend/app/routers/documents.py` | Add `GET /chunks/{file_path}/pdf` endpoint |
|
||||||
|
|
||||||
|
```python
|
||||||
|
@router.get("/chunks/{file_path}/pdf")
|
||||||
|
async def get_chunk_pdf(file_path: str):
|
||||||
|
"""Serve a chunk PDF file from document_chunk/ directory."""
|
||||||
|
# Validate path to prevent directory traversal
|
||||||
|
# Return FileResponse from DOCUMENT_CHUNK_PATH / file_path
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3.2.6 Ingestion Pipeline Refactor
|
||||||
|
|
||||||
|
The entire ingestion flow needs to be updated:
|
||||||
|
|
||||||
|
**Current flow**:
|
||||||
|
```
|
||||||
|
Upload → parse_pdf() → flat text → chunk() → metadata → store in ChromaDB
|
||||||
|
```
|
||||||
|
|
||||||
|
**New flow**:
|
||||||
|
```
|
||||||
|
Upload → check if filename exists → YES: delete old chunks + chunk PDFs (full replacement)
|
||||||
|
→ parse_pdf_by_page() → per-page text
|
||||||
|
→ chunk_pages() with 200-token overlap from adjacent pages
|
||||||
|
→ for each page: extract page as PDF → save to document_chunk/
|
||||||
|
→ enhanced metadata (page_number, chunk_file_path, document_id)
|
||||||
|
→ store in ChromaDB
|
||||||
|
```
|
||||||
|
|
||||||
|
**Same-filename replacement** (confirmed):
|
||||||
|
- On upload, query ChromaDB for existing chunks with matching `filename`
|
||||||
|
- If found: delete old chunk IDs from collection, delete old PDFs from `document_chunk/`
|
||||||
|
- Create new `document_id`, ingest fresh
|
||||||
|
- This ensures clean replacement without orphaned data
|
||||||
|
|
||||||
|
**Modified Files**:
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `backend/app/routers/ingest.py` | Refactor: page-aware parsing, chunk PDF generation, enhanced metadata, same-filename replacement |
|
||||||
|
|
||||||
|
### 3.3 Frontend Changes
|
||||||
|
|
||||||
|
**Modified Files**:
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `frontend/src/types/index.ts` | Add `page_number`, `chunk_file_path` to `SourceMetadata` |
|
||||||
|
| `frontend/src/components/ResponsePanel.tsx` | Render `chunk_file_path` as clickable link in sources |
|
||||||
|
| `frontend/src/components/ChunkList.tsx` | Show page number, link to chunk PDF |
|
||||||
|
|
||||||
|
**Source Card Update**:
|
||||||
|
|
||||||
|
Current source card shows: `filename`, `upload_date`, `content_summary`, `chunk_index`
|
||||||
|
|
||||||
|
Enhanced source card adds: `page_number` (e.g., "Page 3"), clickable "View Source" link opening chunk PDF
|
||||||
|
|
||||||
|
### 3.4 Directory Structure After Enhancement
|
||||||
|
|
||||||
|
```
|
||||||
|
legco_reranker/
|
||||||
|
├── app/
|
||||||
|
│ ├── backend/...
|
||||||
|
│ ├── frontend/...
|
||||||
|
│ └── chroma_db/ # Existing
|
||||||
|
├── document_chunk/ # NEW — chunk PDF files
|
||||||
|
│ ├── NEC4 ACC_page_1.pdf
|
||||||
|
│ ├── NEC4 ACC_page_2.pdf
|
||||||
|
│ ├── NEC4 ACC_page_3.pdf
|
||||||
|
│ └── meeting_notes_page_5.pdf
|
||||||
|
├── .plans/
|
||||||
|
└── ...
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.5 Acceptance Criteria
|
||||||
|
|
||||||
|
- [ ] PDF uploads produce page-aware chunks: 1 chunk per page with 200-token overlap from adjacent pages
|
||||||
|
- [ ] Each page is saved as a separate PDF (original page, not generated text) in `document_chunk/`
|
||||||
|
- [ ] Chunk PDF filename follows convention: `{filename}_page_{n}.pdf`
|
||||||
|
- [ ] Page numbers are sequential index (1, 2, 3...), not PDF internal labels
|
||||||
|
- [ ] Oversized pages are kept as single chunks (never split)
|
||||||
|
- [ ] `GET /api/v1/chunks/{file_path}/pdf` serves the original chunk PDF
|
||||||
|
- [ ] RAG response sources include `page_number` and `chunk_file_path`
|
||||||
|
- [ ] Frontend source cards show page number and clickable link
|
||||||
|
- [ ] Clicking source link opens/downloads the original chunk PDF
|
||||||
|
- [ ] DOCX uploads work without page numbers (graceful degradation, no chunk PDFs)
|
||||||
|
- [ ] Uploading a file with same filename replaces existing document (old chunks + PDFs deleted, new document_id)
|
||||||
|
- [ ] `document_chunk/` is `.gitignore`d
|
||||||
|
- [ ] Deleting a document also removes its chunk PDFs from `document_chunk/`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Sequence
|
||||||
|
|
||||||
|
The three features have dependencies. Recommended order:
|
||||||
|
|
||||||
|
```
|
||||||
|
Feature 1 (Nav + Routing) ← No backend changes, enables Feature 2
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Feature 2 (RAG Database Page) ← Needs Feature 1 for page routing
|
||||||
|
│ But backend CRUD endpoints are independent
|
||||||
|
▼
|
||||||
|
Feature 3 (Page-Aware Chunking) ← Modifies ingestion pipeline
|
||||||
|
Enhances Feature 2 (chunk file links)
|
||||||
|
Enhances ResponsePanel (clickable sources)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Sub-Phase Breakdown
|
||||||
|
|
||||||
|
| Sub-Phase | Feature | Scope | Backend | Frontend | Status |
|
||||||
|
|-----------|---------|-------|---------|----------|--------|
|
||||||
|
| 1.5.1 | 1 | Nav bar + routing + page scaffold | None | NavBar, LTTPage, RAGDatabasePage, App.tsx refactor | ✅ Complete |
|
||||||
|
| 1.5.2 | 2 | Backend CRUD for documents/chunks | documents router, RAGService methods, schemas | None | 📋 Pending |
|
||||||
|
| 1.5.3 | 2 | Frontend RAG Database page | None | RAGDatabasePage, DocumentList, ChunkList, DocumentUpload, API hooks | 📋 Pending |
|
||||||
|
| 1.5.4 | 3 | Page-aware parsing & chunking | pdf_parser, chunking, metadata enhancements | None | 📋 Pending |
|
||||||
|
| 1.5.5 | 3 | Chunk PDF generation & storage | pdf_extractor, config, ingest pipeline refactor | None | 📋 Pending |
|
||||||
|
| 1.5.6 | 3 | Chunk file serving + frontend links | documents router endpoint | ResponsePanel clickable links, ChunkList updates | 📋 Pending |
|
||||||
|
|
||||||
|
### Parallelization Opportunities
|
||||||
|
|
||||||
|
- **1.5.1 and 1.5.2 can run in parallel** — Frontend routing changes and backend CRUD are independent
|
||||||
|
- **1.5.3 blocked by 1.5.1 + 1.5.2** — Needs both routing and backend endpoints
|
||||||
|
- **1.5.4 and 1.5.5 are sequential** — 1.5.5 depends on 1.5.4's page-aware parsing
|
||||||
|
- **1.5.6 blocked by 1.5.3 + 1.5.5** — Needs both frontend page and backend chunk serving
|
||||||
|
|
||||||
|
```
|
||||||
|
1.5.1 (Nav+Routing) ─┐
|
||||||
|
├─► 1.5.3 (RAG DB Page) ─┐
|
||||||
|
1.5.2 (Backend CRUD) ─┘ │
|
||||||
|
├─► 1.5.6 (Links)
|
||||||
|
1.5.4 (Page-Aware) ──► 1.5.5 (Chunk PDFs) ─────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## New Dependencies
|
||||||
|
|
||||||
|
### Backend
|
||||||
|
| Package | Purpose | Already installed? |
|
||||||
|
|---------|---------|--------------------|
|
||||||
|
| (none) | pypdf already supports page extraction | ✅ |
|
||||||
|
|
||||||
|
### Frontend
|
||||||
|
| Package | Purpose | Already installed? |
|
||||||
|
|---------|---------|--------------------|
|
||||||
|
| `react-router-dom` | Client-side routing | ✅ Installed |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## .gitignore Updates
|
||||||
|
|
||||||
|
```gitignore
|
||||||
|
# Chunk PDF storage
|
||||||
|
document_chunk/
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risks & Mitigations
|
||||||
|
|
||||||
|
| Risk | Impact | Mitigation |
|
||||||
|
|------|--------|------------|
|
||||||
|
| ChromaDB has no native "group by document" query | `list_documents()` needs manual grouping from all metadata | Group by `filename` + `document_id` in metadata. Cache result if slow. |
|
||||||
|
| Large PDFs → many chunk PDF files | Disk usage grows | One PDF per unique page (not per chunk). Pages shared by chunks reuse same file. |
|
||||||
|
| Chunk spans multiple pages | Ambiguous page assignment | Tag chunk with STARTING page only. Note in UI. |
|
||||||
|
| DOCX has no page numbers | `page_number` is None for DOCX chunks | Graceful degradation — show "N/A" or hide page info for DOCX. |
|
||||||
|
| Deleting documents must clean up chunk files | Orphan files if deletion fails | Delete files after successful ChromaDB deletion. Log failures for manual cleanup. |
|
||||||
|
| Path traversal in chunk PDF endpoint | Security risk | Validate `file_path` doesn't contain `..` or absolute paths. Use whitelist of known files. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decisions (Confirmed)
|
||||||
|
|
||||||
|
| # | Question | Decision |
|
||||||
|
|---|----------|----------|
|
||||||
|
| 1 | Chunk algorithm | **Page-as-chunk-unit**. Each chunk = `[last 200 tokens of prev page] + [full current page text] + [first 200 tokens of next page]`. One chunk per page — never split a page even if oversized. |
|
||||||
|
| 2 | DOCX chunk PDFs | **No**. Only PDFs get chunk PDFs. DOCX chunks show text preview only. |
|
||||||
|
| 3 | IngestPanel placement | **Keep on LTT page** + also add upload on RAG Database page. |
|
||||||
|
| 4 | Re-ingestion / same filename | **Full replacement**. Delete old chunks + old chunk PDFs + create new `document_id`. |
|
||||||
|
| 5 | Chunk PDF content | **Original page from source PDF**. Extract actual page — preserves formatting, tables, images. |
|
||||||
|
| 6 | Page numbering | **Sequential index** (1, 2, 3...). Not PDF internal labels. |
|
||||||
|
| 7 | Oversized pages | **Never split**. One chunk per page regardless of token count. |
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
None — all resolved.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test Plan
|
||||||
|
|
||||||
|
### Backend Tests (New Files)
|
||||||
|
|
||||||
|
| File | Coverage |
|
||||||
|
|------|----------|
|
||||||
|
| `test_phase1_documents_router.py` | GET /documents, DELETE /documents/{id}, DELETE /chunks/{id} |
|
||||||
|
| `test_phase1_pdf_parser_pages.py` | parse_pdf_by_page() — multi-page PDFs, single-page, empty |
|
||||||
|
| `test_phase1_page_aware_chunking.py` | chunk_with_pages() — cross-page chunks, single-page chunks |
|
||||||
|
| `test_phase1_pdf_extractor.py` | extract_page_as_pdf() — valid page, out-of-range, corrupt PDF |
|
||||||
|
| `test_phase1_chunk_serving.py` | GET /chunks/{path}/pdf — valid file, missing file, path traversal |
|
||||||
|
|
||||||
|
### Frontend Tests (New Files)
|
||||||
|
|
||||||
|
| File | Coverage |
|
||||||
|
|------|----------|
|
||||||
|
| `NavBar.test.tsx` | Navigation links, active state |
|
||||||
|
| `RAGDatabasePage.test.tsx` | Document list, delete, upload |
|
||||||
|
| `DocumentList.test.tsx` | Document cards, expand/collapse |
|
||||||
|
| `ChunkList.test.tsx` | Chunk table, page numbers, PDF links |
|
||||||
|
|
||||||
|
### Acceptance Tests
|
||||||
|
|
||||||
|
| File | Coverage |
|
||||||
|
|------|----------|
|
||||||
|
| `test_acceptance_phase1_documents_crud.py` | Real ChromaDB CRUD with list, delete |
|
||||||
|
| `test_acceptance_phase1_page_chunking.py` | Real PDF upload → page-aware chunks → chunk PDFs exist |
|
||||||
|
| `test_acceptance_phase1_chunk_links.py` | Full flow: upload → query → response has clickable chunk links |
|
||||||
|
|
@ -107,7 +107,7 @@ class RAGService:
|
||||||
f"Answer the question using ONLY these document chunks. "
|
f"Answer the question using ONLY these document chunks. "
|
||||||
f"Do not use any external knowledge. "
|
f"Do not use any external knowledge. "
|
||||||
f"Format your answer as bullet points. "
|
f"Format your answer as bullet points. "
|
||||||
f"Cite the source number [N] for each point.\n\n"
|
f"Cite the source name in [ ] for each point.\n\n"
|
||||||
f"Document chunks:\n{context}\n\n"
|
f"Document chunks:\n{context}\n\n"
|
||||||
f"Answer:"
|
f"Answer:"
|
||||||
)
|
)
|
||||||
|
|
|
||||||
File diff suppressed because it is too large
Load Diff
|
|
@ -16,6 +16,8 @@
|
||||||
"lucide-react": "^0.190.0",
|
"lucide-react": "^0.190.0",
|
||||||
"react": "^18.2.0",
|
"react": "^18.2.0",
|
||||||
"react-dom": "^18.2.0",
|
"react-dom": "^18.2.0",
|
||||||
|
"react-markdown": "^10.1.0",
|
||||||
|
"react-router-dom": "^7.14.2",
|
||||||
"tailwindcss": "^3.4.0"
|
"tailwindcss": "^3.4.0"
|
||||||
},
|
},
|
||||||
"devDependencies": {
|
"devDependencies": {
|
||||||
|
|
|
||||||
|
|
@ -1,69 +1,27 @@
|
||||||
import React from 'react'
|
import React from 'react'
|
||||||
import { QueryClientProvider } from '@tanstack/react-query'
|
import { BrowserRouter, Routes, Route } from 'react-router-dom'
|
||||||
import { queryClient, useQueryDocument, useIngestDocument } from './lib/queries'
|
import { AppQueryProvider } from './lib/queries'
|
||||||
import { Film } from 'lucide-react'
|
|
||||||
import { QueryInput } from './components/QueryInput'
|
|
||||||
import { KeywordsDisplay } from './components/KeywordsDisplay'
|
|
||||||
import { ResponsePanel } from './components/ResponsePanel'
|
|
||||||
import { IngestPanel } from './components/IngestPanel'
|
|
||||||
import { ErrorBoundary } from './components/ErrorBoundary'
|
import { ErrorBoundary } from './components/ErrorBoundary'
|
||||||
|
import { NavBar } from './components/NavBar'
|
||||||
const VideoPlaceholder: React.FC = () => {
|
import { LTTPage } from './pages/LTTPage'
|
||||||
return (
|
import { RAGDatabasePage } from './pages/RAGDatabasePage'
|
||||||
<div className="h-full flex items-center justify-center bg-white/50">
|
|
||||||
<div className="text-center space-y-2">
|
|
||||||
<Film className="mx-auto w-12 h-12 text-gray-600" />
|
|
||||||
<div className="text-gray-700 font-semibold">Video upload coming in Phase 2</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
)
|
|
||||||
}
|
|
||||||
|
|
||||||
const AppContent: React.FC = () => {
|
|
||||||
const queryMutation = useQueryDocument()
|
|
||||||
const ingestMutation = useIngestDocument()
|
|
||||||
|
|
||||||
const handleQuerySubmit = (question: string): void => {
|
|
||||||
queryMutation.mutate({ question })
|
|
||||||
}
|
|
||||||
|
|
||||||
const handleFileUpload = (file: File): void => {
|
|
||||||
ingestMutation.mutate(file)
|
|
||||||
}
|
|
||||||
|
|
||||||
return (
|
|
||||||
<div className="h-screen grid grid-rows-[30%_1fr] grid-cols-2 bg-gray-50">
|
|
||||||
<div className="border-r border-b border-gray-200 p-4 min-h-0 overflow-hidden">
|
|
||||||
<VideoPlaceholder />
|
|
||||||
</div>
|
|
||||||
<div className="border-b border-gray-200 p-6 flex flex-col gap-4 overflow-y-auto min-h-0">
|
|
||||||
<QueryInput onSubmit={handleQuerySubmit} isLoading={queryMutation.isPending} />
|
|
||||||
<KeywordsDisplay keywords={queryMutation.data?.keywords} isLoading={queryMutation.isPending} />
|
|
||||||
<IngestPanel
|
|
||||||
onUpload={handleFileUpload}
|
|
||||||
isLoading={ingestMutation.isPending}
|
|
||||||
success={ingestMutation.isSuccess ? ingestMutation.data?.filename ?? null : null}
|
|
||||||
error={ingestMutation.isError ? (ingestMutation.error instanceof Error ? ingestMutation.error.message : 'Upload failed') : null}
|
|
||||||
/>
|
|
||||||
</div>
|
|
||||||
<div className="col-span-2 p-6 border-t border-gray-200 overflow-y-auto min-h-0">
|
|
||||||
<ResponsePanel
|
|
||||||
answer={queryMutation.data?.answer ?? null}
|
|
||||||
sources={queryMutation.data?.sources ?? []}
|
|
||||||
isLoading={queryMutation.isPending}
|
|
||||||
error={queryMutation.isError ? (queryMutation.error instanceof Error ? queryMutation.error.message : 'Query failed') : null}
|
|
||||||
/>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
)
|
|
||||||
}
|
|
||||||
|
|
||||||
export default function App(): JSX.Element {
|
export default function App(): JSX.Element {
|
||||||
return (
|
return (
|
||||||
<QueryClientProvider client={queryClient}>
|
<BrowserRouter>
|
||||||
<ErrorBoundary>
|
<AppQueryProvider>
|
||||||
<AppContent />
|
<ErrorBoundary>
|
||||||
</ErrorBoundary>
|
<div className="h-screen flex flex-col">
|
||||||
</QueryClientProvider>
|
<NavBar />
|
||||||
|
<div className="flex-1 overflow-auto">
|
||||||
|
<Routes>
|
||||||
|
<Route path="/" element={<LTTPage />} />
|
||||||
|
<Route path="/rag-database" element={<RAGDatabasePage />} />
|
||||||
|
</Routes>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</ErrorBoundary>
|
||||||
|
</AppQueryProvider>
|
||||||
|
</BrowserRouter>
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,35 @@
|
||||||
|
import React from 'react'
|
||||||
|
import { NavLink } from 'react-router-dom'
|
||||||
|
|
||||||
|
export const NavBar: React.FC = () => {
|
||||||
|
return (
|
||||||
|
<nav className="h-12 flex-shrink-0 bg-white border-b border-gray-200 px-4 flex items-center">
|
||||||
|
<div className="flex gap-6">
|
||||||
|
<NavLink
|
||||||
|
to="/"
|
||||||
|
className={({ isActive }) =>
|
||||||
|
`text-sm font-medium transition-colors ${
|
||||||
|
isActive
|
||||||
|
? 'text-gray-900 border-b-2 border-gray-900'
|
||||||
|
: 'text-gray-500 hover:text-gray-700 border-b-2 border-transparent'
|
||||||
|
}`
|
||||||
|
}
|
||||||
|
>
|
||||||
|
LTT
|
||||||
|
</NavLink>
|
||||||
|
<NavLink
|
||||||
|
to="/rag-database"
|
||||||
|
className={({ isActive }) =>
|
||||||
|
`text-sm font-medium transition-colors ${
|
||||||
|
isActive
|
||||||
|
? 'text-gray-900 border-b-2 border-gray-900'
|
||||||
|
: 'text-gray-500 hover:text-gray-700 border-b-2 border-transparent'
|
||||||
|
}`
|
||||||
|
}
|
||||||
|
>
|
||||||
|
RAG Database
|
||||||
|
</NavLink>
|
||||||
|
</div>
|
||||||
|
</nav>
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
@ -1,5 +1,6 @@
|
||||||
import React, { useState } from 'react'
|
import React, { useState } from 'react'
|
||||||
import { MessageSquare, AlertCircle, Copy, ChevronDown, ChevronRight } from 'lucide-react'
|
import { MessageSquare, AlertCircle, Copy, ChevronDown, ChevronRight } from 'lucide-react'
|
||||||
|
import ReactMarkdown from 'react-markdown'
|
||||||
import type { SourceMetadata } from '../types'
|
import type { SourceMetadata } from '../types'
|
||||||
|
|
||||||
interface ResponsePanelProps {
|
interface ResponsePanelProps {
|
||||||
|
|
@ -106,21 +107,8 @@ export const ResponsePanel: React.FC<ResponsePanelProps> = ({
|
||||||
<span className="text-sm">{copied ? 'Copied!' : 'Copy'}</span>
|
<span className="text-sm">{copied ? 'Copied!' : 'Copy'}</span>
|
||||||
</button>
|
</button>
|
||||||
</div>
|
</div>
|
||||||
<div className="space-y-2 transition-all duration-300">
|
<div className="prose prose-sm max-w-none text-gray-800">
|
||||||
{answer
|
<ReactMarkdown>{answer ?? ''}</ReactMarkdown>
|
||||||
?.split('\n')
|
|
||||||
.map((line, index) => {
|
|
||||||
const trimmedLine = line.trim()
|
|
||||||
if (trimmedLine.startsWith('-') || trimmedLine.startsWith('•')) {
|
|
||||||
const content = trimmedLine.replace(/^[-•]\s*/, '')
|
|
||||||
return (
|
|
||||||
<li key={index} className="ml-4">
|
|
||||||
{content}
|
|
||||||
</li>
|
|
||||||
)
|
|
||||||
}
|
|
||||||
return <p key={index}>{trimmedLine}</p>
|
|
||||||
})}
|
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,57 @@
|
||||||
|
import React from 'react'
|
||||||
|
import { Film } from 'lucide-react'
|
||||||
|
import { useQueryDocument, useIngestDocument } from '../lib/queries'
|
||||||
|
import { QueryInput } from '../components/QueryInput'
|
||||||
|
import { KeywordsDisplay } from '../components/KeywordsDisplay'
|
||||||
|
import { ResponsePanel } from '../components/ResponsePanel'
|
||||||
|
import { IngestPanel } from '../components/IngestPanel'
|
||||||
|
|
||||||
|
const VideoPlaceholder: React.FC = () => {
|
||||||
|
return (
|
||||||
|
<div className="h-full flex items-center justify-center bg-white/50">
|
||||||
|
<div className="text-center space-y-2">
|
||||||
|
<Film className="mx-auto w-12 h-12 text-gray-600" />
|
||||||
|
<div className="text-gray-700 font-semibold">Video upload coming in Phase 2</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
export const LTTPage: React.FC = () => {
|
||||||
|
const queryMutation = useQueryDocument()
|
||||||
|
const ingestMutation = useIngestDocument()
|
||||||
|
|
||||||
|
const handleQuerySubmit = (question: string): void => {
|
||||||
|
queryMutation.mutate({ question })
|
||||||
|
}
|
||||||
|
|
||||||
|
const handleFileUpload = (file: File): void => {
|
||||||
|
ingestMutation.mutate(file)
|
||||||
|
}
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="h-full grid grid-rows-[30%_1fr] grid-cols-2 bg-gray-50">
|
||||||
|
<div className="border-r border-b border-gray-200 p-4 min-h-0 overflow-hidden">
|
||||||
|
<VideoPlaceholder />
|
||||||
|
</div>
|
||||||
|
<div className="border-b border-gray-200 p-6 flex flex-col gap-4 overflow-y-auto min-h-0">
|
||||||
|
<QueryInput onSubmit={handleQuerySubmit} isLoading={queryMutation.isPending} />
|
||||||
|
<KeywordsDisplay keywords={queryMutation.data?.keywords} isLoading={queryMutation.isPending} />
|
||||||
|
<IngestPanel
|
||||||
|
onUpload={handleFileUpload}
|
||||||
|
isLoading={ingestMutation.isPending}
|
||||||
|
success={ingestMutation.isSuccess ? ingestMutation.data?.filename ?? null : null}
|
||||||
|
error={ingestMutation.isError ? (ingestMutation.error instanceof Error ? ingestMutation.error.message : 'Upload failed') : null}
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
<div className="col-span-2 p-6 border-t border-gray-200 overflow-y-auto min-h-0">
|
||||||
|
<ResponsePanel
|
||||||
|
answer={queryMutation.data?.answer ?? null}
|
||||||
|
sources={queryMutation.data?.sources ?? []}
|
||||||
|
isLoading={queryMutation.isPending}
|
||||||
|
error={queryMutation.isError ? (queryMutation.error instanceof Error ? queryMutation.error.message : 'Query failed') : null}
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
@ -0,0 +1,13 @@
|
||||||
|
import React from 'react'
|
||||||
|
import { Database } from 'lucide-react'
|
||||||
|
|
||||||
|
export const RAGDatabasePage: React.FC = () => {
|
||||||
|
return (
|
||||||
|
<div className="h-full flex items-center justify-center bg-white/50">
|
||||||
|
<div className="text-center space-y-2">
|
||||||
|
<Database className="mx-auto w-12 h-12 text-gray-600" />
|
||||||
|
<div className="text-gray-700 font-semibold">RAG Database Management — Coming Soon</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
@ -53,8 +53,8 @@ describe('ResponsePanel', () => {
|
||||||
expect(screen.getByText(/Failed to fetch answer/i)).toBeInTheDocument()
|
expect(screen.getByText(/Failed to fetch answer/i)).toBeInTheDocument()
|
||||||
})
|
})
|
||||||
|
|
||||||
it('renders answer text as bullet points', () => {
|
it('renders answer text as bullet points via markdown', () => {
|
||||||
const answer = `- First point\n- Second point\n• Third point\nPlain text line`
|
const answer = `- First point\n- Second point\n- Third point\n\nPlain text line`
|
||||||
render(<ResponsePanel answer={answer} sources={[]} isLoading={false} error={null} />)
|
render(<ResponsePanel answer={answer} sources={[]} isLoading={false} error={null} />)
|
||||||
expect(screen.getByText('First point')).toBeInTheDocument()
|
expect(screen.getByText('First point')).toBeInTheDocument()
|
||||||
expect(screen.getByText('Second point')).toBeInTheDocument()
|
expect(screen.getByText('Second point')).toBeInTheDocument()
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue