legco_ai_assistant/.plans/package3_enhancement_plan.md

# Package 3 Enhancement Plan

**Source**: User request (2026-04-25)
**Scope**: System Prompt Configuration Page + Query History Page
**Status**: 🔧 In Progress (3.1 ✅, 3.2 ✅, 3.3 ✅, next: 3.4)

---

## Objective

Add two new features that give users visibility and control over the RAG pipeline:

1. **System Prompt Configuration Page** — Users can view/edit the full prompt templates for all 3 LLM calls (Decomposer, Relevance Filter, Response Generator). Templates support placeholders (`{question}`, `{chunks}`, `{context}`) that are replaced at query time. Supports 3 profiles (A, B, C) that users switch between with a single click.

2. **Query History Page** — Records every query with full detail: input text, extracted questions, timing per pipeline stage (decompose, retrieve, filter, generate), chunks retrieved/filtered counts, final answer, sources, total time, and which profile was used.

---

## Current State

### What Exists

**LLM Pipeline** (3 calls, prompt templates hardcoded in service files):

| Call | Service | File:Line | Current Prompt Template | Temp | Placeholders |
|------|---------|-----------|------------------------|------|--------------|
| 1 | `QueryDecomposer` | `services/query_decomposer.py:54-59` | `"Given this question: '{question}'\n\nBreak it down into 2-5 simplified sub-questions..."` | default (0.7) | `{question}` |
| 2 | `RelevanceFilter` | `services/relevance_filter.py:36-39` | `"Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores.\n{chunks_string}"` | 0.0 | `{question}`, `{chunks}` |
| 3 | `RAGService` | `services/rag.py:108-117` | `"Question: {question}\n\nAnswer the question using ONLY these document chunks...bullet points...cite sources...\n\nDocument chunks:\n{context}\n\nAnswer:"` | 0.3 | `{question}`, `{context}` |

- `LLMClient.complete(prompt, temperature, step_name)` — single method, sends prompt as `[{"role": "user", "content": prompt}]`
- All 3 prompts are f-strings built inline in the service methods — no template abstraction exists
- The `step_name` parameter is only used for log labels

**Data Storage:**
- **No SQL database exists.** ChromaDB is the only persistent store (vector database).
- Config is `.env`-driven via `pydantic-settings.BaseSettings` (flat key-value, not user-editable at runtime).
- Logging exists (RotatingFileHandler to `backend/app/log/backend.log`) — timing data is logged but never persisted.

**Frontend:**
- 3 pages: `LTTPage` (/), `RAGDatabasePage` (/rag-database), `PdfViewerPage` (/pdf-viewer)
- NavBar has "LTT" and "RAG Database" links
- No history page, no settings/configuration page
- No shadcn/ui — all components are custom Tailwind

**Query Pipeline (SSE streaming)**:
```
POST /api/v1/query
  → QueryDecomposer.decompose()     [LLM Call 1, timing logged only]
  → RAGService.retrieve()           [ChromaDB, no timing capture]
  → RelevanceFilter.filter()        [LLM Call 2, timing logged only]
  → RAGService.generate_response()  [LLM Call 3, timing logged only]
  → SSE: completed event with answer + sources
```

### What's Missing (Gaps This Plan Fills)

- No way for users to customize LLM prompts
- No persistence of query history — all queries are ephemeral
- No record of how long each pipeline stage takes
- No way to review past queries and answers
- No user-facing configuration page of any kind
- Hardcoded prompt templates can't be tuned without changing source code

---

## Feature 1: System Prompt Configuration (Full Template Editing)

### 1.1 Overview

Users edit the **complete prompt template** for each of the 3 LLM calls. Templates contain placeholder variables (e.g., `{question}`, `{chunks}`, `{context}`) that are replaced with actual data at query time. Three profiles (A, B, C) let users save and switch between different prompt sets.

**Design Decision**: Unlike the original plan (system role prefix + hardcoded user template), users edit the ENTIRE prompt. This gives full control over LLM instructions, output format, and behavior. The page documents exactly which placeholders are available for each step so users know what they can use.

### 1.2 Database Schema

**Database**: `backend/data/prompts.db` (SQLite, stdlib `sqlite3`)

```sql
CREATE TABLE IF NOT EXISTS system_prompt_profiles (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,           -- "A" | "B" | "C"
    is_active INTEGER DEFAULT 0,         -- only ONE row has is_active = 1
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE TABLE IF NOT EXISTS system_prompts (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    profile_id INTEGER NOT NULL,
    step_name TEXT NOT NULL,             -- "decompose" | "filter" | "generate"
    prompt_template TEXT NOT NULL,       -- full prompt with {placeholder} variables
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at TEXT NOT NULL DEFAULT (datetime('now')),
    FOREIGN KEY (profile_id) REFERENCES system_prompt_profiles(id) ON DELETE CASCADE,
    UNIQUE(profile_id, step_name)
);
```

**Default seed data** (3 profiles × 3 prompts = 9 rows, Profile A active by default):

All 3 profiles start with the **same defaults** (the current hardcoded prompts). Users customize from there.

| Profile | Step | Placeholder | Seed Template |
|---------|------|-------------|---------------|
| A | decompose | `{question}` | `"Given this question: '{question}'\n\nBreak it down into 2-5 simplified sub-questions that would help search for relevant information. Each sub-question should be short and focused on one aspect. Return as a JSON array of strings."` |
| A | filter | `{question}`, `{chunks}` | `"Given question '{question}' and these document chunks, rate each 0-10 for relevance.\nReturn JSON array of scores.\n{chunks}\n"` |
| A | generate | `{question}`, `{context}` | `"Question: {question}\n\nAnswer the question using ONLY these document chunks. Do not use any external knowledge. Format your answer as bullet points. Cite your sources inline using the exact bracket labels provided, e.g. [filename, page N]. Place the citation at the end of each relevant point.\n\nDocument chunks:\n{context}\n\nAnswer:"` |
| B | decompose | `{question}` | (same as A) |
| B | filter | `{question}`, `{chunks}` | (same as A) |
| B | generate | `{question}`, `{context}` | (same as A) |
| C | decompose | `{question}` | (same as A) |
| C | filter | `{question}`, `{chunks}` | (same as A) |
| C | generate | `{question}`, `{context}` | (same as A) |

### 1.3 Available Placeholders (per step)

These are documented on the frontend edit page so users know exactly what they can insert:

| Step | Placeholder | What It Contains | Example Replacement |
|------|-------------|------------------|---------------------|
| **Decompose** | `{question}` | The user's original input text | `"What is the NEC4 clause about time extensions?"` |
| **Filter** | `{question}` | The user's original input text | (same) |
| | `{chunks}` | Numbered list of all retrieved chunks: `Chunk 1: <text>\nChunk 2: <text>...` | `"Chunk 1: The NEC4 clause 61.3 states that time extensions...\nChunk 2: Notice must be given..."` |
| **Generate** | `{question}` | The user's original input text | (same) |
| | `{context}` | Formatted chunks with citation labels: `[filename, page N] Source: ...\nSummary: ...\nContent: ...` | `"[NEC4 ACC.pdf, page 3] Source: NEC4 ACC.pdf\nSummary: Discussion of time extension provisions...\nContent: Clause 61.3 states..."` |

**Placeholder syntax**: `{variable_name}` — must match exactly. Unknown placeholders are left as-is (not replaced). If a user removes a required placeholder (e.g., `{question}`), the LLM won't see the question — the UI warns but doesn't block.

### 1.4 Backend Architecture

#### New Files

| File | Purpose |
|------|---------|
| `backend/app/core/sqlite_db.py` | SQLite connection factory (shared by prompts + history) |
| `backend/app/services/prompt_service.py` | CRUD for prompt profiles and templates; template formatting |
| `backend/app/routers/prompts.py` | REST API endpoints for prompt management |
| `backend/app/models/prompts.py` | Pydantic schemas for prompt request/response |

#### Modified Files

| File | Change |
|------|--------|
| `backend/app/core/config.py` | Add `prompts_db_path` and `history_db_path` |
| `backend/app/core/dependencies.py` | Add DI factories: `get_prompt_service()` |
| `backend/app/main.py` | Register `prompts` router; startup: create tables + seed 3 default profiles |
| `backend/app/services/query_decomposer.py` | `decompose()` fetches template from prompt service, formats with `{question}`, sends to LLM |
| `backend/app/services/relevance_filter.py` | `filter()` fetches template from prompt service, formats with `{question}` and `{chunks}`, sends to LLM |
| `backend/app/services/rag.py` | `generate_response()` fetches template from prompt service, formats with `{question}` and `{context}`, sends to LLM |
| `backend/app/routers/query.py` | Pass `PromptService` to pipeline; record active profile name for history |

#### How Template Formatting Works

Each service method changes from building a hardcoded prompt to fetching and formatting a template:

**Before** (query_decomposer.py):
```python
prompt = (
    f"Given this question: '{question}'\n\n"
    f"Break it down into 2-5 simplified sub-questions..."
)
response = await self.llm_client.complete(prompt, step_name="QueryDecomposer")
```

**After** (query_decomposer.py):
```python
template = self.prompt_service.get_prompt_template(step="decompose")
prompt = template.replace("{question}", question)
response = await self.llm_client.complete(prompt, step_name="QueryDecomposer")
```

**`PromptService.get_prompt_template()`** fetches the template for the currently active profile + given step. Uses Python `str.replace()` for placeholder substitution — simple, predictable, no `str.format()` edge cases with curly braces in user text.

**Note**: `LLMClient.complete()` does NOT change — no `system_prompt` parameter is added. Templates remain single user-role messages, same as today. The only difference is the prompt text comes from the DB instead of being hardcoded.

#### API Endpoints (5 total — fixed 3 profiles, no create/delete)

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/v1/prompts/profiles` | List all 3 profiles with active status: `[{name: "A", is_active: true}, ...]` |
| `PUT` | `/api/v1/prompts/profiles/{name}/activate` | Activate a profile by name (e.g., `PUT /profiles/B/activate`). Validates name is A/B/C. |
| `GET` | `/api/v1/prompts/profiles/{name}` | Get all 3 prompt templates for a profile |
| `PUT` | `/api/v1/prompts/profiles/{name}/{step}` | Update a single prompt template. Validates step is decompose/filter/generate. |
| `PUT` | `/api/v1/prompts/profiles/{name}/all` | Batch update all 3 prompt templates for a profile |

**Why fixed 3 profiles (no create/delete)**:
- Simplest mental model: 3 slots, name them A/B/C
- No duplicate name conflicts, no "delete last profile" edge case
- "Reset to Defaults" restores the seed template for a profile

### 1.5 Frontend Design

**New page**: `/system-prompts`
**New NavBar link**: "System Prompts"

```
┌──────────────────────────────────────────────────────────┐
│  System Prompts                                          │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  Active Profile: [A ▼]  [Set Active]                     │
│                                                           │
│  ┌────────────────────────────────────────────────────┐  │
│  │ ● Profile A  (active)     [Edit]                   │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ ○ Profile B               [Edit]                   │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ ○ Profile C               [Edit]                   │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  ── Editing Profile A ─────────────────────────────────  │
│                                                           │
│  Available placeholders:                                  │
│  ┌────────────────────────────────────────────────────┐  │
│  │  {question}  — The user's input question           │  │
│  │  {chunks}    — Retrieved document chunks (filter)  │  │
│  │  {context}   — Formatted chunks with citations     │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  Step 1: Query Decomposition                             │
│  Placeholders: {question}                                │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Given this question: '{question}'                  │  │
│  │                                                    │  │
│  │ Break it down into 2-5 simplified sub-questions    │  │
│  │ that would help search for relevant information.   │  │
│  │ Each sub-question should be short and focused on   │  │
│  │ one aspect. Return as a JSON array of strings.     │  │
│  └────────────────────────────────────────────────────┘  │
│  ───────────────────────────────────────────────────────  │
│  Step 2: Relevance Filtering                             │
│  Placeholders: {question}, {chunks}                     │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Given question '{question}' and these document     │  │
│  │ chunks, rate each 0-10 for relevance.              │  │
│  │ Return JSON array of scores.                       │  │
│  │ {chunks}                                           │  │
│  └────────────────────────────────────────────────────┘  │
│  ───────────────────────────────────────────────────────  │
│  Step 3: Response Generation                             │
│  Placeholders: {question}, {context}                    │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Question: {question}                               │  │
│  │                                                    │  │
│  │ Answer the question using ONLY these document      │  │
│  │ chunks. Do not use any external knowledge.         │  │
│  │ Format your answer as bullet points.               │  │
│  │ Cite your sources inline...                        │  │
│  │                                                    │  │
│  │ Document chunks:                                   │  │
│  │ {context}                                          │  │
│  │                                                    │  │
│  │ Answer:                                            │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  [Save Changes]  [Reset All to Defaults]  [Cancel]      │
│                                                           │
└──────────────────────────────────────────────────────────┘
```

**Component tree**:
```
SystemPromptsPage
├── ProfileSelector (dropdown A/B/C + "Set Active" button)
├── ProfileList (3 cards, active indicator)
│   └── ProfileCard × 3 (name, active indicator, Edit button)
├── PlaceholderDocs (info box showing available placeholders per step)
└── PromptEditor (shown when editing a profile)
    ├── PromptTextArea × 3 (labeled with step name + available placeholders)
    │   └── Per-step reset icon (↺) next to each textarea label
    └── ActionBar (Save, Reset All to Defaults, Cancel)
```

**Placeholder documentation in UI**: The page shows a "Available Placeholders" info box listing all placeholder variables and what they expand to. Each textarea has a subtle label showing which placeholders are valid for that step (e.g., "Placeholders: `{question}`, `{chunks}`"). Unknown placeholders in the template are left as-is by the backend — the UI shows a soft warning if the template references an unknown placeholder, but doesn't block saving.

**API hooks** (new in `lib/queries.tsx`):
```typescript
usePromptProfiles()              // useQuery: GET /prompts/profiles
usePromptProfile(name)           // useQuery: GET /prompts/profiles/{name}
useActivateProfile(name)         // useMutation: PUT /prompts/profiles/{name}/activate
useUpdatePrompt(name, step)      // useMutation: PUT /prompts/profiles/{name}/{step}
useUpdateAllPrompts(name)        // useMutation: PUT /prompts/profiles/{name}/all
```

**Edge cases handled**:
- Empty prompt template: allowed (LLM call proceeds with empty prompt — LLM will likely error or return nothing)
- Removed `{question}` placeholder: soft warning shown; LLM won't see the question — user's choice
- Unknown placeholder in template (e.g., `{foo}`): left as-is, UI shows warning badge
- Very long templates: textarea with vertical scroll, character count
- Unsaved changes: warn before navigating away
- Loading state: skeleton cards
- Error state: red error banner with retry

### 1.6 Acceptance Criteria
- [ ] `/system-prompts` page accessible via NavBar link
- [ ] 3 profiles (A/B/C) shown with active indicator (● / ○)
- [ ] "Set Active" switches which profile is used for queries
- [ ] Editing a profile shows 3 labeled textareas pre-filled with current templates
- [ ] Each textarea shows its available placeholders
- [ ] "Save Changes" persists templates to DB
- [ ] Per-step reset icon (↺) restores the seed template for that individual step
- [ ] "Reset All to Defaults" restores all 3 templates for the profile at once
- [ ] "Cancel" reverts unsaved edits
- [ ] Changing a template affects the NEXT query (fetched fresh each time)
- [ ] Placeholder docs visible on the page
- [ ] `pytest` backend tests pass (new + existing)
- [ ] `npm test` frontend tests pass (new + existing)

---

## Feature 2: Query History

### 2.1 Overview

Every query submitted through the LTT page is recorded in a history database with detailed timing per pipeline stage. Users can browse past queries, see timing breakdowns, and review answers.

### 2.2 Database Schema

**Database**: `backend/data/history.db` (SQLite, separate from prompts.db)

```sql
CREATE TABLE IF NOT EXISTS query_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    input_text TEXT NOT NULL,               -- original user input
    extracted_questions TEXT DEFAULT NULL,   -- JSON array of sub-questions
    decomposer_time_ms INTEGER DEFAULT 0,   -- LLM Call 1 duration
    retriever_time_ms INTEGER DEFAULT 0,    -- ChromaDB retrieval duration
    chunks_retrieved INTEGER DEFAULT 0,     -- chunks from ChromaDB
    filter_time_ms INTEGER DEFAULT 0,       -- LLM Call 2 duration
    chunks_filtered INTEGER DEFAULT 0,      -- chunks after relevance filtering
    generator_time_ms INTEGER DEFAULT 0,    -- LLM Call 3 duration
    total_time_ms INTEGER DEFAULT 0,        -- input received → final response sent
    final_answer TEXT DEFAULT NULL,          -- full RAG answer text
    sources TEXT DEFAULT NULL,               -- JSON array of SourceMetadata
    profile_used TEXT DEFAULT NULL,          -- "A", "B", or "C"
    created_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_query_history_created_at ON query_history(created_at DESC);
```

### 2.3 Backend Architecture

#### New Files

| File | Purpose |
|------|---------|
| `backend/app/services/history_service.py` | CRUD for query history records |
| `backend/app/routers/history.py` | REST API endpoints for history browsing |
| `backend/app/models/history.py` | Pydantic schemas for history request/response |

#### Modified Files

| File | Change |
|------|--------|
| `backend/app/core/sqlite_db.py` | Add `get_prompts_db()` and `get_history_db()` connection factories |
| `backend/app/core/config.py` | Add `prompts_db_path` and `history_db_path` |
| `backend/app/core/dependencies.py` | Add `get_history_service()` |
| `backend/app/main.py` | Register `history` router; startup: create history table |
| `backend/app/routers/query.py` | Wrap pipeline in `time.perf_counter()`; record history via `asyncio.create_task()` |

#### Timing Capture (in `_query_stream()`)

```python
async def _query_stream(request: QueryRequest):
    overall_start = time.perf_counter()

    # Fetch prompt templates for active profile
    decompose_template = prompt_service.get_prompt_template("decompose")
    filter_template = prompt_service.get_prompt_template("filter")
    generate_template = prompt_service.get_prompt_template("generate")
    active_profile = prompt_service.get_active_profile_name()  # "A", "B", or "C"

    # Stage 1: Decompose
    stage_start = time.perf_counter()
    prompt = decompose_template.replace("{question}", question)
    response = await llm_client.complete(prompt, step_name="QueryDecomposer")
    decomposer_time_ms = int((time.perf_counter() - stage_start) * 1000)
    questions = parse_questions(response)
    yield sse_event("decomposed", ...)

    # Stage 2: Retrieve
    stage_start = time.perf_counter()
    chunks, metadata = await rag.retrieve(question_texts=questions, ...)
    retriever_time_ms = int((time.perf_counter() - stage_start) * 1000)
    chunks_retrieved = len(chunks)
    yield sse_event("retrieving", ...)

    # Stage 3: Filter
    stage_start = time.perf_counter()
    prompt = filter_template.replace("{question}", question)
    prompt = prompt.replace("{chunks}", format_chunks(chunks))
    response = await llm_client.complete(prompt, temperature=0.0, step_name="RelevanceFilter")
    filter_time_ms = int((time.perf_counter() - stage_start) * 1000)
    filtered = parse_scores(response, chunks, threshold)
    chunks_filtered = len(filtered)
    yield sse_event("filtering", ...)

    # Stage 4: Generate
    stage_start = time.perf_counter()
    prompt = generate_template.replace("{question}", question)
    prompt = prompt.replace("{context}", format_context(filtered, metadata))
    answer = await llm_client.complete(prompt, temperature=0.3, step_name="ResponseGeneration")
    generator_time_ms = int((time.perf_counter() - stage_start) * 1000)

    total_time_ms = int((time.perf_counter() - overall_start) * 1000)

    # Record history (fire-and-forget)
    asyncio.create_task(history_service.record(QueryHistoryRecord(
        input_text=request.question,
        extracted_questions=json.dumps(questions),
        decomposer_time_ms=decomposer_time_ms,
        retriever_time_ms=retriever_time_ms,
        chunks_retrieved=chunks_retrieved,
        filter_time_ms=filter_time_ms,
        chunks_filtered=chunks_filtered,
        generator_time_ms=generator_time_ms,
        total_time_ms=total_time_ms,
        final_answer=answer,
        sources=json.dumps([s.dict() for s in sources]),
        profile_used=active_profile,
    )))

    yield sse_event("completed", ...)
```

**Fire-and-forget**: `asyncio.create_task()` ensures history recording never blocks the SSE stream. If recording fails, the query completes normally — history is best-effort.

#### API Endpoints

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/v1/history` | List query history (paginated, newest first). Query params: `limit` (default 50), `offset` (default 0) |
| `GET` | `/api/v1/history/{query_id}` | Get full detail for a single query |
| `DELETE` | `/api/v1/history/{query_id}` | Delete a history record |
| `DELETE` | `/api/v1/history` | Clear all history |
| `GET` | `/api/v1/history/stats` | Aggregate stats: total queries, avg time, avg chunks, most used profile |

#### Response Schemas

```python
class QueryHistorySummary(BaseModel):
    id: int
    input_text: str                        # truncated to 100 chars
    total_time_ms: int
    chunks_retrieved: int
    chunks_filtered: int
    profile_used: str | None               # "A", "B", or "C"
    created_at: str

class QueryHistoryDetail(BaseModel):
    id: int
    input_text: str                        # full text
    extracted_questions: list[str]
    decomposer_time_ms: int
    retriever_time_ms: int
    filter_time_ms: int
    generator_time_ms: int
    total_time_ms: int
    chunks_retrieved: int
    chunks_filtered: int
    final_answer: str
    sources: list[SourceMetadata]
    profile_used: str | None
    created_at: str

class QueryHistoryList(BaseModel):
    queries: list[QueryHistorySummary]
    total: int
    limit: int
    offset: int
```

### 2.4 Frontend Design

**New page**: `/history`
**New NavBar link**: "History"

```
┌──────────────────────────────────────────────────────────┐
│  Query History                    Total: 42 queries       │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  ┌────────────────────────────────────────────────────┐  │
│  │ 📊 Stats                                           │  │
│  │ Avg time: 3.2s · Avg chunks: 8.5 → 4.2 filtered    │  │
│  │ Most used: Profile A (35 queries)                  │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  ┌────────────────────────────────────────────────────┐  │
│  │ #42 · 2026-04-25 14:32 · 3.8s · Profile A          │  │
│  │ "What is the NEC4 clause about time extensions?"   │  │
│  │ 8 chunks → 4 filtered · [Expand ▼]                  │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ #41 · 2026-04-25 14:15 · 2.1s · Profile B          │  │
│  │ "How does arbitration work under the contract?"    │  │
│  │ 10 chunks → 3 filtered · [Expand ▼]                 │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ #40 · 2026-04-25 13:50 · 4.5s · Profile A          │  │
│  │ "Explain the payment mechanism and valuation..."   │  │
│  │ 12 chunks → 6 filtered · [Expand ▼]                 │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  [Load More]                              [Clear All]    │
│                                                           │
│  ── Expanded: #42 ─────────────────────────────────────  │
│                                                           │
│  ⏱ Pipeline Timing:                                      │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Decompose  ████████░░░░░░░░░  0.8s                 │  │
│  │ Retrieve   ██░░░░░░░░░░░░░░░  0.2s  (8 chunks)    │  │
│  │ Filter     ██████████░░░░░░░░  1.1s  (4 kept)      │  │
│  │ Generate   ██████████████████  1.7s                 │  │
│  │ ──────────────────────────────────                  │  │
│  │ Total      ██████████████████  3.8s                 │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  📝 Extracted Questions:                                  │
│  1. What are the time extension provisions?              │
│  2. What notice is required for time extensions?        │
│  3. How is extended time calculated under NEC4?         │
│                                                           │
│  💬 Answer:                                              │
│  ┌────────────────────────────────────────────────────┐  │
│  │ • The time extension provisions are outlined in    │  │
│  │   clause 61.3 [NEC4 ACC.pdf, page 3]               │  │
│  │ • Notice must be given within 8 weeks [NEC4 ACC... │  │
│  │ ...                                                 │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  📎 Sources (4):  · NEC4 ACC.pdf, page 3  · ...          │
│  📋 Profile used: A                                      │
│                                                           │
└──────────────────────────────────────────────────────────┘
```

**Component tree**:
```
HistoryPage
├── HistoryStats (summary bar: total queries, avg time, avg chunks, most used profile)
├── HistoryList (scrollable list)
│   └── HistoryCard × N (collapsed: date, time, question preview, profile badge)
│       └── HistoryDetail (expanded: timing bars, questions, answer, sources)
├── LoadMoreButton
└── ClearAllButton (with confirmation dialog)
```

**Timing bars**: Pure CSS — `<div className="h-4 rounded bg-blue-500" style={{width: `${(time/total)*100}%`}} />`. Color-coded: Decompose (blue-400), Retrieve (green-400), Filter (amber-400), Generate (purple-400).

**API hooks**:
```typescript
useQueryHistory(limit, offset)     // useQuery: GET /history
useQueryHistoryDetail(id)          // useQuery: GET /history/{id}
useDeleteHistoryRecord(id)         // useMutation: DELETE /history/{id}
useClearHistory()                  // useMutation: DELETE /history
useHistoryStats()                  // useQuery: GET /history/stats
```

### 2.5 Acceptance Criteria
- [ ] Every query creates a history record with all timing and data fields
- [ ] `GET /api/v1/history?limit=20&offset=0` returns paginated results (newest first)
- [ ] `GET /api/v1/history/{id}` returns full detail with parsed JSON fields
- [ ] `DELETE /api/v1/history/{id}` removes one record
- [ ] `DELETE /api/v1/history` clears all records
- [ ] `GET /api/v1/history/stats` returns aggregate statistics
- [ ] History recording is fire-and-forget — never blocks query response
- [ ] History page accessible via NavBar link
- [ ] Timing bars accurately represent stage proportions
- [ ] Expanded detail shows answer rendered as markdown with citation links
- [ ] Sources show clickable links to PDF viewer
- [ ] All states: loading, empty, error, success
- [ ] Profile used is shown for each query
- [ ] All backend + frontend tests pass

---

## Sub-Phase Breakdown

| Sub-Phase | Feature | Difficulty | Backend | Frontend | Depends On |
|-----------|---------|-----------|---------|----------|------------|
| 3.1 | SQLite Infrastructure | ⭐⭐ Medium | sqlite_db.py (dual-DB factories), config, table creation, seed data | None | — |
| 3.2 | Prompt Backend | ⭐⭐⭐ Hard | prompt_service.py, prompts router, models, template formatting | None | 3.1 |
| 3.3 | Prompt Frontend Page | ⭐⭐ Medium | None | SystemPromptsPage, ProfileList, PromptEditor, placeholder docs | 3.2 |
| 3.4 | Service Refactoring (Template Injection) | ⭐⭐⭐ Hard | query_decomposer, relevance_filter, rag.py, query.py | None | 3.2 |
| 3.5 | History Backend | ⭐⭐⭐ Hard | history_service.py, history router, models, query.py timing capture | None | 3.1, 3.4 |
| 3.6 | History Frontend Page | ⭐⭐ Medium | None | HistoryPage, HistoryList, HistoryDetail, timing bars | 3.5 |

### Dependency Graph

```
3.1 (SQLite Infra)
 │
 ├──► 3.2 (Prompt Backend)
 │       │
 │       ├──► 3.3 (Prompt Frontend)     ← parallel with 3.4
 │       │
 │       └──► 3.4 (Service Refactoring)
 │               │
 │               └──► 3.5 (History Backend)
 │                       │
 │                       └──► 3.6 (History Frontend)
```

- **3.1** is the foundation
- **3.2** blocks 3.3 and 3.4 (both need the prompt service)
- **3.3 and 3.4 run in PARALLEL** after 3.2
- **3.5** needs 3.1 (history DB) AND 3.4 (refactored pipeline for timing capture)
- **3.6** needs 3.5 (history API)

---

## Sub-Phase 3.1: SQLite Infrastructure ⭐⭐ Medium

### Objective
Introduce SQLite with two separate databases: `prompts.db` for prompt templates and `history.db` for query history. Create connection factories, table schemas, and default seed data.

### Database Technology

**Decision**: `sqlite3` stdlib — zero new dependencies. Lightweight operations, adequate for single-user desktop app.

### Changes Required

| File | Change |
|------|--------|
| `backend/app/core/sqlite_db.py` | **NEW** — `get_prompts_db()` and `get_history_db()` connection factories; `init_prompts_db()`, `init_history_db()` table creation; `seed_default_profiles()` |
| `backend/app/core/config.py` | Add `prompts_db_path: str = "./data/prompts.db"` and `history_db_path: str = "./data/history.db"` |
| `backend/app/main.py` | Startup event: create `data/` dir, init both DBs, seed default profiles |
| `backend/.env.example` | Add `PROMPTS_DB_PATH` and `HISTORY_DB_PATH` |
| `backend/.gitignore` | Add `data/` directory |

**`sqlite_db.py` design**:
```python
import sqlite3, os
from app.core.config import get_settings

def _get_db(db_path: str) -> sqlite3.Connection:
    """Shared connection factory (caller must close)."""
    os.makedirs(os.path.dirname(db_path), exist_ok=True)
    conn = sqlite3.connect(db_path)
    conn.row_factory = sqlite3.Row
    conn.execute("PRAGMA journal_mode=WAL")
    conn.execute("PRAGMA foreign_keys=ON")
    return conn

def get_prompts_db() -> sqlite3.Connection:
    return _get_db(get_settings().prompts_db_path)

def get_history_db() -> sqlite3.Connection:
    return _get_db(get_settings().history_db_path)
```

### Acceptance Criteria
- [ ] `backend/data/prompts.db` created on first startup with profile + prompt tables
- [ ] `backend/data/history.db` created on first startup with query_history table + index
- [ ] 3 profiles (A/B/C) seeded with current hardcoded prompts as default templates
- [ ] Profile A active by default
- [ ] `data/` directory gitignored
- [ ] Both DB paths configurable via `.env`
- [ ] Existing `pytest` tests still pass

---

## Sub-Phase 3.2: Prompt Backend ⭐⭐⭐ Hard

### Objective
Create the prompt service layer: Pydantic models, CRUD service, template formatting, REST API endpoints.

### Changes Required

| File | Change |
|------|--------|
| `backend/app/models/prompts.py` | **NEW** — `PromptProfile`, `PromptSetResponse` (3 prompts), `PromptUpdateRequest` |
| `backend/app/services/prompt_service.py` | **NEW** — `PromptService`: get_profile, list_profiles, activate, get_template, update_prompt, update_all, format_prompt |
| `backend/app/routers/prompts.py` | **NEW** — 5 endpoints on `/api/v1/prompts` |
| `backend/app/core/dependencies.py` | Add `get_prompt_service()` |
| `backend/app/main.py` | Register `prompts` router |

**`PromptService` key methods**:
```python
class PromptService:
    def get_active_profile_name(self) -> str:
        """Return "A", "B", or "C" — which profile is active."""

    def get_prompt_template(self, step: str) -> str:
        """Get the template for the active profile + given step ("decompose"/"filter"/"generate")."""

    def list_profiles(self) -> list[dict]:
        """Return [{name: "A", is_active: true}, ...]."""

    def activate_profile(self, name: str) -> None:
        """Set is_active=1 for name, is_active=0 for others. Validates name in {A, B, C}."""

    def get_profile_prompts(self, name: str) -> dict:
        """Return {"decompose": "...", "filter": "...", "generate": "..."}."""

    def update_prompt(self, name: str, step: str, template: str) -> None:
        """Update single template. Validates step in {decompose, filter, generate}."""

    def update_all_prompts(self, name: str, prompts: dict) -> None:
        """Batch update all 3 templates."""

    def reset_to_defaults(self, name: str, step: str | None = None) -> None:
        """Restore seed template. If step is None, reset all 3 steps. Otherwise reset only that step."""
```

### Acceptance Criteria
- [ ] `GET /api/v1/prompts/profiles` returns A/B/C with active status
- [ ] `PUT /api/v1/prompts/profiles/B/activate` switches active profile (only one at a time)
- [ ] `PUT /api/v1/prompts/profiles/A/decompose` updates template and persists across restarts
- [ ] `PUT /api/v1/prompts/profiles/A/all` batch-updates all 3 templates
- [ ] Invalid profile name (e.g., "D") returns 400
- [ ] Invalid step name (e.g., "summarize") returns 400
- [ ] Active profile is fetched fresh per query (no caching)
- [ ] All tests pass: `test_phase3_prompt_service.py`, `test_phase3_prompts_router.py`

---

## Sub-Phase 3.3: Prompt Frontend Page ⭐⭐ Medium

### Objective
Build the System Prompts page at `/system-prompts` with profile switching and full template editing.

### Changes Required

| File | Change |
|------|--------|
| `frontend/src/pages/SystemPromptsPage.tsx` | **NEW** |
| `frontend/src/components/ProfileList.tsx` | **NEW** — 3 cards (A/B/C) |
| `frontend/src/components/PromptEditor.tsx` | **NEW** — 3 textareas + placeholder docs + save/reset/cancel |
| `frontend/src/components/PlaceholderDocs.tsx` | **NEW** — info box listing available placeholders |
| `frontend/src/lib/api.ts` | Add 5 prompt API functions |
| `frontend/src/lib/queries.tsx` | Add TanStack Query hooks |
| `frontend/src/types/index.ts` | Add prompt-related types |
| `frontend/src/App.tsx` | Add `/system-prompts` route |
| `frontend/src/components/NavBar.tsx` | Add "System Prompts" nav link |

### Acceptance Criteria
- [ ] Page accessible via NavBar
- [ ] 3 profiles shown: A (active ●), B (○), C (○)
- [ ] "Set Active" switches active profile
- [ ] Editing a profile shows 3 labeled textareas with current templates
- [ ] Each textarea labeled with available placeholders
- [ ] Placeholder docs info box visible
- [ ] "Save Changes" persists; "Reset to Defaults" restores seed template; "Cancel" reverts
- [ ] Soft warning if template references unknown placeholder
- [ ] All states: loading, error, success
- [ ] Frontend tests pass

---

## Sub-Phase 3.4: Service Refactoring (Template Injection) ⭐⭐⭐ Hard

### Objective
Refactor all 3 LLM-calling services to fetch prompt templates from the DB instead of using hardcoded strings. Wire the query router to pass `PromptService` through the pipeline.

### Changes Required

| File | Change |
|------|--------|
| `backend/app/services/query_decomposer.py` | Accept `PromptService`; `decompose()` fetches template, replaces `{question}`, calls LLM |
| `backend/app/services/relevance_filter.py` | Accept `PromptService`; `filter()` fetches template, replaces `{question}` and `{chunks}`, calls LLM |
| `backend/app/services/rag.py` | Accept `PromptService`; `generate_response()` fetches template, replaces `{question}` and `{context}`, calls LLM |
| `backend/app/routers/query.py` | Instantiate `PromptService` at pipeline start; pass to all services; capture `active_profile_name` |
| `backend/app/test/conftest.py` | Add `mock_prompt_service` fixture |
| `backend/app/test/test_phase1_query_decomposer.py` | Update tests for PromptService dependency |
| `backend/app/test/test_phase1_relevance_filter.py` | Update tests |
| `backend/app/test/test_phase1_rag_service.py` | Update tests |

**Before/After per service**:

| Service | Before (hardcoded) | After (template from DB) |
|---------|-------------------|-------------------------|
| `QueryDecomposer.decompose()` | `f"Given this question: '{question}'\n\nBreak it down..."` | `template.replace("{question}", question)` |
| `RelevanceFilter._build_prompt()` | `f"Given question '{question}'...{chunks_formatted}"` | `template.replace("{question}", question).replace("{chunks}", chunks_formatted)` |
| `RAGService.generate_response()` | `f"Question: {question}\n\nAnswer...{context}\n\nAnswer:"` | `template.replace("{question}", question).replace("{context}", context)` |

**`LLMClient.complete()` — NO CHANGES.** Templates remain single user-role messages.

### Acceptance Criteria
- [ ] All 3 LLM calls use templates from the active profile in the DB
- [ ] Placeholders correctly replaced: `{question}` → user input, `{chunks}` → numbered list, `{context}` → formatted chunks with citations
- [ ] Switching active profile changes prompts for NEXT query
- [ ] If template is empty string, LLM call proceeds with empty prompt (LLM error is acceptable)
- [ ] All existing tests pass (updated for PromptService dependency)
- [ ] New tests: `test_phase3_prompt_injection.py`

---

## Sub-Phase 3.5: History Backend ⭐⭐⭐ Hard

### Objective
Capture timing and data from every pipeline stage and persist to `history.db`. Expose REST API for browsing.

### Changes Required

| File | Change |
|------|--------|
| `backend/app/models/history.py` | **NEW** — `QueryHistoryRecord`, `QueryHistorySummary`, `QueryHistoryDetail`, `QueryHistoryList` |
| `backend/app/services/history_service.py` | **NEW** — `HistoryService`: record, list (paginated), get, delete, clear_all, get_stats |
| `backend/app/routers/history.py` | **NEW** — 5 endpoints on `/api/v1/history` |
| `backend/app/routers/query.py` | Add `time.perf_counter()` around each stage; `asyncio.create_task(history_service.record(...))` at end |
| `backend/app/core/dependencies.py` | Add `get_history_service()` |
| `backend/app/main.py` | Register `history` router |

**Timing stages captured**: decompose, retrieve, filter, generate, total.

### Acceptance Criteria
- [ ] Every query creates a history record with all fields
- [ ] All 5 history API endpoints work correctly
- [ ] Pagination: `limit` + `offset`, newest first
- [ ] Stats endpoint: total queries, avg times, avg chunks, most used profile
- [ ] History recording is fire-and-forget (never blocks query)
- [ ] History persists across restarts
- [ ] All tests pass: `test_phase3_history_service.py`, `test_phase3_history_router.py`, `test_phase3_query_history_integration.py`

---

## Sub-Phase 3.6: History Frontend Page ⭐⭐ Medium

### Objective
Build the History page at `/history` with scrollable list, expandable detail, timing bars, and stats.

### Changes Required

| File | Change |
|------|--------|
| `frontend/src/pages/HistoryPage.tsx` | **NEW** |
| `frontend/src/components/HistoryList.tsx` | **NEW** |
| `frontend/src/components/HistoryCard.tsx` | **NEW** — collapsed card + expandable detail |
| `frontend/src/components/TimingBar.tsx` | **NEW** — CSS-width proportional bars |
| `frontend/src/lib/api.ts` | Add 5 history API functions |
| `frontend/src/lib/queries.tsx` | Add TanStack Query hooks |
| `frontend/src/types/index.ts` | Add history types |
| `frontend/src/App.tsx` | Add `/history` route |
| `frontend/src/components/NavBar.tsx` | Add "History" nav link |

### Acceptance Criteria
- [ ] Page accessible via NavBar
- [ ] Stats bar: total, avg time, avg chunks, most used profile
- [ ] History list: paginated, newest first, shows date/time/duration/input preview/profile badge
- [ ] Expand card: timing bars, extracted questions, full answer (markdown), sources (clickable)
- [ ] "Load More" pagination
- [ ] "Clear All" with confirmation
- [ ] Individual delete with confirmation
- [ ] All states: loading skeleton, empty "No queries yet", error with retry
- [ ] Frontend tests pass

---

## New Dependencies

**Zero.** `sqlite3` is Python stdlib. All UI is custom Tailwind. No new npm or pip packages.

---

## Directory Structure After Package 3

```
legco_reranker/
├── backend/
│   ├── app/
│   │   ├── core/
│   │   │   ├── config.py              # + prompts_db_path, history_db_path
│   │   │   ├── database.py            # (unchanged - ChromaDB)
│   │   │   ├── dependencies.py        # + get_prompt_service, get_history_service
│   │   │   └── sqlite_db.py           # NEW - dual-DB connection factories
│   │   ├── models/
│   │   │   ├── history.py             # NEW
│   │   │   └── prompts.py             # NEW
│   │   ├── routers/
│   │   │   ├── history.py             # NEW
│   │   │   ├── prompts.py             # NEW
│   │   │   └── query.py               # MODIFIED - timing capture + template injection
│   │   ├── services/
│   │   │   ├── history_service.py     # NEW
│   │   │   ├── prompt_service.py      # NEW - template storage + formatting
│   │   │   └── query_decomposer.py    # MODIFIED - use PromptService for templates
│   │   │   └── rag.py                 # MODIFIED - use PromptService for templates
│   │   │   └── relevance_filter.py    # MODIFIED - use PromptService for templates
│   │   ├── test/
│   │   │   ├── test_phase3_prompt_service.py       # NEW
│   │   │   ├── test_phase3_prompts_router.py       # NEW
│   │   │   ├── test_phase3_prompt_injection.py     # NEW
│   │   │   ├── test_phase3_history_service.py      # NEW
│   │   │   ├── test_phase3_history_router.py       # NEW
│   │   │   ├── test_phase3_query_history_integration.py  # NEW
│   │   │   ├── test_phase1_query_decomposer.py      # MODIFIED
│   │   │   ├── test_phase1_relevance_filter.py      # MODIFIED
│   │   │   └── test_phase1_rag_service.py           # MODIFIED
│   │   └── main.py                    # MODIFIED - startup init + new routers
│   ├── data/                          # NEW (gitignored)
│   │   ├── prompts.db
│   │   └── history.db
│   └── .env.example                   # + PROMPTS_DB_PATH, HISTORY_DB_PATH
├── frontend/src/
│   ├── components/
│   │   ├── HistoryCard.tsx            # NEW
│   │   ├── HistoryList.tsx            # NEW
│   │   ├── NavBar.tsx                 # MODIFIED - +2 nav links
│   │   ├── PlaceholderDocs.tsx        # NEW
│   │   ├── ProfileList.tsx            # NEW
│   │   ├── PromptEditor.tsx           # NEW
│   │   └── TimingBar.tsx              # NEW
│   ├── pages/
│   │   ├── HistoryPage.tsx            # NEW
│   │   └── SystemPromptsPage.tsx      # NEW
│   ├── lib/
│   │   ├── api.ts                     # MODIFIED - +history +prompts endpoints
│   │   └── queries.tsx               # MODIFIED - +history +prompts hooks
│   ├── types/index.ts                 # MODIFIED - +history +prompts types
│   └── App.tsx                        # MODIFIED - +2 routes
└── .gitignore                         # + data/
```

---

## Test Plan

### Backend Tests (New)

| File | Coverage | Sub-Phase |
|------|----------|-----------|
| `test_phase3_prompt_service.py` | Prompt CRUD, activation, template formatting, edge cases | 3.2 |
| `test_phase3_prompts_router.py` | All 5 HTTP endpoints, error codes, validation | 3.2 |
| `test_phase3_prompt_injection.py` | Templates fetched from DB, placeholders replaced, end-to-end query uses templates | 3.4 |
| `test_phase3_history_service.py` | History CRUD, pagination, stats, edge cases | 3.5 |
| `test_phase3_history_router.py` | All 5 HTTP endpoints, pagination bounds, empty DB | 3.5 |
| `test_phase3_query_history_integration.py` | Full SSE query → history record created with correct data | 3.5 |

### Backend Tests (Modified)

| File | Change | Sub-Phase |
|------|--------|-----------|
| `test_phase1_query_decomposer.py` | Add PromptService dependency to test setup | 3.4 |
| `test_phase1_relevance_filter.py` | Add PromptService dependency | 3.4 |
| `test_phase1_rag_service.py` | Add PromptService dependency | 3.4 |
| `conftest.py` | Add `mock_prompt_service` fixture | 3.2 |

### Frontend Tests (New)

| File | Coverage | Sub-Phase |
|------|----------|-----------|
| `SystemPromptsPage.test.tsx` | Page render, profile list, activation, edit flows | 3.3 |
| `ProfileList.test.tsx` | A/B/C cards, active indicator, edit button | 3.3 |
| `PromptEditor.test.tsx` | 3 textareas, placeholder docs, save/reset/cancel | 3.3 |
| `HistoryPage.test.tsx` | Page render, stats, pagination, clear all | 3.6 |
| `HistoryCard.test.tsx` | Collapsed/expanded states, timing bars, answer, sources | 3.6 |
| `TimingBar.test.tsx` | Proportional widths, zero-time stages, color mapping | 3.6 |

### Acceptance Tests

| File | Coverage | Sub-Phase |
|------|----------|-----------|
| `test_acceptance_package3_prompts.py` | Create profile → edit templates → activate → query uses new templates | 3.2-3.4 |
| `test_acceptance_package3_history.py` | Multiple queries → history shows correct records with timing + profile | 3.5 |

---

## Risks & Mitigations

| Risk | Impact | Mitigation |
|------|--------|------------|
| User removes `{question}` placeholder → LLM doesn't see the question | LLM returns irrelevant or empty response | UI shows soft warning; user's choice — they can always reset to defaults |
| `str.replace()` is case-sensitive → `{Question}` not recognized | Placeholder left as-is in prompt | UI documents exact placeholder names; preview mode could highlight unresolved placeholders |
| `sqlite3` sync calls block async event loop | Slow responses under load | Operations are trivial (single-row lookups). History recording is fire-and-forget. WAL mode for concurrent reads. |
| History DB grows unbounded | Disk usage | Manual cleanup via "Clear All" button. Future: auto-prune config. |
| `data/` directory not created on startup | SQLite connection fails | `os.makedirs(dirname, exist_ok=True)` in connection factory |
| User expects `{question}` to work in filter/generate templates | Might add it in wrong context | Placeholder docs on page show exactly which placeholders are valid per step |
| Two separate DB files complicate backups | User might backup one but not the other | Use same `data/` directory — easy to back up as one folder |

---

## Decisions

| # | Question | Decision |
|---|----------|----------|
| 1 | Template editing scope | **Full prompt template** with `{placeholder}` variables — users edit the entire message sent to LLM |
| 2 | System role vs user role | **User role only** — no system prompt concept. Templates are the full user message (same as current). |
| 3 | Number of profiles | **Fixed 3** (A, B, C) — no create/delete. Simplest mental model. |
| 4 | Database separation | **Two files**: `prompts.db` and `history.db` — independent concerns |
| 5 | Database technology | **sqlite3 stdlib** — zero new dependencies |
| 6 | Placeholder syntax | **`{variable_name}`** with `str.replace()` — simple, predictable. No `str.format()` edge cases. |
| 7 | History recording reliability | **Fire-and-forget** (`asyncio.create_task`) — never blocks query response |
| 8 | History data retention | **Manual cleanup only** in Package 3 |
| 9 | Timing capture location | **Inline in query.py** — centralized, one file changes |
| 10 | Frontend timing visualization | **CSS width bars** — no charting library |
| 11 | History pagination | **Offset-based** (`limit` + `offset`) |
| 12 | NavBar order | **LTT · RAG Database · System Prompts · History** |
| 13 | Default seed templates | **All 3 profiles start identical** (current hardcoded prompts) — users customize from a common baseline |
| 14 | Reset button granularity | **Both** — per-step reset icon (↺) on each textarea label, plus "Reset All to Defaults" button in the action bar |

---

## Pre-Implementation Checklist

Before starting implementation, verify:
- [ ] All existing backend tests pass (`cd backend && pytest app/test/ -v`)
- [ ] All existing frontend tests pass (`cd frontend && npm test`)
- [ ] AGENTS.md updated to reflect current project state (no longer "Greenfield")
- [ ] Plan reviewed and approved by user