66 KiB

Raw Blame History

Package 3 Enhancement Plan

Source: User request (2026-04-25)
Scope: System Prompt Configuration Page + Query History Page
Status: ✅ Package 3 Complete (3.1 ✅, 3.2 ✅, 3.3 ✅, 3.4 ✅, 3.5 ✅, 3.6 ✅)

Objective

Add two new features that give users visibility and control over the RAG pipeline:

System Prompt Configuration Page — Users can view/edit the full prompt templates for all 3 LLM calls (Decomposer, Relevance Filter, Response Generator). Templates support placeholders ({question}, {chunks}, {context}) that are replaced at query time. Supports 3 profiles (A, B, C) that users switch between with a single click.
Query History Page — Records every query with full detail: input text, extracted questions, timing per pipeline stage (decompose, retrieve, filter, generate), actual LLM prompts sent for all 3 calls, chunks retrieved/filtered as full XML-tagged data (filename, page, content, relevance scores), final answer, sources, total time, and which profile was used.

Current State

What Exists

LLM Pipeline (3 calls, prompt templates hardcoded in service files):

Call	Service	File:Line	Current Prompt Template	Temp	Placeholders
1	`QueryDecomposer`	`services/query_decomposer.py:54-59`	`"Given this question: '{question}'\n\nBreak it down into 2-5 simplified sub-questions..."`	default (0.7)	`{question}`
2	`RelevanceFilter`	`services/relevance_filter.py:36-39`	`"Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores.\n{chunks_string}"`	0.0	`{question}`, `{chunks}`
3	`RAGService`	`services/rag.py:108-117`	`"Question: {question}\n\nAnswer the question using ONLY these document chunks...bullet points...cite sources...\n\nDocument chunks:\n{context}\n\nAnswer:"`	0.3	`{question}`, `{context}`

LLMClient.complete(prompt, temperature, step_name) — single method, sends prompt as [{"role": "user", "content": prompt}]
All 3 prompts are f-strings built inline in the service methods — no template abstraction exists
The step_name parameter is only used for log labels

Data Storage:

No SQL database exists. ChromaDB is the only persistent store (vector database).
Config is .env-driven via pydantic-settings.BaseSettings (flat key-value, not user-editable at runtime).
Logging exists (RotatingFileHandler to backend/app/log/backend.log) — timing data is logged but never persisted.

Frontend:

3 pages: LTTPage (/), RAGDatabasePage (/rag-database), PdfViewerPage (/pdf-viewer)
NavBar has "LTT" and "RAG Database" links
No history page, no settings/configuration page
No shadcn/ui — all components are custom Tailwind

Query Pipeline (SSE streaming):

POST /api/v1/query
  → QueryDecomposer.decompose()     [LLM Call 1, timing logged only]
  → RAGService.retrieve()           [ChromaDB, no timing capture]
  → RelevanceFilter.filter()        [LLM Call 2, timing logged only]
  → RAGService.generate_response()  [LLM Call 3, timing logged only]
  → SSE: completed event with answer + sources

What's Missing (Gaps This Plan Fills)

No way for users to customize LLM prompts
No persistence of query history — all queries are ephemeral
No record of how long each pipeline stage takes
No record of the actual LLM prompts sent during each query
No record of the full chunk data (text, metadata, scores) used at each stage
No way to review past queries and answers
No user-facing configuration page of any kind
Hardcoded prompt templates can't be tuned without changing source code

Feature 1: System Prompt Configuration (Full Template Editing)

1.1 Overview

Users edit the complete prompt template for each of the 3 LLM calls. Templates contain placeholder variables (e.g., {question}, {chunks}, {context}) that are replaced with actual data at query time. Three profiles (A, B, C) let users save and switch between different prompt sets.

Design Decision: Unlike the original plan (system role prefix + hardcoded user template), users edit the ENTIRE prompt. This gives full control over LLM instructions, output format, and behavior. The page documents exactly which placeholders are available for each step so users know what they can use.

1.2 Database Schema

Database: backend/data/prompts.db (SQLite, stdlib sqlite3)

CREATE TABLE IF NOT EXISTS system_prompt_profiles (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,           -- "A" | "B" | "C"
    is_active INTEGER DEFAULT 0,         -- only ONE row has is_active = 1
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE TABLE IF NOT EXISTS system_prompts (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    profile_id INTEGER NOT NULL,
    step_name TEXT NOT NULL,             -- "decompose" | "filter" | "generate"
    prompt_template TEXT NOT NULL,       -- full prompt with {placeholder} variables
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at TEXT NOT NULL DEFAULT (datetime('now')),
    FOREIGN KEY (profile_id) REFERENCES system_prompt_profiles(id) ON DELETE CASCADE,
    UNIQUE(profile_id, step_name)
);

Default seed data (3 profiles × 3 prompts = 9 rows, Profile A active by default):

All 3 profiles start with the same defaults (the current hardcoded prompts). Users customize from there.

Profile	Step	Placeholder	Seed Template
A	decompose	`{question}`	`"Given this question: '{question}'\n\nBreak it down into 2-5 simplified sub-questions that would help search for relevant information. Each sub-question should be short and focused on one aspect. Return as a JSON array of strings."`
A	filter	`{question}`, `{chunks}`	`"Given question '{question}' and these document chunks, rate each 0-10 for relevance.\nReturn JSON array of scores.\n{chunks}\n"`
A	generate	`{question}`, `{context}`	`"Question: {question}\n\nAnswer the question using ONLY these document chunks. Do not use any external knowledge. Format your answer as bullet points. Cite your sources inline using the exact bracket labels provided, e.g. [filename, page N]. Place the citation at the end of each relevant point.\n\nDocument chunks:\n{context}\n\nAnswer:"`
B	decompose	`{question}`	(same as A)
B	filter	`{question}`, `{chunks}`	(same as A)
B	generate	`{question}`, `{context}`	(same as A)
C	decompose	`{question}`	(same as A)
C	filter	`{question}`, `{chunks}`	(same as A)
C	generate	`{question}`, `{context}`	(same as A)

1.3 Available Placeholders (per step)

These are documented on the frontend edit page so users know exactly what they can insert:

Step	Placeholder	What It Contains	Example Replacement
Decompose	`{question}`	The user's original input text	`"What is the NEC4 clause about time extensions?"`
Filter	`{question}`	The user's original input text	(same)
	`{chunks}`	Numbered list of all retrieved chunks: `Chunk 1: <text>\nChunk 2: <text>...`	`"Chunk 1: The NEC4 clause 61.3 states that time extensions...\nChunk 2: Notice must be given..."`
Generate	`{question}`	The user's original input text	(same)
	`{context}`	Formatted chunks with citation labels: `[filename, page N] Source: ...\nSummary: ...\nContent: ...`	`"[NEC4 ACC.pdf, page 3] Source: NEC4 ACC.pdf\nSummary: Discussion of time extension provisions...\nContent: Clause 61.3 states..."`

Placeholder syntax: {variable_name} — must match exactly. Unknown placeholders are left as-is (not replaced). If a user removes a required placeholder (e.g., {question}), the LLM won't see the question — the UI warns but doesn't block.

1.4 Backend Architecture

New Files

File	Purpose
`backend/app/core/sqlite_db.py`	SQLite connection factory (shared by prompts + history)
`backend/app/services/prompt_service.py`	CRUD for prompt profiles and templates; template formatting
`backend/app/routers/prompts.py`	REST API endpoints for prompt management
`backend/app/models/prompts.py`	Pydantic schemas for prompt request/response

Modified Files

File	Change
`backend/app/core/config.py`	Add `prompts_db_path` and `history_db_path`
`backend/app/core/dependencies.py`	Add DI factories: `get_prompt_service()`
`backend/app/main.py`	Register `prompts` router; startup: create tables + seed 3 default profiles
`backend/app/services/query_decomposer.py`	`decompose()` fetches template from prompt service, formats with `{question}`, sends to LLM
`backend/app/services/relevance_filter.py`	`filter()` fetches template from prompt service, formats with `{question}` and `{chunks}`, sends to LLM
`backend/app/services/rag.py`	`generate_response()` fetches template from prompt service, formats with `{question}` and `{context}`, sends to LLM
`backend/app/routers/query.py`	Pass `PromptService` to pipeline; record active profile name for history

How Template Formatting Works

Each service method changes from building a hardcoded prompt to fetching and formatting a template:

Before (query_decomposer.py):

prompt = (
    f"Given this question: '{question}'\n\n"
    f"Break it down into 2-5 simplified sub-questions..."
)
response = await self.llm_client.complete(prompt, step_name="QueryDecomposer")

After (query_decomposer.py):

template = self.prompt_service.get_prompt_template(step="decompose")
prompt = template.replace("{question}", question)
response = await self.llm_client.complete(prompt, step_name="QueryDecomposer")

PromptService.get_prompt_template() fetches the template for the currently active profile + given step. Uses Python str.replace() for placeholder substitution — simple, predictable, no str.format() edge cases with curly braces in user text.

Note: LLMClient.complete() does NOT change — no system_prompt parameter is added. Templates remain single user-role messages, same as today. The only difference is the prompt text comes from the DB instead of being hardcoded.

API Endpoints (5 total — fixed 3 profiles, no create/delete)

Method	Path	Description
`GET`	`/api/v1/prompts/profiles`	List all 3 profiles with active status: `[{name: "A", is_active: true}, ...]`
`PUT`	`/api/v1/prompts/profiles/{name}/activate`	Activate a profile by name (e.g., `PUT /profiles/B/activate`). Validates name is A/B/C.
`GET`	`/api/v1/prompts/profiles/{name}`	Get all 3 prompt templates for a profile
`PUT`	`/api/v1/prompts/profiles/{name}/{step}`	Update a single prompt template. Validates step is decompose/filter/generate.
`PUT`	`/api/v1/prompts/profiles/{name}/all`	Batch update all 3 prompt templates for a profile

Why fixed 3 profiles (no create/delete):

Simplest mental model: 3 slots, name them A/B/C
No duplicate name conflicts, no "delete last profile" edge case
"Reset to Defaults" restores the seed template for a profile

1.5 Frontend Design

New page: /system-prompts
New NavBar link: "System Prompts"

┌──────────────────────────────────────────────────────────┐
│  System Prompts                                          │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  Active Profile: [A ▼]  [Set Active]                     │
│                                                           │
│  ┌────────────────────────────────────────────────────┐  │
│  │ ● Profile A  (active)     [Edit]                   │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ ○ Profile B               [Edit]                   │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ ○ Profile C               [Edit]                   │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  ── Editing Profile A ─────────────────────────────────  │
│                                                           │
│  Available placeholders:                                  │
│  ┌────────────────────────────────────────────────────┐  │
│  │  {question}  — The user's input question           │  │
│  │  {chunks}    — Retrieved document chunks (filter)  │  │
│  │  {context}   — Formatted chunks with citations     │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  Step 1: Query Decomposition                             │
│  Placeholders: {question}                                │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Given this question: '{question}'                  │  │
│  │                                                    │  │
│  │ Break it down into 2-5 simplified sub-questions    │  │
│  │ that would help search for relevant information.   │  │
│  │ Each sub-question should be short and focused on   │  │
│  │ one aspect. Return as a JSON array of strings.     │  │
│  └────────────────────────────────────────────────────┘  │
│  ───────────────────────────────────────────────────────  │
│  Step 2: Relevance Filtering                             │
│  Placeholders: {question}, {chunks}                     │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Given question '{question}' and these document     │  │
│  │ chunks, rate each 0-10 for relevance.              │  │
│  │ Return JSON array of scores.                       │  │
│  │ {chunks}                                           │  │
│  └────────────────────────────────────────────────────┘  │
│  ───────────────────────────────────────────────────────  │
│  Step 3: Response Generation                             │
│  Placeholders: {question}, {context}                    │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Question: {question}                               │  │
│  │                                                    │  │
│  │ Answer the question using ONLY these document      │  │
│  │ chunks. Do not use any external knowledge.         │  │
│  │ Format your answer as bullet points.               │  │
│  │ Cite your sources inline...                        │  │
│  │                                                    │  │
│  │ Document chunks:                                   │  │
│  │ {context}                                          │  │
│  │                                                    │  │
│  │ Answer:                                            │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  [Save Changes]  [Reset All to Defaults]  [Cancel]      │
│                                                           │
└──────────────────────────────────────────────────────────┘

Component tree:

SystemPromptsPage
├── ProfileSelector (dropdown A/B/C + "Set Active" button)
├── ProfileList (3 cards, active indicator)
│   └── ProfileCard × 3 (name, active indicator, Edit button)
├── PlaceholderDocs (info box showing available placeholders per step)
└── PromptEditor (shown when editing a profile)
    ├── PromptTextArea × 3 (labeled with step name + available placeholders)
    │   └── Per-step reset icon (↺) next to each textarea label
    └── ActionBar (Save, Reset All to Defaults, Cancel)

Placeholder documentation in UI: The page shows a "Available Placeholders" info box listing all placeholder variables and what they expand to. Each textarea has a subtle label showing which placeholders are valid for that step (e.g., "Placeholders: {question}, {chunks}"). Unknown placeholders in the template are left as-is by the backend — the UI shows a soft warning if the template references an unknown placeholder, but doesn't block saving.

API hooks (new in lib/queries.tsx):

usePromptProfiles()              // useQuery: GET /prompts/profiles
usePromptProfile(name)           // useQuery: GET /prompts/profiles/{name}
useActivateProfile(name)         // useMutation: PUT /prompts/profiles/{name}/activate
useUpdatePrompt(name, step)      // useMutation: PUT /prompts/profiles/{name}/{step}
useUpdateAllPrompts(name)        // useMutation: PUT /prompts/profiles/{name}/all

Edge cases handled:

Empty prompt template: allowed (LLM call proceeds with empty prompt — LLM will likely error or return nothing)
Removed {question} placeholder: soft warning shown; LLM won't see the question — user's choice
Unknown placeholder in template (e.g., {foo}): left as-is, UI shows warning badge
Very long templates: textarea with vertical scroll, character count
Unsaved changes: warn before navigating away
Loading state: skeleton cards
Error state: red error banner with retry

1.6 Acceptance Criteria

/system-prompts page accessible via NavBar link
3 profiles (A/B/C) shown with active indicator (● / ○)
"Set Active" switches which profile is used for queries
Editing a profile shows 3 labeled textareas pre-filled with current templates
Each textarea shows its available placeholders
"Save Changes" persists templates to DB
Per-step reset icon (↺) restores the seed template for that individual step
"Reset All to Defaults" restores all 3 templates for the profile at once
"Cancel" reverts unsaved edits
Changing a template affects the NEXT query (fetched fresh each time)
Placeholder docs visible on the page
pytest backend tests pass (new + existing)
npm test frontend tests pass (new + existing)

Feature 2: Query History

2.1 Overview

Every query submitted through the LTT page is recorded in a history database with detailed timing per pipeline stage. Users can browse past queries, see timing breakdowns, and review answers.

2.2 Database Schema

Database: backend/data/history.db (SQLite, separate from prompts.db)

CREATE TABLE IF NOT EXISTS query_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    input_text TEXT NOT NULL,               -- original user input
    extracted_questions TEXT DEFAULT NULL,   -- JSON array of sub-questions
    decompose_prompt TEXT DEFAULT NULL,      -- actual prompt sent to LLM Call 1
    decomposer_time_ms INTEGER DEFAULT 0,   -- LLM Call 1 duration
    retriever_time_ms INTEGER DEFAULT 0,    -- ChromaDB retrieval duration
    chunks_retrieved TEXT DEFAULT NULL,      -- XML-tagged full chunk data (filename, page, content)
    chunks_retrieved_count INTEGER DEFAULT 0, -- count of retrieved chunks (for list view)
    filter_prompt TEXT DEFAULT NULL,         -- actual prompt sent to LLM Call 2
    filter_time_ms INTEGER DEFAULT 0,       -- LLM Call 2 duration
    chunks_filtered TEXT DEFAULT NULL,       -- XML-tagged filtered chunks (filename, page, relevance, content)
    chunks_filtered_count INTEGER DEFAULT 0, -- count of filtered chunks (for list view)
    generate_prompt TEXT DEFAULT NULL,       -- actual prompt sent to LLM Call 3
    generator_time_ms INTEGER DEFAULT 0,    -- LLM Call 3 duration
    total_time_ms INTEGER DEFAULT 0,        -- input received → final response sent
    final_answer TEXT DEFAULT NULL,          -- full RAG answer text
    sources TEXT DEFAULT NULL,               -- JSON array of SourceMetadata
    profile_used TEXT DEFAULT NULL,          -- "A", "B", or "C"
    created_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_query_history_created_at ON query_history(created_at DESC);

Chunk XML format — chunks_retrieved and chunks_filtered store full chunk data as XML-tagged strings:

chunks_retrieved example:

<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Content: Clause 61.3 states that time extensions...
</chunk_1>
<chunk_2>
Filename: NEC4 Contract.pdf
Page: 12
Content: Notice must be given within 8 weeks...
</chunk_2>

chunks_filtered example (includes relevance score):

<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Relevance: 8.5
Content: Clause 61.3 states that time extensions...
</chunk_1>
<chunk_2>
Filename: NEC4 Contract.pdf
Page: 12
Relevance: 9.0
Content: Notice must be given within 8 weeks...
</chunk_2>

Note: When page_number is None/missing, the Page: line is omitted from the XML.

Prompt capture approach: Each service returns its built prompt alongside the result (e.g., decompose() returns (questions, prompt_used) instead of just questions). query.py captures from return values — no separate build_prompt() method needed. Services remain black-box (build + call internally).

Relevance score storage: Instead of changing RelevanceFilter.filter()'s return type, the relevance score is embedded into the metadata dict (meta["relevance_score"] = score). This keeps the return type as List[Tuple[str, Dict]] — zero impact on existing callers. The XML formatter reads meta.get("relevance_score").

2.3 Backend Architecture

New Files

File	Purpose
`backend/app/services/history_service.py`	CRUD for query history records
`backend/app/routers/history.py`	REST API endpoints for history browsing
`backend/app/models/history.py`	Pydantic schemas for history request/response

Modified Files

File	Change
`backend/app/core/sqlite_db.py`	Add `get_prompts_db()` and `get_history_db()` connection factories
`backend/app/core/config.py`	Add `prompts_db_path` and `history_db_path`
`backend/app/core/dependencies.py`	Add `get_history_service()`
`backend/app/main.py`	Register `history` router; startup: create history table
`backend/app/routers/query.py`	Wrap pipeline in `time.perf_counter()`; record history via `asyncio.create_task()`

Timing Capture (in `_query_stream()`)

async def _query_stream(request: QueryRequest):
    overall_start = time.perf_counter()
    
    # Fetch prompt templates for active profile
    active_profile = prompt_service.get_active_profile_name()  # "A", "B", or "C"
    
    # Stage 1: Decompose
    stage_start = time.perf_counter()
    questions, decompose_prompt = await decomposer.decompose(request.question)  # now returns (questions, prompt)
    decomposer_time_ms = int((time.perf_counter() - stage_start) * 1000)
    yield sse_event("decomposed", ...)
    
    # Stage 2: Retrieve
    stage_start = time.perf_counter()
    chunks = rag.retrieve(question_texts=questions, ...)
    retriever_time_ms = int((time.perf_counter() - stage_start) * 1000)
    chunks_retrieved_count = len(chunks)
    chunks_retrieved = format_chunks_retrieved_xml(chunks)  # XML-tagged string
    yield sse_event("retrieving", ...)
    
    # Stage 3: Filter
    stage_start = time.perf_counter()
    chunks_for_filter = [(text, meta) for text, meta, _dist in chunks]
    filtered, filter_prompt = await relevance_filter.filter(  # now returns (filtered, prompt)
        request.question, chunks_for_filter, threshold=settings.relevance_threshold
    )
    filter_time_ms = int((time.perf_counter() - stage_start) * 1000)
    chunks_filtered_count = len(filtered)
    chunks_filtered = format_chunks_filtered_xml(filtered)  # XML-tagged string with scores
    yield sse_event("filtering", ...)
    
    # Stage 4: Generate
    stage_start = time.perf_counter()
    chunk_texts = [chunk for chunk, _meta in filtered]
    chunk_metadata = [meta for _chunk, meta in filtered]
    answer, generate_prompt = await rag.generate_response(  # now returns (answer, prompt)
        request.question, chunk_texts, chunk_metadata
    )
    generator_time_ms = int((time.perf_counter() - stage_start) * 1000)
    
    total_time_ms = int((time.perf_counter() - overall_start) * 1000)
    
    # Record history (fire-and-forget)
    asyncio.create_task(history_service.record(QueryHistoryRecord(
        input_text=request.question,
        extracted_questions=json.dumps(questions),
        decompose_prompt=decompose_prompt,
        decomposer_time_ms=decomposer_time_ms,
        retriever_time_ms=retriever_time_ms,
        chunks_retrieved=chunks_retrieved,
        chunks_retrieved_count=chunks_retrieved_count,
        filter_prompt=filter_prompt,
        filter_time_ms=filter_time_ms,
        chunks_filtered=chunks_filtered,
        chunks_filtered_count=chunks_filtered_count,
        generate_prompt=generate_prompt,
        generator_time_ms=generator_time_ms,
        total_time_ms=total_time_ms,
        final_answer=answer,
        sources=json.dumps([s.dict() for s in sources]),
        profile_used=active_profile,
    )))
    
    yield sse_event("completed", ...)

Helper functions for XML formatting:

def format_chunks_retrieved_xml(chunks: List[Tuple[str, Dict, float]]) -> str:
    """Format retrieved chunks as XML-tagged string.
    
    chunks = [(text, metadata, distance), ...] from RAGService.retrieve()
    """
    parts = []
    for i, (text, meta, _dist) in enumerate(chunks, start=1):
        lines = [f"<chunk_{i}>"]
        lines.append(f"Filename: {meta.get('filename', 'unknown')}")
        page = meta.get("page_number")
        if page is not None:
            lines.append(f"Page: {page}")
        lines.append(f"Content: {text}")
        lines.append(f"</chunk_{i}>")
        parts.append("\n".join(lines))
    return "\n".join(parts)


def format_chunks_filtered_xml(filtered: List[Tuple[str, Dict]]) -> str:
    """Format filtered chunks as XML-tagged string with relevance scores.
    
    filtered = [(text, meta), ...] — score embedded in meta["relevance_score"]
    """
    parts = []
    for i, (text, meta) in enumerate(filtered, start=1):
        lines = [f"<chunk_{i}>"]
        lines.append(f"Filename: {meta.get('filename', 'unknown')}")
        page = meta.get("page_number")
        if page is not None:
            lines.append(f"Page: {page}")
        score = meta.get("relevance_score")
        if score is not None:
            lines.append(f"Relevance: {score}")
        lines.append(f"Content: {text}")
        lines.append(f"</chunk_{i}>")
        parts.append("\n".join(lines))
    return "\n".join(parts)

Fire-and-forget: asyncio.create_task() ensures history recording never blocks the SSE stream. If recording fails, the query completes normally — history is best-effort.

API Endpoints

Method	Path	Description
`GET`	`/api/v1/history`	List query history (paginated, newest first). Query params: `limit` (default 50), `offset` (default 0)
`GET`	`/api/v1/history/{query_id}`	Get full detail for a single query
`DELETE`	`/api/v1/history/{query_id}`	Delete a history record
`DELETE`	`/api/v1/history`	Clear all history
`GET`	`/api/v1/history/stats`	Aggregate stats: total queries, avg time, avg chunks, most used profile

Response Schemas

class QueryHistorySummary(BaseModel):
    id: int
    input_text: str                        # truncated to 100 chars
    total_time_ms: int
    chunks_retrieved_count: int
    chunks_filtered_count: int
    profile_used: str | None               # "A", "B", or "C"
    created_at: str

class QueryHistoryDetail(BaseModel):
    id: int
    input_text: str                        # full text
    extracted_questions: list[str]
    decompose_prompt: str                  # full prompt sent to LLM Call 1
    decomposer_time_ms: int
    retriever_time_ms: int
    chunks_retrieved: str                  # XML-tagged full chunk data
    chunks_retrieved_count: int
    filter_prompt: str                     # full prompt sent to LLM Call 2
    filter_time_ms: int
    chunks_filtered: str                   # XML-tagged filtered chunks with scores
    chunks_filtered_count: int
    generate_prompt: str                   # full prompt sent to LLM Call 3
    generator_time_ms: int
    total_time_ms: int
    final_answer: str
    sources: list[SourceMetadata]
    profile_used: str | None
    created_at: str

class QueryHistoryList(BaseModel):
    queries: list[QueryHistorySummary]
    total: int
    limit: int
    offset: int

2.4 Frontend Design

New page: /history
New NavBar link: "History"

┌──────────────────────────────────────────────────────────┐
│  Query History                    Total: 42 queries       │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  ┌────────────────────────────────────────────────────┐  │
│  │ 📊 Stats                                           │  │
│  │ Avg time: 3.2s · Avg chunks: 8.5 → 4.2 filtered    │  │
│  │ Most used: Profile A (35 queries)                  │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  ┌────────────────────────────────────────────────────┐  │
│  │ #42 · 2026-04-25 14:32 · 3.8s · Profile A          │  │
│  │ "What is the NEC4 clause about time extensions?"   │  │
│  │ 8 chunks → 4 filtered · [Expand ▼]                  │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ #41 · 2026-04-25 14:15 · 2.1s · Profile B          │  │
│  │ "How does arbitration work under the contract?"    │  │
│  │ 10 chunks → 3 filtered · [Expand ▼]                 │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ #40 · 2026-04-25 13:50 · 4.5s · Profile A          │  │
│  │ "Explain the payment mechanism and valuation..."   │  │
│  │ 12 chunks → 6 filtered · [Expand ▼]                 │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  [Load More]                              [Clear All]    │
│                                                           │
│  ── Expanded: #42 ─────────────────────────────────────  │
│                                                           │
│  ⏱ Pipeline Timing:                                      │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Decompose  ████████░░░░░░░░░  0.8s                 │  │
│  │ Retrieve   ██░░░░░░░░░░░░░░░  0.2s  (8 chunks)    │  │
│  │ Filter     ██████████░░░░░░░░  1.1s  (4 kept)      │  │
│  │ Generate   ██████████████████  1.7s                 │  │
│  │ ──────────────────────────────────                  │  │
│  │ Total      ██████████████████  3.8s                 │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  📝 Extracted Questions:                                  │
│  1. What are the time extension provisions?              │
│  2. What notice is required for time extensions?        │
│  3. How is extended time calculated under NEC4?         │
│                                                           │
│  📤 Decompose Prompt:                                    │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Given this question: 'What is the NEC4 clause...'  │  │
│  │ Break it down into 2-5 simplified sub-questions... │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  📥 Retrieved Chunks (8):                                │
│  ┌────────────────────────────────────────────────────┐  │
│  │ <chunk_1>                                          │  │
│  │   Filename: NEC4 ACC.pdf                           │  │
│  │   Page: 3                                          │  │
│  │   Content: Clause 61.3 states that time extensions...│ │
│  │ </chunk_1>                                         │  │
│  └────────────────────────────────────────────────────┘  │
│  (raw XML in collapsible monospace code block)           │
│                                                           │
│  🔍 Filter Prompt:                                       │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Given question 'What is the NEC4...' and these     │  │
│  │ document chunks, rate each 0-10 for relevance...   │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  ✅ Filtered Chunks (4):                                 │
│  ┌────────────────────────────────────────────────────┐  │
│  │ <chunk_1>                                          │  │
│  │   Filename: NEC4 ACC.pdf                           │  │
│  │   Page: 3                                          │  │
│  │   Relevance: 8.5                                   │  │
│  │   Content: Clause 61.3 states that time extensions...│ │
│  │ </chunk_1>                                         │  │
│  └────────────────────────────────────────────────────┘  │
│  (raw XML in collapsible monospace code block)           │
│                                                           │
│  🤖 Generate Prompt:                                     │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Question: What is the NEC4 clause...               │  │
│  │ Answer the question using ONLY these document...   │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  💬 Answer:                                              │
│  ┌────────────────────────────────────────────────────┐  │
│  │ • The time extension provisions are outlined in    │  │
│  │   clause 61.3 [NEC4 ACC.pdf, page 3]               │  │
│  │ • Notice must be given within 8 weeks [NEC4 ACC... │  │
│  │ ...                                                 │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  📎 Sources (4):  · NEC4 ACC.pdf, page 3  · ...          │
│  📋 Profile used: A                                      │
│                                                           │
└──────────────────────────────────────────────────────────┘

Component tree:

HistoryPage
├── HistoryStats (summary bar: total queries, avg time, avg chunks, most used profile)
├── HistoryList (scrollable list)
│   └── HistoryCard × N (collapsed: date, time, question preview, profile badge)
│       └── HistoryDetail (expanded: timing bars, prompts, chunks, questions, answer, sources)
│           ├── TimingBars (color-coded proportional bars per stage)
│           ├── ExtractedQuestions (numbered list)
│           ├── PromptSection × 3 (decompose_prompt, filter_prompt, generate_prompt — collapsible code blocks)
│           ├── ChunkSection (chunks_retrieved XML — collapsible raw XML in monospace code block)
│           ├── FilteredChunkSection (chunks_filtered XML with scores — collapsible raw XML in monospace code block)
│           ├── AnswerSection (final_answer — rendered markdown)
│           └── SourcesSection (clickable source links)
├── LoadMoreButton
└── ClearAllButton (with confirmation dialog)

Timing bars: Pure CSS — <div className="h-4 rounded bg-blue-500" style={{width: ${(time/total)*100}%}} />. Color-coded: Decompose (blue-400), Retrieve (green-400), Filter (amber-400), Generate (purple-400).

API hooks:

useQueryHistory(limit, offset)     // useQuery: GET /history
useQueryHistoryDetail(id)          // useQuery: GET /history/{id}
useDeleteHistoryRecord(id)         // useMutation: DELETE /history/{id}
useClearHistory()                  // useMutation: DELETE /history
useHistoryStats()                  // useQuery: GET /history/stats

2.5 Acceptance Criteria

Every query creates a history record with all timing and data fields
GET /api/v1/history?limit=20&offset=0 returns paginated results (newest first)
GET /api/v1/history/{id} returns full detail with parsed JSON fields
DELETE /api/v1/history/{id} removes one record
DELETE /api/v1/history clears all records
GET /api/v1/history/stats returns aggregate statistics
History recording is fire-and-forget — never blocks query response
History page accessible via NavBar link
Timing bars accurately represent stage proportions
Expanded detail shows answer rendered as markdown with citation links
Sources show clickable links to PDF viewer
All states: loading, empty, error, success
Profile used is shown for each query
All backend + frontend tests pass

Sub-Phase Breakdown

Sub-Phase	Feature	Difficulty	Backend	Frontend	Depends On
3.1	SQLite Infrastructure	⭐⭐ Medium	sqlite_db.py (dual-DB factories), config, table creation, seed data	None	—
3.2	Prompt Backend	⭐⭐⭐ Hard	prompt_service.py, prompts router, models, template formatting	None	3.1
3.3	Prompt Frontend Page	⭐⭐ Medium	None	SystemPromptsPage, ProfileList, PromptEditor, placeholder docs	3.2
3.4	Service Refactoring (Template Injection)	⭐⭐⭐ Hard	query_decomposer, relevance_filter, rag.py, query.py	None	3.2
3.5	History Backend	⭐⭐⭐ Hard	history_service.py, history router, models, query.py timing capture	None	3.1, 3.4
3.6	History Frontend Page	⭐⭐ Medium	None	HistoryPage, HistoryList, HistoryDetail, timing bars	3.5

Dependency Graph

3.1 (SQLite Infra)
 │
 ├──► 3.2 (Prompt Backend)
 │       │
 │       ├──► 3.3 (Prompt Frontend)     ← parallel with 3.4
 │       │
 │       └──► 3.4 (Service Refactoring)
 │               │
 │               └──► 3.5 (History Backend)
 │                       │
 │                       └──► 3.6 (History Frontend)

3.1 is the foundation
3.2 blocks 3.3 and 3.4 (both need the prompt service)
3.3 and 3.4 run in PARALLEL after 3.2
3.5 needs 3.1 (history DB) AND 3.4 (refactored pipeline for timing capture)
3.6 needs 3.5 (history API)

Sub-Phase 3.1: SQLite Infrastructure ⭐⭐ Medium

Objective

Introduce SQLite with two separate databases: prompts.db for prompt templates and history.db for query history. Create connection factories, table schemas, and default seed data.

Database Technology

Decision: sqlite3 stdlib — zero new dependencies. Lightweight operations, adequate for single-user desktop app.

Changes Required

File	Change
`backend/app/core/sqlite_db.py`	NEW — `get_prompts_db()` and `get_history_db()` connection factories; `init_prompts_db()`, `init_history_db()` table creation; `seed_default_profiles()`
`backend/app/core/config.py`	Add `prompts_db_path: str = "./data/prompts.db"` and `history_db_path: str = "./data/history.db"`
`backend/app/main.py`	Startup event: create `data/` dir, init both DBs, seed default profiles
`backend/.env.example`	Add `PROMPTS_DB_PATH` and `HISTORY_DB_PATH`
`backend/.gitignore`	Add `data/` directory

sqlite_db.py design:

import sqlite3, os
from app.core.config import get_settings

def _get_db(db_path: str) -> sqlite3.Connection:
    """Shared connection factory (caller must close)."""
    os.makedirs(os.path.dirname(db_path), exist_ok=True)
    conn = sqlite3.connect(db_path)
    conn.row_factory = sqlite3.Row
    conn.execute("PRAGMA journal_mode=WAL")
    conn.execute("PRAGMA foreign_keys=ON")
    return conn

def get_prompts_db() -> sqlite3.Connection:
    return _get_db(get_settings().prompts_db_path)

def get_history_db() -> sqlite3.Connection:
    return _get_db(get_settings().history_db_path)

Acceptance Criteria

backend/data/prompts.db created on first startup with profile + prompt tables
backend/data/history.db created on first startup with query_history table + index
3 profiles (A/B/C) seeded with current hardcoded prompts as default templates
Profile A active by default
data/ directory gitignored
Both DB paths configurable via .env
Existing pytest tests still pass

Sub-Phase 3.2: Prompt Backend ⭐⭐⭐ Hard

Objective

Create the prompt service layer: Pydantic models, CRUD service, template formatting, REST API endpoints.

Changes Required

File	Change
`backend/app/models/prompts.py`	NEW — `PromptProfile`, `PromptSetResponse` (3 prompts), `PromptUpdateRequest`
`backend/app/services/prompt_service.py`	NEW — `PromptService`: get_profile, list_profiles, activate, get_template, update_prompt, update_all, format_prompt
`backend/app/routers/prompts.py`	NEW — 5 endpoints on `/api/v1/prompts`
`backend/app/core/dependencies.py`	Add `get_prompt_service()`
`backend/app/main.py`	Register `prompts` router

PromptService key methods:

class PromptService:
    def get_active_profile_name(self) -> str:
        """Return "A", "B", or "C" — which profile is active."""
    
    def get_prompt_template(self, step: str) -> str:
        """Get the template for the active profile + given step ("decompose"/"filter"/"generate")."""
    
    def list_profiles(self) -> list[dict]:
        """Return [{name: "A", is_active: true}, ...]."""
    
    def activate_profile(self, name: str) -> None:
        """Set is_active=1 for name, is_active=0 for others. Validates name in {A, B, C}."""
    
    def get_profile_prompts(self, name: str) -> dict:
        """Return {"decompose": "...", "filter": "...", "generate": "..."}."""
    
    def update_prompt(self, name: str, step: str, template: str) -> None:
        """Update single template. Validates step in {decompose, filter, generate}."""
    
    def update_all_prompts(self, name: str, prompts: dict) -> None:
        """Batch update all 3 templates."""
    
    def reset_to_defaults(self, name: str, step: str | None = None) -> None:
        """Restore seed template. If step is None, reset all 3 steps. Otherwise reset only that step."""

Acceptance Criteria

GET /api/v1/prompts/profiles returns A/B/C with active status
PUT /api/v1/prompts/profiles/B/activate switches active profile (only one at a time)
PUT /api/v1/prompts/profiles/A/decompose updates template and persists across restarts
PUT /api/v1/prompts/profiles/A/all batch-updates all 3 templates
Invalid profile name (e.g., "D") returns 400
Invalid step name (e.g., "summarize") returns 400
Active profile is fetched fresh per query (no caching)
All tests pass: test_phase3_prompt_service.py, test_phase3_prompts_router.py

Sub-Phase 3.3: Prompt Frontend Page ⭐⭐ Medium

Objective

Build the System Prompts page at /system-prompts with profile switching and full template editing.

Changes Required

File	Change
`frontend/src/pages/SystemPromptsPage.tsx`	NEW
`frontend/src/components/ProfileList.tsx`	NEW — 3 cards (A/B/C)
`frontend/src/components/PromptEditor.tsx`	NEW — 3 textareas + placeholder docs + save/reset/cancel
`frontend/src/components/PlaceholderDocs.tsx`	NEW — info box listing available placeholders
`frontend/src/lib/api.ts`	Add 5 prompt API functions
`frontend/src/lib/queries.tsx`	Add TanStack Query hooks
`frontend/src/types/index.ts`	Add prompt-related types
`frontend/src/App.tsx`	Add `/system-prompts` route
`frontend/src/components/NavBar.tsx`	Add "System Prompts" nav link

Acceptance Criteria

Page accessible via NavBar
3 profiles shown: A (active ●), B (○), C (○)
"Set Active" switches active profile
Editing a profile shows 3 labeled textareas with current templates
Each textarea labeled with available placeholders
Placeholder docs info box visible
"Save Changes" persists; "Reset to Defaults" restores seed template; "Cancel" reverts
Soft warning if template references unknown placeholder
All states: loading, error, success
Frontend tests pass

Sub-Phase 3.4: Service Refactoring (Template Injection) ⭐⭐⭐ Hard

Objective

Refactor all 3 LLM-calling services to fetch prompt templates from the DB instead of using hardcoded strings. Wire the query router to pass PromptService through the pipeline.

Changes Required

File	Change
`backend/app/services/query_decomposer.py`	Accept `PromptService`; `decompose()` fetches template, replaces `{question}`, calls LLM
`backend/app/services/relevance_filter.py`	Accept `PromptService`; `filter()` fetches template, replaces `{question}` and `{chunks}`, calls LLM
`backend/app/services/rag.py`	Accept `PromptService`; `generate_response()` fetches template, replaces `{question}` and `{context}`, calls LLM
`backend/app/routers/query.py`	Instantiate `PromptService` at pipeline start; pass to all services; capture `active_profile_name`
`backend/app/test/conftest.py`	Add `mock_prompt_service` fixture
`backend/app/test/test_phase1_query_decomposer.py`	Update tests for PromptService dependency
`backend/app/test/test_phase1_relevance_filter.py`	Update tests
`backend/app/test/test_phase1_rag_service.py`	Update tests

Before/After per service:

Service	Before (hardcoded)	After (template from DB)
`QueryDecomposer.decompose()`	`f"Given this question: '{question}'\n\nBreak it down..."`	`template.replace("{question}", question)`
`RelevanceFilter._build_prompt()`	`f"Given question '{question}'...{chunks_formatted}"`	`template.replace("{question}", question).replace("{chunks}", chunks_formatted)`
`RAGService.generate_response()`	`f"Question: {question}\n\nAnswer...{context}\n\nAnswer:"`	`template.replace("{question}", question).replace("{context}", context)`

LLMClient.complete() — NO CHANGES. Templates remain single user-role messages.

Acceptance Criteria

All 3 LLM calls use templates from the active profile in the DB
Placeholders correctly replaced: {question} → user input, {chunks} → numbered list, {context} → formatted chunks with citations
Switching active profile changes prompts for NEXT query
If template is empty string, LLM call proceeds with empty prompt (LLM error is acceptable)
All existing tests pass (updated for PromptService dependency)
New tests: test_phase3_prompt_injection.py

Sub-Phase 3.5: History Backend ⭐⭐⭐ Hard

Objective

Capture timing and data from every pipeline stage and persist to history.db. Expose REST API for browsing.

Changes Required

File	Change
`backend/app/models/history.py`	NEW — `QueryHistoryRecord`, `QueryHistorySummary`, `QueryHistoryDetail`, `QueryHistoryList`
`backend/app/services/history_service.py`	NEW — `HistoryService`: record, list (paginated), get, delete, clear_all, get_stats
`backend/app/routers/history.py`	NEW — 5 endpoints on `/api/v1/history`
`backend/app/routers/query.py`	Add `time.perf_counter()` around each stage; capture prompts from service return values; format chunks as XML; `asyncio.create_task(history_service.record(...))` at end
`backend/app/services/relevance_filter.py`	MODIFY — `filter()` must embed `meta["relevance_score"]` for each surviving chunk; return `(filtered, prompt_used)` alongside result
`backend/app/services/query_decomposer.py`	MODIFY — `decompose()` must return `(questions, prompt_used)` alongside result
`backend/app/services/rag.py`	MODIFY — `generate_response()` must return `(answer, prompt_used)` alongside result
`backend/app/core/dependencies.py`	Add `get_history_service()`
`backend/app/main.py`	Register `history` router

Service return type changes (all 3 services return prompt alongside result):

Method	Before	After
`QueryDecomposer.decompose(question)`	`→ List[str]`	`→ Tuple[List[str], str]` — `(questions, prompt_used)`
`RelevanceFilter.filter(question, chunks, threshold)`	`→ List[Tuple[str, Dict]]`	`→ Tuple[List[Tuple[str, Dict]], str]` — `(filtered, prompt_used)`
`RAGService.generate_response(question, chunks, metadata)`	`→ str`	`→ Tuple[str, str]` — `(answer, prompt_used)`

All service internals remain unchanged — they still build the prompt and call the LLM themselves. Only the return signature adds the prompt string.

Timing stages captured: decompose, retrieve, filter, generate, total.

Data captured per stage:

Stage 1 (Decompose): prompt sent, response time, extracted questions
Stage 2 (Retrieve): response time, all chunks as XML (filename, page, content), chunk count
Stage 3 (Filter): prompt sent, response time, filtered chunks as XML (filename, page, relevance score, content), chunk count
Stage 4 (Generate): prompt sent, response time, final answer

XML formatting helpers — Two utility functions in query.py or a shared utils/ module:

format_chunks_retrieved_xml(chunks) — converts [(text, meta, distance), ...] to XML
format_chunks_filtered_xml(filtered) — converts [(text, meta, score), ...] to XML with relevance scores

Acceptance Criteria

Every query creates a history record with all fields (including 3 LLM prompts and 2 chunk XML strings)
All 5 history API endpoints work correctly
Pagination: limit + offset, newest first
Stats endpoint: total queries, avg times, avg chunks, most used profile
History recording is fire-and-forget (never blocks query)
History persists across restarts
decompose_prompt, filter_prompt, generate_prompt record the exact prompt sent to each LLM call
chunks_retrieved contains full XML with filename, page, content per chunk
chunks_filtered contains full XML with filename, page, relevance score, content per chunk
RelevanceFilter.filter() returns scores alongside filtered chunks
chunks_retrieved_count and chunks_filtered_count are accurate integer counts
All tests pass: test_phase3_history_service.py, test_phase3_history_router.py, test_phase3_query_history_integration.py

Sub-Phase 3.6: History Frontend Page ⭐⭐ Medium

Objective

Build the History page at /history with scrollable list, expandable detail, timing bars, and stats.

Changes Required

File	Change
`frontend/src/pages/HistoryPage.tsx`	NEW
`frontend/src/components/HistoryList.tsx`	NEW
`frontend/src/components/HistoryCard.tsx`	NEW — collapsed card + expandable detail
`frontend/src/components/TimingBar.tsx`	NEW — CSS-width proportional bars
`frontend/src/lib/api.ts`	Add 5 history API functions
`frontend/src/lib/queries.tsx`	Add TanStack Query hooks
`frontend/src/types/index.ts`	Add history types
`frontend/src/App.tsx`	Add `/history` route
`frontend/src/components/NavBar.tsx`	Add "History" nav link

Acceptance Criteria

Page accessible via NavBar
Stats bar: total, avg time, avg chunks, most used profile
History list: paginated, newest first, shows date/time/duration/input preview/profile badge
Expand card: timing bars, extracted questions, full answer (markdown), sources (clickable)
Expanded detail shows all 3 LLM prompts (collapsible sections)
Expanded detail shows retrieved chunks XML (collapsible, formatted)
Expanded detail shows filtered chunks XML with relevance scores (collapsible, formatted)
"Load More" pagination
"Clear All" with confirmation
Individual delete with confirmation
All states: loading skeleton, empty "No queries yet", error with retry
Frontend tests pass

New Dependencies

Zero. sqlite3 is Python stdlib. All UI is custom Tailwind. No new npm or pip packages.

Directory Structure After Package 3

legco_reranker/
├── backend/
│   ├── app/
│   │   ├── core/
│   │   │   ├── config.py              # + prompts_db_path, history_db_path
│   │   │   ├── database.py            # (unchanged - ChromaDB)
│   │   │   ├── dependencies.py        # + get_prompt_service, get_history_service
│   │   │   └── sqlite_db.py           # NEW - dual-DB connection factories
│   │   ├── models/
│   │   │   ├── history.py             # NEW
│   │   │   └── prompts.py             # NEW
│   │   ├── routers/
│   │   │   ├── history.py             # NEW
│   │   │   ├── prompts.py             # NEW
│   │   │   └── query.py               # MODIFIED - timing capture + template injection
│   │   ├── services/
│   │   │   ├── history_service.py     # NEW
│   │   │   ├── prompt_service.py      # NEW - template storage + formatting
│   │   │   └── query_decomposer.py    # MODIFIED - use PromptService for templates
│   │   │   └── rag.py                 # MODIFIED - use PromptService for templates
│   │   │   └── relevance_filter.py    # MODIFIED - use PromptService for templates
│   │   ├── test/
│   │   │   ├── test_phase3_prompt_service.py       # NEW
│   │   │   ├── test_phase3_prompts_router.py       # NEW
│   │   │   ├── test_phase3_prompt_injection.py     # NEW
│   │   │   ├── test_phase3_history_service.py      # NEW
│   │   │   ├── test_phase3_history_router.py       # NEW
│   │   │   ├── test_phase3_query_history_integration.py  # NEW
│   │   │   ├── test_phase1_query_decomposer.py      # MODIFIED
│   │   │   ├── test_phase1_relevance_filter.py      # MODIFIED
│   │   │   └── test_phase1_rag_service.py           # MODIFIED
│   │   └── main.py                    # MODIFIED - startup init + new routers
│   ├── data/                          # NEW (gitignored)
│   │   ├── prompts.db
│   │   └── history.db
│   └── .env.example                   # + PROMPTS_DB_PATH, HISTORY_DB_PATH
├── frontend/src/
│   ├── components/
│   │   ├── HistoryCard.tsx            # NEW
│   │   ├── HistoryList.tsx            # NEW
│   │   ├── NavBar.tsx                 # MODIFIED - +2 nav links
│   │   ├── PlaceholderDocs.tsx        # NEW
│   │   ├── ProfileList.tsx            # NEW
│   │   ├── PromptEditor.tsx           # NEW
│   │   └── TimingBar.tsx              # NEW
│   ├── pages/
│   │   ├── HistoryPage.tsx            # NEW
│   │   └── SystemPromptsPage.tsx      # NEW
│   ├── lib/
│   │   ├── api.ts                     # MODIFIED - +history +prompts endpoints
│   │   └── queries.tsx               # MODIFIED - +history +prompts hooks
│   ├── types/index.ts                 # MODIFIED - +history +prompts types
│   └── App.tsx                        # MODIFIED - +2 routes
└── .gitignore                         # + data/

Test Plan

Backend Tests (New)

File	Coverage	Sub-Phase
`test_phase3_prompt_service.py`	Prompt CRUD, activation, template formatting, edge cases	3.2
`test_phase3_prompts_router.py`	All 5 HTTP endpoints, error codes, validation	3.2
`test_phase3_prompt_injection.py`	Templates fetched from DB, placeholders replaced, end-to-end query uses templates	3.4
`test_phase3_history_service.py`	History CRUD, pagination, stats, edge cases	3.5
`test_phase3_history_router.py`	All 5 HTTP endpoints, pagination bounds, empty DB	3.5
`test_phase3_query_history_integration.py`	Full SSE query → history record created with correct data	3.5

Backend Tests (Modified)

File	Change	Sub-Phase
`test_phase1_query_decomposer.py`	Add PromptService dependency to test setup	3.4
`test_phase1_relevance_filter.py`	Add PromptService dependency	3.4
`test_phase1_rag_service.py`	Add PromptService dependency	3.4
`conftest.py`	Add `mock_prompt_service` fixture	3.2

Frontend Tests (New)

File	Coverage	Sub-Phase
`SystemPromptsPage.test.tsx`	Page render, profile list, activation, edit flows	3.3
`ProfileList.test.tsx`	A/B/C cards, active indicator, edit button	3.3
`PromptEditor.test.tsx`	3 textareas, placeholder docs, save/reset/cancel	3.3
`HistoryPage.test.tsx`	Page render, stats, pagination, clear all	3.6
`HistoryCard.test.tsx`	Collapsed/expanded states, timing bars, answer, sources	3.6
`TimingBar.test.tsx`	Proportional widths, zero-time stages, color mapping	3.6

Acceptance Tests

File	Coverage	Sub-Phase
`test_acceptance_package3_prompts.py`	Create profile → edit templates → activate → query uses new templates	3.2-3.4
`test_acceptance_package3_history.py`	Multiple queries → history shows correct records with timing + profile	3.5

Risks & Mitigations

Risk	Impact	Mitigation
User removes `{question}` placeholder → LLM doesn't see the question	LLM returns irrelevant or empty response	UI shows soft warning; user's choice — they can always reset to defaults
`str.replace()` is case-sensitive → `{Question}` not recognized	Placeholder left as-is in prompt	UI documents exact placeholder names; preview mode could highlight unresolved placeholders
`sqlite3` sync calls block async event loop	Slow responses under load	Operations are trivial (single-row lookups). History recording is fire-and-forget. WAL mode for concurrent reads.
History DB grows unbounded	Disk usage (exacerbated by XML chunk data and full LLM prompts per query)	Manual cleanup via "Clear All" button. Future: auto-prune config. XML chunks are 5-50KB per query — acceptable for SQLite desktop app.
`data/` directory not created on startup	SQLite connection fails	`os.makedirs(dirname, exist_ok=True)` in connection factory
User expects `{question}` to work in filter/generate templates	Might add it in wrong context	Placeholder docs on page show exactly which placeholders are valid per step
Two separate DB files complicate backups	User might backup one but not the other	Use same `data/` directory — easy to back up as one folder

Decisions

#	Question	Decision
1	Template editing scope	Full prompt template with `{placeholder}` variables — users edit the entire message sent to LLM
2	System role vs user role	User role only — no system prompt concept. Templates are the full user message (same as current).
3	Number of profiles	Fixed 3 (A, B, C) — no create/delete. Simplest mental model.
4	Database separation	Two files: `prompts.db` and `history.db` — independent concerns
5	Database technology	sqlite3 stdlib — zero new dependencies
6	Placeholder syntax	`{variable_name}` with `str.replace()` — simple, predictable. No `str.format()` edge cases.
7	History recording reliability	Fire-and-forget (`asyncio.create_task`) — never blocks query response
8	History data retention	Manual cleanup only in Package 3
9	Timing capture location	Inline in query.py — centralized, one file changes
10	Frontend timing visualization	CSS width bars — no charting library
11	History pagination	Offset-based (`limit` + `offset`)
12	NavBar order	LTT · RAG Database · System Prompts · History
13	Default seed templates	All 3 profiles start identical (current hardcoded prompts) — users customize from a common baseline
14	Reset button granularity	Both — per-step reset icon (↺) on each textarea label, plus "Reset All to Defaults" button in the action bar
15	Chunk data in history	XML-tagged TEXT — full chunk data as `<chunk_N>Filename: ...\nPage: ...\nContent: ...\n</chunk_N>`. Separate count columns for fast list queries.
16	LLM prompts in history	3 separate TEXT columns (`decompose_prompt`, `filter_prompt`, `generate_prompt`) — the exact prompt sent to each LLM call
17	Filtered chunk scores	`RelevanceFilter.filter()` embeds score in `meta["relevance_score"]` — no tuple format change, zero impact on existing callers
18	Prompt capture approach	Services return prompt alongside result — `decompose()` returns `(questions, prompt)`, `filter()` returns `(filtered, prompt)`, `generate_response()` returns `(answer, prompt)`. No separate `build_prompt()` methods.
19	Chunk XML display on frontend	Raw XML in monospace code blocks — collapsible `<pre>` showing the exact stored XML string. Copy-paste friendly, no frontend parsing.

Pre-Implementation Checklist

Before starting implementation, verify:

All existing backend tests pass (cd backend && pytest app/test/ -v)
All existing frontend tests pass (cd frontend && npm test)
AGENTS.md updated to reflect current project state (no longer "Greenfield")
Plan reviewed and approved by user

66 KiB Raw Blame History Unescape Escape

Package 3 Enhancement Plan

Objective

Current State

What Exists

What's Missing (Gaps This Plan Fills)

Feature 1: System Prompt Configuration (Full Template Editing)

1.1 Overview

1.2 Database Schema

1.3 Available Placeholders (per step)

1.4 Backend Architecture

New Files

Modified Files

How Template Formatting Works

API Endpoints (5 total — fixed 3 profiles, no create/delete)

1.5 Frontend Design

1.6 Acceptance Criteria

Feature 2: Query History

2.1 Overview

2.2 Database Schema

2.3 Backend Architecture

New Files

Modified Files

Timing Capture (in _query_stream())

API Endpoints

Response Schemas

2.4 Frontend Design

2.5 Acceptance Criteria

Sub-Phase Breakdown

Dependency Graph

Sub-Phase 3.1: SQLite Infrastructure ⭐⭐ Medium

Objective

Database Technology

Changes Required

Acceptance Criteria

Sub-Phase 3.2: Prompt Backend ⭐⭐⭐ Hard

Objective

Changes Required

Acceptance Criteria

Sub-Phase 3.3: Prompt Frontend Page ⭐⭐ Medium

Objective

Changes Required

Acceptance Criteria

Sub-Phase 3.4: Service Refactoring (Template Injection) ⭐⭐⭐ Hard

Objective

Changes Required

Acceptance Criteria

Sub-Phase 3.5: History Backend ⭐⭐⭐ Hard

Objective

Changes Required

Acceptance Criteria

Sub-Phase 3.6: History Frontend Page ⭐⭐ Medium

Objective

Changes Required

Acceptance Criteria

New Dependencies

Directory Structure After Package 3

Test Plan

Backend Tests (New)

Backend Tests (Modified)

Frontend Tests (New)

Acceptance Tests

Risks & Mitigations

Decisions

Pre-Implementation Checklist

66 KiB

Raw Blame History

Timing Capture (in `_query_stream()`)