legco_ai_assistant/.plans/package3_enhancement_plan.md

66 KiB
Raw Blame History

Package 3 Enhancement Plan

Source: User request (2026-04-25)
Scope: System Prompt Configuration Page + Query History Page
Status: Package 3 Complete (3.1 , 3.2 , 3.3 , 3.4 , 3.5 , 3.6 )


Objective

Add two new features that give users visibility and control over the RAG pipeline:

  1. System Prompt Configuration Page — Users can view/edit the full prompt templates for all 3 LLM calls (Decomposer, Relevance Filter, Response Generator). Templates support placeholders ({question}, {chunks}, {context}) that are replaced at query time. Supports 3 profiles (A, B, C) that users switch between with a single click.

  2. Query History Page — Records every query with full detail: input text, extracted questions, timing per pipeline stage (decompose, retrieve, filter, generate), actual LLM prompts sent for all 3 calls, chunks retrieved/filtered as full XML-tagged data (filename, page, content, relevance scores), final answer, sources, total time, and which profile was used.


Current State

What Exists

LLM Pipeline (3 calls, prompt templates hardcoded in service files):

Call Service File:Line Current Prompt Template Temp Placeholders
1 QueryDecomposer services/query_decomposer.py:54-59 "Given this question: '{question}'\n\nBreak it down into 2-5 simplified sub-questions..." default (0.7) {question}
2 RelevanceFilter services/relevance_filter.py:36-39 "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores.\n{chunks_string}" 0.0 {question}, {chunks}
3 RAGService services/rag.py:108-117 "Question: {question}\n\nAnswer the question using ONLY these document chunks...bullet points...cite sources...\n\nDocument chunks:\n{context}\n\nAnswer:" 0.3 {question}, {context}
  • LLMClient.complete(prompt, temperature, step_name) — single method, sends prompt as [{"role": "user", "content": prompt}]
  • All 3 prompts are f-strings built inline in the service methods — no template abstraction exists
  • The step_name parameter is only used for log labels

Data Storage:

  • No SQL database exists. ChromaDB is the only persistent store (vector database).
  • Config is .env-driven via pydantic-settings.BaseSettings (flat key-value, not user-editable at runtime).
  • Logging exists (RotatingFileHandler to backend/app/log/backend.log) — timing data is logged but never persisted.

Frontend:

  • 3 pages: LTTPage (/), RAGDatabasePage (/rag-database), PdfViewerPage (/pdf-viewer)
  • NavBar has "LTT" and "RAG Database" links
  • No history page, no settings/configuration page
  • No shadcn/ui — all components are custom Tailwind

Query Pipeline (SSE streaming):

POST /api/v1/query
  → QueryDecomposer.decompose()     [LLM Call 1, timing logged only]
  → RAGService.retrieve()           [ChromaDB, no timing capture]
  → RelevanceFilter.filter()        [LLM Call 2, timing logged only]
  → RAGService.generate_response()  [LLM Call 3, timing logged only]
  → SSE: completed event with answer + sources

What's Missing (Gaps This Plan Fills)

  • No way for users to customize LLM prompts
  • No persistence of query history — all queries are ephemeral
  • No record of how long each pipeline stage takes
  • No record of the actual LLM prompts sent during each query
  • No record of the full chunk data (text, metadata, scores) used at each stage
  • No way to review past queries and answers
  • No user-facing configuration page of any kind
  • Hardcoded prompt templates can't be tuned without changing source code

Feature 1: System Prompt Configuration (Full Template Editing)

1.1 Overview

Users edit the complete prompt template for each of the 3 LLM calls. Templates contain placeholder variables (e.g., {question}, {chunks}, {context}) that are replaced with actual data at query time. Three profiles (A, B, C) let users save and switch between different prompt sets.

Design Decision: Unlike the original plan (system role prefix + hardcoded user template), users edit the ENTIRE prompt. This gives full control over LLM instructions, output format, and behavior. The page documents exactly which placeholders are available for each step so users know what they can use.

1.2 Database Schema

Database: backend/data/prompts.db (SQLite, stdlib sqlite3)

CREATE TABLE IF NOT EXISTS system_prompt_profiles (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,           -- "A" | "B" | "C"
    is_active INTEGER DEFAULT 0,         -- only ONE row has is_active = 1
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE TABLE IF NOT EXISTS system_prompts (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    profile_id INTEGER NOT NULL,
    step_name TEXT NOT NULL,             -- "decompose" | "filter" | "generate"
    prompt_template TEXT NOT NULL,       -- full prompt with {placeholder} variables
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at TEXT NOT NULL DEFAULT (datetime('now')),
    FOREIGN KEY (profile_id) REFERENCES system_prompt_profiles(id) ON DELETE CASCADE,
    UNIQUE(profile_id, step_name)
);

Default seed data (3 profiles × 3 prompts = 9 rows, Profile A active by default):

All 3 profiles start with the same defaults (the current hardcoded prompts). Users customize from there.

Profile Step Placeholder Seed Template
A decompose {question} "Given this question: '{question}'\n\nBreak it down into 2-5 simplified sub-questions that would help search for relevant information. Each sub-question should be short and focused on one aspect. Return as a JSON array of strings."
A filter {question}, {chunks} "Given question '{question}' and these document chunks, rate each 0-10 for relevance.\nReturn JSON array of scores.\n{chunks}\n"
A generate {question}, {context} "Question: {question}\n\nAnswer the question using ONLY these document chunks. Do not use any external knowledge. Format your answer as bullet points. Cite your sources inline using the exact bracket labels provided, e.g. [filename, page N]. Place the citation at the end of each relevant point.\n\nDocument chunks:\n{context}\n\nAnswer:"
B decompose {question} (same as A)
B filter {question}, {chunks} (same as A)
B generate {question}, {context} (same as A)
C decompose {question} (same as A)
C filter {question}, {chunks} (same as A)
C generate {question}, {context} (same as A)

1.3 Available Placeholders (per step)

These are documented on the frontend edit page so users know exactly what they can insert:

Step Placeholder What It Contains Example Replacement
Decompose {question} The user's original input text "What is the NEC4 clause about time extensions?"
Filter {question} The user's original input text (same)
{chunks} Numbered list of all retrieved chunks: Chunk 1: <text>\nChunk 2: <text>... "Chunk 1: The NEC4 clause 61.3 states that time extensions...\nChunk 2: Notice must be given..."
Generate {question} The user's original input text (same)
{context} Formatted chunks with citation labels: [filename, page N] Source: ...\nSummary: ...\nContent: ... "[NEC4 ACC.pdf, page 3] Source: NEC4 ACC.pdf\nSummary: Discussion of time extension provisions...\nContent: Clause 61.3 states..."

Placeholder syntax: {variable_name} — must match exactly. Unknown placeholders are left as-is (not replaced). If a user removes a required placeholder (e.g., {question}), the LLM won't see the question — the UI warns but doesn't block.

1.4 Backend Architecture

New Files

File Purpose
backend/app/core/sqlite_db.py SQLite connection factory (shared by prompts + history)
backend/app/services/prompt_service.py CRUD for prompt profiles and templates; template formatting
backend/app/routers/prompts.py REST API endpoints for prompt management
backend/app/models/prompts.py Pydantic schemas for prompt request/response

Modified Files

File Change
backend/app/core/config.py Add prompts_db_path and history_db_path
backend/app/core/dependencies.py Add DI factories: get_prompt_service()
backend/app/main.py Register prompts router; startup: create tables + seed 3 default profiles
backend/app/services/query_decomposer.py decompose() fetches template from prompt service, formats with {question}, sends to LLM
backend/app/services/relevance_filter.py filter() fetches template from prompt service, formats with {question} and {chunks}, sends to LLM
backend/app/services/rag.py generate_response() fetches template from prompt service, formats with {question} and {context}, sends to LLM
backend/app/routers/query.py Pass PromptService to pipeline; record active profile name for history

How Template Formatting Works

Each service method changes from building a hardcoded prompt to fetching and formatting a template:

Before (query_decomposer.py):

prompt = (
    f"Given this question: '{question}'\n\n"
    f"Break it down into 2-5 simplified sub-questions..."
)
response = await self.llm_client.complete(prompt, step_name="QueryDecomposer")

After (query_decomposer.py):

template = self.prompt_service.get_prompt_template(step="decompose")
prompt = template.replace("{question}", question)
response = await self.llm_client.complete(prompt, step_name="QueryDecomposer")

PromptService.get_prompt_template() fetches the template for the currently active profile + given step. Uses Python str.replace() for placeholder substitution — simple, predictable, no str.format() edge cases with curly braces in user text.

Note: LLMClient.complete() does NOT change — no system_prompt parameter is added. Templates remain single user-role messages, same as today. The only difference is the prompt text comes from the DB instead of being hardcoded.

API Endpoints (5 total — fixed 3 profiles, no create/delete)

Method Path Description
GET /api/v1/prompts/profiles List all 3 profiles with active status: [{name: "A", is_active: true}, ...]
PUT /api/v1/prompts/profiles/{name}/activate Activate a profile by name (e.g., PUT /profiles/B/activate). Validates name is A/B/C.
GET /api/v1/prompts/profiles/{name} Get all 3 prompt templates for a profile
PUT /api/v1/prompts/profiles/{name}/{step} Update a single prompt template. Validates step is decompose/filter/generate.
PUT /api/v1/prompts/profiles/{name}/all Batch update all 3 prompt templates for a profile

Why fixed 3 profiles (no create/delete):

  • Simplest mental model: 3 slots, name them A/B/C
  • No duplicate name conflicts, no "delete last profile" edge case
  • "Reset to Defaults" restores the seed template for a profile

1.5 Frontend Design

New page: /system-prompts
New NavBar link: "System Prompts"

┌──────────────────────────────────────────────────────────┐
│  System Prompts                                          │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  Active Profile: [A ▼]  [Set Active]                     │
│                                                           │
│  ┌────────────────────────────────────────────────────┐  │
│  │ ● Profile A  (active)     [Edit]                   │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ ○ Profile B               [Edit]                   │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ ○ Profile C               [Edit]                   │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  ── Editing Profile A ─────────────────────────────────  │
│                                                           │
│  Available placeholders:                                  │
│  ┌────────────────────────────────────────────────────┐  │
│  │  {question}  — The user's input question           │  │
│  │  {chunks}    — Retrieved document chunks (filter)  │  │
│  │  {context}   — Formatted chunks with citations     │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  Step 1: Query Decomposition                             │
│  Placeholders: {question}                                │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Given this question: '{question}'                  │  │
│  │                                                    │  │
│  │ Break it down into 2-5 simplified sub-questions    │  │
│  │ that would help search for relevant information.   │  │
│  │ Each sub-question should be short and focused on   │  │
│  │ one aspect. Return as a JSON array of strings.     │  │
│  └────────────────────────────────────────────────────┘  │
│  ───────────────────────────────────────────────────────  │
│  Step 2: Relevance Filtering                             │
│  Placeholders: {question}, {chunks}                     │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Given question '{question}' and these document     │  │
│  │ chunks, rate each 0-10 for relevance.              │  │
│  │ Return JSON array of scores.                       │  │
│  │ {chunks}                                           │  │
│  └────────────────────────────────────────────────────┘  │
│  ───────────────────────────────────────────────────────  │
│  Step 3: Response Generation                             │
│  Placeholders: {question}, {context}                    │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Question: {question}                               │  │
│  │                                                    │  │
│  │ Answer the question using ONLY these document      │  │
│  │ chunks. Do not use any external knowledge.         │  │
│  │ Format your answer as bullet points.               │  │
│  │ Cite your sources inline...                        │  │
│  │                                                    │  │
│  │ Document chunks:                                   │  │
│  │ {context}                                          │  │
│  │                                                    │  │
│  │ Answer:                                            │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  [Save Changes]  [Reset All to Defaults]  [Cancel]      │
│                                                           │
└──────────────────────────────────────────────────────────┘

Component tree:

SystemPromptsPage
├── ProfileSelector (dropdown A/B/C + "Set Active" button)
├── ProfileList (3 cards, active indicator)
│   └── ProfileCard × 3 (name, active indicator, Edit button)
├── PlaceholderDocs (info box showing available placeholders per step)
└── PromptEditor (shown when editing a profile)
    ├── PromptTextArea × 3 (labeled with step name + available placeholders)
    │   └── Per-step reset icon (↺) next to each textarea label
    └── ActionBar (Save, Reset All to Defaults, Cancel)

Placeholder documentation in UI: The page shows a "Available Placeholders" info box listing all placeholder variables and what they expand to. Each textarea has a subtle label showing which placeholders are valid for that step (e.g., "Placeholders: {question}, {chunks}"). Unknown placeholders in the template are left as-is by the backend — the UI shows a soft warning if the template references an unknown placeholder, but doesn't block saving.

API hooks (new in lib/queries.tsx):

usePromptProfiles()              // useQuery: GET /prompts/profiles
usePromptProfile(name)           // useQuery: GET /prompts/profiles/{name}
useActivateProfile(name)         // useMutation: PUT /prompts/profiles/{name}/activate
useUpdatePrompt(name, step)      // useMutation: PUT /prompts/profiles/{name}/{step}
useUpdateAllPrompts(name)        // useMutation: PUT /prompts/profiles/{name}/all

Edge cases handled:

  • Empty prompt template: allowed (LLM call proceeds with empty prompt — LLM will likely error or return nothing)
  • Removed {question} placeholder: soft warning shown; LLM won't see the question — user's choice
  • Unknown placeholder in template (e.g., {foo}): left as-is, UI shows warning badge
  • Very long templates: textarea with vertical scroll, character count
  • Unsaved changes: warn before navigating away
  • Loading state: skeleton cards
  • Error state: red error banner with retry

1.6 Acceptance Criteria

  • /system-prompts page accessible via NavBar link
  • 3 profiles (A/B/C) shown with active indicator (● / ○)
  • "Set Active" switches which profile is used for queries
  • Editing a profile shows 3 labeled textareas pre-filled with current templates
  • Each textarea shows its available placeholders
  • "Save Changes" persists templates to DB
  • Per-step reset icon (↺) restores the seed template for that individual step
  • "Reset All to Defaults" restores all 3 templates for the profile at once
  • "Cancel" reverts unsaved edits
  • Changing a template affects the NEXT query (fetched fresh each time)
  • Placeholder docs visible on the page
  • pytest backend tests pass (new + existing)
  • npm test frontend tests pass (new + existing)

Feature 2: Query History

2.1 Overview

Every query submitted through the LTT page is recorded in a history database with detailed timing per pipeline stage. Users can browse past queries, see timing breakdowns, and review answers.

2.2 Database Schema

Database: backend/data/history.db (SQLite, separate from prompts.db)

CREATE TABLE IF NOT EXISTS query_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    input_text TEXT NOT NULL,               -- original user input
    extracted_questions TEXT DEFAULT NULL,   -- JSON array of sub-questions
    decompose_prompt TEXT DEFAULT NULL,      -- actual prompt sent to LLM Call 1
    decomposer_time_ms INTEGER DEFAULT 0,   -- LLM Call 1 duration
    retriever_time_ms INTEGER DEFAULT 0,    -- ChromaDB retrieval duration
    chunks_retrieved TEXT DEFAULT NULL,      -- XML-tagged full chunk data (filename, page, content)
    chunks_retrieved_count INTEGER DEFAULT 0, -- count of retrieved chunks (for list view)
    filter_prompt TEXT DEFAULT NULL,         -- actual prompt sent to LLM Call 2
    filter_time_ms INTEGER DEFAULT 0,       -- LLM Call 2 duration
    chunks_filtered TEXT DEFAULT NULL,       -- XML-tagged filtered chunks (filename, page, relevance, content)
    chunks_filtered_count INTEGER DEFAULT 0, -- count of filtered chunks (for list view)
    generate_prompt TEXT DEFAULT NULL,       -- actual prompt sent to LLM Call 3
    generator_time_ms INTEGER DEFAULT 0,    -- LLM Call 3 duration
    total_time_ms INTEGER DEFAULT 0,        -- input received → final response sent
    final_answer TEXT DEFAULT NULL,          -- full RAG answer text
    sources TEXT DEFAULT NULL,               -- JSON array of SourceMetadata
    profile_used TEXT DEFAULT NULL,          -- "A", "B", or "C"
    created_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_query_history_created_at ON query_history(created_at DESC);

Chunk XML formatchunks_retrieved and chunks_filtered store full chunk data as XML-tagged strings:

chunks_retrieved example:

<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Content: Clause 61.3 states that time extensions...
</chunk_1>
<chunk_2>
Filename: NEC4 Contract.pdf
Page: 12
Content: Notice must be given within 8 weeks...
</chunk_2>

chunks_filtered example (includes relevance score):

<chunk_1>
Filename: NEC4 ACC.pdf
Page: 3
Relevance: 8.5
Content: Clause 61.3 states that time extensions...
</chunk_1>
<chunk_2>
Filename: NEC4 Contract.pdf
Page: 12
Relevance: 9.0
Content: Notice must be given within 8 weeks...
</chunk_2>

Note: When page_number is None/missing, the Page: line is omitted from the XML.

Prompt capture approach: Each service returns its built prompt alongside the result (e.g., decompose() returns (questions, prompt_used) instead of just questions). query.py captures from return values — no separate build_prompt() method needed. Services remain black-box (build + call internally).

Relevance score storage: Instead of changing RelevanceFilter.filter()'s return type, the relevance score is embedded into the metadata dict (meta["relevance_score"] = score). This keeps the return type as List[Tuple[str, Dict]] — zero impact on existing callers. The XML formatter reads meta.get("relevance_score").

2.3 Backend Architecture

New Files

File Purpose
backend/app/services/history_service.py CRUD for query history records
backend/app/routers/history.py REST API endpoints for history browsing
backend/app/models/history.py Pydantic schemas for history request/response

Modified Files

File Change
backend/app/core/sqlite_db.py Add get_prompts_db() and get_history_db() connection factories
backend/app/core/config.py Add prompts_db_path and history_db_path
backend/app/core/dependencies.py Add get_history_service()
backend/app/main.py Register history router; startup: create history table
backend/app/routers/query.py Wrap pipeline in time.perf_counter(); record history via asyncio.create_task()

Timing Capture (in _query_stream())

async def _query_stream(request: QueryRequest):
    overall_start = time.perf_counter()
    
    # Fetch prompt templates for active profile
    active_profile = prompt_service.get_active_profile_name()  # "A", "B", or "C"
    
    # Stage 1: Decompose
    stage_start = time.perf_counter()
    questions, decompose_prompt = await decomposer.decompose(request.question)  # now returns (questions, prompt)
    decomposer_time_ms = int((time.perf_counter() - stage_start) * 1000)
    yield sse_event("decomposed", ...)
    
    # Stage 2: Retrieve
    stage_start = time.perf_counter()
    chunks = rag.retrieve(question_texts=questions, ...)
    retriever_time_ms = int((time.perf_counter() - stage_start) * 1000)
    chunks_retrieved_count = len(chunks)
    chunks_retrieved = format_chunks_retrieved_xml(chunks)  # XML-tagged string
    yield sse_event("retrieving", ...)
    
    # Stage 3: Filter
    stage_start = time.perf_counter()
    chunks_for_filter = [(text, meta) for text, meta, _dist in chunks]
    filtered, filter_prompt = await relevance_filter.filter(  # now returns (filtered, prompt)
        request.question, chunks_for_filter, threshold=settings.relevance_threshold
    )
    filter_time_ms = int((time.perf_counter() - stage_start) * 1000)
    chunks_filtered_count = len(filtered)
    chunks_filtered = format_chunks_filtered_xml(filtered)  # XML-tagged string with scores
    yield sse_event("filtering", ...)
    
    # Stage 4: Generate
    stage_start = time.perf_counter()
    chunk_texts = [chunk for chunk, _meta in filtered]
    chunk_metadata = [meta for _chunk, meta in filtered]
    answer, generate_prompt = await rag.generate_response(  # now returns (answer, prompt)
        request.question, chunk_texts, chunk_metadata
    )
    generator_time_ms = int((time.perf_counter() - stage_start) * 1000)
    
    total_time_ms = int((time.perf_counter() - overall_start) * 1000)
    
    # Record history (fire-and-forget)
    asyncio.create_task(history_service.record(QueryHistoryRecord(
        input_text=request.question,
        extracted_questions=json.dumps(questions),
        decompose_prompt=decompose_prompt,
        decomposer_time_ms=decomposer_time_ms,
        retriever_time_ms=retriever_time_ms,
        chunks_retrieved=chunks_retrieved,
        chunks_retrieved_count=chunks_retrieved_count,
        filter_prompt=filter_prompt,
        filter_time_ms=filter_time_ms,
        chunks_filtered=chunks_filtered,
        chunks_filtered_count=chunks_filtered_count,
        generate_prompt=generate_prompt,
        generator_time_ms=generator_time_ms,
        total_time_ms=total_time_ms,
        final_answer=answer,
        sources=json.dumps([s.dict() for s in sources]),
        profile_used=active_profile,
    )))
    
    yield sse_event("completed", ...)

Helper functions for XML formatting:

def format_chunks_retrieved_xml(chunks: List[Tuple[str, Dict, float]]) -> str:
    """Format retrieved chunks as XML-tagged string.
    
    chunks = [(text, metadata, distance), ...] from RAGService.retrieve()
    """
    parts = []
    for i, (text, meta, _dist) in enumerate(chunks, start=1):
        lines = [f"<chunk_{i}>"]
        lines.append(f"Filename: {meta.get('filename', 'unknown')}")
        page = meta.get("page_number")
        if page is not None:
            lines.append(f"Page: {page}")
        lines.append(f"Content: {text}")
        lines.append(f"</chunk_{i}>")
        parts.append("\n".join(lines))
    return "\n".join(parts)


def format_chunks_filtered_xml(filtered: List[Tuple[str, Dict]]) -> str:
    """Format filtered chunks as XML-tagged string with relevance scores.
    
    filtered = [(text, meta), ...] — score embedded in meta["relevance_score"]
    """
    parts = []
    for i, (text, meta) in enumerate(filtered, start=1):
        lines = [f"<chunk_{i}>"]
        lines.append(f"Filename: {meta.get('filename', 'unknown')}")
        page = meta.get("page_number")
        if page is not None:
            lines.append(f"Page: {page}")
        score = meta.get("relevance_score")
        if score is not None:
            lines.append(f"Relevance: {score}")
        lines.append(f"Content: {text}")
        lines.append(f"</chunk_{i}>")
        parts.append("\n".join(lines))
    return "\n".join(parts)

Fire-and-forget: asyncio.create_task() ensures history recording never blocks the SSE stream. If recording fails, the query completes normally — history is best-effort.

API Endpoints

Method Path Description
GET /api/v1/history List query history (paginated, newest first). Query params: limit (default 50), offset (default 0)
GET /api/v1/history/{query_id} Get full detail for a single query
DELETE /api/v1/history/{query_id} Delete a history record
DELETE /api/v1/history Clear all history
GET /api/v1/history/stats Aggregate stats: total queries, avg time, avg chunks, most used profile

Response Schemas

class QueryHistorySummary(BaseModel):
    id: int
    input_text: str                        # truncated to 100 chars
    total_time_ms: int
    chunks_retrieved_count: int
    chunks_filtered_count: int
    profile_used: str | None               # "A", "B", or "C"
    created_at: str

class QueryHistoryDetail(BaseModel):
    id: int
    input_text: str                        # full text
    extracted_questions: list[str]
    decompose_prompt: str                  # full prompt sent to LLM Call 1
    decomposer_time_ms: int
    retriever_time_ms: int
    chunks_retrieved: str                  # XML-tagged full chunk data
    chunks_retrieved_count: int
    filter_prompt: str                     # full prompt sent to LLM Call 2
    filter_time_ms: int
    chunks_filtered: str                   # XML-tagged filtered chunks with scores
    chunks_filtered_count: int
    generate_prompt: str                   # full prompt sent to LLM Call 3
    generator_time_ms: int
    total_time_ms: int
    final_answer: str
    sources: list[SourceMetadata]
    profile_used: str | None
    created_at: str

class QueryHistoryList(BaseModel):
    queries: list[QueryHistorySummary]
    total: int
    limit: int
    offset: int

2.4 Frontend Design

New page: /history
New NavBar link: "History"

┌──────────────────────────────────────────────────────────┐
│  Query History                    Total: 42 queries       │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  ┌────────────────────────────────────────────────────┐  │
│  │ 📊 Stats                                           │  │
│  │ Avg time: 3.2s · Avg chunks: 8.5 → 4.2 filtered    │  │
│  │ Most used: Profile A (35 queries)                  │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  ┌────────────────────────────────────────────────────┐  │
│  │ #42 · 2026-04-25 14:32 · 3.8s · Profile A          │  │
│  │ "What is the NEC4 clause about time extensions?"   │  │
│  │ 8 chunks → 4 filtered · [Expand ▼]                  │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ #41 · 2026-04-25 14:15 · 2.1s · Profile B          │  │
│  │ "How does arbitration work under the contract?"    │  │
│  │ 10 chunks → 3 filtered · [Expand ▼]                 │  │
│  ├────────────────────────────────────────────────────┤  │
│  │ #40 · 2026-04-25 13:50 · 4.5s · Profile A          │  │
│  │ "Explain the payment mechanism and valuation..."   │  │
│  │ 12 chunks → 6 filtered · [Expand ▼]                 │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  [Load More]                              [Clear All]    │
│                                                           │
│  ── Expanded: #42 ─────────────────────────────────────  │
│                                                           │
│  ⏱ Pipeline Timing:                                      │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Decompose  ████████░░░░░░░░░  0.8s                 │  │
│  │ Retrieve   ██░░░░░░░░░░░░░░░  0.2s  (8 chunks)    │  │
│  │ Filter     ██████████░░░░░░░░  1.1s  (4 kept)      │  │
│  │ Generate   ██████████████████  1.7s                 │  │
│  │ ──────────────────────────────────                  │  │
│  │ Total      ██████████████████  3.8s                 │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  📝 Extracted Questions:                                  │
│  1. What are the time extension provisions?              │
│  2. What notice is required for time extensions?        │
│  3. How is extended time calculated under NEC4?         │
│                                                           │
│  📤 Decompose Prompt:                                    │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Given this question: 'What is the NEC4 clause...'  │  │
│  │ Break it down into 2-5 simplified sub-questions... │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  📥 Retrieved Chunks (8):                                │
│  ┌────────────────────────────────────────────────────┐  │
│  │ <chunk_1>                                          │  │
│  │   Filename: NEC4 ACC.pdf                           │  │
│  │   Page: 3                                          │  │
│  │   Content: Clause 61.3 states that time extensions...│ │
│  │ </chunk_1>                                         │  │
│  └────────────────────────────────────────────────────┘  │
│  (raw XML in collapsible monospace code block)           │
│                                                           │
│  🔍 Filter Prompt:                                       │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Given question 'What is the NEC4...' and these     │  │
│  │ document chunks, rate each 0-10 for relevance...   │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  ✅ Filtered Chunks (4):                                 │
│  ┌────────────────────────────────────────────────────┐  │
│  │ <chunk_1>                                          │  │
│  │   Filename: NEC4 ACC.pdf                           │  │
│  │   Page: 3                                          │  │
│  │   Relevance: 8.5                                   │  │
│  │   Content: Clause 61.3 states that time extensions...│ │
│  │ </chunk_1>                                         │  │
│  └────────────────────────────────────────────────────┘  │
│  (raw XML in collapsible monospace code block)           │
│                                                           │
│  🤖 Generate Prompt:                                     │
│  ┌────────────────────────────────────────────────────┐  │
│  │ Question: What is the NEC4 clause...               │  │
│  │ Answer the question using ONLY these document...   │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  💬 Answer:                                              │
│  ┌────────────────────────────────────────────────────┐  │
│  │ • The time extension provisions are outlined in    │  │
│  │   clause 61.3 [NEC4 ACC.pdf, page 3]               │  │
│  │ • Notice must be given within 8 weeks [NEC4 ACC... │  │
│  │ ...                                                 │  │
│  └────────────────────────────────────────────────────┘  │
│                                                           │
│  📎 Sources (4):  · NEC4 ACC.pdf, page 3  · ...          │
│  📋 Profile used: A                                      │
│                                                           │
└──────────────────────────────────────────────────────────┘

Component tree:

HistoryPage
├── HistoryStats (summary bar: total queries, avg time, avg chunks, most used profile)
├── HistoryList (scrollable list)
│   └── HistoryCard × N (collapsed: date, time, question preview, profile badge)
│       └── HistoryDetail (expanded: timing bars, prompts, chunks, questions, answer, sources)
│           ├── TimingBars (color-coded proportional bars per stage)
│           ├── ExtractedQuestions (numbered list)
│           ├── PromptSection × 3 (decompose_prompt, filter_prompt, generate_prompt — collapsible code blocks)
│           ├── ChunkSection (chunks_retrieved XML — collapsible raw XML in monospace code block)
│           ├── FilteredChunkSection (chunks_filtered XML with scores — collapsible raw XML in monospace code block)
│           ├── AnswerSection (final_answer — rendered markdown)
│           └── SourcesSection (clickable source links)
├── LoadMoreButton
└── ClearAllButton (with confirmation dialog)

Timing bars: Pure CSS — <div className="h-4 rounded bg-blue-500" style={{width: ${(time/total)*100}%}} />. Color-coded: Decompose (blue-400), Retrieve (green-400), Filter (amber-400), Generate (purple-400).

API hooks:

useQueryHistory(limit, offset)     // useQuery: GET /history
useQueryHistoryDetail(id)          // useQuery: GET /history/{id}
useDeleteHistoryRecord(id)         // useMutation: DELETE /history/{id}
useClearHistory()                  // useMutation: DELETE /history
useHistoryStats()                  // useQuery: GET /history/stats

2.5 Acceptance Criteria

  • Every query creates a history record with all timing and data fields
  • GET /api/v1/history?limit=20&offset=0 returns paginated results (newest first)
  • GET /api/v1/history/{id} returns full detail with parsed JSON fields
  • DELETE /api/v1/history/{id} removes one record
  • DELETE /api/v1/history clears all records
  • GET /api/v1/history/stats returns aggregate statistics
  • History recording is fire-and-forget — never blocks query response
  • History page accessible via NavBar link
  • Timing bars accurately represent stage proportions
  • Expanded detail shows answer rendered as markdown with citation links
  • Sources show clickable links to PDF viewer
  • All states: loading, empty, error, success
  • Profile used is shown for each query
  • All backend + frontend tests pass

Sub-Phase Breakdown

Sub-Phase Feature Difficulty Backend Frontend Depends On
3.1 SQLite Infrastructure Medium sqlite_db.py (dual-DB factories), config, table creation, seed data None
3.2 Prompt Backend Hard prompt_service.py, prompts router, models, template formatting None 3.1
3.3 Prompt Frontend Page Medium None SystemPromptsPage, ProfileList, PromptEditor, placeholder docs 3.2
3.4 Service Refactoring (Template Injection) Hard query_decomposer, relevance_filter, rag.py, query.py None 3.2
3.5 History Backend Hard history_service.py, history router, models, query.py timing capture None 3.1, 3.4
3.6 History Frontend Page Medium None HistoryPage, HistoryList, HistoryDetail, timing bars 3.5

Dependency Graph

3.1 (SQLite Infra)
 │
 ├──► 3.2 (Prompt Backend)
 │       │
 │       ├──► 3.3 (Prompt Frontend)     ← parallel with 3.4
 │       │
 │       └──► 3.4 (Service Refactoring)
 │               │
 │               └──► 3.5 (History Backend)
 │                       │
 │                       └──► 3.6 (History Frontend)
  • 3.1 is the foundation
  • 3.2 blocks 3.3 and 3.4 (both need the prompt service)
  • 3.3 and 3.4 run in PARALLEL after 3.2
  • 3.5 needs 3.1 (history DB) AND 3.4 (refactored pipeline for timing capture)
  • 3.6 needs 3.5 (history API)

Sub-Phase 3.1: SQLite Infrastructure Medium

Objective

Introduce SQLite with two separate databases: prompts.db for prompt templates and history.db for query history. Create connection factories, table schemas, and default seed data.

Database Technology

Decision: sqlite3 stdlib — zero new dependencies. Lightweight operations, adequate for single-user desktop app.

Changes Required

File Change
backend/app/core/sqlite_db.py NEWget_prompts_db() and get_history_db() connection factories; init_prompts_db(), init_history_db() table creation; seed_default_profiles()
backend/app/core/config.py Add prompts_db_path: str = "./data/prompts.db" and history_db_path: str = "./data/history.db"
backend/app/main.py Startup event: create data/ dir, init both DBs, seed default profiles
backend/.env.example Add PROMPTS_DB_PATH and HISTORY_DB_PATH
backend/.gitignore Add data/ directory

sqlite_db.py design:

import sqlite3, os
from app.core.config import get_settings

def _get_db(db_path: str) -> sqlite3.Connection:
    """Shared connection factory (caller must close)."""
    os.makedirs(os.path.dirname(db_path), exist_ok=True)
    conn = sqlite3.connect(db_path)
    conn.row_factory = sqlite3.Row
    conn.execute("PRAGMA journal_mode=WAL")
    conn.execute("PRAGMA foreign_keys=ON")
    return conn

def get_prompts_db() -> sqlite3.Connection:
    return _get_db(get_settings().prompts_db_path)

def get_history_db() -> sqlite3.Connection:
    return _get_db(get_settings().history_db_path)

Acceptance Criteria

  • backend/data/prompts.db created on first startup with profile + prompt tables
  • backend/data/history.db created on first startup with query_history table + index
  • 3 profiles (A/B/C) seeded with current hardcoded prompts as default templates
  • Profile A active by default
  • data/ directory gitignored
  • Both DB paths configurable via .env
  • Existing pytest tests still pass

Sub-Phase 3.2: Prompt Backend Hard

Objective

Create the prompt service layer: Pydantic models, CRUD service, template formatting, REST API endpoints.

Changes Required

File Change
backend/app/models/prompts.py NEWPromptProfile, PromptSetResponse (3 prompts), PromptUpdateRequest
backend/app/services/prompt_service.py NEWPromptService: get_profile, list_profiles, activate, get_template, update_prompt, update_all, format_prompt
backend/app/routers/prompts.py NEW — 5 endpoints on /api/v1/prompts
backend/app/core/dependencies.py Add get_prompt_service()
backend/app/main.py Register prompts router

PromptService key methods:

class PromptService:
    def get_active_profile_name(self) -> str:
        """Return "A", "B", or "C" — which profile is active."""
    
    def get_prompt_template(self, step: str) -> str:
        """Get the template for the active profile + given step ("decompose"/"filter"/"generate")."""
    
    def list_profiles(self) -> list[dict]:
        """Return [{name: "A", is_active: true}, ...]."""
    
    def activate_profile(self, name: str) -> None:
        """Set is_active=1 for name, is_active=0 for others. Validates name in {A, B, C}."""
    
    def get_profile_prompts(self, name: str) -> dict:
        """Return {"decompose": "...", "filter": "...", "generate": "..."}."""
    
    def update_prompt(self, name: str, step: str, template: str) -> None:
        """Update single template. Validates step in {decompose, filter, generate}."""
    
    def update_all_prompts(self, name: str, prompts: dict) -> None:
        """Batch update all 3 templates."""
    
    def reset_to_defaults(self, name: str, step: str | None = None) -> None:
        """Restore seed template. If step is None, reset all 3 steps. Otherwise reset only that step."""

Acceptance Criteria

  • GET /api/v1/prompts/profiles returns A/B/C with active status
  • PUT /api/v1/prompts/profiles/B/activate switches active profile (only one at a time)
  • PUT /api/v1/prompts/profiles/A/decompose updates template and persists across restarts
  • PUT /api/v1/prompts/profiles/A/all batch-updates all 3 templates
  • Invalid profile name (e.g., "D") returns 400
  • Invalid step name (e.g., "summarize") returns 400
  • Active profile is fetched fresh per query (no caching)
  • All tests pass: test_phase3_prompt_service.py, test_phase3_prompts_router.py

Sub-Phase 3.3: Prompt Frontend Page Medium

Objective

Build the System Prompts page at /system-prompts with profile switching and full template editing.

Changes Required

File Change
frontend/src/pages/SystemPromptsPage.tsx NEW
frontend/src/components/ProfileList.tsx NEW — 3 cards (A/B/C)
frontend/src/components/PromptEditor.tsx NEW — 3 textareas + placeholder docs + save/reset/cancel
frontend/src/components/PlaceholderDocs.tsx NEW — info box listing available placeholders
frontend/src/lib/api.ts Add 5 prompt API functions
frontend/src/lib/queries.tsx Add TanStack Query hooks
frontend/src/types/index.ts Add prompt-related types
frontend/src/App.tsx Add /system-prompts route
frontend/src/components/NavBar.tsx Add "System Prompts" nav link

Acceptance Criteria

  • Page accessible via NavBar
  • 3 profiles shown: A (active ●), B (○), C (○)
  • "Set Active" switches active profile
  • Editing a profile shows 3 labeled textareas with current templates
  • Each textarea labeled with available placeholders
  • Placeholder docs info box visible
  • "Save Changes" persists; "Reset to Defaults" restores seed template; "Cancel" reverts
  • Soft warning if template references unknown placeholder
  • All states: loading, error, success
  • Frontend tests pass

Sub-Phase 3.4: Service Refactoring (Template Injection) Hard

Objective

Refactor all 3 LLM-calling services to fetch prompt templates from the DB instead of using hardcoded strings. Wire the query router to pass PromptService through the pipeline.

Changes Required

File Change
backend/app/services/query_decomposer.py Accept PromptService; decompose() fetches template, replaces {question}, calls LLM
backend/app/services/relevance_filter.py Accept PromptService; filter() fetches template, replaces {question} and {chunks}, calls LLM
backend/app/services/rag.py Accept PromptService; generate_response() fetches template, replaces {question} and {context}, calls LLM
backend/app/routers/query.py Instantiate PromptService at pipeline start; pass to all services; capture active_profile_name
backend/app/test/conftest.py Add mock_prompt_service fixture
backend/app/test/test_phase1_query_decomposer.py Update tests for PromptService dependency
backend/app/test/test_phase1_relevance_filter.py Update tests
backend/app/test/test_phase1_rag_service.py Update tests

Before/After per service:

Service Before (hardcoded) After (template from DB)
QueryDecomposer.decompose() f"Given this question: '{question}'\n\nBreak it down..." template.replace("{question}", question)
RelevanceFilter._build_prompt() f"Given question '{question}'...{chunks_formatted}" template.replace("{question}", question).replace("{chunks}", chunks_formatted)
RAGService.generate_response() f"Question: {question}\n\nAnswer...{context}\n\nAnswer:" template.replace("{question}", question).replace("{context}", context)

LLMClient.complete() — NO CHANGES. Templates remain single user-role messages.

Acceptance Criteria

  • All 3 LLM calls use templates from the active profile in the DB
  • Placeholders correctly replaced: {question} → user input, {chunks} → numbered list, {context} → formatted chunks with citations
  • Switching active profile changes prompts for NEXT query
  • If template is empty string, LLM call proceeds with empty prompt (LLM error is acceptable)
  • All existing tests pass (updated for PromptService dependency)
  • New tests: test_phase3_prompt_injection.py

Sub-Phase 3.5: History Backend Hard

Objective

Capture timing and data from every pipeline stage and persist to history.db. Expose REST API for browsing.

Changes Required

File Change
backend/app/models/history.py NEWQueryHistoryRecord, QueryHistorySummary, QueryHistoryDetail, QueryHistoryList
backend/app/services/history_service.py NEWHistoryService: record, list (paginated), get, delete, clear_all, get_stats
backend/app/routers/history.py NEW — 5 endpoints on /api/v1/history
backend/app/routers/query.py Add time.perf_counter() around each stage; capture prompts from service return values; format chunks as XML; asyncio.create_task(history_service.record(...)) at end
backend/app/services/relevance_filter.py MODIFYfilter() must embed meta["relevance_score"] for each surviving chunk; return (filtered, prompt_used) alongside result
backend/app/services/query_decomposer.py MODIFYdecompose() must return (questions, prompt_used) alongside result
backend/app/services/rag.py MODIFYgenerate_response() must return (answer, prompt_used) alongside result
backend/app/core/dependencies.py Add get_history_service()
backend/app/main.py Register history router

Service return type changes (all 3 services return prompt alongside result):

Method Before After
QueryDecomposer.decompose(question) → List[str] → Tuple[List[str], str](questions, prompt_used)
RelevanceFilter.filter(question, chunks, threshold) → List[Tuple[str, Dict]] → Tuple[List[Tuple[str, Dict]], str](filtered, prompt_used)
RAGService.generate_response(question, chunks, metadata) → str → Tuple[str, str](answer, prompt_used)

All service internals remain unchanged — they still build the prompt and call the LLM themselves. Only the return signature adds the prompt string.

Timing stages captured: decompose, retrieve, filter, generate, total.

Data captured per stage:

  • Stage 1 (Decompose): prompt sent, response time, extracted questions
  • Stage 2 (Retrieve): response time, all chunks as XML (filename, page, content), chunk count
  • Stage 3 (Filter): prompt sent, response time, filtered chunks as XML (filename, page, relevance score, content), chunk count
  • Stage 4 (Generate): prompt sent, response time, final answer

XML formatting helpers — Two utility functions in query.py or a shared utils/ module:

  • format_chunks_retrieved_xml(chunks) — converts [(text, meta, distance), ...] to XML
  • format_chunks_filtered_xml(filtered) — converts [(text, meta, score), ...] to XML with relevance scores

Acceptance Criteria

  • Every query creates a history record with all fields (including 3 LLM prompts and 2 chunk XML strings)
  • All 5 history API endpoints work correctly
  • Pagination: limit + offset, newest first
  • Stats endpoint: total queries, avg times, avg chunks, most used profile
  • History recording is fire-and-forget (never blocks query)
  • History persists across restarts
  • decompose_prompt, filter_prompt, generate_prompt record the exact prompt sent to each LLM call
  • chunks_retrieved contains full XML with filename, page, content per chunk
  • chunks_filtered contains full XML with filename, page, relevance score, content per chunk
  • RelevanceFilter.filter() returns scores alongside filtered chunks
  • chunks_retrieved_count and chunks_filtered_count are accurate integer counts
  • All tests pass: test_phase3_history_service.py, test_phase3_history_router.py, test_phase3_query_history_integration.py

Sub-Phase 3.6: History Frontend Page Medium

Objective

Build the History page at /history with scrollable list, expandable detail, timing bars, and stats.

Changes Required

File Change
frontend/src/pages/HistoryPage.tsx NEW
frontend/src/components/HistoryList.tsx NEW
frontend/src/components/HistoryCard.tsx NEW — collapsed card + expandable detail
frontend/src/components/TimingBar.tsx NEW — CSS-width proportional bars
frontend/src/lib/api.ts Add 5 history API functions
frontend/src/lib/queries.tsx Add TanStack Query hooks
frontend/src/types/index.ts Add history types
frontend/src/App.tsx Add /history route
frontend/src/components/NavBar.tsx Add "History" nav link

Acceptance Criteria

  • Page accessible via NavBar
  • Stats bar: total, avg time, avg chunks, most used profile
  • History list: paginated, newest first, shows date/time/duration/input preview/profile badge
  • Expand card: timing bars, extracted questions, full answer (markdown), sources (clickable)
  • Expanded detail shows all 3 LLM prompts (collapsible sections)
  • Expanded detail shows retrieved chunks XML (collapsible, formatted)
  • Expanded detail shows filtered chunks XML with relevance scores (collapsible, formatted)
  • "Load More" pagination
  • "Clear All" with confirmation
  • Individual delete with confirmation
  • All states: loading skeleton, empty "No queries yet", error with retry
  • Frontend tests pass

New Dependencies

Zero. sqlite3 is Python stdlib. All UI is custom Tailwind. No new npm or pip packages.


Directory Structure After Package 3

legco_reranker/
├── backend/
│   ├── app/
│   │   ├── core/
│   │   │   ├── config.py              # + prompts_db_path, history_db_path
│   │   │   ├── database.py            # (unchanged - ChromaDB)
│   │   │   ├── dependencies.py        # + get_prompt_service, get_history_service
│   │   │   └── sqlite_db.py           # NEW - dual-DB connection factories
│   │   ├── models/
│   │   │   ├── history.py             # NEW
│   │   │   └── prompts.py             # NEW
│   │   ├── routers/
│   │   │   ├── history.py             # NEW
│   │   │   ├── prompts.py             # NEW
│   │   │   └── query.py               # MODIFIED - timing capture + template injection
│   │   ├── services/
│   │   │   ├── history_service.py     # NEW
│   │   │   ├── prompt_service.py      # NEW - template storage + formatting
│   │   │   └── query_decomposer.py    # MODIFIED - use PromptService for templates
│   │   │   └── rag.py                 # MODIFIED - use PromptService for templates
│   │   │   └── relevance_filter.py    # MODIFIED - use PromptService for templates
│   │   ├── test/
│   │   │   ├── test_phase3_prompt_service.py       # NEW
│   │   │   ├── test_phase3_prompts_router.py       # NEW
│   │   │   ├── test_phase3_prompt_injection.py     # NEW
│   │   │   ├── test_phase3_history_service.py      # NEW
│   │   │   ├── test_phase3_history_router.py       # NEW
│   │   │   ├── test_phase3_query_history_integration.py  # NEW
│   │   │   ├── test_phase1_query_decomposer.py      # MODIFIED
│   │   │   ├── test_phase1_relevance_filter.py      # MODIFIED
│   │   │   └── test_phase1_rag_service.py           # MODIFIED
│   │   └── main.py                    # MODIFIED - startup init + new routers
│   ├── data/                          # NEW (gitignored)
│   │   ├── prompts.db
│   │   └── history.db
│   └── .env.example                   # + PROMPTS_DB_PATH, HISTORY_DB_PATH
├── frontend/src/
│   ├── components/
│   │   ├── HistoryCard.tsx            # NEW
│   │   ├── HistoryList.tsx            # NEW
│   │   ├── NavBar.tsx                 # MODIFIED - +2 nav links
│   │   ├── PlaceholderDocs.tsx        # NEW
│   │   ├── ProfileList.tsx            # NEW
│   │   ├── PromptEditor.tsx           # NEW
│   │   └── TimingBar.tsx              # NEW
│   ├── pages/
│   │   ├── HistoryPage.tsx            # NEW
│   │   └── SystemPromptsPage.tsx      # NEW
│   ├── lib/
│   │   ├── api.ts                     # MODIFIED - +history +prompts endpoints
│   │   └── queries.tsx               # MODIFIED - +history +prompts hooks
│   ├── types/index.ts                 # MODIFIED - +history +prompts types
│   └── App.tsx                        # MODIFIED - +2 routes
└── .gitignore                         # + data/

Test Plan

Backend Tests (New)

File Coverage Sub-Phase
test_phase3_prompt_service.py Prompt CRUD, activation, template formatting, edge cases 3.2
test_phase3_prompts_router.py All 5 HTTP endpoints, error codes, validation 3.2
test_phase3_prompt_injection.py Templates fetched from DB, placeholders replaced, end-to-end query uses templates 3.4
test_phase3_history_service.py History CRUD, pagination, stats, edge cases 3.5
test_phase3_history_router.py All 5 HTTP endpoints, pagination bounds, empty DB 3.5
test_phase3_query_history_integration.py Full SSE query → history record created with correct data 3.5

Backend Tests (Modified)

File Change Sub-Phase
test_phase1_query_decomposer.py Add PromptService dependency to test setup 3.4
test_phase1_relevance_filter.py Add PromptService dependency 3.4
test_phase1_rag_service.py Add PromptService dependency 3.4
conftest.py Add mock_prompt_service fixture 3.2

Frontend Tests (New)

File Coverage Sub-Phase
SystemPromptsPage.test.tsx Page render, profile list, activation, edit flows 3.3
ProfileList.test.tsx A/B/C cards, active indicator, edit button 3.3
PromptEditor.test.tsx 3 textareas, placeholder docs, save/reset/cancel 3.3
HistoryPage.test.tsx Page render, stats, pagination, clear all 3.6
HistoryCard.test.tsx Collapsed/expanded states, timing bars, answer, sources 3.6
TimingBar.test.tsx Proportional widths, zero-time stages, color mapping 3.6

Acceptance Tests

File Coverage Sub-Phase
test_acceptance_package3_prompts.py Create profile → edit templates → activate → query uses new templates 3.2-3.4
test_acceptance_package3_history.py Multiple queries → history shows correct records with timing + profile 3.5

Risks & Mitigations

Risk Impact Mitigation
User removes {question} placeholder → LLM doesn't see the question LLM returns irrelevant or empty response UI shows soft warning; user's choice — they can always reset to defaults
str.replace() is case-sensitive → {Question} not recognized Placeholder left as-is in prompt UI documents exact placeholder names; preview mode could highlight unresolved placeholders
sqlite3 sync calls block async event loop Slow responses under load Operations are trivial (single-row lookups). History recording is fire-and-forget. WAL mode for concurrent reads.
History DB grows unbounded Disk usage (exacerbated by XML chunk data and full LLM prompts per query) Manual cleanup via "Clear All" button. Future: auto-prune config. XML chunks are 5-50KB per query — acceptable for SQLite desktop app.
data/ directory not created on startup SQLite connection fails os.makedirs(dirname, exist_ok=True) in connection factory
User expects {question} to work in filter/generate templates Might add it in wrong context Placeholder docs on page show exactly which placeholders are valid per step
Two separate DB files complicate backups User might backup one but not the other Use same data/ directory — easy to back up as one folder

Decisions

# Question Decision
1 Template editing scope Full prompt template with {placeholder} variables — users edit the entire message sent to LLM
2 System role vs user role User role only — no system prompt concept. Templates are the full user message (same as current).
3 Number of profiles Fixed 3 (A, B, C) — no create/delete. Simplest mental model.
4 Database separation Two files: prompts.db and history.db — independent concerns
5 Database technology sqlite3 stdlib — zero new dependencies
6 Placeholder syntax {variable_name} with str.replace() — simple, predictable. No str.format() edge cases.
7 History recording reliability Fire-and-forget (asyncio.create_task) — never blocks query response
8 History data retention Manual cleanup only in Package 3
9 Timing capture location Inline in query.py — centralized, one file changes
10 Frontend timing visualization CSS width bars — no charting library
11 History pagination Offset-based (limit + offset)
12 NavBar order LTT · RAG Database · System Prompts · History
13 Default seed templates All 3 profiles start identical (current hardcoded prompts) — users customize from a common baseline
14 Reset button granularity Both — per-step reset icon (↺) on each textarea label, plus "Reset All to Defaults" button in the action bar
15 Chunk data in history XML-tagged TEXT — full chunk data as <chunk_N>Filename: ...\nPage: ...\nContent: ...\n</chunk_N>. Separate count columns for fast list queries.
16 LLM prompts in history 3 separate TEXT columns (decompose_prompt, filter_prompt, generate_prompt) — the exact prompt sent to each LLM call
17 Filtered chunk scores RelevanceFilter.filter() embeds score in meta["relevance_score"] — no tuple format change, zero impact on existing callers
18 Prompt capture approach Services return prompt alongside resultdecompose() returns (questions, prompt), filter() returns (filtered, prompt), generate_response() returns (answer, prompt). No separate build_prompt() methods.
19 Chunk XML display on frontend Raw XML in monospace code blocks — collapsible <pre> showing the exact stored XML string. Copy-paste friendly, no frontend parsing.

Pre-Implementation Checklist

Before starting implementation, verify:

  • All existing backend tests pass (cd backend && pytest app/test/ -v)
  • All existing frontend tests pass (cd frontend && npm test)
  • AGENTS.md updated to reflect current project state (no longer "Greenfield")
  • Plan reviewed and approved by user