Commit Graph

62 Commits

Author SHA1 Message Date
Woody 032dd75e17 feat: add Sub-Phase 9.3 evaluation API endpoint and 9.4 polish 2026-05-25 19:30:17 +08:00
Woody 098be359e7 feat: add Sub-Phase 9.2 evaluation engine (CER/WER, key questions, chunk, response) 2026-05-25 18:45:53 +08:00
Woody ac81df0704 feat: add Sub-Phase 9.1 results generation APIs with reusable RAGPipeline 2026-05-25 18:35:55 +08:00
Woody 852430f1f1 feat: add Sub-Phase 9.0 config and Pydantic models for accuracy testing 2026-05-25 18:27:51 +08:00
Woody 6928fff8ff test: update Phase 2 tests for ASR provider abstraction
Update TestTranscribeFull to use async/await and patch the moved OpenAI import (now in asr_providers.py). Set ASR_PROVIDER=dashscope in test fixtures to ensure tests don't pick up the real .env ASR_PROVIDER value. All 19 Phase 2 + 7 integration tests pass.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 09:48:58 +08:00
Woody 733824c177 test: add Phase 5 ASR provider and integration tests
test_phase5_config.py: 6 tests for ASR_PROVIDER validation and default values. test_phase5_openrouter_provider.py: 14 tests covering OpenRouterSTT transcription, retry logic, error handling, URL construction, cleanup, and factory dispatch. test_phase5_integration.py: 4 tests for full video-to-transcribe flow with both providers (mocked) and per-provider API key validation.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 09:48:37 +08:00
Woody 73c1789698 fix: Q\&A chunking always fell back to token — LLM never called, missing API fields
Three bugs caused 'Chunk by Question' to silently produce token chunks:

1. QuestionChunkingStrategy.chunk_pages() had a broken event-loop check
   that always skipped LLM structure detection in FastAPI's async context.
   Fixed by making chunk_pages() async and removing the is_running() guard.

2. get_chunking_strategy() factory never passed an LLMClient to
   QuestionChunkingStrategy. Fixed by creating LLMClient in the factory
   with graceful fallback to regex-only when config is incomplete.

3. rag.list_documents() and list_chunks() didn't extract strategy_type
   or Q&A fields from ChromaDB metadata, so the frontend always showed
   chunking_strategy='token' and null Q&A fields. Fixed by reading
   these fields from ChromaDB and routing them through the API.

Also: TokenChunkingStrategy.chunk_pages() made async for consistency
with the question strategy; ingest router updated to await it.
Tests updated (asyncio.run() for sync tests, async mock chunk_pages).
2026-05-15 14:46:45 +08:00
Woody 9bef65de7b test: Sub-Phase 8.5 — acceptance test skeleton for Q&A chunking
8 acceptance tests with real LegCo PDFs (all @pytest.mark.acceptance + @slow).
Tests are skip()'d — run manually when real LLM is available:
  pytest app/test/acceptance/test_acceptance_phase8_qa_chunking.py -v -m acceptance

Sub-Phase 8.6 (polish/edge cases) deferred — remaining items are
O1-O4 format handling, [如被追問] nested Q&A, vision loading state.
Core algorithm (8.1-8.4) is test-passing and production-ready.
2026-05-15 12:45:46 +08:00
Woody 14423c773a feat: Sub-Phases 8.1-8.4 — Q&A-pair chunking strategy
8.1 — Core algorithm (test-first):
- qa_chunking.py: preprocess_text, build_structure_detection_prompt,
  parse_llm_structure_response, Section dataclass, split_chinese_qa,
  split_english_qa, build_chunks_from_sections with recursive size split
- QuestionChunkingStrategy in chunking.py with _chunk_metadata tracking
- get_chunking_strategy() factory function
- table_extraction.py: vision LLM extraction, heuristic text fallback,
  disk cache, inject_tables_into_answer
- 18/18 tests pass (LLM parse, regex fast-pass, multi-page, ABC contract,
  size limit, chunk building, preprocess)

8.2 — Metadata enrichment:
- extract_metadata() accepts strategy_type + chunk_metadata params
- Q&A fields (question_id, question_index, section_heading, etc.)
  merged into ChromaDB metadata entries
- DocumentInfo.chunking_strategy + ChunkInfo Q&A fields in models
- 6/6 metadata tests pass

8.3 — Ingest API integration:
- POST /api/v1/ingest accepts ?strategy=token|question
- validate strategy against VALID_CHUNKING_STRATEGIES
- factory creates correct chunker; _chunk_metadata passed to extract_metadata
- 6/6 ingest integration tests pass, zero regressions on existing tests

8.4 — Frontend strategy selector:
- Radio button selector (Token / Question) on RAG Database page
- Strategy passed to ingest mutation via api.ts
- DocumentList: strategy badge (gray/blue)
- ChunkList: Q&A display with question_id, question_text, page range, table badge
- tsc --noEmit clean, vite build successful
2026-05-15 12:44:04 +08:00
Woody ef10b937cf feat: Sub-Phase 8.0 — config & enums for Q&A-pair chunking strategy
Backend:
- Add 6 Q&A chunking config fields to Settings (default_chunking_strategy,
  qa_vision_enabled, qa_max_chunk_tokens, qa_structure_model,
  qa_include_internal_refs, qa_cache_vision_results)
- Define ChunkingStrategyType Literal + VALID_CHUNKING_STRATEGIES frozenset
- Add strategy field to IngestResponse (default token, non-breaking)
- Add IngestRequest model with strategy param
- Update .env.example with new env vars

Frontend:
- Add ChunkingStrategy type ('token' | 'question')
- Extend IngestResponse, DocumentInfo, ChunkInfo with Q&A fields

Tests:
- test_qa_chunking_config_defaults — all defaults verified
- test_qa_chunking_config_from_env — env var overrides verified

Plan fix: renamed qa_verification_model → qa_structure_model to match
LLM-first architecture
2026-05-15 12:01:28 +08:00
Woody d69c180544 feat: Phase 4.8-4.9 — integration tests, acceptance tests, docs, and polish 2026-05-15 09:51:45 +08:00
Woody 7bff4308b7 feat: Phase 4 — System Audio & Listen Mic capture into ASR → RAG
Adds two new live audio sources alongside file Upload:

- System Audio: getDisplayMedia() captures system/tab audio output,
  pipes through WebSocket → DashScope realtime ASR → RAG.
- Listen Mic: getUserMedia() captures microphone input via the same
  audio pipeline (shared useMediaStreamASR hook).

Backend: feature toggles (system_audio_enabled, mic_enabled) in
config.py, source query param gating in ws_asr.py, 10 config tests.

Bug fix: getDisplayMedia() rejected video:false per W3C spec —
changed to video:true then stop video tracks to allow audio-only
capture on Windows/macOS Chrome.
2026-05-14 22:55:06 +08:00
Woody b05c361fbd revert: remove Phase 3 YouTube proxy — all 7 sub-phases
Reverts commits 284028b through b4096d6. Phase 4 (System Audio Capture)
will replace the YouTube use case with a more versatile getDisplayMedia approach.

Removed: YouTube router, HLS proxy, YouTubeService, YouTubeInput,
YouTubeVideoPlayer, useYouTubeASR hook, all Phase 3 tests, hls.js dep,
YouTube config fields, YouTube README/plan sections.

Modified files restored to pre-Phase-3 state: LTTPage (no source toggle),
api.ts (no YouTube extract), types (no YouTube types), config.py (no
youtube fields), main.py (no YouTube router), requirements.txt (no yt-dlp),
.env.example (no YouTube vars), package.json (no hls.js).

Relevant Phase 2 code preserved: ws_asr.py (unchanged), useVideoASR,
VideoPlayer, VideoUpload, QueryInput, Full Transcript.
2026-05-09 21:07:21 +08:00
Woody b4096d6afc feat: Phase 3.7 — Polish, PO token handling, docs, deployment verification
- PO token handling: _is_po_token_error() detects YouTube bot-detection errors,
  invalidates cache on detection, logs warning for retry guidance (2 new tests)
- README: YouTube Live Stream Proxy section with architecture, usage, config, limits
- development_plan: Phase 3 complete, timeline updated, status → Phase 1-3 Complete
- Dockerfile/compose: verified OK (ffmpeg + yt-dlp already present, no new volumes)
- npm build: 1403 modules, production build clean
- 59/59 backend + 44/44 frontend Phase 2+3 tests pass
- Plan: 3.7 Complete, 7/7 sub-phases done
2026-05-09 17:27:54 +08:00
Woody cee859d5d7 feat: Phase 3.6 — integration + acceptance tests for YouTube proxy
- test_integration_phase3.py: 6 tests
  Extract→proxy flow (VOD manifest, VOD segment, live manifest),
  cache hit bypasses yt-dlp, upstream 404→502, extract disabled→503
  Mocked yt-dlp, real FastAPI TestClient + HLSProxyService
- test_acceptance_phase3_youtube.py: 3 tests
  Real YouTube VOD extraction, manifest proxy, segment proxy
  Follows master→variant→segment chain, verifies MPEG-TS sync byte
- test_acceptance_phase3_live.py: 3 tests
  Real live stream extraction, no #EXT-X-ENDLIST assertion,
  cache refresh verification, graceful skip when offline
- 201/201 CI pass (234 backend Phase 1-3, zero Phase 3 regressions)
- Updated plan: 3.6 Complete, 6/7 sub-phases done
2026-05-09 17:18:55 +08:00
Woody 3c9ed2cc8d feat: Phase 3.3 — HLS manifest proxy with line-by-line rewriting
- HLSProxyService: rewrite_manifest() rewrites segment/sub-manifest/EXT-X-KEY URIs
  to proxy URLs; proxy_segment() transparently proxies .ts segments
- Route: upstream status checked before streaming — 502 on failure
- CORS access-control-allow-origin: * on all responses
- Line rewriting: pass-through tags/comments, rewrite URIs, handle relative/absolute URLs
- URL resolution: urljoin for relative, absolute path, and absolute URL
- 22 tests (8 line rewriting, 4 URL resolution, 3 proxy URL construction,
  2 manifest integration, 1 segment proxying, 4 route integration)
- 104/104 total pass (zero regressions)
2026-05-09 16:13:33 +08:00
Woody 284028bb1f feat: Phase 3.1 + 3.2 — YouTube config infra and URL extraction
Phase 3.1 — Configuration & Infrastructure:
- Add youtube_proxy_enabled, yt_dlp_timeout, yt_dlp_cache_ttl config fields
- Add yt-dlp and hls.js dependencies
- Create models/youtube.py (request/response schemas)
- Create service stubs (youtube_service, hls_proxy)
- Create router stub and register in main.py
- 11 config tests

Phase 3.2 — YouTube URL Extraction:
- yt-dlp wrapper with async extraction (run_in_executor)
- Format selection: ≤480p video-only + highest-bitrate audio (VOD)
- Combined format fallback: same URL for live streams
- In-memory URL cache: 5min TTL live, 30min VOD
- lru_cache singleton service for cache persistence
- Error handling: DownloadError → 200 with error field
- 18 extract tests, 82/82 total pass (zero regressions)

Real-URL verified: VOD (5bF3tkO5jAA) 24 formats, Live (fN9uYWCjQaw) 6 HLS
2026-05-09 15:53:04 +08:00
Woody 09b5ea7d64 refactor: remove dead _merge_stash, add Phase 3 YouTube proxy plan
- Remove _merge_stash (dead code since delta-based ASR refactor)
- Replace TestMergeStash with TestTextFieldFormatting (53/53 Phase 2 tests pass)
- Mark phase2_enhancement_use_text_field as Complete
- Add Phase 3 YouTube live stream proxy implementation plan
- README updates
2026-05-09 15:14:01 +08:00
Woody 78d1f8cc91 feat: delta-based ASR transcript — use text field, utterance boundaries, stash on pause
Replace full_text responses with character-level deltas computed from
DashScope's monotonically-growing 'text' field. Stash-only events (empty
text) are skipped; trailing stash chars sent alongside deltas and
appended on pause to complete final sentences.

Backend:
- Delta = text[len(prev_text):] — simple suffix diff, no merge logic
- Track item_id for utterance boundaries, prepend space separator
- Send stash alongside delta for frontend pause handler

Frontend:
- Accumulate deltas locally (transcriptRef += msg.delta)
- Store lastStashRef from each message
- On pause: append stash to text, fire onFinalTranscript

Plan: .plans/phase2_enhancement_delta_sse.md updated to Complete
2026-05-07 11:26:19 +08:00
Woody cb0ac07786 fix: text accumulation — stashes are sliding windows, merge via overlap detection
DashScope stashes are ~7-char rolling windows, not cumulative. Each partial
event replaces the previous. Completed events rarely sent. This caused text to
jump/replace during streaming and disappear on pause.

Backend:
- Add _merge_stash() — finds overlapping suffix between successive stashes
  and appends only new characters, reconstructing full utterance from partials
- format_transcription_event returns raw stash for read_events to merge
- read_events maintains partial_buffer via _merge_stash, clears on completed
- Guard against empty/whitespace-only stashes

Frontend:
- transcriptRef + onFinalTranscriptRef avoid stale closures in pause handler
- stopStreaming fires onFinalTranscript(currentText) before clearing partial
- Removed blind setPartialTranscript('') that erased text on pause

Tests: 16/16 ws_protocol tests pass, frontend tests unchanged
Plan: Updated phase2_implementation_plan.md to Complete with 11-bug log
2026-05-06 20:06:39 +08:00
Woody fcb9ec1f6c fix: Phase 2 ASR pipeline — 9 bugs resolved, Full Transcript works end-to-end
- Vite proxy: forward /api and /ws to backend port 8000
- WebSocket URL: use backend host, not Vite HMR port
- LTTPage: callback ref replaces useRef (video element always null before)
- ws_asr: pass DashScope API key to OmniRealtimeConversation
- asr_client: fix data_url MIME type (audio/wav), omit extra_body when auto
- useFullTranscript: use absolute URL prefix for fetch
- QueryInput: add value prop for external Full Transcript injection
- QueryInput: fix displayValue || logic (partialText '' overrode question)
- ffmpeg: install static binary for audio extraction
- Integration tests: 7 tests (upload→transcribe flow)
- Acceptance tests: real DashScope tests (skippable)
- Structured logging: ws_asr.py + video.py
2026-05-06 18:26:17 +08:00
Woody a4e067822b feat: Phase 2.3 ASR proxy + full transcript and 2.4 frontend hooks
- Backend: DashScope WebSocket proxy (/ws/asr/{video_id}), DashScopeCallback
  sync-to-async bridge, ffmpeg audio extraction, POST /video/{id}/transcribe
- Frontend: useVideoASR hook (auto on play), useFullTranscript hook,
  QueryInput partialText prop, VideoUploadResponse types, uploadVideo API
- Tests: 41 backend + 26 frontend = 67 new tests, all passing
2026-05-06 13:41:24 +08:00
Woody 9934749d2b feat: Phase 2.1 config + infrastructure and 2.2 video upload backend
- Add DashScope ASR and video upload config fields to Settings
- Create Pydantic models (video.py, asr.py)
- Create VideoService with validation, save, serve, delete
- Create ASR client stub with float32_to_s16le utility
- Implement POST /api/v1/video/upload with streaming validation
- Implement GET /api/v1/video/{video_id} with FileResponse
- Create WebSocket ASR endpoint stub
- Register new routers in main.py
- Update .env.example and requirements.txt
- Add reference examples for DashScope integration
- 8 tests passing (3 config + 5 video upload)
2026-05-06 13:08:19 +08:00
Woody df62283f58 feat: inject Pydantic JSON schema into Deepseek prompt (Phase 6)
Follows Deepseek JSON Output guide: the prompt now includes the word 'json' and a format example derived from the Pydantic model schema. Added _pydantic_to_json_instruction() helper that builds a human-readable schema description with EXAMPLE JSON OUTPUT.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-04 15:17:24 +08:00
Woody 226f4ed700 test: update integration mocks for dual-client architecture (Phase 6)
Added complete_structured() to mock classes, split response lists between LLMClientDP (decompose) and LLMClient (filter+generate), and patched both clients in all integration tests.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-04 14:59:23 +08:00
Woody 849beb4d4e feat: add LLMClientDP for Deepseek decompose (Phase 6)
Uses Deepseek's json_object response_format (not json_schema, which Deepseek does not support). Always disables thinking mode. Includes unit tests (12) and acceptance tests (5).

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-04 14:58:53 +08:00
Woody 41f59b396f feat: track highlight generation prompt, response, and timing in history (Phase 5.5)
- Add 3 columns to query_history: highlight_prompt, highlight_response, highlight_time_ms
- HistoryService.update_highlights() updates existing row after batch LLM call
- ChunkHighlightService measures timing, captures prompt and structured JSON response
- SSE completed event includes history_id for frontend to pass back
- Frontend captures historyId, passes as ?history_id= query param in batch POST
- Highlight time tracked separately (excluded from total_time_ms)
- All 153 tests pass (108 backend + 45 frontend)
2026-04-29 11:18:21 +08:00
Woody a56f8f69e2 feat: add highlight batch and GET endpoints (Phase 5.4.5)
- POST /api/v1/v2/highlights/batch: compute and cache highlights for cited chunks
- GET /api/v1/v2/highlights: serve cached highlighted HTML pages
- chunks.py router registered in main.py
- Dynamic DB path computation (prompts.db -> highlights.db), no Settings changes
- 7 endpoint tests: POST 200/422, GET 200/404, mock service verification
2026-04-29 09:26:50 +08:00
Woody c6d4a38013 feat: add LLM-based batch highlight service and HTML rendering (Phase 5.4.4)
- ChunkHighlightService.compute_highlights_batch(): single LLM call across
  all cited chunks, grouped by sub-question, with structured output
- render_highlight_html(): self-contained HTML page with yellow-highlighted
  relevant sentences, LLM reason annotations, and View Original PDF footer
- Per-target error isolation, ChromaDB miss handling, graceful degradation
- 14 tests: 7 batch service + 7 HTML rendering
2026-04-29 09:26:33 +08:00
Woody bdbc8ea1a0 feat: add SQLite highlight cache service (Phase 5.4.3)
- highlight_cache.py: HighlightCache class with get/set_highlight and
  compute_cache_key (sha256 hash of document_id|chunk_index|sub_question)
- INSERT OR REPLACE semantics, idempotent table creation
- 13 tests covering round-trip, overwrite, missing keys, determinism
2026-04-29 09:26:20 +08:00
Woody b11d31e2d1 feat: add sentence splitter and highlight data models (Phase 5.4.1-5.4.2)
- sentence_splitter.py: regex-based sentence splitting for English + Chinese punctuation
- highlight.py: 6 Pydantic models (ChunkHighlightTarget, HighlightBatchRequest,
  RelevantSentence, ChunkHighlights, HighlightBatchResult, HighlightBatchResponse)
- 43 tests: 13 sentence splitter + 30 model validation
2026-04-29 09:26:06 +08:00
Woody 25b26c9b48 feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3)
DOCX and TXT ingestion now produces chunk_file_path + per-chunk PDF files matching the PDF ingestion flow. Uses reportlab to render chunk text as simple PDFs with automatic text wrapping.

- Add reportlab==4.2.5 to requirements.txt
- New utils/text_to_pdf.py: generate_text_pdf() renders chunk text as PDF
- Ingest router DOCX/TXT branches: generate chunk_N.pdf per chunk, store in chunk_file_paths
- Graceful degradation: chunk_file_path stays None if PDF generation fails
- Update test_phase1_ingest_page_aware.py assertions: DOCX chunks now HAVE chunk_file_path
- New test_phase5_docx_pdf_generation.py: 5 tests (DOCX PDF gen, TXT PDF gen, PDF regression, file count, graceful degradation)
- 361 backend tests pass (4 pre-existing embedding failures unrelated)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-28 17:32:22 +08:00
Woody f2115ae563 feat: structured LLM output for decompose + citation fuzzy matching (Phase 5)
Phase 5.1 — Structured LLM output for query decomposition:
- Add SubQuestions Pydantic model with sub_question, keywords, rationale
- Add LLMClient.complete_structured() using langchain with_structured_output
- Update QueryDecomposer with structured output path + legacy json.loads fallback
- Update SQLite seed templates: add subq+citation labeling requirement
- Add tests: structured output, subquestions model validation, logging

Phase 5.2 — Citation format alignment and fallback links:
- Add document_id to SourceMetadata (backend + frontend types)
- Rewrite citationParser.ts with fuzzy matching and fallback document links
- Add RAGDatabasePage auto-expand from ?document= URL param
- Tighten generate_per_subq seed prompt: 'Copy exact bracket labels shown'
- Add citation parser tests for fuzzy match and fallback link scenarios
- Defer: DOCX/TXT PDF generation → Phase 5.3 (fallback links sufficient)
2026-04-28 15:39:17 +08:00
Woody 23796d6a0c feat(prompts): add JSON export/import for profile prompt configurations 2026-04-27 19:44:35 +08:00
Woody a7a22f1494 fix(relevance): tolerate LLM score count mismatches via padding instead of discarding
The per-sub-question filter was all-or-nothing: if the LLM returned
9 scores for 10 chunks (common with qwen3.5-35b), every chunk was
discarded and the user got 'no relevant information found'.

Now: fewer scores → pad with 0.0; more scores → truncate. Changed
from error→warning since this is recoverable.

Also improve LTT page UI: sources collapsed by default in per-sub-q
sections, and the 'Your question' text now shows the full question
instead of being truncated.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-27 14:31:18 +08:00
Woody 2656f9ca08 refactor(test): rewrite tests to comply with integration-first rules
Replace mocked DB/internal-services with real ChromaDB/SQLite via tmp_path.
Only mock truly external APIs (LLM, embedding for deterministic vectors).

13 test files rewritten (314 pass, 0 fail):
- Route tests: use TestClient + real ChromaDB, seed test data
- Service tests: use real PersistentClient/SQLite instances
- Pipeline tests: TestClient hits SSE /query endpoint, verify history
- Converted unittest.TestCase to pytest where applicable

Plus: fix metadata.py to filter None values from ChromaDB metadata
(pre-existing bug caught by real-DB ingestion tests)
2026-04-27 11:46:58 +08:00
Woody 3b868a0133 feat(prompts): integrate filter_per_subq with PromptService, fix seed bugs, restructure UI
Break the hardcoded per-sub-q filter prompt into 3 editable PromptService templates (filter_intro, filter_section, filter_outro) with placeholders for the for-loop iteration pattern. Refactor RelevanceFilter._build_per_subq_prompt() to compose them at runtime, falling back to built-in defaults when PromptService is unavailable.

Fix two latent bugs from Package 4:
- generate_per_subq was called by rag.py but never added to _VALID_STEPS or DB seed (would ValueError at runtime)
- _SEED_GENERATE placeholder mismatch: flat generate_response() expects {question}/{context} but Package 4 changed it to {context_sections}. Restored flat template; generate_per_subq now holds {context_sections}.

Add database backfill migration in seed_default_profiles() to INSERT OR IGNORE missing steps into existing profile rows, ensuring all 7 steps exist on restart.

Restructure System Prompts UI: remove unused flat filter/generate steps, replace with Step 2.1-2.3 (filter_intro/section/outro) and Step 3 (generate_per_subq). Update PlaceholderDocs with {context_sections}, {subq_idx}, {subq_question}.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-27 11:14:27 +08:00
Woody 3f50f81bfe test(backend): extend existing tests for per-sub-q methods and templates
Add 6 tests for retrieve_per_subquestion and generate_response_per_subquestion to Phase 1 rag service tests. Add 4 tests for filter_per_subquestion to Phase 1 relevance filter tests. Add 2 tests for new {context_sections} generate template to Phase 3 prompt injection tests. Add TestPerSubQPipelineHistory class with 3 per-sub-q pipeline simulation tests to Phase 3 integration tests. Add generate_per_subq template seed to conftest mock_prompt_service fixture.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-26 23:29:27 +08:00
Woody 201bddecf0 test(backend): add Phase 4 integration and acceptance tests
5 integration tests simulating full per-sub-question pipeline with mocked services covering 2-sub-q, empty decomposition fallback, single sub-q, all-filtered, and partial retrieval. 2 acceptance tests (manual run) for real LLM verification of per-sub-question organized answers with grouped sources and ## Sub-question headers.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-26 23:29:09 +08:00
Woody dd98fa0b65 test(backend): add Phase 4 unit tests for generate, format, history, prompts
9 tests for generate_response_per_subquestion() and answer format validation covering multi-sub-q, empty, prompt construction, and markdown format. 8 tests for new history XML/JSON formats (sources as list-of-lists, <sub_q> wrappers in XML) and new {context_sections} prompt template.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-26 23:28:58 +08:00
Woody ab6ec28de6 test(backend): add Phase 4 unit tests for retrieval and filtering
10 tests for retrieve_per_subquestion() covering multi-sub-q, empty, single, call counting, n_results passthrough, and empty results. 14 tests for filter_per_subquestion() covering basic filtering, threshold behavior, JSON parsing edge cases, markdown extraction, LLM exceptions, and format helpers.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-26 23:28:45 +08:00
Woody 0ecae11bf8 feat(db): update history schema and generate prompt template for Package 4
Add chunks_retrieved_per_subq_count and chunks_filtered_per_subq_count columns to query_history table with safe ALTER TABLE migration. Replace generate template {question}/{context} placeholders with {context_sections} for per-sub-question organized context sections. Update Phase 3 test assertions to match new template and schema shapes.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-26 23:28:28 +08:00
Woody 475306f2b1 feat(history): Phase 3.5 — Query History backend (service, API, timing, XML capture) 2026-04-25 22:59:53 +08:00
Woody e49a68b0bd feat(prompts): Phase 3.2 — Prompt Backend (CRUD service, REST API, 33 tests)
- PromptService (services/prompt_service.py): full CRUD for 3 profiles A/B/C
  with seed template reset, validation, and sqlite3.Row access
- REST API (routers/prompts.py): 6 endpoints on /api/v1/prompts
- Pydantic models (models/prompts.py): 6 schemas
- DI wiring (dependencies.py): get_prompt_service()
- App registration (main.py): prompts router
- Mock fixture (conftest.py): mock_prompt_service
- Tests: test_phase3_prompt_service.py (22) + test_phase3_prompts_router.py (11)
- 162/166 total pass, 4 skipped, 0 fail
2026-04-25 21:11:17 +08:00
Woody f4b404f27d feat(db): Phase 3.1 — SQLite infrastructure (prompts.db + history.db)
- Add sqlite_db.py with dual-DB connection factories (WAL mode, foreign keys)
- init_prompts_db() creates system_prompt_profiles + system_prompts tables
- init_history_db() creates query_history table + created_at index
- seed_default_profiles() inserts 3 profiles (A/B/C) x 3 steps each
- All 3 profiles start with identical seed templates; Profile A active
- Add prompts_db_path + history_db_path to config (./data/ default)
- Startup init in main.py creates data/ dir, inits both DBs, seeds profiles
- Add PROMPTS_DB_PATH + HISTORY_DB_PATH to .env.example
- Add data/ to .gitignore
- 17 new tests in test_phase3_sqlite_db.py (all passing)
2026-04-25 20:29:29 +08:00
Woody 51640201f3 test(backend): update query tests for sub-question generation (sub-phase 2.3)
Update prompt assertion in decomposer test and field assertions in query endpoint tests to match extracted_questions rename.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 16:24:10 +08:00
Woody d49756f374 feat: add chunk PDF serving endpoint and frontend clickable source links (1.5.6)
- Add page_number and chunk_file_path to SourceMetadata model and query router
- Add GET /chunks/{file_path}/pdf endpoint with path traversal protection
- Add View PDF links in ResponsePanel source cards and ChunkList component
- Update TypeScript types and API helper for chunk PDF URLs
- Add backend tests (5) and frontend ChunkList tests (7)
- Update enhancement plan: all 3 features complete
2026-04-24 11:49:39 +08:00
Woody b2dd385443 feat(backend): refactor ingest pipeline for page-aware chunking with PDF generation
PDF uploads now use parse_pdf_by_page() -> chunk_pages() -> extract page PDFs -> enhanced metadata with page_number, chunk_file_path, and document_id. Same-filename replacement deletes old chunks and PDFs before re-ingest. DOCX/TXT keep original flat flow with document_id added. RAGService.ingest_document() accepts optional document_id parameter.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 10:53:17 +08:00
Woody 8c84062996 feat(backend): add PDF page extractor and chunk PDF storage config
New pdf_extractor.py with extract_page_as_pdf() and extract_pages_as_pdf() for extracting individual PDF pages as separate files. Adds document_chunk_path setting to config and document_chunk/ to .gitignore.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 10:52:57 +08:00
Woody b97264c66a feat(backend): add page_number, chunk_file_path, document_id to chunk metadata
Enhance extract_metadata() with three new optional fields for page-aware chunking support. Validates list length mismatches. Fully backward compatible — existing callers unaffected.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 10:30:40 +08:00