Commit Graph

32 Commits

Author SHA1 Message Date
Woody fc6b5463b5 fix: vLLM structured output missing thinking-control extra_body 2026-04-29 21:01:10 +08:00
Woody 16de8394aa fix: add full input/output logging to vLLM structured output path
Log the complete prompt, schema, extra_body content, full API response,
token counts, and full parsed JSON output. Add exc_info=True tracebacks
on all failure paths.
2026-04-29 16:52:26 +08:00
Woody 3ab6fd102a fix: use vLLM-native guided_json for structured output
vLLM servers support JSON schema enforcement via extra_body (guided_json
or structured_outputs), not OpenAI's response_format protocol. LangChain's
with_structured_output(method='json_schema') sends response_format which
vLLM ignores, causing NoneType not iterable parsing errors.

- vLLM path: direct OpenAI SDK call with extra_body={guided_json|structured_outputs}
- OpenRouter path: unchanged with_structured_output(method='json_schema')
- Try new 'structured_outputs' format first, fall back to legacy 'guided_json'
- Update _SEED_DECOMPOSE with explicit JSON array instruction
- Add diagnostic logging: exc_info=True, schema preview, prompt template preview
- Add logging in _parse_legacy_json for fallback failure debugging
2026-04-29 16:49:14 +08:00
Woody 2aca18d30e docs: add vLLM structured output fix plan
- Diagnose: vLLM ignores OpenAI-native response_format, causing NoneType error
- Diagnose: legacy fallback prompt lacks JSON instruction → empty questions
- Plan: use vLLM-native guided_json via extra_body instead of with_structured_output
- Plan: update _SEED_DECOMPOSE with JSON format instruction
- Plan: add diagnostic logging (exc_info, method, schema preview)

wip: temporary function_calling switch for vLLM (to be replaced by guided_json)
2026-04-29 16:42:23 +08:00
Woody cbb958d75d fix: vLLM chat_template_kwargs breaks LangChain structured output
vLLM's chat_template_kwargs leaked into LangChain's AsyncCompletions.parse()
via _get_langchain_model's model_kwargs, causing structured decomposition
to fail on vLLM backends. Skip vLLM-specific params when building the
LangChain model — only provider-agnostic params (OpenAI reasoning) pass through.
2026-04-29 16:07:44 +08:00
Woody 41f59b396f feat: track highlight generation prompt, response, and timing in history (Phase 5.5)
- Add 3 columns to query_history: highlight_prompt, highlight_response, highlight_time_ms
- HistoryService.update_highlights() updates existing row after batch LLM call
- ChunkHighlightService measures timing, captures prompt and structured JSON response
- SSE completed event includes history_id for frontend to pass back
- Frontend captures historyId, passes as ?history_id= query param in batch POST
- Highlight time tracked separately (excluded from total_time_ms)
- All 153 tests pass (108 backend + 45 frontend)
2026-04-29 11:18:21 +08:00
Woody c6d4a38013 feat: add LLM-based batch highlight service and HTML rendering (Phase 5.4.4)
- ChunkHighlightService.compute_highlights_batch(): single LLM call across
  all cited chunks, grouped by sub-question, with structured output
- render_highlight_html(): self-contained HTML page with yellow-highlighted
  relevant sentences, LLM reason annotations, and View Original PDF footer
- Per-target error isolation, ChromaDB miss handling, graceful degradation
- 14 tests: 7 batch service + 7 HTML rendering
2026-04-29 09:26:33 +08:00
Woody bdbc8ea1a0 feat: add SQLite highlight cache service (Phase 5.4.3)
- highlight_cache.py: HighlightCache class with get/set_highlight and
  compute_cache_key (sha256 hash of document_id|chunk_index|sub_question)
- INSERT OR REPLACE semantics, idempotent table creation
- 13 tests covering round-trip, overwrite, missing keys, determinism
2026-04-29 09:26:20 +08:00
Woody 48e15f8232 feat(llm): log structured LLM response and extra_body
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-28 16:50:26 +08:00
Woody 095f013739 feat(llm): pass extra_body via model_kwargs in LangChain
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-28 16:42:49 +08:00
Woody f2115ae563 feat: structured LLM output for decompose + citation fuzzy matching (Phase 5)
Phase 5.1 — Structured LLM output for query decomposition:
- Add SubQuestions Pydantic model with sub_question, keywords, rationale
- Add LLMClient.complete_structured() using langchain with_structured_output
- Update QueryDecomposer with structured output path + legacy json.loads fallback
- Update SQLite seed templates: add subq+citation labeling requirement
- Add tests: structured output, subquestions model validation, logging

Phase 5.2 — Citation format alignment and fallback links:
- Add document_id to SourceMetadata (backend + frontend types)
- Rewrite citationParser.ts with fuzzy matching and fallback document links
- Add RAGDatabasePage auto-expand from ?document= URL param
- Tighten generate_per_subq seed prompt: 'Copy exact bracket labels shown'
- Add citation parser tests for fuzzy match and fallback link scenarios
- Defer: DOCX/TXT PDF generation → Phase 5.3 (fallback links sufficient)
2026-04-28 15:39:17 +08:00
Woody 711be3dfde feat(llm): add VLLM_ENGINE env flag for provider-specific extra_body format 2026-04-28 13:30:27 +08:00
Woody a7a22f1494 fix(relevance): tolerate LLM score count mismatches via padding instead of discarding
The per-sub-question filter was all-or-nothing: if the LLM returned
9 scores for 10 chunks (common with qwen3.5-35b), every chunk was
discarded and the user got 'no relevant information found'.

Now: fewer scores → pad with 0.0; more scores → truncate. Changed
from error→warning since this is recoverable.

Also improve LTT page UI: sources collapsed by default in per-sub-q
sections, and the 'Your question' text now shows the full question
instead of being truncated.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-27 14:31:18 +08:00
Woody 3b868a0133 feat(prompts): integrate filter_per_subq with PromptService, fix seed bugs, restructure UI
Break the hardcoded per-sub-q filter prompt into 3 editable PromptService templates (filter_intro, filter_section, filter_outro) with placeholders for the for-loop iteration pattern. Refactor RelevanceFilter._build_per_subq_prompt() to compose them at runtime, falling back to built-in defaults when PromptService is unavailable.

Fix two latent bugs from Package 4:
- generate_per_subq was called by rag.py but never added to _VALID_STEPS or DB seed (would ValueError at runtime)
- _SEED_GENERATE placeholder mismatch: flat generate_response() expects {question}/{context} but Package 4 changed it to {context_sections}. Restored flat template; generate_per_subq now holds {context_sections}.

Add database backfill migration in seed_default_profiles() to INSERT OR IGNORE missing steps into existing profile rows, ensuring all 7 steps exist on restart.

Restructure System Prompts UI: remove unused flat filter/generate steps, replace with Step 2.1-2.3 (filter_intro/section/outro) and Step 3 (generate_per_subq). Update PlaceholderDocs with {context_sections}, {subq_idx}, {subq_question}.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-27 11:14:27 +08:00
Woody 0ecae11bf8 feat(db): update history schema and generate prompt template for Package 4
Add chunks_retrieved_per_subq_count and chunks_filtered_per_subq_count columns to query_history table with safe ALTER TABLE migration. Replace generate template {question}/{context} placeholders with {context_sections} for per-sub-question organized context sections. Update Phase 3 test assertions to match new template and schema shapes.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-26 23:28:28 +08:00
Woody 57a130dc96 feat(services): add per-sub-question retrieval, filtering, and response generation
Add retrieve_per_subquestion() that queries ChromaDB independently per sub-question instead of joining all sub-qs into one query string. Add filter_per_subquestion() that evaluates each chunk against its own originating sub-question in a single LLM call with a redesigned grouped prompt. Add generate_response_per_subquestion() that produces markdown sections per sub-question with grouped sources and {context_sections} template support. All existing methods preserved for backward compatibility.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-26 23:27:50 +08:00
Woody 475306f2b1 feat(history): Phase 3.5 — Query History backend (service, API, timing, XML capture) 2026-04-25 22:59:53 +08:00
Woody e49a68b0bd feat(prompts): Phase 3.2 — Prompt Backend (CRUD service, REST API, 33 tests)
- PromptService (services/prompt_service.py): full CRUD for 3 profiles A/B/C
  with seed template reset, validation, and sqlite3.Row access
- REST API (routers/prompts.py): 6 endpoints on /api/v1/prompts
- Pydantic models (models/prompts.py): 6 schemas
- DI wiring (dependencies.py): get_prompt_service()
- App registration (main.py): prompts router
- Mock fixture (conftest.py): mock_prompt_service
- Tests: test_phase3_prompt_service.py (22) + test_phase3_prompts_router.py (11)
- 162/166 total pass, 4 skipped, 0 fail
2026-04-25 21:11:17 +08:00
Woody e78b670baa feat(backend): use [filename, page N] citation labels in RAG context (sub-phase 2.6)
Replace numeric [1] labels with [filename, page N] format in context chunks.
Update LLM prompt to instruct inline citation using bracket labels.
Enables traceable source references in generated answers.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 17:52:54 +08:00
Woody f9dda7bd18 feat(backend): rename keywords to extracted_questions in query pipeline (sub-phase 2.3)
Change QueryDecomposer prompt to generate 2-5 sub-questions instead of keywords. Rename API field from keywords to extracted_questions across models, service, and router.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 16:23:53 +08:00
Woody b2dd385443 feat(backend): refactor ingest pipeline for page-aware chunking with PDF generation
PDF uploads now use parse_pdf_by_page() -> chunk_pages() -> extract page PDFs -> enhanced metadata with page_number, chunk_file_path, and document_id. Same-filename replacement deletes old chunks and PDFs before re-ingest. DOCX/TXT keep original flat flow with document_id added. RAGService.ingest_document() accepts optional document_id parameter.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 10:53:17 +08:00
Woody 178461915a feat(backend): add documents CRUD service methods and Pydantic schemas
Add list_documents(), list_chunks(), delete_document(), delete_chunk() to RAGService for ChromaDB document management. New schemas: DocumentInfo, ChunkInfo, DocumentListResponse, DeleteResponse.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 19:02:07 +08:00
Woody 52c09b86cb feat(frontend): add nav bar with routing, markdown rendering, and enhancement plan
- Add react-router-dom with NavBar component (LTT + RAG Database tabs)
- Extract AppContent into LTTPage, add RAGDatabasePage placeholder
- Refactor App.tsx to BrowserRouter + Routes layout
- Switch ResponsePanel to react-markdown for rich formatting
- Fix ResponsePanel test for markdown rendering
- Update RAG prompt to cite source name instead of number
- Save Phase 1 enhancement plan (.plans/phase1_enhancement_plan.md)
2026-04-23 18:37:30 +08:00
Woody 029a0e490f debug(backend): add LLM request/response logging for OpenRouter debugging
- Log extra_body contents before sending to LLM

- Log full LLM response object for debugging

- Changed extra_body format to OpenRouter reasoning format

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus \u003cclio-agent@sisyphuslabs.ai\u003e
2026-04-23 16:28:43 +08:00
Woody 33b960f786 fix(backend): extract JSON from markdown code blocks in LLM responses
The LLM (Qwen3.5 via OpenRouter) returns JSON wrapped in markdown code blocks:

```json

["project manager", "limits", ...]

```

But the code was trying to parse this directly with json.loads(), causing:

- QueryDecomposer to return empty keywords

- RelevanceFilter to fail with "Expecting value: line 1 column 1"

Changes:

- Added _extract_json_from_markdown() helper function to both modules

- Strips markdown code block markers (```json and ```) before JSON parsing

- Added unit tests for markdown code block handling

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus \u003cclio-agent@sisyphuslabs.ai\u003e
2026-04-23 16:28:07 +08:00
Woody f5cfe44183 feat(backend): add LLM monitoring with step names, timing, and prompt logging
- LLMClient.complete() now accepts step_name parameter to identify processing step

- Logs prompt preview (first 100 + last 100 chars) at INFO level

- Logs processing time in milliseconds with token usage stats

- Updated QueryDecomposer, RelevanceFilter, and RAGService to pass step names

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 14:51:57 +08:00
Woody 74cb8b83d5 feat(backend): migrate LLM client to OpenAI SDK with thinking control
- Replace httpx with openai.AsyncOpenAI

- Add llm_enable_thinking config (default False)

- Add _build_extra_body() for Qwen3.5 thinking mode control

- Use chat_template_kwargs for vLLM/SGLang compatibility

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 14:10:26 +08:00
Woody f4d78b0b77 refactor(backend): update query decomposer, relevance filter, and RAG service
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 13:26:56 +08:00
Woody 38f4c70762 feat(backend): add embedding client and update LLM client
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 13:26:43 +08:00
Woody 181f4eca5b feat: Phase 1.3 query pipeline with decomposition, relevance filter, and response
- Add QueryDecomposer: extracts keywords from question via LLM JSON response
- Add RelevanceFilter: batch scores chunks 0-10, filters by threshold
- Add POST /api/v1/query endpoint with full 3-step pipeline:
  1. QueryDecomposer.decompose() → keywords
  2. RAGService.retrieve() → chunks from ChromaDB
  3. RelevanceFilter.filter() → score and filter chunks
  4. RAGService.generate_response() → bullet-point answer
- Fix SourceMetadata.upload_date type from datetime to str for flexibility
- Test-first: 13 new tests pass (5 decomposer, 5 relevance filter, 3 query endpoint)
- All Phase 1 tests: 41 passed, 2 skipped
2026-04-22 17:19:21 +08:00
Woody d94abaac77 feat: Phase 1.2 ingestion pipeline with chunking and metadata
- Add document parsers (DOCX, PDF) with lazy imports
- Add TokenChunkingStrategy with ABC for future replacement
- Add metadata extraction (filename, upload_date, content_summary)
- Add RAGService for ChromaDB ingestion/retrieval/response generation
- Add POST /api/v1/ingest endpoint with file validation
- Test-first: 20 passed, 2 skipped (python-docx not installed)
2026-04-22 16:49:52 +08:00
Woody 3712397d64 feat: Phase 1.1 project setup with config, database, and models
- Add requirements.txt with all dependencies
- Add .env.example with required environment variables
- Add Pydantic Settings (config.py) with .env loading
- Add ChromaDB persistent client (database.py)
- Add Pydantic schemas (ingest.py) for request/response
- Add FastAPI main.py with CORS middleware
- Add package __init__.py files
- Add tests: test_phase1_config.py, test_phase1_database.py
- All 5 tests pass
2026-04-22 16:13:52 +08:00