legco_ai_assistant

Commit Graph

Author	SHA1	Message	Date
Woody	41f59b396f	feat: track highlight generation prompt, response, and timing in history (Phase 5.5) - Add 3 columns to query_history: highlight_prompt, highlight_response, highlight_time_ms - HistoryService.update_highlights() updates existing row after batch LLM call - ChunkHighlightService measures timing, captures prompt and structured JSON response - SSE completed event includes history_id for frontend to pass back - Frontend captures historyId, passes as ?history_id= query param in batch POST - Highlight time tracked separately (excluded from total_time_ms) - All 153 tests pass (108 backend + 45 frontend)	2026-04-29 11:18:21 +08:00
Woody	f2115ae563	feat: structured LLM output for decompose + citation fuzzy matching (Phase 5) Phase 5.1 — Structured LLM output for query decomposition: - Add SubQuestions Pydantic model with sub_question, keywords, rationale - Add LLMClient.complete_structured() using langchain with_structured_output - Update QueryDecomposer with structured output path + legacy json.loads fallback - Update SQLite seed templates: add subq+citation labeling requirement - Add tests: structured output, subquestions model validation, logging Phase 5.2 — Citation format alignment and fallback links: - Add document_id to SourceMetadata (backend + frontend types) - Rewrite citationParser.ts with fuzzy matching and fallback document links - Add RAGDatabasePage auto-expand from ?document= URL param - Tighten generate_per_subq seed prompt: 'Copy exact bracket labels shown' - Add citation parser tests for fuzzy match and fallback link scenarios - Defer: DOCX/TXT PDF generation → Phase 5.3 (fallback links sufficient)	2026-04-28 15:39:17 +08:00
Woody	666b603639	feat(query): refactor pipeline for per-sub-question flow with progressive SSE Restructure _query_stream() to use per-sub-question retrieval, filtering, and generation. Add generative_subquestion SSE events for progressive frontend rendering. Add format_chunks_retrieved_per_subq() and format_chunks_filtered_per_subq() with <sub_q> XML wrappers. Add empty decomposition fallback using original question as single sub-q. Update history recording for grouped sources JSON (list-of-lists format). Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-26 23:28:06 +08:00
Woody	475306f2b1	feat(history): Phase 3.5 — Query History backend (service, API, timing, XML capture)	2026-04-25 22:59:53 +08:00
Woody	3b741c1844	feat(query): stream extracted questions immediately via SSE Convert /query endpoint from synchronous JSON to Server-Sent Events (SSE) streaming. The frontend now receives extracted_questions as soon as the first LLM call completes, without waiting for retrieval, filtering, and answer generation. Backend: - Add StreamingQueryEvent union type (Decomposed, Retrieving, Filtering, Generating, Completed, Error) - Convert /query to return StreamingResponse with SSE format - Yield events after each pipeline phase Frontend: - Add queryDocumentStream() using fetch + ReadableStream - Add useQueryDocumentStream() hook with phase-aware state - Update LTTPage to use streaming instead of mutation - Update ResponsePanel to show phase messages (Searching documents..., Filtering passages..., Generating answer...) - Update ExtractedQuestionsDisplay to accept null Tests: - Update query_flow e2e test to mock queryDocumentStream - 84/85 tests pass (1 pre-existing failure from removed file-input)	2026-04-25 18:29:22 +08:00
Woody	f9dda7bd18	feat(backend): rename keywords to extracted_questions in query pipeline (sub-phase 2.3) Change QueryDecomposer prompt to generate 2-5 sub-questions instead of keywords. Rename API field from keywords to extracted_questions across models, service, and router. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 16:23:53 +08:00
Woody	d49756f374	feat: add chunk PDF serving endpoint and frontend clickable source links (1.5.6) - Add page_number and chunk_file_path to SourceMetadata model and query router - Add GET /chunks/{file_path}/pdf endpoint with path traversal protection - Add View PDF links in ResponsePanel source cards and ChunkList component - Update TypeScript types and API helper for chunk PDF URLs - Add backend tests (5) and frontend ChunkList tests (7) - Update enhancement plan: all 3 features complete	2026-04-24 11:49:39 +08:00
Woody	4a22b906e4	refactor(backend): update ingest and query routers Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-23 13:26:32 +08:00
Woody	7493b3aaf6	feat: Phase 1.4 acceptance tests, error handling, and polish - Implement acceptance tests for ingest (real ChromaDB) and query (real LLM) - Full 3-step RAG pipeline verified: decompose → retrieve → filter → generate - Add logging to ingest and query routers - Improve error handling: empty doc detection, proper HTTPException re-raising - Add .txt file support to ingest endpoint - Fix query router: strip distance from retrieve tuples before relevance filter - Update plan: Phase 1 backend complete (all acceptance criteria met) - Tests: 41 unit passed, 5 acceptance passed (real OpenRouter calls)	2026-04-22 17:45:50 +08:00
Woody	181f4eca5b	feat: Phase 1.3 query pipeline with decomposition, relevance filter, and response - Add QueryDecomposer: extracts keywords from question via LLM JSON response - Add RelevanceFilter: batch scores chunks 0-10, filters by threshold - Add POST /api/v1/query endpoint with full 3-step pipeline: 1. QueryDecomposer.decompose() → keywords 2. RAGService.retrieve() → chunks from ChromaDB 3. RelevanceFilter.filter() → score and filter chunks 4. RAGService.generate_response() → bullet-point answer - Fix SourceMetadata.upload_date type from datetime to str for flexibility - Test-first: 13 new tests pass (5 decomposer, 5 relevance filter, 3 query endpoint) - All Phase 1 tests: 41 passed, 2 skipped	2026-04-22 17:19:21 +08:00

10 Commits