legco_ai_assistant

Commit Graph

Author	SHA1	Message	Date
Woody	3ab6fd102a	fix: use vLLM-native guided_json for structured output vLLM servers support JSON schema enforcement via extra_body (guided_json or structured_outputs), not OpenAI's response_format protocol. LangChain's with_structured_output(method='json_schema') sends response_format which vLLM ignores, causing NoneType not iterable parsing errors. - vLLM path: direct OpenAI SDK call with extra_body={guided_json\|structured_outputs} - OpenRouter path: unchanged with_structured_output(method='json_schema') - Try new 'structured_outputs' format first, fall back to legacy 'guided_json' - Update _SEED_DECOMPOSE with explicit JSON array instruction - Add diagnostic logging: exc_info=True, schema preview, prompt template preview - Add logging in _parse_legacy_json for fallback failure debugging	2026-04-29 16:49:14 +08:00
Woody	2aca18d30e	docs: add vLLM structured output fix plan - Diagnose: vLLM ignores OpenAI-native response_format, causing NoneType error - Diagnose: legacy fallback prompt lacks JSON instruction → empty questions - Plan: use vLLM-native guided_json via extra_body instead of with_structured_output - Plan: update _SEED_DECOMPOSE with JSON format instruction - Plan: add diagnostic logging (exc_info, method, schema preview) wip: temporary function_calling switch for vLLM (to be replaced by guided_json)	2026-04-29 16:42:23 +08:00
Woody	cbb958d75d	fix: vLLM chat_template_kwargs breaks LangChain structured output vLLM's chat_template_kwargs leaked into LangChain's AsyncCompletions.parse() via _get_langchain_model's model_kwargs, causing structured decomposition to fail on vLLM backends. Skip vLLM-specific params when building the LangChain model — only provider-agnostic params (OpenAI reasoning) pass through.	2026-04-29 16:07:44 +08:00
Woody	41f59b396f	feat: track highlight generation prompt, response, and timing in history (Phase 5.5) - Add 3 columns to query_history: highlight_prompt, highlight_response, highlight_time_ms - HistoryService.update_highlights() updates existing row after batch LLM call - ChunkHighlightService measures timing, captures prompt and structured JSON response - SSE completed event includes history_id for frontend to pass back - Frontend captures historyId, passes as ?history_id= query param in batch POST - Highlight time tracked separately (excluded from total_time_ms) - All 153 tests pass (108 backend + 45 frontend)	2026-04-29 11:18:21 +08:00
Woody	a56f8f69e2	feat: add highlight batch and GET endpoints (Phase 5.4.5) - POST /api/v1/v2/highlights/batch: compute and cache highlights for cited chunks - GET /api/v1/v2/highlights: serve cached highlighted HTML pages - chunks.py router registered in main.py - Dynamic DB path computation (prompts.db -> highlights.db), no Settings changes - 7 endpoint tests: POST 200/422, GET 200/404, mock service verification	2026-04-29 09:26:50 +08:00
Woody	c6d4a38013	feat: add LLM-based batch highlight service and HTML rendering (Phase 5.4.4) - ChunkHighlightService.compute_highlights_batch(): single LLM call across all cited chunks, grouped by sub-question, with structured output - render_highlight_html(): self-contained HTML page with yellow-highlighted relevant sentences, LLM reason annotations, and View Original PDF footer - Per-target error isolation, ChromaDB miss handling, graceful degradation - 14 tests: 7 batch service + 7 HTML rendering	2026-04-29 09:26:33 +08:00
Woody	bdbc8ea1a0	feat: add SQLite highlight cache service (Phase 5.4.3) - highlight_cache.py: HighlightCache class with get/set_highlight and compute_cache_key (sha256 hash of document_id\|chunk_index\|sub_question) - INSERT OR REPLACE semantics, idempotent table creation - 13 tests covering round-trip, overwrite, missing keys, determinism	2026-04-29 09:26:20 +08:00
Woody	b11d31e2d1	feat: add sentence splitter and highlight data models (Phase 5.4.1-5.4.2) - sentence_splitter.py: regex-based sentence splitting for English + Chinese punctuation - highlight.py: 6 Pydantic models (ChunkHighlightTarget, HighlightBatchRequest, RelevantSentence, ChunkHighlights, HighlightBatchResult, HighlightBatchResponse) - 43 tests: 13 sentence splitter + 30 model validation	2026-04-29 09:26:06 +08:00
Woody	25b26c9b48	feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3) DOCX and TXT ingestion now produces chunk_file_path + per-chunk PDF files matching the PDF ingestion flow. Uses reportlab to render chunk text as simple PDFs with automatic text wrapping. - Add reportlab==4.2.5 to requirements.txt - New utils/text_to_pdf.py: generate_text_pdf() renders chunk text as PDF - Ingest router DOCX/TXT branches: generate chunk_N.pdf per chunk, store in chunk_file_paths - Graceful degradation: chunk_file_path stays None if PDF generation fails - Update test_phase1_ingest_page_aware.py assertions: DOCX chunks now HAVE chunk_file_path - New test_phase5_docx_pdf_generation.py: 5 tests (DOCX PDF gen, TXT PDF gen, PDF regression, file count, graceful degradation) - 361 backend tests pass (4 pre-existing embedding failures unrelated) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-28 17:32:22 +08:00
Woody	48e15f8232	feat(llm): log structured LLM response and extra_body Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-28 16:50:26 +08:00
Woody	4c56e81872	feat(prompts): enforce bullet-point output in generate template Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-28 16:42:55 +08:00
Woody	095f013739	feat(llm): pass extra_body via model_kwargs in LangChain Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-28 16:42:49 +08:00
Woody	136c25ae38	feat: rewrite DOCX parser with table extraction Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-28 16:42:41 +08:00
Woody	f2115ae563	feat: structured LLM output for decompose + citation fuzzy matching (Phase 5) Phase 5.1 — Structured LLM output for query decomposition: - Add SubQuestions Pydantic model with sub_question, keywords, rationale - Add LLMClient.complete_structured() using langchain with_structured_output - Update QueryDecomposer with structured output path + legacy json.loads fallback - Update SQLite seed templates: add subq+citation labeling requirement - Add tests: structured output, subquestions model validation, logging Phase 5.2 — Citation format alignment and fallback links: - Add document_id to SourceMetadata (backend + frontend types) - Rewrite citationParser.ts with fuzzy matching and fallback document links - Add RAGDatabasePage auto-expand from ?document= URL param - Tighten generate_per_subq seed prompt: 'Copy exact bracket labels shown' - Add citation parser tests for fuzzy match and fallback link scenarios - Defer: DOCX/TXT PDF generation → Phase 5.3 (fallback links sufficient)	2026-04-28 15:39:17 +08:00
Woody	711be3dfde	feat(llm): add VLLM_ENGINE env flag for provider-specific extra_body format	2026-04-28 13:30:27 +08:00
Woody	23796d6a0c	feat(prompts): add JSON export/import for profile prompt configurations	2026-04-27 19:44:35 +08:00
Woody	4ad9deeccb	feat(deploy): add Dockerfile, compose, nginx config, and README Multi-stage Dockerfile: Node builds frontend, Python serves both API and static files. docker-compose.yml with named volumes for ChromaDB, chunks, and SQLite data. nginx.conf as reverse proxy with 350M upload limit and 300s LLM proxy timeout. README with dev setup, deploy steps, env vars table, and architecture diagram. Backend main.py: add catch-all route to serve frontend/dist/static files in production. Only activates when dist/ exists. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-27 17:17:53 +08:00
Woody	d444c99c23	feat(config): log resolved llm and embedding model names on startup Add INFO log in get_settings() to print the actual model names after merging .env and class defaults. Confirms pydantic-settings priority: env values override class defaults as expected. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-27 15:11:36 +08:00
Woody	a7a22f1494	fix(relevance): tolerate LLM score count mismatches via padding instead of discarding The per-sub-question filter was all-or-nothing: if the LLM returned 9 scores for 10 chunks (common with qwen3.5-35b), every chunk was discarded and the user got 'no relevant information found'. Now: fewer scores → pad with 0.0; more scores → truncate. Changed from error→warning since this is recoverable. Also improve LTT page UI: sources collapsed by default in per-sub-q sections, and the 'Your question' text now shows the full question instead of being truncated. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-27 14:31:18 +08:00
Woody	2656f9ca08	refactor(test): rewrite tests to comply with integration-first rules Replace mocked DB/internal-services with real ChromaDB/SQLite via tmp_path. Only mock truly external APIs (LLM, embedding for deterministic vectors). 13 test files rewritten (314 pass, 0 fail): - Route tests: use TestClient + real ChromaDB, seed test data - Service tests: use real PersistentClient/SQLite instances - Pipeline tests: TestClient hits SSE /query endpoint, verify history - Converted unittest.TestCase to pytest where applicable Plus: fix metadata.py to filter None values from ChromaDB metadata (pre-existing bug caught by real-DB ingestion tests)	2026-04-27 11:46:58 +08:00
Woody	3b868a0133	feat(prompts): integrate filter_per_subq with PromptService, fix seed bugs, restructure UI Break the hardcoded per-sub-q filter prompt into 3 editable PromptService templates (filter_intro, filter_section, filter_outro) with placeholders for the for-loop iteration pattern. Refactor RelevanceFilter._build_per_subq_prompt() to compose them at runtime, falling back to built-in defaults when PromptService is unavailable. Fix two latent bugs from Package 4: - generate_per_subq was called by rag.py but never added to _VALID_STEPS or DB seed (would ValueError at runtime) - _SEED_GENERATE placeholder mismatch: flat generate_response() expects {question}/{context} but Package 4 changed it to {context_sections}. Restored flat template; generate_per_subq now holds {context_sections}. Add database backfill migration in seed_default_profiles() to INSERT OR IGNORE missing steps into existing profile rows, ensuring all 7 steps exist on restart. Restructure System Prompts UI: remove unused flat filter/generate steps, replace with Step 2.1-2.3 (filter_intro/section/outro) and Step 3 (generate_per_subq). Update PlaceholderDocs with {context_sections}, {subq_idx}, {subq_question}. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-27 11:14:27 +08:00
Woody	3f50f81bfe	test(backend): extend existing tests for per-sub-q methods and templates Add 6 tests for retrieve_per_subquestion and generate_response_per_subquestion to Phase 1 rag service tests. Add 4 tests for filter_per_subquestion to Phase 1 relevance filter tests. Add 2 tests for new {context_sections} generate template to Phase 3 prompt injection tests. Add TestPerSubQPipelineHistory class with 3 per-sub-q pipeline simulation tests to Phase 3 integration tests. Add generate_per_subq template seed to conftest mock_prompt_service fixture. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-26 23:29:27 +08:00
Woody	201bddecf0	test(backend): add Phase 4 integration and acceptance tests 5 integration tests simulating full per-sub-question pipeline with mocked services covering 2-sub-q, empty decomposition fallback, single sub-q, all-filtered, and partial retrieval. 2 acceptance tests (manual run) for real LLM verification of per-sub-question organized answers with grouped sources and ## Sub-question headers. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-26 23:29:09 +08:00
Woody	dd98fa0b65	test(backend): add Phase 4 unit tests for generate, format, history, prompts 9 tests for generate_response_per_subquestion() and answer format validation covering multi-sub-q, empty, prompt construction, and markdown format. 8 tests for new history XML/JSON formats (sources as list-of-lists, <sub_q> wrappers in XML) and new {context_sections} prompt template. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-26 23:28:58 +08:00
Woody	ab6ec28de6	test(backend): add Phase 4 unit tests for retrieval and filtering 10 tests for retrieve_per_subquestion() covering multi-sub-q, empty, single, call counting, n_results passthrough, and empty results. 14 tests for filter_per_subquestion() covering basic filtering, threshold behavior, JSON parsing edge cases, markdown extraction, LLM exceptions, and format helpers. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-26 23:28:45 +08:00
Woody	0ecae11bf8	feat(db): update history schema and generate prompt template for Package 4 Add chunks_retrieved_per_subq_count and chunks_filtered_per_subq_count columns to query_history table with safe ALTER TABLE migration. Replace generate template {question}/{context} placeholders with {context_sections} for per-sub-question organized context sections. Update Phase 3 test assertions to match new template and schema shapes. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-26 23:28:28 +08:00
Woody	40393d81f8	feat(models): add SubQuestionSources model and per-sub-q history fields Add SubQuestionSources, SubQuestionResult, GeneratingSubquestionEvent Pydantic models for the new per-sub-question response format. Add chunks_retrieved_per_subq_count and chunks_filtered_per_subq_count optional fields to QueryHistoryRecord and QueryHistoryDetail for per-sub-question chunk count tracking. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-26 23:28:19 +08:00
Woody	666b603639	feat(query): refactor pipeline for per-sub-question flow with progressive SSE Restructure _query_stream() to use per-sub-question retrieval, filtering, and generation. Add generative_subquestion SSE events for progressive frontend rendering. Add format_chunks_retrieved_per_subq() and format_chunks_filtered_per_subq() with <sub_q> XML wrappers. Add empty decomposition fallback using original question as single sub-q. Update history recording for grouped sources JSON (list-of-lists format). Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-26 23:28:06 +08:00
Woody	57a130dc96	feat(services): add per-sub-question retrieval, filtering, and response generation Add retrieve_per_subquestion() that queries ChromaDB independently per sub-question instead of joining all sub-qs into one query string. Add filter_per_subquestion() that evaluates each chunk against its own originating sub-question in a single LLM call with a redesigned grouped prompt. Add generate_response_per_subquestion() that produces markdown sections per sub-question with grouped sources and {context_sections} template support. All existing methods preserved for backward compatibility. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-26 23:27:50 +08:00
Woody	475306f2b1	feat(history): Phase 3.5 — Query History backend (service, API, timing, XML capture)	2026-04-25 22:59:53 +08:00
Woody	e49a68b0bd	feat(prompts): Phase 3.2 — Prompt Backend (CRUD service, REST API, 33 tests) - PromptService (services/prompt_service.py): full CRUD for 3 profiles A/B/C with seed template reset, validation, and sqlite3.Row access - REST API (routers/prompts.py): 6 endpoints on /api/v1/prompts - Pydantic models (models/prompts.py): 6 schemas - DI wiring (dependencies.py): get_prompt_service() - App registration (main.py): prompts router - Mock fixture (conftest.py): mock_prompt_service - Tests: test_phase3_prompt_service.py (22) + test_phase3_prompts_router.py (11) - 162/166 total pass, 4 skipped, 0 fail	2026-04-25 21:11:17 +08:00
Woody	f4b404f27d	feat(db): Phase 3.1 — SQLite infrastructure (prompts.db + history.db) - Add sqlite_db.py with dual-DB connection factories (WAL mode, foreign keys) - init_prompts_db() creates system_prompt_profiles + system_prompts tables - init_history_db() creates query_history table + created_at index - seed_default_profiles() inserts 3 profiles (A/B/C) x 3 steps each - All 3 profiles start with identical seed templates; Profile A active - Add prompts_db_path + history_db_path to config (./data/ default) - Startup init in main.py creates data/ dir, inits both DBs, seeds profiles - Add PROMPTS_DB_PATH + HISTORY_DB_PATH to .env.example - Add data/ to .gitignore - 17 new tests in test_phase3_sqlite_db.py (all passing)	2026-04-25 20:29:29 +08:00
Woody	3b741c1844	feat(query): stream extracted questions immediately via SSE Convert /query endpoint from synchronous JSON to Server-Sent Events (SSE) streaming. The frontend now receives extracted_questions as soon as the first LLM call completes, without waiting for retrieval, filtering, and answer generation. Backend: - Add StreamingQueryEvent union type (Decomposed, Retrieving, Filtering, Generating, Completed, Error) - Convert /query to return StreamingResponse with SSE format - Yield events after each pipeline phase Frontend: - Add queryDocumentStream() using fetch + ReadableStream - Add useQueryDocumentStream() hook with phase-aware state - Update LTTPage to use streaming instead of mutation - Update ResponsePanel to show phase messages (Searching documents..., Filtering passages..., Generating answer...) - Update ExtractedQuestionsDisplay to accept null Tests: - Update query_flow e2e test to mock queryDocumentStream - 84/85 tests pass (1 pre-existing failure from removed file-input)	2026-04-25 18:29:22 +08:00
Woody	e78b670baa	feat(backend): use [filename, page N] citation labels in RAG context (sub-phase 2.6) Replace numeric [1] labels with [filename, page N] format in context chunks. Update LLM prompt to instruct inline citation using bracket labels. Enables traceable source references in generated answers. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 17:52:54 +08:00
Woody	51640201f3	test(backend): update query tests for sub-question generation (sub-phase 2.3) Update prompt assertion in decomposer test and field assertions in query endpoint tests to match extracted_questions rename. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 16:24:10 +08:00
Woody	f9dda7bd18	feat(backend): rename keywords to extracted_questions in query pipeline (sub-phase 2.3) Change QueryDecomposer prompt to generate 2-5 sub-questions instead of keywords. Rename API field from keywords to extracted_questions across models, service, and router. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 16:23:53 +08:00
Woody	d49756f374	feat: add chunk PDF serving endpoint and frontend clickable source links (1.5.6) - Add page_number and chunk_file_path to SourceMetadata model and query router - Add GET /chunks/{file_path}/pdf endpoint with path traversal protection - Add View PDF links in ResponsePanel source cards and ChunkList component - Update TypeScript types and API helper for chunk PDF URLs - Add backend tests (5) and frontend ChunkList tests (7) - Update enhancement plan: all 3 features complete	2026-04-24 11:49:39 +08:00
Woody	4732b4949c	feat(backend): clean up chunk PDFs on document and chunk deletion Delete document endpoint now removes associated chunk PDF files from document_chunk/ before ChromaDB deletion. Delete chunk endpoint removes individual chunk PDF. Missing files logged as warnings, not errors. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 10:53:34 +08:00
Woody	b2dd385443	feat(backend): refactor ingest pipeline for page-aware chunking with PDF generation PDF uploads now use parse_pdf_by_page() -> chunk_pages() -> extract page PDFs -> enhanced metadata with page_number, chunk_file_path, and document_id. Same-filename replacement deletes old chunks and PDFs before re-ingest. DOCX/TXT keep original flat flow with document_id added. RAGService.ingest_document() accepts optional document_id parameter. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 10:53:17 +08:00
Woody	8c84062996	feat(backend): add PDF page extractor and chunk PDF storage config New pdf_extractor.py with extract_page_as_pdf() and extract_pages_as_pdf() for extracting individual PDF pages as separate files. Adds document_chunk_path setting to config and document_chunk/ to .gitignore. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 10:52:57 +08:00
Woody	b97264c66a	feat(backend): add page_number, chunk_file_path, document_id to chunk metadata Enhance extract_metadata() with three new optional fields for page-aware chunking support. Validates list length mismatches. Fully backward compatible — existing callers unaffected. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 10:30:40 +08:00
Woody	0995c685fa	feat(backend): add page-aware chunking with adjacent-page overlap Add chunk_pages() to TokenChunkingStrategy: one chunk per page with 200-token overlap from adjacent pages. Uses original page text for main content, decoded tokens for overlap. Never splits a page regardless of size. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 10:30:18 +08:00
Woody	f4fa577fb0	feat(backend): add page-aware PDF parsing with per-page text extraction Add parse_pdf_by_page() that returns List[Tuple[int, str]] with 1-indexed page numbers. Pages with no extractable text are skipped. Follows same error handling as existing parse_pdf(). Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 10:30:04 +08:00
Woody	5dcb71369c	fix(backend): add embed_query method to EmbeddingFunctionWrapper for ChromaDB query ChromaDB 1.5.8 calls embed_query() during collection.query(), but the wrapper only implemented __call__ (used by collection.add()). Added embed_query() as alias and refactored to shared _embed() method. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 10:15:08 +08:00
Woody	b48c23001e	fix(backend): preserve original filename in chunk metadata instead of temp file name When uploading files, the backend passes them through NamedTemporaryFile, causing os.path.basename to return temp names like 'tmp90i7xqa8.pdf'. Added original_filename parameter to extract_metadata() so the actual upload filename is stored. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-24 10:14:58 +08:00
Woody	c6abe5c335	fix(backend): add name() method to EmbeddingFunctionWrapper for ChromaDB 1.5.8 ChromaDB 1.5.8 requires embedding functions to implement the name() method from the EmbeddingFunction protocol. Without this, collection.get() fails with AttributeError. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-23 19:02:41 +08:00
Woody	f21085b3df	feat(backend): add documents CRUD endpoints and tests Add 4 REST endpoints for RAG database management: GET /documents, GET /documents/{id}/chunks, DELETE /documents/{id}, DELETE /chunks/{id}. Register documents router in main.py. 8 unit tests covering all CRUD operations. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-23 19:02:28 +08:00
Woody	178461915a	feat(backend): add documents CRUD service methods and Pydantic schemas Add list_documents(), list_chunks(), delete_document(), delete_chunk() to RAGService for ChromaDB document management. New schemas: DocumentInfo, ChunkInfo, DocumentListResponse, DeleteResponse. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-23 19:02:07 +08:00
Woody	52c09b86cb	feat(frontend): add nav bar with routing, markdown rendering, and enhancement plan - Add react-router-dom with NavBar component (LTT + RAG Database tabs) - Extract AppContent into LTTPage, add RAGDatabasePage placeholder - Refactor App.tsx to BrowserRouter + Routes layout - Switch ResponsePanel to react-markdown for rich formatting - Fix ResponsePanel test for markdown rendering - Update RAG prompt to cite source name instead of number - Save Phase 1 enhancement plan (.plans/phase1_enhancement_plan.md)	2026-04-23 18:37:30 +08:00
Woody	029a0e490f	debug(backend): add LLM request/response logging for OpenRouter debugging - Log extra_body contents before sending to LLM - Log full LLM response object for debugging - Changed extra_body format to OpenRouter reasoning format Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus \u003cclio-agent@sisyphuslabs.ai\u003e	2026-04-23 16:28:43 +08:00

1 2

70 Commits