Commit Graph

13 Commits

Author SHA1 Message Date
Woody 16de8394aa fix: add full input/output logging to vLLM structured output path
Log the complete prompt, schema, extra_body content, full API response,
token counts, and full parsed JSON output. Add exc_info=True tracebacks
on all failure paths.
2026-04-29 16:52:26 +08:00
Woody 3ab6fd102a fix: use vLLM-native guided_json for structured output
vLLM servers support JSON schema enforcement via extra_body (guided_json
or structured_outputs), not OpenAI's response_format protocol. LangChain's
with_structured_output(method='json_schema') sends response_format which
vLLM ignores, causing NoneType not iterable parsing errors.

- vLLM path: direct OpenAI SDK call with extra_body={guided_json|structured_outputs}
- OpenRouter path: unchanged with_structured_output(method='json_schema')
- Try new 'structured_outputs' format first, fall back to legacy 'guided_json'
- Update _SEED_DECOMPOSE with explicit JSON array instruction
- Add diagnostic logging: exc_info=True, schema preview, prompt template preview
- Add logging in _parse_legacy_json for fallback failure debugging
2026-04-29 16:49:14 +08:00
Woody 2aca18d30e docs: add vLLM structured output fix plan
- Diagnose: vLLM ignores OpenAI-native response_format, causing NoneType error
- Diagnose: legacy fallback prompt lacks JSON instruction → empty questions
- Plan: use vLLM-native guided_json via extra_body instead of with_structured_output
- Plan: update _SEED_DECOMPOSE with JSON format instruction
- Plan: add diagnostic logging (exc_info, method, schema preview)

wip: temporary function_calling switch for vLLM (to be replaced by guided_json)
2026-04-29 16:42:23 +08:00
Woody cbb958d75d fix: vLLM chat_template_kwargs breaks LangChain structured output
vLLM's chat_template_kwargs leaked into LangChain's AsyncCompletions.parse()
via _get_langchain_model's model_kwargs, causing structured decomposition
to fail on vLLM backends. Skip vLLM-specific params when building the
LangChain model — only provider-agnostic params (OpenAI reasoning) pass through.
2026-04-29 16:07:44 +08:00
Woody 48e15f8232 feat(llm): log structured LLM response and extra_body
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-28 16:50:26 +08:00
Woody 095f013739 feat(llm): pass extra_body via model_kwargs in LangChain
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-28 16:42:49 +08:00
Woody f2115ae563 feat: structured LLM output for decompose + citation fuzzy matching (Phase 5)
Phase 5.1 — Structured LLM output for query decomposition:
- Add SubQuestions Pydantic model with sub_question, keywords, rationale
- Add LLMClient.complete_structured() using langchain with_structured_output
- Update QueryDecomposer with structured output path + legacy json.loads fallback
- Update SQLite seed templates: add subq+citation labeling requirement
- Add tests: structured output, subquestions model validation, logging

Phase 5.2 — Citation format alignment and fallback links:
- Add document_id to SourceMetadata (backend + frontend types)
- Rewrite citationParser.ts with fuzzy matching and fallback document links
- Add RAGDatabasePage auto-expand from ?document= URL param
- Tighten generate_per_subq seed prompt: 'Copy exact bracket labels shown'
- Add citation parser tests for fuzzy match and fallback link scenarios
- Defer: DOCX/TXT PDF generation → Phase 5.3 (fallback links sufficient)
2026-04-28 15:39:17 +08:00
Woody 711be3dfde feat(llm): add VLLM_ENGINE env flag for provider-specific extra_body format 2026-04-28 13:30:27 +08:00
Woody 029a0e490f debug(backend): add LLM request/response logging for OpenRouter debugging
- Log extra_body contents before sending to LLM

- Log full LLM response object for debugging

- Changed extra_body format to OpenRouter reasoning format

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus \u003cclio-agent@sisyphuslabs.ai\u003e
2026-04-23 16:28:43 +08:00
Woody f5cfe44183 feat(backend): add LLM monitoring with step names, timing, and prompt logging
- LLMClient.complete() now accepts step_name parameter to identify processing step

- Logs prompt preview (first 100 + last 100 chars) at INFO level

- Logs processing time in milliseconds with token usage stats

- Updated QueryDecomposer, RelevanceFilter, and RAGService to pass step names

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 14:51:57 +08:00
Woody 74cb8b83d5 feat(backend): migrate LLM client to OpenAI SDK with thinking control
- Replace httpx with openai.AsyncOpenAI

- Add llm_enable_thinking config (default False)

- Add _build_extra_body() for Qwen3.5 thinking mode control

- Use chat_template_kwargs for vLLM/SGLang compatibility

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 14:10:26 +08:00
Woody 38f4c70762 feat(backend): add embedding client and update LLM client
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 13:26:43 +08:00
Woody d94abaac77 feat: Phase 1.2 ingestion pipeline with chunking and metadata
- Add document parsers (DOCX, PDF) with lazy imports
- Add TokenChunkingStrategy with ABC for future replacement
- Add metadata extraction (filename, upload_date, content_summary)
- Add RAGService for ChromaDB ingestion/retrieval/response generation
- Add POST /api/v1/ingest endpoint with file validation
- Test-first: 20 passed, 2 skipped (python-docx not installed)
2026-04-22 16:49:52 +08:00