vLLM's chat_template_kwargs leaked into LangChain's AsyncCompletions.parse() via _get_langchain_model's model_kwargs, causing structured decomposition to fail on vLLM backends. Skip vLLM-specific params when building the LangChain model — only provider-agnostic params (OpenAI reasoning) pass through. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| chunk_highlight_service.py | ||
| embedding_client.py | ||
| highlight_cache.py | ||
| history_service.py | ||
| llm_client.py | ||
| prompt_service.py | ||
| query_decomposer.py | ||
| rag.py | ||
| relevance_filter.py | ||