docs: Package 8 — add vLLM vision compatibility risk and smoke test to plan
- New risk: vLLM may not support Qwen3.5-35B-A3B vision API depending on version - Dependencies: added vLLM compatibility note with smoke test snippet - Heuristic fallback (Option B) works regardless of OpenRouter or vLLM - qa_vision_enabled toggle provides escape hatch
This commit is contained in:
parent
16fbb107f4
commit
322caf1cc0
|
|
@ -655,12 +655,31 @@ class Settings(BaseSettings):
|
|||
| **LegCo format drift**: Future documents may use different Q&A markers | Low | Detection is regex-based — easy to add new patterns. LLM verification catches novel formats. Log format detection results for monitoring. |
|
||||
| **Chunk size**: Some Q&A pairs are very long (7+ pages) | Medium | Apply max chunk token limit (configurable, default 3000). Recursive split on `\n\n` → `\n` with question text prepended to each sub-chunk for context. |
|
||||
| **DOCX/TXT Q&A**: Non-PDF formats may have different Q&A markers | Low | Use same regex detection on concatenated text. Skip vision table extraction (text-based only). |
|
||||
| **vLLM vision compatibility**: vLLM may not support vision API for Qwen3.5-35B-A3B depending on version and how the model is served | Medium | Test with a single vision call against your vLLM instance before implementation. Set `QA_VISION_ENABLED=false` and use heuristic fallback (Option B) if unsupported. See vLLM compatibility note in Dependencies below. |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Vision LLM API**: The existing `LLM_MODEL_NAME` (`qwen/qwen3.5-35b-a3b`) is a native vision-language model that accepts base64 images via OpenRouter's standard OpenAI Chat Completions API. No separate vision model, API key, or endpoint needed. If vision is unavailable or disabled (`QA_VISION_ENABLED=false`), fall back to heuristic table detection (text-only).
|
||||
- **Vision LLM API**: The existing `LLM_MODEL_NAME` (`qwen/qwen3.5-35b-a3b`) is a native vision-language model that accepts base64 images via the standard OpenAI Chat Completions API. No separate vision model, API key, or endpoint needed. If vision is unavailable or disabled (`QA_VISION_ENABLED=false`), fall back to heuristic table detection (text-only).
|
||||
|
||||
- **vLLM compatibility** (when `VLLM_ENGINE=true`): Vision table extraction requires vLLM v0.6.0+ with the model served as multimodal (vision encoder loaded). Verify with a quick smoke test before implementation:
|
||||
```python
|
||||
from openai import AsyncOpenAI
|
||||
client = AsyncOpenAI(base_url=settings.llm_base_url, api_key=settings.llm_api_key)
|
||||
resp = await client.chat.completions.create(
|
||||
model=settings.llm_model_name,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": "Describe what you see."},
|
||||
{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo="}}
|
||||
]
|
||||
}]
|
||||
)
|
||||
```
|
||||
If this returns a valid response → vision works. If it errors (400/422) → set `QA_VISION_ENABLED=false` and use Option B (heuristic text-based table extraction). The heuristic fallback works identically regardless of provider (OpenRouter or vLLM).
|
||||
|
||||
- **New Python packages**: `Pillow` (likely already installed for image rendering). `pypdf` already installed. No `pymupdf` needed — vision extraction sends raw page images (PNG) directly to the LLM; the LLM itself identifies table regions.
|
||||
- **Existing codebase**: No breaking changes. Strategy is additive — existing TokenChunkingStrategy unchanged.
|
||||
- **ChromaDB**: No schema change. Metadata fields are flexible (ChromaDB accepts arbitrary dict keys).
|
||||
|
|
|
|||
Loading…
Reference in New Issue