docs: Package 8 — add vLLM vision compatibility risk and smoke test to plan

- New risk: vLLM may not support Qwen3.5-35B-A3B vision API depending on version - Dependencies: added vLLM compatibility note with smoke test snippet - Heuristic fallback (Option B) works regardless of OpenRouter or vLLM - qa_vision_enabled toggle provides escape hatch
2026-05-15 11:20:20 +08:00 · 2026-05-15 11:20:20 +08:00 · 322caf1cc0
parent 16fbb107f4
commit 322caf1cc0
1 changed files with 20 additions and 1 deletions
--- a/.plans/package8_enhancement_plan.md
+++ b/.plans/package8_enhancement_plan.md
@ -655,12 +655,31 @@ class Settings(BaseSettings):
 | **LegCo format drift**: Future documents may use different Q&A markers | Low | Detection is regex-based — easy to add new patterns. LLM verification catches novel formats. Log format detection results for monitoring. |
 | **Chunk size**: Some Q&A pairs are very long (7+ pages) | Medium | Apply max chunk token limit (configurable, default 3000). Recursive split on `\n\n` → `\n` with question text prepended to each sub-chunk for context. |
 | **DOCX/TXT Q&A**: Non-PDF formats may have different Q&A markers | Low | Use same regex detection on concatenated text. Skip vision table extraction (text-based only). |
+| **vLLM vision compatibility**: vLLM may not support vision API for Qwen3.5-35B-A3B depending on version and how the model is served | Medium | Test with a single vision call against your vLLM instance before implementation. Set `QA_VISION_ENABLED=false` and use heuristic fallback (Option B) if unsupported. See vLLM compatibility note in Dependencies below. |

 ---

 ## Dependencies

- **Vision LLM API**: The existing `LLM_MODEL_NAME` (`qwen/qwen3.5-35b-a3b`) is a native vision-language model that accepts base64 images via OpenRouter's standard OpenAI Chat Completions API. No separate vision model, API key, or endpoint needed. If vision is unavailable or disabled (`QA_VISION_ENABLED=false`), fall back to heuristic table detection (text-only).
+- **Vision LLM API**: The existing `LLM_MODEL_NAME` (`qwen/qwen3.5-35b-a3b`) is a native vision-language model that accepts base64 images via the standard OpenAI Chat Completions API. No separate vision model, API key, or endpoint needed. If vision is unavailable or disabled (`QA_VISION_ENABLED=false`), fall back to heuristic table detection (text-only).
+
+- **vLLM compatibility** (when `VLLM_ENGINE=true`): Vision table extraction requires vLLM v0.6.0+ with the model served as multimodal (vision encoder loaded). Verify with a quick smoke test before implementation:
+  ```python
+  from openai import AsyncOpenAI
+  client = AsyncOpenAI(base_url=settings.llm_base_url, api_key=settings.llm_api_key)
+  resp = await client.chat.completions.create(
+      model=settings.llm_model_name,
+      messages=[{
+          "role": "user",
+          "content": [
+              {"type": "text", "text": "Describe what you see."},
+              {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo="}}
+          ]
+      }]
+  )
+  ```
+  If this returns a valid response → vision works. If it errors (400/422) → set `QA_VISION_ENABLED=false` and use Option B (heuristic text-based table extraction). The heuristic fallback works identically regardless of provider (OpenRouter or vLLM).
+
 - **New Python packages**: `Pillow` (likely already installed for image rendering). `pypdf` already installed. No `pymupdf` needed — vision extraction sends raw page images (PNG) directly to the LLM; the LLM itself identifies table regions.
 - **Existing codebase**: No breaking changes. Strategy is additive — existing TokenChunkingStrategy unchanged.
 - **ChromaDB**: No schema change. Metadata fields are flexible (ChromaDB accepts arbitrary dict keys).