diff --git a/.plans/package8_enhancement_plan.md b/.plans/package8_enhancement_plan.md
index 7c05dfd..3cfed38 100644
--- a/.plans/package8_enhancement_plan.md
+++ b/.plans/package8_enhancement_plan.md
@@ -655,12 +655,31 @@ class Settings(BaseSettings):
 | **LegCo format drift**: Future documents may use different Q&A markers | Low | Detection is regex-based — easy to add new patterns. LLM verification catches novel formats. Log format detection results for monitoring. |
 | **Chunk size**: Some Q&A pairs are very long (7+ pages) | Medium | Apply max chunk token limit (configurable, default 3000). Recursive split on `\n\n` → `\n` with question text prepended to each sub-chunk for context. |
 | **DOCX/TXT Q&A**: Non-PDF formats may have different Q&A markers | Low | Use same regex detection on concatenated text. Skip vision table extraction (text-based only). |
+| **vLLM vision compatibility**: vLLM may not support vision API for Qwen3.5-35B-A3B depending on version and how the model is served | Medium | Test with a single vision call against your vLLM instance before implementation. Set `QA_VISION_ENABLED=false` and use heuristic fallback (Option B) if unsupported. See vLLM compatibility note in Dependencies below. |
 
 ---
 
 ## Dependencies
 
-- **Vision LLM API**: The existing `LLM_MODEL_NAME` (`qwen/qwen3.5-35b-a3b`) is a native vision-language model that accepts base64 images via OpenRouter's standard OpenAI Chat Completions API. No separate vision model, API key, or endpoint needed. If vision is unavailable or disabled (`QA_VISION_ENABLED=false`), fall back to heuristic table detection (text-only).
+- **Vision LLM API**: The existing `LLM_MODEL_NAME` (`qwen/qwen3.5-35b-a3b`) is a native vision-language model that accepts base64 images via the standard OpenAI Chat Completions API. No separate vision model, API key, or endpoint needed. If vision is unavailable or disabled (`QA_VISION_ENABLED=false`), fall back to heuristic table detection (text-only).
+
+- **vLLM compatibility** (when `VLLM_ENGINE=true`): Vision table extraction requires vLLM v0.6.0+ with the model served as multimodal (vision encoder loaded). Verify with a quick smoke test before implementation:
+  ```python
+  from openai import AsyncOpenAI
+  client = AsyncOpenAI(base_url=settings.llm_base_url, api_key=settings.llm_api_key)
+  resp = await client.chat.completions.create(
+      model=settings.llm_model_name,
+      messages=[{
+          "role": "user",
+          "content": [
+              {"type": "text", "text": "Describe what you see."},
+              {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo="}}
+          ]
+      }]
+  )
+  ```
+  If this returns a valid response → vision works. If it errors (400/422) → set `QA_VISION_ENABLED=false` and use Option B (heuristic text-based table extraction). The heuristic fallback works identically regardless of provider (OpenRouter or vLLM).
+
 - **New Python packages**: `Pillow` (likely already installed for image rendering). `pypdf` already installed. No `pymupdf` needed — vision extraction sends raw page images (PNG) directly to the LLM; the LLM itself identifies table regions.
 - **Existing codebase**: No breaking changes. Strategy is additive — existing TokenChunkingStrategy unchanged.
 - **ChromaDB**: No schema change. Metadata fields are flexible (ChromaDB accepts arbitrary dict keys).