From a72d5b077366aedc16a748e7d82e5ccb4e2dc561 Mon Sep 17 00:00:00 2001 From: Woody Date: Sun, 26 Apr 2026 23:30:19 +0800 Subject: [PATCH] docs: update AGENTS.md with per-sub-question pipeline architecture Replace flat 3-step LLM workflow with per-sub-question architecture diagram. Document per-sub-question retrieval, filtering (single LLM call with sub-q grouping), and response generation with ## headers. Update CODE MAP to reflect completed implementation status. Add SSE event sequence with generating_subquestion events, history XML format with wrappers, and sources as list-of-lists JSON. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus --- AGENTS.md | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 138ae26..95acee1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -51,7 +51,9 @@ app/ | UI components | `frontend/src/components/` | shadcn/ui + Tailwind | ## CODE MAP -*Greenfield — no code yet. See development_plan.md for full specification.* +- **Backend**: FastAPI app with routers (query, ingest, video, ws_asr, prompts, history), services (rag, llm_client, asr_client, video_service, query_decomposer, relevance_filter, prompt_service, history_service), Pydantic models +- **Frontend**: React 18 + TypeScript + Vite with react-resizable-panels layout, TanStack Query, SSE streaming via `queryDocumentStream()`, shadcn/ui + Tailwind components +- **Pipeline**: 3-step LLM workflow (decompose → retrieve → filter → generate) with per-sub-question organization ## CONVENTIONS - **Backend**: `snake_case` files; routers thin, services thick; `.env` for all LLM/ASR config @@ -60,6 +62,28 @@ app/ - **RAG**: Strict prompt — answer ONLY from retrieved context; bullet-point format - **Metadata**: Every doc chunk must have `filename`, `upload_date`, `content_summary` +### RAG Pipeline (3-Step LLM Workflow — Per-Sub-Question) + +``` +User Question + ↓ +[LLM Call 1] QueryDecomposer — extract 2-5 sub-questions + ↓ +[ChromaDB] Retrieve per sub-question — each sub-q independently queries ChromaDB + ↓ +[LLM Call 2] RelevanceFilter (single call) — chunks grouped by sub-q, each scored against its own sub-q + ↓ +[LLM Call 3] ResponseGeneration — markdown sections per sub-question with ## headers +``` + +**Per-Sub-Question Organization**: +- Retrieval: `RAGService.retrieve_per_subquestion()` queries ChromaDB once per sub-question +- Filtering: `RelevanceFilter.filter_per_subquestion()` single LLM call with sub-q grouping +- Response: `RAGService.generate_response_per_subquestion()` produces markdown sections with grouped sources +- SSE Events: `decomposed → retrieving → filtering → generating → generating_subquestion (per sub-q) → completed` +- History: XML chunks wrapped in `` elements; sources stored as list-of-lists JSON +- Empty decomposition fallback (Decision #13): if decomposer returns `[]`, uses `[original_question]` + ## ANTI-PATTERNS (THIS PROJECT) - Hardcode LLM URLs/keys — always `.env` - Business logic in routers — belongs in `services/`