docs: update AGENTS.md with per-sub-question pipeline architecture

Replace flat 3-step LLM workflow with per-sub-question architecture diagram. Document per-sub-question retrieval, filtering (single LLM call with sub-q grouping), and response generation with ## headers. Update CODE MAP to reflect completed implementation status. Add SSE event sequence with generating_subquestion events, history XML format with <sub_q> wrappers, and sources as list-of-lists JSON.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
This commit is contained in:
Woody 2026-04-26 23:30:19 +08:00
parent 3f292abe1b
commit a72d5b0773
1 changed files with 25 additions and 1 deletions

View File

@ -51,7 +51,9 @@ app/
| UI components | `frontend/src/components/` | shadcn/ui + Tailwind |
## CODE MAP
*Greenfield — no code yet. See development_plan.md for full specification.*
- **Backend**: FastAPI app with routers (query, ingest, video, ws_asr, prompts, history), services (rag, llm_client, asr_client, video_service, query_decomposer, relevance_filter, prompt_service, history_service), Pydantic models
- **Frontend**: React 18 + TypeScript + Vite with react-resizable-panels layout, TanStack Query, SSE streaming via `queryDocumentStream()`, shadcn/ui + Tailwind components
- **Pipeline**: 3-step LLM workflow (decompose → retrieve → filter → generate) with per-sub-question organization
## CONVENTIONS
- **Backend**: `snake_case` files; routers thin, services thick; `.env` for all LLM/ASR config
@ -60,6 +62,28 @@ app/
- **RAG**: Strict prompt — answer ONLY from retrieved context; bullet-point format
- **Metadata**: Every doc chunk must have `filename`, `upload_date`, `content_summary`
### RAG Pipeline (3-Step LLM Workflow — Per-Sub-Question)
```
User Question
[LLM Call 1] QueryDecomposer — extract 2-5 sub-questions
[ChromaDB] Retrieve per sub-question — each sub-q independently queries ChromaDB
[LLM Call 2] RelevanceFilter (single call) — chunks grouped by sub-q, each scored against its own sub-q
[LLM Call 3] ResponseGeneration — markdown sections per sub-question with ## headers
```
**Per-Sub-Question Organization**:
- Retrieval: `RAGService.retrieve_per_subquestion()` queries ChromaDB once per sub-question
- Filtering: `RelevanceFilter.filter_per_subquestion()` single LLM call with sub-q grouping
- Response: `RAGService.generate_response_per_subquestion()` produces markdown sections with grouped sources
- SSE Events: `decomposed → retrieving → filtering → generating → generating_subquestion (per sub-q) → completed`
- History: XML chunks wrapped in `<sub_q>` elements; sources stored as list-of-lists JSON
- Empty decomposition fallback (Decision #13): if decomposer returns `[]`, uses `[original_question]`
## ANTI-PATTERNS (THIS PROJECT)
- Hardcode LLM URLs/keys — always `.env`
- Business logic in routers — belongs in `services/`