docs: update AGENTS.md with per-sub-question pipeline architecture
Replace flat 3-step LLM workflow with per-sub-question architecture diagram. Document per-sub-question retrieval, filtering (single LLM call with sub-q grouping), and response generation with ## headers. Update CODE MAP to reflect completed implementation status. Add SSE event sequence with generating_subquestion events, history XML format with <sub_q> wrappers, and sources as list-of-lists JSON. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
This commit is contained in:
parent
3f292abe1b
commit
a72d5b0773
26
AGENTS.md
26
AGENTS.md
|
|
@ -51,7 +51,9 @@ app/
|
||||||
| UI components | `frontend/src/components/` | shadcn/ui + Tailwind |
|
| UI components | `frontend/src/components/` | shadcn/ui + Tailwind |
|
||||||
|
|
||||||
## CODE MAP
|
## CODE MAP
|
||||||
*Greenfield — no code yet. See development_plan.md for full specification.*
|
- **Backend**: FastAPI app with routers (query, ingest, video, ws_asr, prompts, history), services (rag, llm_client, asr_client, video_service, query_decomposer, relevance_filter, prompt_service, history_service), Pydantic models
|
||||||
|
- **Frontend**: React 18 + TypeScript + Vite with react-resizable-panels layout, TanStack Query, SSE streaming via `queryDocumentStream()`, shadcn/ui + Tailwind components
|
||||||
|
- **Pipeline**: 3-step LLM workflow (decompose → retrieve → filter → generate) with per-sub-question organization
|
||||||
|
|
||||||
## CONVENTIONS
|
## CONVENTIONS
|
||||||
- **Backend**: `snake_case` files; routers thin, services thick; `.env` for all LLM/ASR config
|
- **Backend**: `snake_case` files; routers thin, services thick; `.env` for all LLM/ASR config
|
||||||
|
|
@ -60,6 +62,28 @@ app/
|
||||||
- **RAG**: Strict prompt — answer ONLY from retrieved context; bullet-point format
|
- **RAG**: Strict prompt — answer ONLY from retrieved context; bullet-point format
|
||||||
- **Metadata**: Every doc chunk must have `filename`, `upload_date`, `content_summary`
|
- **Metadata**: Every doc chunk must have `filename`, `upload_date`, `content_summary`
|
||||||
|
|
||||||
|
### RAG Pipeline (3-Step LLM Workflow — Per-Sub-Question)
|
||||||
|
|
||||||
|
```
|
||||||
|
User Question
|
||||||
|
↓
|
||||||
|
[LLM Call 1] QueryDecomposer — extract 2-5 sub-questions
|
||||||
|
↓
|
||||||
|
[ChromaDB] Retrieve per sub-question — each sub-q independently queries ChromaDB
|
||||||
|
↓
|
||||||
|
[LLM Call 2] RelevanceFilter (single call) — chunks grouped by sub-q, each scored against its own sub-q
|
||||||
|
↓
|
||||||
|
[LLM Call 3] ResponseGeneration — markdown sections per sub-question with ## headers
|
||||||
|
```
|
||||||
|
|
||||||
|
**Per-Sub-Question Organization**:
|
||||||
|
- Retrieval: `RAGService.retrieve_per_subquestion()` queries ChromaDB once per sub-question
|
||||||
|
- Filtering: `RelevanceFilter.filter_per_subquestion()` single LLM call with sub-q grouping
|
||||||
|
- Response: `RAGService.generate_response_per_subquestion()` produces markdown sections with grouped sources
|
||||||
|
- SSE Events: `decomposed → retrieving → filtering → generating → generating_subquestion (per sub-q) → completed`
|
||||||
|
- History: XML chunks wrapped in `<sub_q>` elements; sources stored as list-of-lists JSON
|
||||||
|
- Empty decomposition fallback (Decision #13): if decomposer returns `[]`, uses `[original_question]`
|
||||||
|
|
||||||
## ANTI-PATTERNS (THIS PROJECT)
|
## ANTI-PATTERNS (THIS PROJECT)
|
||||||
- Hardcode LLM URLs/keys — always `.env`
|
- Hardcode LLM URLs/keys — always `.env`
|
||||||
- Business logic in routers — belongs in `services/`
|
- Business logic in routers — belongs in `services/`
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue