docs: mark Phase 5.3 complete in enhancement plan

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
This commit is contained in:
Woody 2026-04-28 17:33:00 +08:00
parent 25b26c9b48
commit ec3b5a4ae1
1 changed files with 7 additions and 2 deletions

View File

@ -247,11 +247,16 @@ If `with_structured_output()` causes issues in production:
---
## Phase 5.3 — DOCX/TXT PDF Generation (DEFERRED)
## Phase 5.3 — DOCX/TXT PDF Generation
Generate per-chunk PDF files for DOCX/TXT documents at ingestion time so they have the same `chunk_file_path` → PDF viewer flow as PDF documents.
**Status**: Deferred. Phase 5.2 fallback links (`/rag-database?document=xxx`) are sufficient. Revisit after Phase 5.4 if plain-text chunk views are still needed alongside highlighted views.
**Status**: Complete (2026-04-28). Implemented in commit `25b26c9`.
- `reportlab==4.2.5` added to `requirements.txt`
- New `backend/app/utils/text_to_pdf.py`: renders chunk text as simple PDFs with word wrapping
- `ingest.py` DOCX/TXT branches: generates `{stem}_chunk_{idx}.pdf` per chunk, passes `chunk_file_paths` to `extract_metadata()`
- Graceful degradation: `chunk_file_path` stays `None` on generation failure (logged as warning)
- Tests: `test_phase5_docx_pdf_generation.py` (5 tests), updated `test_phase1_ingest_page_aware.py` (2 assertions)
---