diff --git a/.plans/package5_enhancement_plan.md b/.plans/package5_enhancement_plan.md index b82133e..a1b3ee3 100644 --- a/.plans/package5_enhancement_plan.md +++ b/.plans/package5_enhancement_plan.md @@ -247,11 +247,16 @@ If `with_structured_output()` causes issues in production: --- -## Phase 5.3 — DOCX/TXT PDF Generation (DEFERRED) +## Phase 5.3 — DOCX/TXT PDF Generation ✅ Generate per-chunk PDF files for DOCX/TXT documents at ingestion time so they have the same `chunk_file_path` → PDF viewer flow as PDF documents. -**Status**: Deferred. Phase 5.2 fallback links (`/rag-database?document=xxx`) are sufficient. Revisit after Phase 5.4 if plain-text chunk views are still needed alongside highlighted views. +**Status**: Complete (2026-04-28). Implemented in commit `25b26c9`. +- `reportlab==4.2.5` added to `requirements.txt` +- New `backend/app/utils/text_to_pdf.py`: renders chunk text as simple PDFs with word wrapping +- `ingest.py` DOCX/TXT branches: generates `{stem}_chunk_{idx}.pdf` per chunk, passes `chunk_file_paths` to `extract_metadata()` +- Graceful degradation: `chunk_file_path` stays `None` on generation failure (logged as warning) +- Tests: `test_phase5_docx_pdf_generation.py` (5 tests), updated `test_phase1_ingest_page_aware.py` (2 assertions) ---