docs: mark Phase 5.3 complete in enhancement plan
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
This commit is contained in:
parent
25b26c9b48
commit
ec3b5a4ae1
|
|
@ -247,11 +247,16 @@ If `with_structured_output()` causes issues in production:
|
|||
|
||||
---
|
||||
|
||||
## Phase 5.3 — DOCX/TXT PDF Generation (DEFERRED)
|
||||
## Phase 5.3 — DOCX/TXT PDF Generation ✅
|
||||
|
||||
Generate per-chunk PDF files for DOCX/TXT documents at ingestion time so they have the same `chunk_file_path` → PDF viewer flow as PDF documents.
|
||||
|
||||
**Status**: Deferred. Phase 5.2 fallback links (`/rag-database?document=xxx`) are sufficient. Revisit after Phase 5.4 if plain-text chunk views are still needed alongside highlighted views.
|
||||
**Status**: Complete (2026-04-28). Implemented in commit `25b26c9`.
|
||||
- `reportlab==4.2.5` added to `requirements.txt`
|
||||
- New `backend/app/utils/text_to_pdf.py`: renders chunk text as simple PDFs with word wrapping
|
||||
- `ingest.py` DOCX/TXT branches: generates `{stem}_chunk_{idx}.pdf` per chunk, passes `chunk_file_paths` to `extract_metadata()`
|
||||
- Graceful degradation: `chunk_file_path` stays `None` on generation failure (logged as warning)
|
||||
- Tests: `test_phase5_docx_pdf_generation.py` (5 tests), updated `test_phase1_ingest_page_aware.py` (2 assertions)
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue