legco_ai_assistant/backend/app
Woody 25b26c9b48 feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3)
DOCX and TXT ingestion now produces chunk_file_path + per-chunk PDF files matching the PDF ingestion flow. Uses reportlab to render chunk text as simple PDFs with automatic text wrapping.

- Add reportlab==4.2.5 to requirements.txt
- New utils/text_to_pdf.py: generate_text_pdf() renders chunk text as PDF
- Ingest router DOCX/TXT branches: generate chunk_N.pdf per chunk, store in chunk_file_paths
- Graceful degradation: chunk_file_path stays None if PDF generation fails
- Update test_phase1_ingest_page_aware.py assertions: DOCX chunks now HAVE chunk_file_path
- New test_phase5_docx_pdf_generation.py: 5 tests (DOCX PDF gen, TXT PDF gen, PDF regression, file count, graceful degradation)
- 361 backend tests pass (4 pre-existing embedding failures unrelated)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-28 17:32:22 +08:00
..
core feat(prompts): enforce bullet-point output in generate template 2026-04-28 16:42:55 +08:00
models feat: structured LLM output for decompose + citation fuzzy matching (Phase 5) 2026-04-28 15:39:17 +08:00
routers feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3) 2026-04-28 17:32:22 +08:00
services feat(llm): log structured LLM response and extra_body 2026-04-28 16:50:26 +08:00
test feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3) 2026-04-28 17:32:22 +08:00
utils feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3) 2026-04-28 17:32:22 +08:00
__init__.py feat: Phase 1.1 project setup with config, database, and models 2026-04-22 16:13:52 +08:00
main.py feat(deploy): add Dockerfile, compose, nginx config, and README 2026-04-27 17:17:53 +08:00