legco_ai_assistant/backend/app/test
Woody 25b26c9b48 feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3)
DOCX and TXT ingestion now produces chunk_file_path + per-chunk PDF files matching the PDF ingestion flow. Uses reportlab to render chunk text as simple PDFs with automatic text wrapping.

- Add reportlab==4.2.5 to requirements.txt
- New utils/text_to_pdf.py: generate_text_pdf() renders chunk text as PDF
- Ingest router DOCX/TXT branches: generate chunk_N.pdf per chunk, store in chunk_file_paths
- Graceful degradation: chunk_file_path stays None if PDF generation fails
- Update test_phase1_ingest_page_aware.py assertions: DOCX chunks now HAVE chunk_file_path
- New test_phase5_docx_pdf_generation.py: 5 tests (DOCX PDF gen, TXT PDF gen, PDF regression, file count, graceful degradation)
- 361 backend tests pass (4 pre-existing embedding failures unrelated)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-28 17:32:22 +08:00
..
acceptance test(backend): add Phase 4 integration and acceptance tests 2026-04-26 23:29:09 +08:00
conftest.py feat(prompts): integrate filter_per_subq with PromptService, fix seed bugs, restructure UI 2026-04-27 11:14:27 +08:00
test_phase1_chunk_serving.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase1_chunking.py feat: Phase 1.2 ingestion pipeline with chunking and metadata 2026-04-22 16:49:52 +08:00
test_phase1_config.py feat: Phase 1.1 project setup with config, database, and models 2026-04-22 16:13:52 +08:00
test_phase1_database.py feat(history): Phase 3.5 — Query History backend (service, API, timing, XML capture) 2026-04-25 22:59:53 +08:00
test_phase1_documents_router.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase1_enhanced_metadata.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase1_ingest.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase1_ingest_page_aware.py feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3) 2026-04-28 17:32:22 +08:00
test_phase1_llm_client.py test(backend): update unit tests for LLM monitoring changes 2026-04-23 14:52:41 +08:00
test_phase1_metadata.py feat: Phase 1.2 ingestion pipeline with chunking and metadata 2026-04-22 16:49:52 +08:00
test_phase1_page_aware_chunking.py feat(backend): add page-aware chunking with adjacent-page overlap 2026-04-24 10:30:18 +08:00
test_phase1_parsers.py feat: Phase 1.2 ingestion pipeline with chunking and metadata 2026-04-22 16:49:52 +08:00
test_phase1_pdf_extractor.py feat(backend): add PDF page extractor and chunk PDF storage config 2026-04-24 10:52:57 +08:00
test_phase1_pdf_parser_pages.py feat(backend): add page-aware PDF parsing with per-page text extraction 2026-04-24 10:30:04 +08:00
test_phase1_query.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase1_query_decomposer.py feat(history): Phase 3.5 — Query History backend (service, API, timing, XML capture) 2026-04-25 22:59:53 +08:00
test_phase1_rag_service.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase1_relevance_filter.py test(backend): extend existing tests for per-sub-q methods and templates 2026-04-26 23:29:27 +08:00
test_phase2_asr_client.py init: project setup with AGENTS.md, test structure, and plan directory 2026-04-22 15:22:29 +08:00
test_phase2_video_upload.py init: project setup with AGENTS.md, test structure, and plan directory 2026-04-22 15:22:29 +08:00
test_phase2_ws_asr.py init: project setup with AGENTS.md, test structure, and plan directory 2026-04-22 15:22:29 +08:00
test_phase3_history_router.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase3_history_service.py feat(db): update history schema and generate prompt template for Package 4 2026-04-26 23:28:28 +08:00
test_phase3_prompt_injection.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase3_prompt_service.py feat(prompts): integrate filter_per_subq with PromptService, fix seed bugs, restructure UI 2026-04-27 11:14:27 +08:00
test_phase3_prompts_router.py feat(prompts): integrate filter_per_subq with PromptService, fix seed bugs, restructure UI 2026-04-27 11:14:27 +08:00
test_phase3_query_history_integration.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase3_sqlite_db.py feat(prompts): integrate filter_per_subq with PromptService, fix seed bugs, restructure UI 2026-04-27 11:14:27 +08:00
test_phase4_generate_per_subq.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase4_history_format.py test(backend): add Phase 4 unit tests for generate, format, history, prompts 2026-04-26 23:28:58 +08:00
test_phase4_integration_query_pipeline.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase4_prompt_templates.py feat(prompts): integrate filter_per_subq with PromptService, fix seed bugs, restructure UI 2026-04-27 11:14:27 +08:00
test_phase4_query_router_filter.py test(backend): add Phase 4 unit tests for retrieval and filtering 2026-04-26 23:28:45 +08:00
test_phase4_query_router_retrieval.py test(backend): add Phase 4 unit tests for retrieval and filtering 2026-04-26 23:28:45 +08:00
test_phase4_relevance_filter_per_subq.py fix(relevance): tolerate LLM score count mismatches via padding instead of discarding 2026-04-27 14:31:18 +08:00
test_phase4_response_format.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase4_retrieve_per_subquestion.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
test_phase5_decompose_logging.py feat: structured LLM output for decompose + citation fuzzy matching (Phase 5) 2026-04-28 15:39:17 +08:00
test_phase5_docx_pdf_generation.py feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3) 2026-04-28 17:32:22 +08:00
test_phase5_llm_client_structured.py feat: structured LLM output for decompose + citation fuzzy matching (Phase 5) 2026-04-28 15:39:17 +08:00
test_phase5_query_decomposer_structured.py feat: structured LLM output for decompose + citation fuzzy matching (Phase 5) 2026-04-28 15:39:17 +08:00
test_phase5_subquestions_model.py feat: structured LLM output for decompose + citation fuzzy matching (Phase 5) 2026-04-28 15:39:17 +08:00
test_phaseX_export.py feat(prompts): add JSON export/import for profile prompt configurations 2026-04-27 19:44:35 +08:00
test_phaseX_import.py feat(prompts): add JSON export/import for profile prompt configurations 2026-04-27 19:44:35 +08:00