New pdf_extractor.py with extract_page_as_pdf() and extract_pages_as_pdf() for extracting individual PDF pages as separate files. Adds document_chunk_path setting to config and document_chunk/ to .gitignore. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| chunking.py | ||
| docx_parser.py | ||
| metadata.py | ||
| pdf_extractor.py | ||
| pdf_parser.py | ||