Commit Graph

16 Commits

Author SHA1 Message Date
Woody 032dd75e17 feat: add Sub-Phase 9.3 evaluation API endpoint and 9.4 polish 2026-05-25 19:30:17 +08:00
Woody ac81df0704 feat: add Sub-Phase 9.1 results generation APIs with reusable RAGPipeline 2026-05-25 18:35:55 +08:00
Woody b05c361fbd revert: remove Phase 3 YouTube proxy — all 7 sub-phases
Reverts commits 284028b through b4096d6. Phase 4 (System Audio Capture)
will replace the YouTube use case with a more versatile getDisplayMedia approach.

Removed: YouTube router, HLS proxy, YouTubeService, YouTubeInput,
YouTubeVideoPlayer, useYouTubeASR hook, all Phase 3 tests, hls.js dep,
YouTube config fields, YouTube README/plan sections.

Modified files restored to pre-Phase-3 state: LTTPage (no source toggle),
api.ts (no YouTube extract), types (no YouTube types), config.py (no
youtube fields), main.py (no YouTube router), requirements.txt (no yt-dlp),
.env.example (no YouTube vars), package.json (no hls.js).

Relevant Phase 2 code preserved: ws_asr.py (unchanged), useVideoASR,
VideoPlayer, VideoUpload, QueryInput, Full Transcript.
2026-05-09 21:07:21 +08:00
Woody 284028bb1f feat: Phase 3.1 + 3.2 — YouTube config infra and URL extraction
Phase 3.1 — Configuration & Infrastructure:
- Add youtube_proxy_enabled, yt_dlp_timeout, yt_dlp_cache_ttl config fields
- Add yt-dlp and hls.js dependencies
- Create models/youtube.py (request/response schemas)
- Create service stubs (youtube_service, hls_proxy)
- Create router stub and register in main.py
- 11 config tests

Phase 3.2 — YouTube URL Extraction:
- yt-dlp wrapper with async extraction (run_in_executor)
- Format selection: ≤480p video-only + highest-bitrate audio (VOD)
- Combined format fallback: same URL for live streams
- In-memory URL cache: 5min TTL live, 30min VOD
- lru_cache singleton service for cache persistence
- Error handling: DownloadError → 200 with error field
- 18 extract tests, 82/82 total pass (zero regressions)

Real-URL verified: VOD (5bF3tkO5jAA) 24 formats, Live (fN9uYWCjQaw) 6 HLS
2026-05-09 15:53:04 +08:00
Woody 9934749d2b feat: Phase 2.1 config + infrastructure and 2.2 video upload backend
- Add DashScope ASR and video upload config fields to Settings
- Create Pydantic models (video.py, asr.py)
- Create VideoService with validation, save, serve, delete
- Create ASR client stub with float32_to_s16le utility
- Implement POST /api/v1/video/upload with streaming validation
- Implement GET /api/v1/video/{video_id} with FileResponse
- Create WebSocket ASR endpoint stub
- Register new routers in main.py
- Update .env.example and requirements.txt
- Add reference examples for DashScope integration
- 8 tests passing (3 config + 5 video upload)
2026-05-06 13:08:19 +08:00
Woody a56f8f69e2 feat: add highlight batch and GET endpoints (Phase 5.4.5)
- POST /api/v1/v2/highlights/batch: compute and cache highlights for cited chunks
- GET /api/v1/v2/highlights: serve cached highlighted HTML pages
- chunks.py router registered in main.py
- Dynamic DB path computation (prompts.db -> highlights.db), no Settings changes
- 7 endpoint tests: POST 200/422, GET 200/404, mock service verification
2026-04-29 09:26:50 +08:00
Woody 4ad9deeccb feat(deploy): add Dockerfile, compose, nginx config, and README
Multi-stage Dockerfile: Node builds frontend, Python serves both API
and static files. docker-compose.yml with named volumes for ChromaDB,
chunks, and SQLite data. nginx.conf as reverse proxy with 350M upload
limit and 300s LLM proxy timeout. README with dev setup, deploy steps,
env vars table, and architecture diagram.

Backend main.py: add catch-all route to serve frontend/dist/static
files in production. Only activates when dist/ exists.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-27 17:17:53 +08:00
Woody 475306f2b1 feat(history): Phase 3.5 — Query History backend (service, API, timing, XML capture) 2026-04-25 22:59:53 +08:00
Woody e49a68b0bd feat(prompts): Phase 3.2 — Prompt Backend (CRUD service, REST API, 33 tests)
- PromptService (services/prompt_service.py): full CRUD for 3 profiles A/B/C
  with seed template reset, validation, and sqlite3.Row access
- REST API (routers/prompts.py): 6 endpoints on /api/v1/prompts
- Pydantic models (models/prompts.py): 6 schemas
- DI wiring (dependencies.py): get_prompt_service()
- App registration (main.py): prompts router
- Mock fixture (conftest.py): mock_prompt_service
- Tests: test_phase3_prompt_service.py (22) + test_phase3_prompts_router.py (11)
- 162/166 total pass, 4 skipped, 0 fail
2026-04-25 21:11:17 +08:00
Woody f4b404f27d feat(db): Phase 3.1 — SQLite infrastructure (prompts.db + history.db)
- Add sqlite_db.py with dual-DB connection factories (WAL mode, foreign keys)
- init_prompts_db() creates system_prompt_profiles + system_prompts tables
- init_history_db() creates query_history table + created_at index
- seed_default_profiles() inserts 3 profiles (A/B/C) x 3 steps each
- All 3 profiles start with identical seed templates; Profile A active
- Add prompts_db_path + history_db_path to config (./data/ default)
- Startup init in main.py creates data/ dir, inits both DBs, seeds profiles
- Add PROMPTS_DB_PATH + HISTORY_DB_PATH to .env.example
- Add data/ to .gitignore
- 17 new tests in test_phase3_sqlite_db.py (all passing)
2026-04-25 20:29:29 +08:00
Woody f21085b3df feat(backend): add documents CRUD endpoints and tests
Add 4 REST endpoints for RAG database management: GET /documents, GET /documents/{id}/chunks, DELETE /documents/{id}, DELETE /chunks/{id}. Register documents router in main.py. 8 unit tests covering all CRUD operations.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 19:02:28 +08:00
Woody e83a4708b5 feat(backend): add rotating file logging to backend/app/log/
- Configure RotatingFileHandler in main.py (10MB per file, 5 backups)

- Log directory auto-created on startup

- Add backend/app/log/ to .gitignore

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 14:09:48 +08:00
Woody 4cf930dc59 feat(backend): add dependency injection and update main entry point
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 13:27:30 +08:00
Woody 181f4eca5b feat: Phase 1.3 query pipeline with decomposition, relevance filter, and response
- Add QueryDecomposer: extracts keywords from question via LLM JSON response
- Add RelevanceFilter: batch scores chunks 0-10, filters by threshold
- Add POST /api/v1/query endpoint with full 3-step pipeline:
  1. QueryDecomposer.decompose() → keywords
  2. RAGService.retrieve() → chunks from ChromaDB
  3. RelevanceFilter.filter() → score and filter chunks
  4. RAGService.generate_response() → bullet-point answer
- Fix SourceMetadata.upload_date type from datetime to str for flexibility
- Test-first: 13 new tests pass (5 decomposer, 5 relevance filter, 3 query endpoint)
- All Phase 1 tests: 41 passed, 2 skipped
2026-04-22 17:19:21 +08:00
Woody d94abaac77 feat: Phase 1.2 ingestion pipeline with chunking and metadata
- Add document parsers (DOCX, PDF) with lazy imports
- Add TokenChunkingStrategy with ABC for future replacement
- Add metadata extraction (filename, upload_date, content_summary)
- Add RAGService for ChromaDB ingestion/retrieval/response generation
- Add POST /api/v1/ingest endpoint with file validation
- Test-first: 20 passed, 2 skipped (python-docx not installed)
2026-04-22 16:49:52 +08:00
Woody 3712397d64 feat: Phase 1.1 project setup with config, database, and models
- Add requirements.txt with all dependencies
- Add .env.example with required environment variables
- Add Pydantic Settings (config.py) with .env loading
- Add ChromaDB persistent client (database.py)
- Add Pydantic schemas (ingest.py) for request/response
- Add FastAPI main.py with CORS middleware
- Add package __init__.py files
- Add tests: test_phase1_config.py, test_phase1_database.py
- All 5 tests pass
2026-04-22 16:13:52 +08:00