legco_ai_assistant

Commit Graph

Author	SHA1	Message	Date
Woody	1e8773469e	Merge branch 'Phase4-dev'	2026-05-14 23:29:42 +08:00
Woody	624df8cf9a	fix: no text displayed during mic capture DashScope realtime ASR sends utterance-completed (final) events without incremental deltas. The onmessage handler cleared partialTranscript on every final, so text never appeared. Set partialTranscript to full_text on final messages instead of clearing it, keeping the transcript visible in QueryInput.	2026-05-14 23:25:39 +08:00
Woody	7c03137577	fix: mic transcript disappearing after stop useMediaStreamASR cleanup() cleared partialTranscript on stop, causing live ASR text to vanish from QueryInput. Unlike video ASR (which has onFinalTranscript to persist via queryText), mic and system-audio hooks rely on partialTranscript for display. Keep partialTranscript populated with the final transcript instead of clearing it.	2026-05-14 23:19:11 +08:00
Woody	7bff4308b7	feat: Phase 4 — System Audio & Listen Mic capture into ASR → RAG Adds two new live audio sources alongside file Upload: - System Audio: getDisplayMedia() captures system/tab audio output, pipes through WebSocket → DashScope realtime ASR → RAG. - Listen Mic: getUserMedia() captures microphone input via the same audio pipeline (shared useMediaStreamASR hook). Backend: feature toggles (system_audio_enabled, mic_enabled) in config.py, source query param gating in ws_asr.py, 10 config tests. Bug fix: getDisplayMedia() rejected video:false per W3C spec — changed to video:true then stop video tracks to allow audio-only capture on Windows/macOS Chrome.	2026-05-14 22:55:06 +08:00
Woody	a8a2cc0940	fix: enable Half Question/Final Submit during interim ASR text isDisabled, handleSubmit, and Half Question onClick all checked question.trim() instead of displayValue.trim(). Since question state is only updated on onFinalTranscript (complete sentences), interim ASR delta text shown in the textarea via partialText was invisible to the disabled check — buttons stayed disabled until sentence end. Fix: use displayValue which includes partialText when user hasn't typed.	2026-05-14 21:55:07 +08:00
Woody	17db487dbb	feat: Phase 3 — Half Question button, Final Submit rename, ASR text always black - Backend: add stop_after_decompose flag to QueryRequest, early-return after decomposition in SSE stream with half_question:true event - Frontend: add decomposeOnly method to useQueryDocumentStream hook - QueryInput: remove grey italic from ASR partial text, rename Submit to Final Submit, add gray Half Question button that decomposes without clearing querybox text - LTTPage: wire handleHalfQuestion to decomposeOnly	2026-05-14 21:27:21 +08:00
Woody	64a7a8a46b	chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update	2026-05-14 20:26:17 +08:00
Woody	2501a2c3c0	docs: use pnpm instead of npm in dev commands	2026-05-14 20:22:33 +08:00
Woody	5832a854c5	chore: remove Phase 3 plan file after revert	2026-05-09 21:14:20 +08:00
Woody	b05c361fbd	revert: remove Phase 3 YouTube proxy — all 7 sub-phases Reverts commits `284028b` through `b4096d6`. Phase 4 (System Audio Capture) will replace the YouTube use case with a more versatile getDisplayMedia approach. Removed: YouTube router, HLS proxy, YouTubeService, YouTubeInput, YouTubeVideoPlayer, useYouTubeASR hook, all Phase 3 tests, hls.js dep, YouTube config fields, YouTube README/plan sections. Modified files restored to pre-Phase-3 state: LTTPage (no source toggle), api.ts (no YouTube extract), types (no YouTube types), config.py (no youtube fields), main.py (no YouTube router), requirements.txt (no yt-dlp), .env.example (no YouTube vars), package.json (no hls.js). Relevant Phase 2 code preserved: ws_asr.py (unchanged), useVideoASR, VideoPlayer, VideoUpload, QueryInput, Full Transcript.	2026-05-09 21:07:21 +08:00
Woody	b4096d6afc	feat: Phase 3.7 — Polish, PO token handling, docs, deployment verification - PO token handling: _is_po_token_error() detects YouTube bot-detection errors, invalidates cache on detection, logs warning for retry guidance (2 new tests) - README: YouTube Live Stream Proxy section with architecture, usage, config, limits - development_plan: Phase 3 complete, timeline updated, status → Phase 1-3 Complete - Dockerfile/compose: verified OK (ffmpeg + yt-dlp already present, no new volumes) - npm build: 1403 modules, production build clean - 59/59 backend + 44/44 frontend Phase 2+3 tests pass - Plan: 3.7 Complete, 7/7 sub-phases done	2026-05-09 17:27:54 +08:00
Woody	cee859d5d7	feat: Phase 3.6 — integration + acceptance tests for YouTube proxy - test_integration_phase3.py: 6 tests Extract→proxy flow (VOD manifest, VOD segment, live manifest), cache hit bypasses yt-dlp, upstream 404→502, extract disabled→503 Mocked yt-dlp, real FastAPI TestClient + HLSProxyService - test_acceptance_phase3_youtube.py: 3 tests Real YouTube VOD extraction, manifest proxy, segment proxy Follows master→variant→segment chain, verifies MPEG-TS sync byte - test_acceptance_phase3_live.py: 3 tests Real live stream extraction, no #EXT-X-ENDLIST assertion, cache refresh verification, graceful skip when offline - 201/201 CI pass (234 backend Phase 1-3, zero Phase 3 regressions) - Updated plan: 3.6 Complete, 6/7 sub-phases done	2026-05-09 17:18:55 +08:00
Woody	1699a249b0	feat: Phase 3.5 — YouTube → ASR integration with source toggle - useYouTubeASR.ts: adapted from useVideoASR, captures audio from HTMLAudioElement (hls.js → <audio> → AudioContext.createMediaElementSource → ScriptProcessorNode → WebSocket) Play/pause events on videoElement; same return shape as useVideoASR - LTTPage.tsx: Source toggle (Upload/YouTube tabs), YouTubeInput + YouTubeVideoPlayer wired with handleExtractSuccess → handleAudioReady → useYouTubeASR Full Transcript button hidden for YouTube source; unified asr variable - QueryInput.tsx: no changes needed (already supports partialText/value from any source) - Tests: 18 new (11 useYouTubeASR, 7 LTTPage integration) - 189/189 total pass (zero regressions) - Updated plan: 3.5 marked Complete, 5/7 sub-phases done	2026-05-09 17:00:32 +08:00
Woody	a8eea54c0f	feat: Phase 3.4 — YouTube Input + Video Player frontend components - YouTubeInput.tsx: URL input with validation (youtube.com/watch, youtu.be, /live/, /shorts/), loading/error states, Load Stream button, uses useYouTubeExtract mutation - YouTubeVideoPlayer.tsx: dual hls.js (video + hidden audio), forwardRef, thumbnail placeholder until play, LIVE badge, quality capped ≤480p, onAudioReady callback for ASR hook exposure, dynamic import('hls.js') - Types: YouTubeFormat, YouTubeStreamResponse interfaces - API: extractYouTubeStream() — POST /youtube/extract - Query: useYouTubeExtract() TanStack Query mutation hook - Tests: 16 new (7 YouTubeInput, 9 YouTubeVideoPlayer) - 171/171 total pass (zero regressions) - Updated plan: 3.4 marked Complete, 4/7 sub-phases done	2026-05-09 16:43:42 +08:00
Woody	3c9ed2cc8d	feat: Phase 3.3 — HLS manifest proxy with line-by-line rewriting - HLSProxyService: rewrite_manifest() rewrites segment/sub-manifest/EXT-X-KEY URIs to proxy URLs; proxy_segment() transparently proxies .ts segments - Route: upstream status checked before streaming — 502 on failure - CORS access-control-allow-origin: * on all responses - Line rewriting: pass-through tags/comments, rewrite URIs, handle relative/absolute URLs - URL resolution: urljoin for relative, absolute path, and absolute URL - 22 tests (8 line rewriting, 4 URL resolution, 3 proxy URL construction, 2 manifest integration, 1 segment proxying, 4 route integration) - 104/104 total pass (zero regressions)	2026-05-09 16:13:33 +08:00
Woody	284028bb1f	feat: Phase 3.1 + 3.2 — YouTube config infra and URL extraction Phase 3.1 — Configuration & Infrastructure: - Add youtube_proxy_enabled, yt_dlp_timeout, yt_dlp_cache_ttl config fields - Add yt-dlp and hls.js dependencies - Create models/youtube.py (request/response schemas) - Create service stubs (youtube_service, hls_proxy) - Create router stub and register in main.py - 11 config tests Phase 3.2 — YouTube URL Extraction: - yt-dlp wrapper with async extraction (run_in_executor) - Format selection: ≤480p video-only + highest-bitrate audio (VOD) - Combined format fallback: same URL for live streams - In-memory URL cache: 5min TTL live, 30min VOD - lru_cache singleton service for cache persistence - Error handling: DownloadError → 200 with error field - 18 extract tests, 82/82 total pass (zero regressions) Real-URL verified: VOD (5bF3tkO5jAA) 24 formats, Live (fN9uYWCjQaw) 6 HLS	2026-05-09 15:53:04 +08:00
Woody	09b5ea7d64	refactor: remove dead _merge_stash, add Phase 3 YouTube proxy plan - Remove _merge_stash (dead code since delta-based ASR refactor) - Replace TestMergeStash with TestTextFieldFormatting (53/53 Phase 2 tests pass) - Mark phase2_enhancement_use_text_field as Complete - Add Phase 3 YouTube live stream proxy implementation plan - README updates	2026-05-09 15:14:01 +08:00
Woody	c8d955c45c	fix: add ffmpeg, uploads volume to Docker deployment for Phase 2 - Dockerfile: install ffmpeg for video audio extraction, create /app/uploads - docker-compose.yml: add uploads_data volume mount - README: add uploads_data to volumes table	2026-05-07 11:32:09 +08:00
Woody	563ef263ed	docs: add DashScope API key to Docker prereqs, ffmpeg install guide, Phase 2 env vars	2026-05-07 11:30:30 +08:00
Woody	78d1f8cc91	feat: delta-based ASR transcript — use text field, utterance boundaries, stash on pause Replace full_text responses with character-level deltas computed from DashScope's monotonically-growing 'text' field. Stash-only events (empty text) are skipped; trailing stash chars sent alongside deltas and appended on pause to complete final sentences. Backend: - Delta = text[len(prev_text):] — simple suffix diff, no merge logic - Track item_id for utterance boundaries, prepend space separator - Send stash alongside delta for frontend pause handler Frontend: - Accumulate deltas locally (transcriptRef += msg.delta) - Store lastStashRef from each message - On pause: append stash to text, fire onFinalTranscript Plan: .plans/phase2_enhancement_delta_sse.md updated to Complete	2026-05-07 11:26:19 +08:00
Woody	cb0ac07786	fix: text accumulation — stashes are sliding windows, merge via overlap detection DashScope stashes are ~7-char rolling windows, not cumulative. Each partial event replaces the previous. Completed events rarely sent. This caused text to jump/replace during streaming and disappear on pause. Backend: - Add _merge_stash() — finds overlapping suffix between successive stashes and appends only new characters, reconstructing full utterance from partials - format_transcription_event returns raw stash for read_events to merge - read_events maintains partial_buffer via _merge_stash, clears on completed - Guard against empty/whitespace-only stashes Frontend: - transcriptRef + onFinalTranscriptRef avoid stale closures in pause handler - stopStreaming fires onFinalTranscript(currentText) before clearing partial - Removed blind setPartialTranscript('') that erased text on pause Tests: 16/16 ws_protocol tests pass, frontend tests unchanged Plan: Updated phase2_implementation_plan.md to Complete with 11-bug log	2026-05-06 20:06:39 +08:00
Woody	fcb9ec1f6c	fix: Phase 2 ASR pipeline — 9 bugs resolved, Full Transcript works end-to-end - Vite proxy: forward /api and /ws to backend port 8000 - WebSocket URL: use backend host, not Vite HMR port - LTTPage: callback ref replaces useRef (video element always null before) - ws_asr: pass DashScope API key to OmniRealtimeConversation - asr_client: fix data_url MIME type (audio/wav), omit extra_body when auto - useFullTranscript: use absolute URL prefix for fetch - QueryInput: add value prop for external Full Transcript injection - QueryInput: fix displayValue \|\| logic (partialText '' overrode question) - ffmpeg: install static binary for audio extraction - Integration tests: 7 tests (upload→transcribe flow) - Acceptance tests: real DashScope tests (skippable) - Structured logging: ws_asr.py + video.py	2026-05-06 18:26:17 +08:00
Woody	f3b94381ae	feat: Phase 2.5 video player, upload UI, and LTTPage layout refactor - VideoUpload: native drag-and-drop with axios progress bar, file validation - VideoPlayer: forwardRef wrapper for <video> element (used by useVideoASR) - LTTPage: replaced VideoPlaceholder, wired useVideoASR/useFullTranscript, Full Transcript button, resizable left/right panels (min 30%) - Tests: 25 new (VideoUpload 8, VideoPlayer 7, LTTPage integration 10)	2026-05-06 14:31:27 +08:00
Woody	a4e067822b	feat: Phase 2.3 ASR proxy + full transcript and 2.4 frontend hooks - Backend: DashScope WebSocket proxy (/ws/asr/{video_id}), DashScopeCallback sync-to-async bridge, ffmpeg audio extraction, POST /video/{id}/transcribe - Frontend: useVideoASR hook (auto on play), useFullTranscript hook, QueryInput partialText prop, VideoUploadResponse types, uploadVideo API - Tests: 41 backend + 26 frontend = 67 new tests, all passing	2026-05-06 13:41:24 +08:00
Woody	9934749d2b	feat: Phase 2.1 config + infrastructure and 2.2 video upload backend - Add DashScope ASR and video upload config fields to Settings - Create Pydantic models (video.py, asr.py) - Create VideoService with validation, save, serve, delete - Create ASR client stub with float32_to_s16le utility - Implement POST /api/v1/video/upload with streaming validation - Implement GET /api/v1/video/{video_id} with FileResponse - Create WebSocket ASR endpoint stub - Register new routers in main.py - Update .env.example and requirements.txt - Add reference examples for DashScope integration - 8 tests passing (3 config + 5 video upload)	2026-05-06 13:08:19 +08:00
Woody	63e4c1a385	docs: add plan for configurable SubQuestions format	2026-05-04 17:22:38 +08:00
Woody	76c3bec2ab	feat: configurable SubQuestions via Step 1.2 system prompt page - Split 'Step 1: Query Decomposition' into Step 1.1 (prompt template) and Step 1.2 (format config with description + max_length) - Add create_subquestions_model() and parse_decompose_format() to decompose.py - QueryDecomposer reads decompose_format from DB, creates dynamic Pydantic model at runtime - PromptEditor renders Step 1.2 as textarea (description) + number input (max_length 1-5) - Graceful fallback to static SubQuestions when decompose_format unavailable	2026-05-04 17:22:14 +08:00
Woody	40b338d3ca	chore: gitignore .research, switch to flash, tighten sub-questions Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-04 16:38:58 +08:00
Woody	5535b42ae2	refactor: tighten SubQuestions to 1-3 with Cantonese format hint Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-04 15:18:14 +08:00
Woody	df62283f58	feat: inject Pydantic JSON schema into Deepseek prompt (Phase 6) Follows Deepseek JSON Output guide: the prompt now includes the word 'json' and a format example derived from the Pydantic model schema. Added _pydantic_to_json_instruction() helper that builds a human-readable schema description with EXAMPLE JSON OUTPUT. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-04 15:17:24 +08:00
Woody	226f4ed700	test: update integration mocks for dual-client architecture (Phase 6) Added complete_structured() to mock classes, split response lists between LLMClientDP (decompose) and LLMClient (filter+generate), and patched both clients in all integration tests. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-04 14:59:23 +08:00
Woody	3b5bd79839	feat: wire LLMClientDP into query decompose pipeline (Phase 6) QueryDecomposer now uses LLMClientDP (Deepseek) while RelevanceFilter and RAGService continue using LLMClient (OpenRouter/vLLM). Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-04 14:59:08 +08:00
Woody	849beb4d4e	feat: add LLMClientDP for Deepseek decompose (Phase 6) Uses Deepseek's json_object response_format (not json_schema, which Deepseek does not support). Always disables thinking mode. Includes unit tests (12) and acceptance tests (5). Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-04 14:58:53 +08:00
Woody	73ae621f3b	feat: add Deepseek config fields and DI wiring (Phase 6) Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-04 14:58:39 +08:00
Woody	b6562f3d76	docs: add Package 6 enhancement plan Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-04 14:58:24 +08:00
Woody	23c665515d	fix: wrap filter chunks in XML tags for clearer LLM input	2026-04-30 13:59:03 +08:00
Woody	fc6b5463b5	fix: vLLM structured output missing thinking-control extra_body	2026-04-29 21:01:10 +08:00
Woody	16de8394aa	fix: add full input/output logging to vLLM structured output path Log the complete prompt, schema, extra_body content, full API response, token counts, and full parsed JSON output. Add exc_info=True tracebacks on all failure paths.	2026-04-29 16:52:26 +08:00
Woody	3ab6fd102a	fix: use vLLM-native guided_json for structured output vLLM servers support JSON schema enforcement via extra_body (guided_json or structured_outputs), not OpenAI's response_format protocol. LangChain's with_structured_output(method='json_schema') sends response_format which vLLM ignores, causing NoneType not iterable parsing errors. - vLLM path: direct OpenAI SDK call with extra_body={guided_json\|structured_outputs} - OpenRouter path: unchanged with_structured_output(method='json_schema') - Try new 'structured_outputs' format first, fall back to legacy 'guided_json' - Update _SEED_DECOMPOSE with explicit JSON array instruction - Add diagnostic logging: exc_info=True, schema preview, prompt template preview - Add logging in _parse_legacy_json for fallback failure debugging	2026-04-29 16:49:14 +08:00
Woody	2aca18d30e	docs: add vLLM structured output fix plan - Diagnose: vLLM ignores OpenAI-native response_format, causing NoneType error - Diagnose: legacy fallback prompt lacks JSON instruction → empty questions - Plan: use vLLM-native guided_json via extra_body instead of with_structured_output - Plan: update _SEED_DECOMPOSE with JSON format instruction - Plan: add diagnostic logging (exc_info, method, schema preview) wip: temporary function_calling switch for vLLM (to be replaced by guided_json)	2026-04-29 16:42:23 +08:00
Woody	cbb958d75d	fix: vLLM chat_template_kwargs breaks LangChain structured output vLLM's chat_template_kwargs leaked into LangChain's AsyncCompletions.parse() via _get_langchain_model's model_kwargs, causing structured decomposition to fail on vLLM backends. Skip vLLM-specific params when building the LangChain model — only provider-agnostic params (OpenAI reasoning) pass through.	2026-04-29 16:07:44 +08:00
Woody	90269608bc	fix: display highlight tracking data in history page UI - Add highlight_prompt, highlight_response, highlight_time_ms to QueryHistoryDetail type - Add 'Highlights' bar segment with pink color in TimingBar component - Pass highlightTimeMs to TimingBar in HistoryCard expanded view - Add collapsible sections for highlight prompt and response in HistoryCard detail	2026-04-29 13:42:08 +08:00
Woody	41f59b396f	feat: track highlight generation prompt, response, and timing in history (Phase 5.5) - Add 3 columns to query_history: highlight_prompt, highlight_response, highlight_time_ms - HistoryService.update_highlights() updates existing row after batch LLM call - ChunkHighlightService measures timing, captures prompt and structured JSON response - SSE completed event includes history_id for frontend to pass back - Frontend captures historyId, passes as ?history_id= query param in batch POST - Highlight time tracked separately (excluded from total_time_ms) - All 153 tests pass (108 backend + 45 frontend)	2026-04-29 11:18:21 +08:00
Woody	36dedab485	docs: finalize Phase 5 enhancement plan with completion status - Mark Phase 5.4 complete with actual commit log - Add Phase 5.4 completion checklist (15 items all checked) - Add production notes (Vite proxy, port conflicts, cache location) - Update test counts to current (108 backend, 45 frontend, 153 total) - Update Decision #12 to reflect inline citation link upgrade	2026-04-29 10:54:18 +08:00
Woody	523b27bb58	test: update batch URL assertion to match absolute backend URL	2026-04-29 10:42:18 +08:00
Woody	b47e37f39b	fix: use absolute backend URL for highlight API calls - Vite dev server doesn't proxy /api/v1/v2/ paths to backend - Changed fetch URL and getHighlightUrl to use http://localhost:8000 - Fixed inline citation highlight URLs in buildCitationUrl - Cleaned up debug code	2026-04-29 10:39:01 +08:00
Woody	bcf4a853bf	feat: add highlight status toast notification (Phase 5.4) - Shows 'Preparing highlights...' (amber spinner) while LLM batch runs - Shows 'Highlights ready' (green) for 4 seconds when batch completes - Fixed position top-left corner, auto-dismisses	2026-04-29 10:00:54 +08:00
Woody	1c490ce2fa	fix: inline citations now upgrade to highlighted view (Phase 5.4) - Added sub_question_text to frontend SourceMetadata type - SubQuestionSection enriches sources with parent sub-question text - buildCitationUrl routes to highlight page when sub_question_text present - processCitations threads highlightReadyKeys through inline citations	2026-04-29 09:54:40 +08:00
Woody	c632b9ea3b	feat: cited source extraction, background batch trigger, and View PDF link upgrade (Phase 5.4.6-5.4.8) - citationParser.ts: extractCitedSources() parses answer text for [citations], resolves against SourceMetadata, returns deduplicated cited sources - ResponsePanel.tsx: useEffect fires POST /api/v1/v2/highlights/batch after answer renders; View PDF link upgrades in-place to highlighted HTML when batch completes; stays as raw PDF on failure - Updated plan: LLM-based relevance detection, eager background computation, single batched LLM call, sqlite cache, regex sentence splitter - 45 frontend tests: 28 citationParser + 17 ResponsePanel (including 4 new sub-question highlight tests)	2026-04-29 09:27:04 +08:00
Woody	a56f8f69e2	feat: add highlight batch and GET endpoints (Phase 5.4.5) - POST /api/v1/v2/highlights/batch: compute and cache highlights for cited chunks - GET /api/v1/v2/highlights: serve cached highlighted HTML pages - chunks.py router registered in main.py - Dynamic DB path computation (prompts.db -> highlights.db), no Settings changes - 7 endpoint tests: POST 200/422, GET 200/404, mock service verification	2026-04-29 09:26:50 +08:00

1 2 3 4

183 Commits All Branches Search

183 Commits

All Branches