Commit Graph

20 Commits

Author SHA1 Message Date
Claude
0771508203 Merge perf/chunked-transcription: chunk-based processing for large files 2026-03-20 13:54:14 -07:00
Claude
c23b9a90dd Merge perf/diarize-threading: diarization progress via background thread 2026-03-20 13:52:59 -07:00
Claude
35af6e9e0c Merge perf/progress-every-segment: emit progress for every segment 2026-03-20 13:52:18 -07:00
Claude
c3b6ad38fd Merge perf/stream-segments: streaming partial transcript segments and speaker updates 2026-03-20 13:51:51 -07:00
Claude
03af5a189c Run pyannote diarization in background thread with progress reporting
Move the blocking pipeline() call to a daemon thread and emit estimated
progress messages every 2 seconds from the main thread. The progress
estimate uses audio duration to calibrate the expected total time.
Also pass audio_duration_sec from PipelineService to DiarizeService.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 13:50:57 -07:00
Claude
16f4b57771 Add chunked transcription for large audio files (>1 hour)
Split files >1 hour into 5-minute chunks via ffmpeg, transcribe each
chunk independently, then merge results with corrected timestamps.
Also add chunk-level progress markers every 10 segments for all files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 13:49:20 -07:00
Claude
6eb13bce63 Remove progress throttle so every segment emits a progress update
Previously, progress messages were only sent every 5th segment due to
a `segment_count % 5` guard. This made the UI feel unresponsive for
short recordings with few segments. Now every segment emits a progress
update with a more descriptive message including the segment number
and audio percentage.

Adds a test verifying that all 8 mock segments produce progress
messages, not just every 5th.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 13:49:14 -07:00
67ed69df00 Stream transcript segments to frontend as they are transcribed
Send each segment to the frontend immediately after transcription via
a new pipeline.segment IPC message, then send speaker assignments as a
batch pipeline.speaker_update message after diarization completes. This
lets the UI display segments progressively instead of waiting for the
entire pipeline to finish.

Changes:
- Add partial_segment_message and speaker_update_message IPC factories
- Add on_segment callback parameter to TranscribeService.transcribe()
- Emit partial segments and speaker updates from PipelineService.run()
- Add send_and_receive_with_progress to SidecarManager (Rust)
- Route pipeline.segment/speaker_update events in run_pipeline command
- Listen for streaming events in Svelte frontend (+page.svelte)
- Add tests for new message types, callback signature, and update logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 13:47:57 -07:00
585411f402 Fix speaker diarization: WAV conversion, pyannote 4.0 compat, telemetry bug
- Convert non-WAV audio to 16kHz mono WAV before diarization (pyannote
  v4.0.4 AudioDecoder returns None duration for FLAC, causing crash)
- Handle pyannote 4.0 DiarizeOutput return type (unwrap .speaker_diarization)
- Disable pyannote telemetry (np.isfinite(None) bug with max_speakers)
- Use huggingface_hub.login() to persist token for all sub-downloads
- Pre-download sub-models (segmentation-3.0, speaker-diarization-community-1)
- Add third required model license link in settings UI
- Improve SpeakerManager hints based on settings state
- Add word-wrap to transcript text

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 19:46:07 -08:00
a3612c986d Add Test & Download button for diarization model, clickable links
- Add diarize.download IPC handler that downloads the pyannote model
  and returns user-friendly error messages (missing license, bad token)
- Add download_diarize_model Tauri command
- Add "Test & Download Model" button in Speakers settings tab
- Update instructions to list both required model licenses
  (speaker-diarization-3.1 AND segmentation-3.0)
- Make all HuggingFace URLs clickable (opens in system browser)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:21:42 -08:00
baf820286f Add HuggingFace token setting for speaker detection
- Add "Speakers" tab in Settings with HF token input field
- Include step-by-step instructions for obtaining the token
- Pass hf_token from settings through Rust → Python pipeline → diarize
- Token can also be set via HF_TOKEN environment variable as fallback
- Move skip_diarization checkbox to Speakers tab

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:08:51 -08:00
ed626b8ba0 Fix progress overlay, play-from-position, layout cutoff, speaker info
- Replace progress bar with task checklist showing pipeline steps
  (load model, transcribe, load diarization, identify speakers, merge)
- Fix WaveformPlayer: track ready state, disable controls until loaded,
  play from current position instead of resetting to start
- Fix workspace height calc to prevent bottom content cutoff
- Show HF_TOKEN setup hint in SpeakerManager when no speakers detected
- Add console logging for progress events to aid debugging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:02:48 -08:00
4d7b9d524f Fix IPC stdout corruption, dark window background, overlay timing
- Redirect sys.stdout to stderr in Python sidecar so library print()
  calls don't corrupt the JSON-line IPC stream
- Save real stdout fd for exclusive IPC use via init_ipc()
- Skip non-JSON lines in Rust reader instead of failing with parse error
- Set Tauri window background color to match dark theme (#0a0a23)
- Add inline dark background on html/body to prevent white flash
- Use Svelte tick() to ensure progress overlay renders before invoke
- Improve ProgressOverlay with spinner, better styling, z-index 9999

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 17:50:55 -08:00
87b3ad94f9 Improve import UX: progress overlay, pyannote fix, debug logging
- Enhanced ProgressOverlay with spinner, better styling, and z-index 9999
- Import button shows "Processing..." with pulse animation while transcribing
- Fix pyannote API: use token= instead of deprecated use_auth_token=
- Read HF_TOKEN from environment for pyannote model download
- Add console logging for click-to-seek debugging
- Add color-scheme: dark for native form controls

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 17:43:49 -08:00
669d88f143 Fix progress feedback, diarization fallback, and dropdown readability
- Stream pipeline progress to frontend via Tauri events so the progress
  overlay updates in real time during transcription/diarization
- Gracefully fall back to transcription-only when diarization fails
  (e.g. pyannote not installed) instead of erroring the whole pipeline
- Add color-scheme: dark to fix native select/option elements rendering
  with unreadable white backgrounds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 17:14:25 -08:00
d67625cd5a Phase 5: AI provider system with local and cloud support
- Implement AIProvider base interface with chat() and is_available()
- Add LocalProvider connecting to bundled llama-server via OpenAI SDK
- Add OpenAIProvider for direct OpenAI API access
- Add AnthropicProvider for Anthropic Claude API
- Add LiteLLMProvider for multi-provider gateway
- Build AIProviderService with provider routing, auto-selection,
  and transcript context injection
- Add ai.chat IPC handler supporting chat, list_providers, set_provider,
  and configure actions
- Add ai_chat, ai_list_providers, ai_configure Tauri commands
- Build interactive AIChatPanel with message history, quick actions
  (Summarize, Action Items), and transcript context awareness
- Tests: 30 Python, 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:25:10 -08:00
415a648a2b Phase 4: Export to SRT, WebVTT, ASS, plain text, and Markdown
- Implement ExportService using pysubs2 for caption formats (SRT, VTT, ASS)
  and custom formatters for plain text and Markdown
- SRT exports with [Speaker]: prefix, WebVTT with <v Speaker> voice tags,
  ASS with color-coded speaker styles
- Plain text groups by speaker with labels, Markdown adds timestamps
- Add export.start IPC handler and export_transcript Tauri command
- Add export dropdown menu in header (appears after transcription)
- Uses native save dialog for output file selection
- Add pysubs2 dependency
- Tests: 30 Python (6 export tests), 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:18:54 -08:00
44480906a4 Phase 3: Speaker diarization and full transcription pipeline
- Implement DiarizeService with pyannote.audio speaker detection
- Build PipelineService combining transcribe → diarize → merge with
  overlap-based speaker assignment per segment
- Add pipeline.start and diarize.start IPC handlers
- Add run_pipeline Tauri command for full pipeline execution
- Wire frontend to use pipeline: speakers auto-created with colors,
  segments assigned to detected speakers
- Build SpeakerManager with rename support (double-click or edit button)
- Add speaker color coding throughout transcript display
- Add pyannote.audio dependency
- Tests: 24 Python (including merge logic), 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:09:48 -08:00
48fe41b064 Phase 2: Core transcription pipeline and audio playback
- Implement faster-whisper TranscribeService with word-level timestamps,
  progress reporting, and hardware auto-detection
- Wire up Rust SidecarManager for Python process lifecycle (spawn, IPC, shutdown)
- Add transcribe_file Tauri command bridging frontend to Python sidecar
- Integrate wavesurfer.js WaveformPlayer with play/pause, skip, seek controls
- Build TranscriptEditor with word-level click-to-seek and active highlighting
- Connect file import flow: prompt → asset load → transcribe → display
- Add typed tauri-bridge service with TranscriptionResult interface
- Add Python tests for hardware detection and transcription result formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 15:53:09 -08:00
503cc6c0cf Phase 1 foundation: Tauri shell, Python sidecar, SQLite database
Tauri v2 + Svelte + TypeScript frontend:
- App shell with workspace layout (waveform, transcript, speakers, AI chat)
- Placeholder components for all major UI areas
- Typed stores (project, transcript, playback, AI)
- TypeScript interfaces matching the database schema
- Tauri bridge service with typed invoke wrappers
- svelte-check passes with 0 errors

Rust backend:
- Tauri v2 app entry point with command registration
- SQLite database layer (rusqlite with bundled SQLite)
  - Full schema: projects, media_files, speakers, segments, words,
    ai_outputs, annotations (with indexes)
  - Model structs with serde serialization
  - CRUD queries for projects, speakers, segments, words
  - Segment text editing preserves original text
  - Schema versioning for future migrations
  - 6 tests passing
- Command stubs for project, transcribe, export, AI, settings, system
- App state management

Python sidecar:
- JSON-line IPC protocol (stdin/stdout)
- Message types: IPCMessage, progress, error, ready
- Handler registry with routing and error handling
- Ping/pong handler for connectivity testing
- Service stubs: transcribe, diarize, pipeline, AI, export
- Provider stubs: local (llama-server), OpenAI, Anthropic, LiteLLM
- Hardware detection stubs
- 14 tests passing, ruff clean

Also adds:
- Testing strategy document (docs/TESTING.md)
- Validation script (scripts/validate.sh)
- Updated .gitignore for Svelte, Rust, Python artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 15:16:06 -08:00