voice-to-notes

Author	SHA1	Message	Date
Claude	0771508203	Merge perf/chunked-transcription: chunk-based processing for large files	2026-03-20 13:54:14 -07:00
Claude	c23b9a90dd	Merge perf/diarize-threading: diarization progress via background thread	2026-03-20 13:52:59 -07:00
Claude	35af6e9e0c	Merge perf/progress-every-segment: emit progress for every segment	2026-03-20 13:52:18 -07:00
Claude	c3b6ad38fd	Merge perf/stream-segments: streaming partial transcript segments and speaker updates	2026-03-20 13:51:51 -07:00
Claude	03af5a189c	Run pyannote diarization in background thread with progress reporting Move the blocking pipeline() call to a daemon thread and emit estimated progress messages every 2 seconds from the main thread. The progress estimate uses audio duration to calibrate the expected total time. Also pass audio_duration_sec from PipelineService to DiarizeService. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:50:57 -07:00
Claude	16f4b57771	Add chunked transcription for large audio files (>1 hour) Split files >1 hour into 5-minute chunks via ffmpeg, transcribe each chunk independently, then merge results with corrected timestamps. Also add chunk-level progress markers every 10 segments for all files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:49:20 -07:00
Claude	6eb13bce63	Remove progress throttle so every segment emits a progress update Previously, progress messages were only sent every 5th segment due to a `segment_count % 5` guard. This made the UI feel unresponsive for short recordings with few segments. Now every segment emits a progress update with a more descriptive message including the segment number and audio percentage. Adds a test verifying that all 8 mock segments produce progress messages, not just every 5th. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:49:14 -07:00
Josh Knapp	67ed69df00	Stream transcript segments to frontend as they are transcribed Send each segment to the frontend immediately after transcription via a new pipeline.segment IPC message, then send speaker assignments as a batch pipeline.speaker_update message after diarization completes. This lets the UI display segments progressively instead of waiting for the entire pipeline to finish. Changes: - Add partial_segment_message and speaker_update_message IPC factories - Add on_segment callback parameter to TranscribeService.transcribe() - Emit partial segments and speaker updates from PipelineService.run() - Add send_and_receive_with_progress to SidecarManager (Rust) - Route pipeline.segment/speaker_update events in run_pipeline command - Listen for streaming events in Svelte frontend (+page.svelte) - Add tests for new message types, callback signature, and update logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:47:57 -07:00
Josh Knapp	585411f402	Fix speaker diarization: WAV conversion, pyannote 4.0 compat, telemetry bug - Convert non-WAV audio to 16kHz mono WAV before diarization (pyannote v4.0.4 AudioDecoder returns None duration for FLAC, causing crash) - Handle pyannote 4.0 DiarizeOutput return type (unwrap .speaker_diarization) - Disable pyannote telemetry (np.isfinite(None) bug with max_speakers) - Use huggingface_hub.login() to persist token for all sub-downloads - Pre-download sub-models (segmentation-3.0, speaker-diarization-community-1) - Add third required model license link in settings UI - Improve SpeakerManager hints based on settings state - Add word-wrap to transcript text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 19:46:07 -08:00
Josh Knapp	a3612c986d	Add Test & Download button for diarization model, clickable links - Add diarize.download IPC handler that downloads the pyannote model and returns user-friendly error messages (missing license, bad token) - Add download_diarize_model Tauri command - Add "Test & Download Model" button in Speakers settings tab - Update instructions to list both required model licenses (speaker-diarization-3.1 AND segmentation-3.0) - Make all HuggingFace URLs clickable (opens in system browser) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:21:42 -08:00
Josh Knapp	baf820286f	Add HuggingFace token setting for speaker detection - Add "Speakers" tab in Settings with HF token input field - Include step-by-step instructions for obtaining the token - Pass hf_token from settings through Rust → Python pipeline → diarize - Token can also be set via HF_TOKEN environment variable as fallback - Move skip_diarization checkbox to Speakers tab Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:08:51 -08:00
Josh Knapp	ed626b8ba0	Fix progress overlay, play-from-position, layout cutoff, speaker info - Replace progress bar with task checklist showing pipeline steps (load model, transcribe, load diarization, identify speakers, merge) - Fix WaveformPlayer: track ready state, disable controls until loaded, play from current position instead of resetting to start - Fix workspace height calc to prevent bottom content cutoff - Show HF_TOKEN setup hint in SpeakerManager when no speakers detected - Add console logging for progress events to aid debugging Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:02:48 -08:00
Josh Knapp	4d7b9d524f	Fix IPC stdout corruption, dark window background, overlay timing - Redirect sys.stdout to stderr in Python sidecar so library print() calls don't corrupt the JSON-line IPC stream - Save real stdout fd for exclusive IPC use via init_ipc() - Skip non-JSON lines in Rust reader instead of failing with parse error - Set Tauri window background color to match dark theme (#0a0a23) - Add inline dark background on html/body to prevent white flash - Use Svelte tick() to ensure progress overlay renders before invoke - Improve ProgressOverlay with spinner, better styling, z-index 9999 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:50:55 -08:00
Josh Knapp	87b3ad94f9	Improve import UX: progress overlay, pyannote fix, debug logging - Enhanced ProgressOverlay with spinner, better styling, and z-index 9999 - Import button shows "Processing..." with pulse animation while transcribing - Fix pyannote API: use token= instead of deprecated use_auth_token= - Read HF_TOKEN from environment for pyannote model download - Add console logging for click-to-seek debugging - Add color-scheme: dark for native form controls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:43:49 -08:00
Josh Knapp	669d88f143	Fix progress feedback, diarization fallback, and dropdown readability - Stream pipeline progress to frontend via Tauri events so the progress overlay updates in real time during transcription/diarization - Gracefully fall back to transcription-only when diarization fails (e.g. pyannote not installed) instead of erroring the whole pipeline - Add color-scheme: dark to fix native select/option elements rendering with unreadable white backgrounds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:14:25 -08:00
Josh Knapp	d67625cd5a	Phase 5: AI provider system with local and cloud support - Implement AIProvider base interface with chat() and is_available() - Add LocalProvider connecting to bundled llama-server via OpenAI SDK - Add OpenAIProvider for direct OpenAI API access - Add AnthropicProvider for Anthropic Claude API - Add LiteLLMProvider for multi-provider gateway - Build AIProviderService with provider routing, auto-selection, and transcript context injection - Add ai.chat IPC handler supporting chat, list_providers, set_provider, and configure actions - Add ai_chat, ai_list_providers, ai_configure Tauri commands - Build interactive AIChatPanel with message history, quick actions (Summarize, Action Items), and transcript context awareness - Tests: 30 Python, 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:25:10 -08:00
Josh Knapp	415a648a2b	Phase 4: Export to SRT, WebVTT, ASS, plain text, and Markdown - Implement ExportService using pysubs2 for caption formats (SRT, VTT, ASS) and custom formatters for plain text and Markdown - SRT exports with [Speaker]: prefix, WebVTT with <v Speaker> voice tags, ASS with color-coded speaker styles - Plain text groups by speaker with labels, Markdown adds timestamps - Add export.start IPC handler and export_transcript Tauri command - Add export dropdown menu in header (appears after transcription) - Uses native save dialog for output file selection - Add pysubs2 dependency - Tests: 30 Python (6 export tests), 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:18:54 -08:00
Josh Knapp	44480906a4	Phase 3: Speaker diarization and full transcription pipeline - Implement DiarizeService with pyannote.audio speaker detection - Build PipelineService combining transcribe → diarize → merge with overlap-based speaker assignment per segment - Add pipeline.start and diarize.start IPC handlers - Add run_pipeline Tauri command for full pipeline execution - Wire frontend to use pipeline: speakers auto-created with colors, segments assigned to detected speakers - Build SpeakerManager with rename support (double-click or edit button) - Add speaker color coding throughout transcript display - Add pyannote.audio dependency - Tests: 24 Python (including merge logic), 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:09:48 -08:00
Josh Knapp	48fe41b064	Phase 2: Core transcription pipeline and audio playback - Implement faster-whisper TranscribeService with word-level timestamps, progress reporting, and hardware auto-detection - Wire up Rust SidecarManager for Python process lifecycle (spawn, IPC, shutdown) - Add transcribe_file Tauri command bridging frontend to Python sidecar - Integrate wavesurfer.js WaveformPlayer with play/pause, skip, seek controls - Build TranscriptEditor with word-level click-to-seek and active highlighting - Connect file import flow: prompt → asset load → transcribe → display - Add typed tauri-bridge service with TranscriptionResult interface - Add Python tests for hardware detection and transcription result formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:53:09 -08:00
Josh Knapp	503cc6c0cf	Phase 1 foundation: Tauri shell, Python sidecar, SQLite database Tauri v2 + Svelte + TypeScript frontend: - App shell with workspace layout (waveform, transcript, speakers, AI chat) - Placeholder components for all major UI areas - Typed stores (project, transcript, playback, AI) - TypeScript interfaces matching the database schema - Tauri bridge service with typed invoke wrappers - svelte-check passes with 0 errors Rust backend: - Tauri v2 app entry point with command registration - SQLite database layer (rusqlite with bundled SQLite) - Full schema: projects, media_files, speakers, segments, words, ai_outputs, annotations (with indexes) - Model structs with serde serialization - CRUD queries for projects, speakers, segments, words - Segment text editing preserves original text - Schema versioning for future migrations - 6 tests passing - Command stubs for project, transcribe, export, AI, settings, system - App state management Python sidecar: - JSON-line IPC protocol (stdin/stdout) - Message types: IPCMessage, progress, error, ready - Handler registry with routing and error handling - Ping/pong handler for connectivity testing - Service stubs: transcribe, diarize, pipeline, AI, export - Provider stubs: local (llama-server), OpenAI, Anthropic, LiteLLM - Hardware detection stubs - 14 tests passing, ruff clean Also adds: - Testing strategy document (docs/TESTING.md) - Validation script (scripts/validate.sh) - Updated .gitignore for Svelte, Rust, Python artifacts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:16:06 -08:00

20 Commits