voice-to-notes

Author	SHA1	Message	Date
Claude	806586ae3d	Fix diarization performance for long files + better progress Some checks failed Build Sidecars / Bump sidecar version and tag (push) Successful in 11s Details Release / Bump version and tag (push) Successful in 10s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m0s Details Release / Build App (macOS) (push) Successful in 1m16s Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 17m34s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 28m9s Details - Cache loaded audio in _sf_load() — previously the entire WAV file was re-read from disk for every 10s crop call. For a 3-hour file with 1000+ chunks, this meant ~345GB of disk reads. Now read once, cached. - Better progress messages for long files: show elapsed time in m:ss format, warn "(180min audio, this may take a while)" for files >10min - Increased progress poll interval from 2s to 5s (less noise) - Better time estimate: use 0.8x audio duration (was 0.5x) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 06:24:21 -07:00
Claude	879a1f3fd6	Fix diarization tensor mismatch + fix sidecar build triggers All checks were successful Build Sidecars / Bump sidecar version and tag (push) Successful in 7s Details Release / Bump version and tag (push) Successful in 5s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m32s Details Release / Build App (macOS) (push) Successful in 1m16s Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 16m28s Details Release / Build App (Linux) (push) Successful in 4m26s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 33m5s Details Release / Build App (Windows) (push) Successful in 3m29s Details Diarization: Audio.crop patch now pads short segments with zeros to match the expected duration. pyannote batches embeddings with vstack which requires uniform tensor sizes — the last segment of a file can be shorter than the 10s window. CI: Reordered sidecar workflow to check for python/ changes FIRST, before bumping version or configuring git. All subsequent steps are gated on has_changes. This prevents unnecessary version bumps and build runs when only app code changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 18:30:43 -07:00
Claude	68524cbbd6	Also patch Audio.crop to fix diarization embedding extraction Some checks failed Build Sidecars / Bump sidecar version and tag (push) Successful in 4s Details Release / Bump version and tag (push) Successful in 3s Details Build Sidecars / Build Sidecar (Windows) (push) Has started running Details Build Sidecars / Build Sidecar (Linux) (push) Has been cancelled Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Release / Build App (macOS) (push) Has been cancelled Details Build Sidecars / Build Sidecar (macOS) (push) Has been cancelled Details The previous patch only replaced Audio.__call__ (segmentation), but pyannote also calls Audio.crop during speaker embedding extraction. crop loads a time segment of audio — patched to load full file via soundfile then slice the tensor to the requested time range. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 17:38:00 -07:00
Claude	f9226ee4d0	Fix diarization: use soundfile instead of torchaudio for audio loading Some checks failed Build Sidecars / Bump sidecar version and tag (push) Successful in 3s Details Release / Bump version and tag (push) Successful in 3s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m58s Details Release / Build App (macOS) (push) Successful in 1m20s Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m41s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 34m33s Details torchaudio 2.10 unconditionally delegates load() to torchcodec, ignoring the backend parameter. Since torchcodec is excluded from PyInstaller, this broke our pyannote Audio monkey-patch. Fix: replace torchaudio.load() with soundfile.read() + torch.from_numpy(). soundfile handles WAV natively (audio is pre-converted to WAV), has no torchcodec dependency, and is already a transitive dependency. Also added soundfile to PyInstaller hiddenimports. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 11:49:39 -07:00
Claude	2e7a5819bc	Fix CSP for blob URLs + fix pyannote AudioDecoder with torchaudio patch All checks were successful Build Sidecars / Bump sidecar version and tag (push) Successful in 4s Details Release / Bump version and tag (push) Successful in 3s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m25s Details Release / Build App (macOS) (push) Successful in 1m26s Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 14m31s Details Release / Build App (Linux) (push) Successful in 3m50s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 27m7s Details Release / Build App (Windows) (push) Successful in 3m26s Details CSP: Add blob: to connect-src/img-src/media-src for wavesurfer.js audio playback. Add http://tauri.localhost to default-src for devtools. pyannote: sys.modules block didn't work — pyannote still uses AudioDecoder unconditionally. New approach: monkey-patch Audio.__call__ in diarize.py to use torchaudio.load() directly, bypassing the broken torchcodec path. Patch runs once before pipeline loading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 10:59:54 -07:00
Claude	7efa3bb116	Fix CUDA fallback: gracefully fall back to CPU when CUDA libs missing Some checks failed Release / Bump version and tag (push) Successful in 18s Details Release / Build (macOS) (push) Successful in 5m27s Details Release / Build (Linux) (push) Successful in 11m38s Details Release / Build (Windows) (push) Has been cancelled Details - transcribe: catch model load failures on CUDA and retry with CPU - hardware detect: test CUDA runtime actually works (torch.zeros on cuda) before recommending GPU, since CPU-only builds detect CUDA via driver but lack cublas/cuDNN libraries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 05:36:40 -07:00
Claude	58faa83cb3	Cross-platform distribution, UI improvements, and performance optimizations - PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:33:43 -07:00
Claude	0771508203	Merge perf/chunked-transcription: chunk-based processing for large files	2026-03-20 13:54:14 -07:00
Claude	c23b9a90dd	Merge perf/diarize-threading: diarization progress via background thread	2026-03-20 13:52:59 -07:00
Claude	35af6e9e0c	Merge perf/progress-every-segment: emit progress for every segment	2026-03-20 13:52:18 -07:00
Claude	c3b6ad38fd	Merge perf/stream-segments: streaming partial transcript segments and speaker updates	2026-03-20 13:51:51 -07:00
Claude	03af5a189c	Run pyannote diarization in background thread with progress reporting Move the blocking pipeline() call to a daemon thread and emit estimated progress messages every 2 seconds from the main thread. The progress estimate uses audio duration to calibrate the expected total time. Also pass audio_duration_sec from PipelineService to DiarizeService. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:50:57 -07:00
Claude	16f4b57771	Add chunked transcription for large audio files (>1 hour) Split files >1 hour into 5-minute chunks via ffmpeg, transcribe each chunk independently, then merge results with corrected timestamps. Also add chunk-level progress markers every 10 segments for all files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:49:20 -07:00
Claude	6eb13bce63	Remove progress throttle so every segment emits a progress update Previously, progress messages were only sent every 5th segment due to a `segment_count % 5` guard. This made the UI feel unresponsive for short recordings with few segments. Now every segment emits a progress update with a more descriptive message including the segment number and audio percentage. Adds a test verifying that all 8 mock segments produce progress messages, not just every 5th. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:49:14 -07:00
Josh Knapp	67ed69df00	Stream transcript segments to frontend as they are transcribed Send each segment to the frontend immediately after transcription via a new pipeline.segment IPC message, then send speaker assignments as a batch pipeline.speaker_update message after diarization completes. This lets the UI display segments progressively instead of waiting for the entire pipeline to finish. Changes: - Add partial_segment_message and speaker_update_message IPC factories - Add on_segment callback parameter to TranscribeService.transcribe() - Emit partial segments and speaker updates from PipelineService.run() - Add send_and_receive_with_progress to SidecarManager (Rust) - Route pipeline.segment/speaker_update events in run_pipeline command - Listen for streaming events in Svelte frontend (+page.svelte) - Add tests for new message types, callback signature, and update logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:47:57 -07:00
Josh Knapp	585411f402	Fix speaker diarization: WAV conversion, pyannote 4.0 compat, telemetry bug - Convert non-WAV audio to 16kHz mono WAV before diarization (pyannote v4.0.4 AudioDecoder returns None duration for FLAC, causing crash) - Handle pyannote 4.0 DiarizeOutput return type (unwrap .speaker_diarization) - Disable pyannote telemetry (np.isfinite(None) bug with max_speakers) - Use huggingface_hub.login() to persist token for all sub-downloads - Pre-download sub-models (segmentation-3.0, speaker-diarization-community-1) - Add third required model license link in settings UI - Improve SpeakerManager hints based on settings state - Add word-wrap to transcript text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 19:46:07 -08:00
Josh Knapp	baf820286f	Add HuggingFace token setting for speaker detection - Add "Speakers" tab in Settings with HF token input field - Include step-by-step instructions for obtaining the token - Pass hf_token from settings through Rust → Python pipeline → diarize - Token can also be set via HF_TOKEN environment variable as fallback - Move skip_diarization checkbox to Speakers tab Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:08:51 -08:00
Josh Knapp	ed626b8ba0	Fix progress overlay, play-from-position, layout cutoff, speaker info - Replace progress bar with task checklist showing pipeline steps (load model, transcribe, load diarization, identify speakers, merge) - Fix WaveformPlayer: track ready state, disable controls until loaded, play from current position instead of resetting to start - Fix workspace height calc to prevent bottom content cutoff - Show HF_TOKEN setup hint in SpeakerManager when no speakers detected - Add console logging for progress events to aid debugging Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:02:48 -08:00
Josh Knapp	87b3ad94f9	Improve import UX: progress overlay, pyannote fix, debug logging - Enhanced ProgressOverlay with spinner, better styling, and z-index 9999 - Import button shows "Processing..." with pulse animation while transcribing - Fix pyannote API: use token= instead of deprecated use_auth_token= - Read HF_TOKEN from environment for pyannote model download - Add console logging for click-to-seek debugging - Add color-scheme: dark for native form controls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:43:49 -08:00
Josh Knapp	669d88f143	Fix progress feedback, diarization fallback, and dropdown readability - Stream pipeline progress to frontend via Tauri events so the progress overlay updates in real time during transcription/diarization - Gracefully fall back to transcription-only when diarization fails (e.g. pyannote not installed) instead of erroring the whole pipeline - Add color-scheme: dark to fix native select/option elements rendering with unreadable white backgrounds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:14:25 -08:00
Josh Knapp	d67625cd5a	Phase 5: AI provider system with local and cloud support - Implement AIProvider base interface with chat() and is_available() - Add LocalProvider connecting to bundled llama-server via OpenAI SDK - Add OpenAIProvider for direct OpenAI API access - Add AnthropicProvider for Anthropic Claude API - Add LiteLLMProvider for multi-provider gateway - Build AIProviderService with provider routing, auto-selection, and transcript context injection - Add ai.chat IPC handler supporting chat, list_providers, set_provider, and configure actions - Add ai_chat, ai_list_providers, ai_configure Tauri commands - Build interactive AIChatPanel with message history, quick actions (Summarize, Action Items), and transcript context awareness - Tests: 30 Python, 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:25:10 -08:00
Josh Knapp	415a648a2b	Phase 4: Export to SRT, WebVTT, ASS, plain text, and Markdown - Implement ExportService using pysubs2 for caption formats (SRT, VTT, ASS) and custom formatters for plain text and Markdown - SRT exports with [Speaker]: prefix, WebVTT with <v Speaker> voice tags, ASS with color-coded speaker styles - Plain text groups by speaker with labels, Markdown adds timestamps - Add export.start IPC handler and export_transcript Tauri command - Add export dropdown menu in header (appears after transcription) - Uses native save dialog for output file selection - Add pysubs2 dependency - Tests: 30 Python (6 export tests), 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:18:54 -08:00
Josh Knapp	44480906a4	Phase 3: Speaker diarization and full transcription pipeline - Implement DiarizeService with pyannote.audio speaker detection - Build PipelineService combining transcribe → diarize → merge with overlap-based speaker assignment per segment - Add pipeline.start and diarize.start IPC handlers - Add run_pipeline Tauri command for full pipeline execution - Wire frontend to use pipeline: speakers auto-created with colors, segments assigned to detected speakers - Build SpeakerManager with rename support (double-click or edit button) - Add speaker color coding throughout transcript display - Add pyannote.audio dependency - Tests: 24 Python (including merge logic), 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:09:48 -08:00
Josh Knapp	48fe41b064	Phase 2: Core transcription pipeline and audio playback - Implement faster-whisper TranscribeService with word-level timestamps, progress reporting, and hardware auto-detection - Wire up Rust SidecarManager for Python process lifecycle (spawn, IPC, shutdown) - Add transcribe_file Tauri command bridging frontend to Python sidecar - Integrate wavesurfer.js WaveformPlayer with play/pause, skip, seek controls - Build TranscriptEditor with word-level click-to-seek and active highlighting - Connect file import flow: prompt → asset load → transcribe → display - Add typed tauri-bridge service with TranscriptionResult interface - Add Python tests for hardware detection and transcription result formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:53:09 -08:00
Josh Knapp	503cc6c0cf	Phase 1 foundation: Tauri shell, Python sidecar, SQLite database Tauri v2 + Svelte + TypeScript frontend: - App shell with workspace layout (waveform, transcript, speakers, AI chat) - Placeholder components for all major UI areas - Typed stores (project, transcript, playback, AI) - TypeScript interfaces matching the database schema - Tauri bridge service with typed invoke wrappers - svelte-check passes with 0 errors Rust backend: - Tauri v2 app entry point with command registration - SQLite database layer (rusqlite with bundled SQLite) - Full schema: projects, media_files, speakers, segments, words, ai_outputs, annotations (with indexes) - Model structs with serde serialization - CRUD queries for projects, speakers, segments, words - Segment text editing preserves original text - Schema versioning for future migrations - 6 tests passing - Command stubs for project, transcribe, export, AI, settings, system - App state management Python sidecar: - JSON-line IPC protocol (stdin/stdout) - Message types: IPCMessage, progress, error, ready - Handler registry with routing and error handling - Ping/pong handler for connectivity testing - Service stubs: transcribe, diarize, pipeline, AI, export - Provider stubs: local (llama-server), OpenAI, Anthropic, LiteLLM - Hardware detection stubs - 14 tests passing, ruff clean Also adds: - Testing strategy document (docs/TESTING.md) - Validation script (scripts/validate.sh) - Updated .gitignore for Svelte, Rust, Python artifacts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:16:06 -08:00

25 Commits