voice-to-notes

Author	SHA1	Message	Date
Claude	33ca3e4a28	Show chunk context in transcription progress for large files Build Sidecars / Bump sidecar version and tag (push) Successful in 3s Details Release / Bump version and tag (push) Successful in 3s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 8m30s Details Release / Build App (macOS) (push) Successful in 1m19s Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 12m9s Details Release / Build App (Linux) (push) Successful in 3m36s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 29m36s Details Release / Build App (Windows) (push) Successful in 3m13s Details Files >1 hour are split into 5-minute chunks. Previously each chunk showed "Starting transcription..." making it look like a restart. Now shows "Chunk 3/12: Starting transcription..." and "Chunk 3/12: Transcribing segment 5 (42% of audio)..." Also skips the "Loading model..." message for chunks after the first since the model is already loaded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 07:57:59 -07:00
Claude	806586ae3d	Fix diarization performance for long files + better progress Build Sidecars / Bump sidecar version and tag (push) Successful in 11s Details Release / Bump version and tag (push) Successful in 10s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m0s Details Release / Build App (macOS) (push) Successful in 1m16s Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 17m34s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 28m9s Details - Cache loaded audio in _sf_load() — previously the entire WAV file was re-read from disk for every 10s crop call. For a 3-hour file with 1000+ chunks, this meant ~345GB of disk reads. Now read once, cached. - Better progress messages for long files: show elapsed time in m:ss format, warn "(180min audio, this may take a while)" for files >10min - Increased progress poll interval from 2s to 5s (less noise) - Better time estimate: use 0.8x audio duration (was 0.5x) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 06:24:21 -07:00
Claude	ca5dc98d24	Fix Ollama: set_active after configure + fix default URL Build Sidecars / Bump sidecar version and tag (push) Successful in 5s Details Release / Bump version and tag (push) Successful in 5s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m35s Details Release / Build App (macOS) (push) Successful in 1m18s Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 16m56s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 37m0s Details The configure action registered the provider but never called set_active(), so the sidecar kept using the old/default provider. Also updated the local provider default from localhost:8080 to localhost:11434/v1 (Ollama). Added debug logging for configure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 19:55:09 -07:00
Claude	879a1f3fd6	Fix diarization tensor mismatch + fix sidecar build triggers Build Sidecars / Bump sidecar version and tag (push) Successful in 7s Details Release / Bump version and tag (push) Successful in 5s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m32s Details Release / Build App (macOS) (push) Successful in 1m16s Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 16m28s Details Release / Build App (Linux) (push) Successful in 4m26s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 33m5s Details Release / Build App (Windows) (push) Successful in 3m29s Details Diarization: Audio.crop patch now pads short segments with zeros to match the expected duration. pyannote batches embeddings with vstack which requires uniform tensor sizes — the last segment of a file can be shorter than the 10s window. CI: Reordered sidecar workflow to check for python/ changes FIRST, before bumping version or configuring git. All subsequent steps are gated on has_changes. This prevents unnecessary version bumps and build runs when only app code changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 18:30:43 -07:00
Claude	425e3c2b7c	Fix Ollama connection: remove double /v1 in URL Build Sidecars / Bump sidecar version and tag (push) Successful in 3s Details Release / Bump version and tag (push) Successful in 3s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 5m16s Details Release / Build App (macOS) (push) Successful in 1m19s Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m55s Details Release / Build App (Linux) (push) Successful in 4m1s Details Release / Build App (Windows) (push) Has been cancelled Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 33m38s Details base_url was being set to 'http://localhost:11434/v1' by the frontend, then LocalProvider appended another '/v1', resulting in '/v1/v1'. Now the provider uses base_url directly (frontend already appends /v1). Also fixed health check to hit Ollama root instead of /health. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 17:41:46 -07:00
Claude	68524cbbd6	Also patch Audio.crop to fix diarization embedding extraction Build Sidecars / Bump sidecar version and tag (push) Successful in 4s Details Release / Bump version and tag (push) Successful in 3s Details Build Sidecars / Build Sidecar (Windows) (push) Has started running Details Build Sidecars / Build Sidecar (Linux) (push) Has been cancelled Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Release / Build App (macOS) (push) Has been cancelled Details Build Sidecars / Build Sidecar (macOS) (push) Has been cancelled Details The previous patch only replaced Audio.__call__ (segmentation), but pyannote also calls Audio.crop during speaker embedding extraction. crop loads a time segment of audio — patched to load full file via soundfile then slice the tensor to the requested time range. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 17:38:00 -07:00
Claude	f9226ee4d0	Fix diarization: use soundfile instead of torchaudio for audio loading Build Sidecars / Bump sidecar version and tag (push) Successful in 3s Details Release / Bump version and tag (push) Successful in 3s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m58s Details Release / Build App (macOS) (push) Successful in 1m20s Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m41s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 34m33s Details torchaudio 2.10 unconditionally delegates load() to torchcodec, ignoring the backend parameter. Since torchcodec is excluded from PyInstaller, this broke our pyannote Audio monkey-patch. Fix: replace torchaudio.load() with soundfile.read() + torch.from_numpy(). soundfile handles WAV natively (audio is pre-converted to WAV), has no torchcodec dependency, and is already a transitive dependency. Also added soundfile to PyInstaller hiddenimports. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 11:49:39 -07:00
Claude	2e7a5819bc	Fix CSP for blob URLs + fix pyannote AudioDecoder with torchaudio patch Build Sidecars / Bump sidecar version and tag (push) Successful in 4s Details Release / Bump version and tag (push) Successful in 3s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m25s Details Release / Build App (macOS) (push) Successful in 1m26s Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 14m31s Details Release / Build App (Linux) (push) Successful in 3m50s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 27m7s Details Release / Build App (Windows) (push) Successful in 3m26s Details CSP: Add blob: to connect-src/img-src/media-src for wavesurfer.js audio playback. Add http://tauri.localhost to default-src for devtools. pyannote: sys.modules block didn't work — pyannote still uses AudioDecoder unconditionally. New approach: monkey-patch Audio.__call__ in diarize.py to use torchaudio.load() directly, bypassing the broken torchcodec path. Patch runs once before pipeline loading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 10:59:54 -07:00
Claude	db770c341d	Fix CSP blocking IPC/assets + fix pyannote AudioDecoder crash Build Sidecars / Bump sidecar version and tag (push) Successful in 9s Details Release / Bump version and tag (push) Successful in 5s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m37s Details Release / Build App (macOS) (push) Successful in 1m16s Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 14m3s Details Release / Build App (Linux) (push) Successful in 4m45s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 24m32s Details Release / Build App (Windows) (push) Successful in 3m12s Details CSP: Add connect-src for ipc.localhost and asset.localhost so Tauri IPC commands and local file loading (waveform, audio playback) work. pyannote: Block torchcodec in sys.modules at startup so pyannote.audio falls back to torchaudio for audio decoding. pyannote has a bug where it uses AudioDecoder unconditionally even when torchcodec import fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 09:54:21 -07:00
Claude	7efa3bb116	Fix CUDA fallback: gracefully fall back to CPU when CUDA libs missing Release / Bump version and tag (push) Successful in 18s Details Release / Build (macOS) (push) Successful in 5m27s Details Release / Build (Linux) (push) Successful in 11m38s Details Release / Build (Windows) (push) Has been cancelled Details - transcribe: catch model load failures on CUDA and retry with CPU - hardware detect: test CUDA runtime actually works (torch.zeros on cuda) before recommending GPU, since CPU-only builds detect CUDA via driver but lack cublas/cuDNN libraries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 05:36:40 -07:00
Claude	4fed9bccb8	Fix sidecar crash: torch circular import under PyInstaller Release / Bump version and tag (push) Successful in 4s Details Release / Build (Linux) (push) Successful in 7m27s Details Release / Build (macOS) (push) Successful in 7m47s Details Release / Build (Windows) (push) Successful in 19m25s Details - Exclude ctranslate2.converters from PyInstaller bundle — these modules import torch at module level causing circular import crashes, and are only needed for model conversion (never used at runtime) - Defer all heavy ML imports to first handler call instead of startup, so the sidecar can send its ready message without loading torch/whisper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-21 20:22:56 -07:00
Claude	58faa83cb3	Cross-platform distribution, UI improvements, and performance optimizations - PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:33:43 -07:00
Claude	0771508203	Merge perf/chunked-transcription: chunk-based processing for large files	2026-03-20 13:54:14 -07:00
Claude	c23b9a90dd	Merge perf/diarize-threading: diarization progress via background thread	2026-03-20 13:52:59 -07:00
Claude	35af6e9e0c	Merge perf/progress-every-segment: emit progress for every segment	2026-03-20 13:52:18 -07:00
Claude	c3b6ad38fd	Merge perf/stream-segments: streaming partial transcript segments and speaker updates	2026-03-20 13:51:51 -07:00
Claude	03af5a189c	Run pyannote diarization in background thread with progress reporting Move the blocking pipeline() call to a daemon thread and emit estimated progress messages every 2 seconds from the main thread. The progress estimate uses audio duration to calibrate the expected total time. Also pass audio_duration_sec from PipelineService to DiarizeService. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:50:57 -07:00
Claude	16f4b57771	Add chunked transcription for large audio files (>1 hour) Split files >1 hour into 5-minute chunks via ffmpeg, transcribe each chunk independently, then merge results with corrected timestamps. Also add chunk-level progress markers every 10 segments for all files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:49:20 -07:00
Claude	6eb13bce63	Remove progress throttle so every segment emits a progress update Previously, progress messages were only sent every 5th segment due to a `segment_count % 5` guard. This made the UI feel unresponsive for short recordings with few segments. Now every segment emits a progress update with a more descriptive message including the segment number and audio percentage. Adds a test verifying that all 8 mock segments produce progress messages, not just every 5th. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:49:14 -07:00
shadowdao	67ed69df00	Stream transcript segments to frontend as they are transcribed Send each segment to the frontend immediately after transcription via a new pipeline.segment IPC message, then send speaker assignments as a batch pipeline.speaker_update message after diarization completes. This lets the UI display segments progressively instead of waiting for the entire pipeline to finish. Changes: - Add partial_segment_message and speaker_update_message IPC factories - Add on_segment callback parameter to TranscribeService.transcribe() - Emit partial segments and speaker updates from PipelineService.run() - Add send_and_receive_with_progress to SidecarManager (Rust) - Route pipeline.segment/speaker_update events in run_pipeline command - Listen for streaming events in Svelte frontend (+page.svelte) - Add tests for new message types, callback signature, and update logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:47:57 -07:00
shadowdao	585411f402	Fix speaker diarization: WAV conversion, pyannote 4.0 compat, telemetry bug - Convert non-WAV audio to 16kHz mono WAV before diarization (pyannote v4.0.4 AudioDecoder returns None duration for FLAC, causing crash) - Handle pyannote 4.0 DiarizeOutput return type (unwrap .speaker_diarization) - Disable pyannote telemetry (np.isfinite(None) bug with max_speakers) - Use huggingface_hub.login() to persist token for all sub-downloads - Pre-download sub-models (segmentation-3.0, speaker-diarization-community-1) - Add third required model license link in settings UI - Improve SpeakerManager hints based on settings state - Add word-wrap to transcript text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 19:46:07 -08:00
shadowdao	a3612c986d	Add Test & Download button for diarization model, clickable links - Add diarize.download IPC handler that downloads the pyannote model and returns user-friendly error messages (missing license, bad token) - Add download_diarize_model Tauri command - Add "Test & Download Model" button in Speakers settings tab - Update instructions to list both required model licenses (speaker-diarization-3.1 AND segmentation-3.0) - Make all HuggingFace URLs clickable (opens in system browser) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:21:42 -08:00
shadowdao	baf820286f	Add HuggingFace token setting for speaker detection - Add "Speakers" tab in Settings with HF token input field - Include step-by-step instructions for obtaining the token - Pass hf_token from settings through Rust → Python pipeline → diarize - Token can also be set via HF_TOKEN environment variable as fallback - Move skip_diarization checkbox to Speakers tab Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:08:51 -08:00
shadowdao	ed626b8ba0	Fix progress overlay, play-from-position, layout cutoff, speaker info - Replace progress bar with task checklist showing pipeline steps (load model, transcribe, load diarization, identify speakers, merge) - Fix WaveformPlayer: track ready state, disable controls until loaded, play from current position instead of resetting to start - Fix workspace height calc to prevent bottom content cutoff - Show HF_TOKEN setup hint in SpeakerManager when no speakers detected - Add console logging for progress events to aid debugging Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:02:48 -08:00
shadowdao	4d7b9d524f	Fix IPC stdout corruption, dark window background, overlay timing - Redirect sys.stdout to stderr in Python sidecar so library print() calls don't corrupt the JSON-line IPC stream - Save real stdout fd for exclusive IPC use via init_ipc() - Skip non-JSON lines in Rust reader instead of failing with parse error - Set Tauri window background color to match dark theme (#0a0a23) - Add inline dark background on html/body to prevent white flash - Use Svelte tick() to ensure progress overlay renders before invoke - Improve ProgressOverlay with spinner, better styling, z-index 9999 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:50:55 -08:00
shadowdao	87b3ad94f9	Improve import UX: progress overlay, pyannote fix, debug logging - Enhanced ProgressOverlay with spinner, better styling, and z-index 9999 - Import button shows "Processing..." with pulse animation while transcribing - Fix pyannote API: use token= instead of deprecated use_auth_token= - Read HF_TOKEN from environment for pyannote model download - Add console logging for click-to-seek debugging - Add color-scheme: dark for native form controls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:43:49 -08:00
shadowdao	669d88f143	Fix progress feedback, diarization fallback, and dropdown readability - Stream pipeline progress to frontend via Tauri events so the progress overlay updates in real time during transcription/diarization - Gracefully fall back to transcription-only when diarization fails (e.g. pyannote not installed) instead of erroring the whole pipeline - Add color-scheme: dark to fix native select/option elements rendering with unreadable white backgrounds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:14:25 -08:00
shadowdao	d67625cd5a	Phase 5: AI provider system with local and cloud support - Implement AIProvider base interface with chat() and is_available() - Add LocalProvider connecting to bundled llama-server via OpenAI SDK - Add OpenAIProvider for direct OpenAI API access - Add AnthropicProvider for Anthropic Claude API - Add LiteLLMProvider for multi-provider gateway - Build AIProviderService with provider routing, auto-selection, and transcript context injection - Add ai.chat IPC handler supporting chat, list_providers, set_provider, and configure actions - Add ai_chat, ai_list_providers, ai_configure Tauri commands - Build interactive AIChatPanel with message history, quick actions (Summarize, Action Items), and transcript context awareness - Tests: 30 Python, 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:25:10 -08:00
shadowdao	415a648a2b	Phase 4: Export to SRT, WebVTT, ASS, plain text, and Markdown - Implement ExportService using pysubs2 for caption formats (SRT, VTT, ASS) and custom formatters for plain text and Markdown - SRT exports with [Speaker]: prefix, WebVTT with <v Speaker> voice tags, ASS with color-coded speaker styles - Plain text groups by speaker with labels, Markdown adds timestamps - Add export.start IPC handler and export_transcript Tauri command - Add export dropdown menu in header (appears after transcription) - Uses native save dialog for output file selection - Add pysubs2 dependency - Tests: 30 Python (6 export tests), 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:18:54 -08:00
shadowdao	44480906a4	Phase 3: Speaker diarization and full transcription pipeline - Implement DiarizeService with pyannote.audio speaker detection - Build PipelineService combining transcribe → diarize → merge with overlap-based speaker assignment per segment - Add pipeline.start and diarize.start IPC handlers - Add run_pipeline Tauri command for full pipeline execution - Wire frontend to use pipeline: speakers auto-created with colors, segments assigned to detected speakers - Build SpeakerManager with rename support (double-click or edit button) - Add speaker color coding throughout transcript display - Add pyannote.audio dependency - Tests: 24 Python (including merge logic), 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:09:48 -08:00
shadowdao	48fe41b064	Phase 2: Core transcription pipeline and audio playback - Implement faster-whisper TranscribeService with word-level timestamps, progress reporting, and hardware auto-detection - Wire up Rust SidecarManager for Python process lifecycle (spawn, IPC, shutdown) - Add transcribe_file Tauri command bridging frontend to Python sidecar - Integrate wavesurfer.js WaveformPlayer with play/pause, skip, seek controls - Build TranscriptEditor with word-level click-to-seek and active highlighting - Connect file import flow: prompt → asset load → transcribe → display - Add typed tauri-bridge service with TranscriptionResult interface - Add Python tests for hardware detection and transcription result formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:53:09 -08:00
shadowdao	503cc6c0cf	Phase 1 foundation: Tauri shell, Python sidecar, SQLite database Tauri v2 + Svelte + TypeScript frontend: - App shell with workspace layout (waveform, transcript, speakers, AI chat) - Placeholder components for all major UI areas - Typed stores (project, transcript, playback, AI) - TypeScript interfaces matching the database schema - Tauri bridge service with typed invoke wrappers - svelte-check passes with 0 errors Rust backend: - Tauri v2 app entry point with command registration - SQLite database layer (rusqlite with bundled SQLite) - Full schema: projects, media_files, speakers, segments, words, ai_outputs, annotations (with indexes) - Model structs with serde serialization - CRUD queries for projects, speakers, segments, words - Segment text editing preserves original text - Schema versioning for future migrations - 6 tests passing - Command stubs for project, transcribe, export, AI, settings, system - App state management Python sidecar: - JSON-line IPC protocol (stdin/stdout) - Message types: IPCMessage, progress, error, ready - Handler registry with routing and error handling - Ping/pong handler for connectivity testing - Service stubs: transcribe, diarize, pipeline, AI, export - Provider stubs: local (llama-server), OpenAI, Anthropic, LiteLLM - Hardware detection stubs - 14 tests passing, ruff clean Also adds: - Testing strategy document (docs/TESTING.md) - Validation script (scripts/validate.sh) - Updated .gitignore for Svelte, Rust, Python artifacts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:16:06 -08:00

32 Commits