voice-to-notes

Author	SHA1	Message	Date
Claude	96e9a6d38b	Fix Ollama: remove duplicate stale configMap in AIChatPanel All checks were successful Release / Bump version and tag (push) Successful in 6s Details Release / Build App (macOS) (push) Successful in 1m17s Details Release / Build App (Linux) (push) Successful in 4m49s Details Release / Build App (Windows) (push) Successful in 3m8s Details AIChatPanel had its own hardcoded configMap with the old llama-server URL (localhost:8080) and field names (local_model_path). Every chat message reconfigured the provider with these wrong values, overriding the correct settings applied at startup. Fix: replace the duplicate with a call to the shared configureAIProvider(). Also strip trailing slashes from ollama_url before appending /v1 to prevent double-slash URLs (http://localhost:11434//v1). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 06:33:03 -07:00
Claude	b1d46fd42e	Add cancel button to processing overlay with confirmation All checks were successful Release / Bump version and tag (push) Successful in 3s Details Release / Build App (macOS) (push) Successful in 1m21s Details Release / Build App (Windows) (push) Successful in 3m8s Details Release / Build App (Linux) (push) Successful in 3m40s Details - Cancel button on the progress overlay during transcription - Clicking Cancel shows confirmation: "Processing is incomplete. If you cancel now, the transcription will need to be started over." - "Continue Processing" dismisses the dialog, "Cancel Processing" stops - Cancel clears partial results (segments, speakers) and resets UI - Pipeline results are discarded if cancelled during processing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 05:38:40 -07:00
Claude	aa319eb823	Fix Ollama settings on startup + video extraction UX All checks were successful Release / Bump version and tag (push) Successful in 3s Details Release / Build App (macOS) (push) Successful in 1m18s Details Release / Build App (Linux) (push) Successful in 3m44s Details Release / Build App (Windows) (push) Successful in 3m57s Details AI provider: - Extract configureAIProvider() from saveSettings for reuse - Call it on app startup after sidecar is ready (was only called on Save) - Call it after first-time sidecar download completes - Sidecar now receives correct Ollama URL/model immediately Video extraction: - Hide ffmpeg console window on Windows (CREATE_NO_WINDOW flag) - Show "Extracting audio from video..." overlay with spinner during extraction - UI stays responsive while ffmpeg runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 05:30:14 -07:00
Claude	02c70f90c8	Extract audio from video files before loading All checks were successful Release / Bump version and tag (push) Successful in 3s Details Release / Build App (macOS) (push) Successful in 1m17s Details Release / Build App (Linux) (push) Successful in 4m53s Details Release / Build App (Windows) (push) Successful in 3m45s Details Video files (MP4, MKV, etc.) are now processed with ffmpeg to extract audio to a temp WAV file before loading into wavesurfer. This prevents the WebView crash caused by trying to fetch multi-GB files into memory. - New extract_audio Tauri command uses ffmpeg (sidecar-bundled or system) - Frontend detects video extensions and extracts audio automatically - User-friendly error if ffmpeg is not installed with install instructions - Reverted wavesurfer MediaElement approach in favor of clean extraction - Added FFmpeg install guide to USER_GUIDE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 20:04:10 -07:00
Claude	b1ae49066c	Fix word wrap in transcript editor Some checks failed Release / Bump version and tag (push) Successful in 3s Details Release / Build App (macOS) (push) Successful in 1m20s Details Release / Build App (Windows) (push) Has been cancelled Details Release / Build App (Linux) (push) Has been cancelled Details - Add min-width: 0 to flex container (allows shrinking for wrap) - Add overflow-x: hidden to prevent horizontal scroll - Add white-space: pre-wrap to segment text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 11:59:15 -07:00
Claude	4a9b00111d	Settings: replace llama-server with Ollama, remove Local AI tab, rename Developer to Debug Some checks failed Release / Bump version and tag (push) Has been cancelled Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Release / Build App (macOS) (push) Has been cancelled Details - AI Provider: "Local (llama-server)" changed to "Ollama" with URL and model fields (defaults to localhost:11434, llama3.2) - Ollama connects via its OpenAI-compatible API (/v1 endpoint) - Removed empty "Local AI" tab - Renamed "Developer" tab to "Debug" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 11:55:09 -07:00
Claude	7f1fa1904c	Make DevTools a toggle in Settings > Developer tab Some checks failed Release / Bump version and tag (push) Successful in 7s Details Release / Build App (macOS) (push) Successful in 1m17s Details Release / Build App (Windows) (push) Successful in 3m29s Details Release / Build App (Linux) (push) Has been cancelled Details - DevTools off by default (no more auto-open on launch) - New "Developer" tab in Settings with a checkbox to toggle devtools - Toggle takes effect immediately (opens/closes inspector) - Setting persists: devtools restored on next launch if enabled - toggle_devtools Tauri command wraps window.open/close_devtools Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 10:55:50 -07:00
Claude	7fa903ad01	Download sidecar on first launch instead of bundling Some checks failed Release / Bump version and tag (push) Successful in 13s Details Release / Build (macOS) (push) Failing after 4m55s Details Release / Build (Windows) (push) Failing after 14m58s Details Release / Build (Linux) (push) Failing after 17m18s Details Major refactor: sidecar is no longer bundled in the installer. Instead, it's downloaded on first launch with a setup screen offering CPU vs CUDA choice. This solves the 2GB+ installer size limit and decouples app/sidecar. Backend: - New commands: check_sidecar, download_sidecar, check_sidecar_update - Streaming download with progress events via reqwest - Added reqwest + futures-util dependencies - Removed sidecar.zip from bundle resources - Restored NSIS target (no longer size-constrained) CI: - Each platform builds both CPU and CUDA sidecar variants (except macOS: CPU only) - Sidecar zips uploaded as separate release assets - Asset naming: sidecar-{os}-{arch}-{variant}.zip Frontend: - SidecarSetup.svelte: first-launch setup with CPU/CUDA radio choice, progress bar, error/retry handling - Update banner on launch if newer sidecar version available - Conditional rendering: setup screen → main app flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 07:09:10 -07:00
Claude	882aa147c7	Smart word timing redistribution on transcript edits When editing a segment, word timing is now intelligently redistributed: - Spelling fixes (same word count): each word keeps its original timing - Word splits (e.g. "gonna" → "going to"): original word's time range is divided proportionally across the new words - Inserted words: timing interpolated from neighboring words - Deleted words: remaining words keep their timing, gaps collapse This preserves click-to-seek accuracy for common edits like fixing misheard words or splitting concatenated words. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 22:23:39 -07:00
Claude	67fc23e8aa	Preserve word-level timing on spelling edits When the edited text has the same word count as the original (e.g. fixing "Whisper" to "wisper"), each word keeps its original start/end timestamps. Only falls back to segment-level timing when words are added or removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 22:21:52 -07:00
Claude	727107323c	Fix transcript text edit not showing after Enter The display renders segment.words (not segment.text), so editing the text field alone had no visible effect. Now finishEditing() rebuilds the words array from the edited text so the change is immediately visible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 22:20:20 -07:00
Claude	61caa07e4c	Add project save/load and improve AI chat formatting Project persistence: - save_project_transcript command: persists segments, speakers, words to SQLite - load_project_transcript command: loads full transcript with nested words - delete_project command: soft-delete projects - Auto-save after pipeline completes (named from filename) - Project dropdown in header to switch between saved transcripts - Projects load audio, segments, and speakers from database AI chat improvements: - Markdown rendering in assistant messages (headers, lists, bold, italic, code) - Better message spacing and visual distinction (border-left accents) - Styled markdown elements matching dark theme - Improved empty state and quick action button sizing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 22:06:29 -07:00
Claude	58faa83cb3	Cross-platform distribution, UI improvements, and performance optimizations - PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:33:43 -07:00
Josh Knapp	585411f402	Fix speaker diarization: WAV conversion, pyannote 4.0 compat, telemetry bug - Convert non-WAV audio to 16kHz mono WAV before diarization (pyannote v4.0.4 AudioDecoder returns None duration for FLAC, causing crash) - Handle pyannote 4.0 DiarizeOutput return type (unwrap .speaker_diarization) - Disable pyannote telemetry (np.isfinite(None) bug with max_speakers) - Use huggingface_hub.login() to persist token for all sub-downloads - Pre-download sub-models (segmentation-3.0, speaker-diarization-community-1) - Add third required model license link in settings UI - Improve SpeakerManager hints based on settings state - Add word-wrap to transcript text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 19:46:07 -08:00
Josh Knapp	a3612c986d	Add Test & Download button for diarization model, clickable links - Add diarize.download IPC handler that downloads the pyannote model and returns user-friendly error messages (missing license, bad token) - Add download_diarize_model Tauri command - Add "Test & Download Model" button in Speakers settings tab - Update instructions to list both required model licenses (speaker-diarization-3.1 AND segmentation-3.0) - Make all HuggingFace URLs clickable (opens in system browser) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:21:42 -08:00
Josh Knapp	baf820286f	Add HuggingFace token setting for speaker detection - Add "Speakers" tab in Settings with HF token input field - Include step-by-step instructions for obtaining the token - Pass hf_token from settings through Rust → Python pipeline → diarize - Token can also be set via HF_TOKEN environment variable as fallback - Move skip_diarization checkbox to Speakers tab Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:08:51 -08:00
Josh Knapp	ed626b8ba0	Fix progress overlay, play-from-position, layout cutoff, speaker info - Replace progress bar with task checklist showing pipeline steps (load model, transcribe, load diarization, identify speakers, merge) - Fix WaveformPlayer: track ready state, disable controls until loaded, play from current position instead of resetting to start - Fix workspace height calc to prevent bottom content cutoff - Show HF_TOKEN setup hint in SpeakerManager when no speakers detected - Add console logging for progress events to aid debugging Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:02:48 -08:00
Josh Knapp	87b3ad94f9	Improve import UX: progress overlay, pyannote fix, debug logging - Enhanced ProgressOverlay with spinner, better styling, and z-index 9999 - Import button shows "Processing..." with pulse animation while transcribing - Fix pyannote API: use token= instead of deprecated use_auth_token= - Read HF_TOKEN from environment for pyannote model download - Add console logging for click-to-seek debugging - Add color-scheme: dark for native form controls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:43:49 -08:00
Josh Knapp	97a1a15755	Phase 6: Llama-server manager, settings UI, packaging, and polish - Implement LlamaManager in Rust for llama-server lifecycle: spawn with port allocation, health check, clean shutdown on Drop, model listing - Add llama_start/stop/status/list_models Tauri commands - Add load_settings/save_settings commands with JSON persistence - Build SettingsModal with tabs for Transcription, AI Provider, Local AI settings (model size, device, language, API keys, provider selection) - Wire settings into pipeline calls (model, device, language, skip diarization) - Configure Tauri packaging: asset protocol for local audio files, CSP policy, bundle metadata, Linux .deb/.AppImage and Windows .msi config - Add keyboard shortcuts: Space (play/pause), Ctrl+O (import), Ctrl+, (settings), Escape (close menus/modals) - Close export dropdown on outside click - Tests: 30 Python, 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:38:23 -08:00
Josh Knapp	d67625cd5a	Phase 5: AI provider system with local and cloud support - Implement AIProvider base interface with chat() and is_available() - Add LocalProvider connecting to bundled llama-server via OpenAI SDK - Add OpenAIProvider for direct OpenAI API access - Add AnthropicProvider for Anthropic Claude API - Add LiteLLMProvider for multi-provider gateway - Build AIProviderService with provider routing, auto-selection, and transcript context injection - Add ai.chat IPC handler supporting chat, list_providers, set_provider, and configure actions - Add ai_chat, ai_list_providers, ai_configure Tauri commands - Build interactive AIChatPanel with message history, quick actions (Summarize, Action Items), and transcript context awareness - Tests: 30 Python, 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:25:10 -08:00
Josh Knapp	415a648a2b	Phase 4: Export to SRT, WebVTT, ASS, plain text, and Markdown - Implement ExportService using pysubs2 for caption formats (SRT, VTT, ASS) and custom formatters for plain text and Markdown - SRT exports with [Speaker]: prefix, WebVTT with <v Speaker> voice tags, ASS with color-coded speaker styles - Plain text groups by speaker with labels, Markdown adds timestamps - Add export.start IPC handler and export_transcript Tauri command - Add export dropdown menu in header (appears after transcription) - Uses native save dialog for output file selection - Add pysubs2 dependency - Tests: 30 Python (6 export tests), 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:18:54 -08:00
Josh Knapp	44480906a4	Phase 3: Speaker diarization and full transcription pipeline - Implement DiarizeService with pyannote.audio speaker detection - Build PipelineService combining transcribe → diarize → merge with overlap-based speaker assignment per segment - Add pipeline.start and diarize.start IPC handlers - Add run_pipeline Tauri command for full pipeline execution - Wire frontend to use pipeline: speakers auto-created with colors, segments assigned to detected speakers - Build SpeakerManager with rename support (double-click or edit button) - Add speaker color coding throughout transcript display - Add pyannote.audio dependency - Tests: 24 Python (including merge logic), 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:09:48 -08:00
Josh Knapp	842f8d5f90	Add auto-scroll, file dialog, and transcript editing - Auto-scroll transcript to active segment during playback with smart pause when user manually scrolls (resumes after 3s) - Replace prompt() with native Tauri file dialog for audio/video import with file type filters - Add inline transcript editing via double-click with Enter to save, Esc to cancel, preserving original text for change tracking - Show "edited" badge on modified segments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:02:27 -08:00
Josh Knapp	48fe41b064	Phase 2: Core transcription pipeline and audio playback - Implement faster-whisper TranscribeService with word-level timestamps, progress reporting, and hardware auto-detection - Wire up Rust SidecarManager for Python process lifecycle (spawn, IPC, shutdown) - Add transcribe_file Tauri command bridging frontend to Python sidecar - Integrate wavesurfer.js WaveformPlayer with play/pause, skip, seek controls - Build TranscriptEditor with word-level click-to-seek and active highlighting - Connect file import flow: prompt → asset load → transcribe → display - Add typed tauri-bridge service with TranscriptionResult interface - Add Python tests for hardware detection and transcription result formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:53:09 -08:00
Josh Knapp	503cc6c0cf	Phase 1 foundation: Tauri shell, Python sidecar, SQLite database Tauri v2 + Svelte + TypeScript frontend: - App shell with workspace layout (waveform, transcript, speakers, AI chat) - Placeholder components for all major UI areas - Typed stores (project, transcript, playback, AI) - TypeScript interfaces matching the database schema - Tauri bridge service with typed invoke wrappers - svelte-check passes with 0 errors Rust backend: - Tauri v2 app entry point with command registration - SQLite database layer (rusqlite with bundled SQLite) - Full schema: projects, media_files, speakers, segments, words, ai_outputs, annotations (with indexes) - Model structs with serde serialization - CRUD queries for projects, speakers, segments, words - Segment text editing preserves original text - Schema versioning for future migrations - 6 tests passing - Command stubs for project, transcribe, export, AI, settings, system - App state management Python sidecar: - JSON-line IPC protocol (stdin/stdout) - Message types: IPCMessage, progress, error, ready - Handler registry with routing and error handling - Ping/pong handler for connectivity testing - Service stubs: transcribe, diarize, pipeline, AI, export - Provider stubs: local (llama-server), OpenAI, Anthropic, LiteLLM - Hardware detection stubs - 14 tests passing, ruff clean Also adds: - Testing strategy document (docs/TESTING.md) - Validation script (scripts/validate.sh) - Updated .gitignore for Svelte, Rust, Python artifacts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:16:06 -08:00

25 Commits