voice-to-notes

Author	SHA1	Message	Date
Claude	aa319eb823	Fix Ollama settings on startup + video extraction UX All checks were successful Release / Bump version and tag (push) Successful in 3s Details Release / Build App (macOS) (push) Successful in 1m18s Details Release / Build App (Linux) (push) Successful in 3m44s Details Release / Build App (Windows) (push) Successful in 3m57s Details AI provider: - Extract configureAIProvider() from saveSettings for reuse - Call it on app startup after sidecar is ready (was only called on Save) - Call it after first-time sidecar download completes - Sidecar now receives correct Ollama URL/model immediately Video extraction: - Hide ffmpeg console window on Windows (CREATE_NO_WINDOW flag) - Show "Extracting audio from video..." overlay with spinner during extraction - UI stays responsive while ffmpeg runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 05:30:14 -07:00
Claude	02c70f90c8	Extract audio from video files before loading All checks were successful Release / Bump version and tag (push) Successful in 3s Details Release / Build App (macOS) (push) Successful in 1m17s Details Release / Build App (Linux) (push) Successful in 4m53s Details Release / Build App (Windows) (push) Successful in 3m45s Details Video files (MP4, MKV, etc.) are now processed with ffmpeg to extract audio to a temp WAV file before loading into wavesurfer. This prevents the WebView crash caused by trying to fetch multi-GB files into memory. - New extract_audio Tauri command uses ffmpeg (sidecar-bundled or system) - Frontend detects video extensions and extracts audio automatically - User-friendly error if ffmpeg is not installed with install instructions - Reverted wavesurfer MediaElement approach in favor of clean extraction - Added FFmpeg install guide to USER_GUIDE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 20:04:10 -07:00
Claude	7f1fa1904c	Make DevTools a toggle in Settings > Developer tab Some checks failed Release / Bump version and tag (push) Successful in 7s Details Release / Build App (macOS) (push) Successful in 1m17s Details Release / Build App (Windows) (push) Successful in 3m29s Details Release / Build App (Linux) (push) Has been cancelled Details - DevTools off by default (no more auto-open on launch) - New "Developer" tab in Settings with a checkbox to toggle devtools - Toggle takes effect immediately (opens/closes inspector) - Setting persists: devtools restored on next launch if enabled - toggle_devtools Tauri command wraps window.open/close_devtools Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 10:55:50 -07:00
Claude	e2c5db89b6	Enable devtools in release builds + add frontend logging All checks were successful Release / Bump version and tag (push) Successful in 3s Details Release / Build App (macOS) (push) Successful in 1m16s Details Release / Build App (Linux) (push) Successful in 4m30s Details Release / Build App (Windows) (push) Successful in 3m20s Details - Enable Tauri devtools feature so right-click Inspect works in release - Open devtools automatically on launch for debugging - Add log_frontend command: frontend can write to ~/.voicetonotes/frontend.log - Sidecar logs go to %LOCALAPPDATA%/com.voicetonotes.app/sidecar.log - Frontend logs go to %USERPROFILE%/.voicetonotes/frontend.log Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 09:16:29 -07:00
Claude	45247ae66e	Decouple sidecar versioning from app versioning Some checks failed Build Sidecars / Bump sidecar version and tag (push) Successful in 3s Details Release / Bump version and tag (push) Failing after 3s Details Release / Build App (Linux) (push) Has been skipped Details Release / Build App (Windows) (push) Has been skipped Details Release / Build App (macOS) (push) Has been skipped Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 5m28s Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m54s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 37m38s Details Sidecar now has its own version (1.0.0) and release lifecycle: - Sidecar tags: sidecar-v1.0.0, sidecar-v1.0.1, etc. - App tags: v0.2.x (unchanged) - Sidecar workflow triggers only on python/** changes or manual dispatch - App release no longer bumps python/pyproject.toml Sidecar version tracked via sidecar-version.txt in app data dir: - resolve_sidecar_path() reads version from file instead of CARGO_PKG_VERSION - download_sidecar() fetches latest sidecar-v* release from Gitea API - check_sidecar_update() compares local vs remote sidecar versions - Version file written after successful download Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 07:57:51 -07:00
Claude	9652290a06	Split CI into app + sidecar workflows, fix reqwest compilation Some checks failed Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m39s Details Release / Bump version and tag (push) Has been cancelled Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Release / Build App (macOS) (push) Has been cancelled Details Build Sidecars / Build Sidecar (Windows) (push) Has been cancelled Details Build Sidecars / Build Sidecar (Linux) (push) Has been cancelled Details CI split: - release.yml: version bump + lightweight app builds (no Python/sidecar) - build-sidecar.yml: builds CPU + CUDA sidecar variants per platform, uploads as separate release assets, runs in parallel with app builds - Sidecar workflow uses retry loop to find release (race with version bump) Fixes: - Add reqwest "json" feature for .json() method - Add explicit type annotations for reqwest Response and bytes::Bytes - Reuse client instance for download (was using reqwest::get directly) Bundle targets: deb, rpm, nsis, msi, dmg (all formats, app is small now) Windows upload finds both .msi and -setup.exe Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 07:50:55 -07:00
Claude	7fa903ad01	Download sidecar on first launch instead of bundling Some checks failed Release / Bump version and tag (push) Successful in 13s Details Release / Build (macOS) (push) Failing after 4m55s Details Release / Build (Windows) (push) Failing after 14m58s Details Release / Build (Linux) (push) Failing after 17m18s Details Major refactor: sidecar is no longer bundled in the installer. Instead, it's downloaded on first launch with a setup screen offering CPU vs CUDA choice. This solves the 2GB+ installer size limit and decouples app/sidecar. Backend: - New commands: check_sidecar, download_sidecar, check_sidecar_update - Streaming download with progress events via reqwest - Added reqwest + futures-util dependencies - Removed sidecar.zip from bundle resources - Restored NSIS target (no longer size-constrained) CI: - Each platform builds both CPU and CUDA sidecar variants (except macOS: CPU only) - Sidecar zips uploaded as separate release assets - Asset naming: sidecar-{os}-{arch}-{variant}.zip Frontend: - SidecarSetup.svelte: first-launch setup with CPU/CUDA radio choice, progress bar, error/retry handling - Update banner on launch if newer sidecar version available - Conditional rendering: setup screen → main app flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 07:09:10 -07:00
Claude	27b705b5b6	File-based project save/load, AI chat formatting, text edit fix Project files (.vtn): - Save Project: serializes transcript, speakers, audio path to JSON file - Open Project: loads .vtn file, restores audio/transcript/speakers - User chooses filename and location via save dialog - Replaces SQLite-based project persistence (DB commands remain for future use) - Text edits update in-memory store immediately, persist on explicit save - Fix Windows path separator in project name extraction AI chat: - Markdown rendering in assistant messages (headers, lists, bold, code) - Better visual distinction with border-left accents - Styled markdown elements for dark theme Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 22:17:35 -07:00
Claude	8e7d21d22b	Persist segment text edits to database on Enter - Add update_segment Tauri command (calls existing update_segment_text query) - Wire onTextEdit handler from TranscriptEditor to invoke update_segment - Edits are saved to SQLite immediately when user presses Enter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 22:08:52 -07:00
Claude	61caa07e4c	Add project save/load and improve AI chat formatting Project persistence: - save_project_transcript command: persists segments, speakers, words to SQLite - load_project_transcript command: loads full transcript with nested words - delete_project command: soft-delete projects - Auto-save after pipeline completes (named from filename) - Project dropdown in header to switch between saved transcripts - Projects load audio, segments, and speakers from database AI chat improvements: - Markdown rendering in assistant messages (headers, lists, bold, italic, code) - Better message spacing and visual distinction (border-left accents) - Styled markdown elements matching dark theme - Improved empty state and quick action button sizing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 22:06:29 -07:00
Claude	58faa83cb3	Cross-platform distribution, UI improvements, and performance optimizations - PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:33:43 -07:00
Claude	42ccd3e21d	Fix cargo fmt formatting and diarize threading test mock	2026-03-20 13:56:32 -07:00
Claude	c3b6ad38fd	Merge perf/stream-segments: streaming partial transcript segments and speaker updates	2026-03-20 13:51:51 -07:00
Josh Knapp	67ed69df00	Stream transcript segments to frontend as they are transcribed Send each segment to the frontend immediately after transcription via a new pipeline.segment IPC message, then send speaker assignments as a batch pipeline.speaker_update message after diarization completes. This lets the UI display segments progressively instead of waiting for the entire pipeline to finish. Changes: - Add partial_segment_message and speaker_update_message IPC factories - Add on_segment callback parameter to TranscribeService.transcribe() - Emit partial segments and speaker updates from PipelineService.run() - Add send_and_receive_with_progress to SidecarManager (Rust) - Route pipeline.segment/speaker_update events in run_pipeline command - Listen for streaming events in Svelte frontend (+page.svelte) - Add tests for new message types, callback signature, and update logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 13:47:57 -07:00
Josh Knapp	a3612c986d	Add Test & Download button for diarization model, clickable links - Add diarize.download IPC handler that downloads the pyannote model and returns user-friendly error messages (missing license, bad token) - Add download_diarize_model Tauri command - Add "Test & Download Model" button in Speakers settings tab - Update instructions to list both required model licenses (speaker-diarization-3.1 AND segmentation-3.0) - Make all HuggingFace URLs clickable (opens in system browser) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:21:42 -08:00
Josh Knapp	baf820286f	Add HuggingFace token setting for speaker detection - Add "Speakers" tab in Settings with HF token input field - Include step-by-step instructions for obtaining the token - Pass hf_token from settings through Rust → Python pipeline → diarize - Token can also be set via HF_TOKEN environment variable as fallback - Move skip_diarization checkbox to Speakers tab Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 18:08:51 -08:00
Josh Knapp	669d88f143	Fix progress feedback, diarization fallback, and dropdown readability - Stream pipeline progress to frontend via Tauri events so the progress overlay updates in real time during transcription/diarization - Gracefully fall back to transcription-only when diarization fails (e.g. pyannote not installed) instead of erroring the whole pipeline - Add color-scheme: dark to fix native select/option elements rendering with unreadable white backgrounds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 17:14:25 -08:00
Josh Knapp	d00281f0c7	Fix critical integration issues for end-to-end functionality - Rewrite SidecarManager as singleton with OnceLock, reusing one Python process across all commands instead of spawning per call - Separate stdin/stdout ownership with dedicated BufReader to prevent data corruption between wait_for_ready and send_and_receive - Add ensure_running() for auto-start on first command - Fix asset protocol URL: use convertFileSrc() instead of manual encodeURIComponent which broke file paths with slashes - Add +layout.svelte with global dark theme, CSS reset, and custom scrollbar styling to prevent white flash on startup - Register AppState with Tauri .manage(), initialize SQLite database on app startup at ~/.voicetonotes/voice_to_notes.db - Wire project commands (create/get/list) to real database queries instead of placeholder stubs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:50:14 -08:00
Josh Knapp	97a1a15755	Phase 6: Llama-server manager, settings UI, packaging, and polish - Implement LlamaManager in Rust for llama-server lifecycle: spawn with port allocation, health check, clean shutdown on Drop, model listing - Add llama_start/stop/status/list_models Tauri commands - Add load_settings/save_settings commands with JSON persistence - Build SettingsModal with tabs for Transcription, AI Provider, Local AI settings (model size, device, language, API keys, provider selection) - Wire settings into pipeline calls (model, device, language, skip diarization) - Configure Tauri packaging: asset protocol for local audio files, CSP policy, bundle metadata, Linux .deb/.AppImage and Windows .msi config - Add keyboard shortcuts: Space (play/pause), Ctrl+O (import), Ctrl+, (settings), Escape (close menus/modals) - Close export dropdown on outside click - Tests: 30 Python, 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:38:23 -08:00
Josh Knapp	d67625cd5a	Phase 5: AI provider system with local and cloud support - Implement AIProvider base interface with chat() and is_available() - Add LocalProvider connecting to bundled llama-server via OpenAI SDK - Add OpenAIProvider for direct OpenAI API access - Add AnthropicProvider for Anthropic Claude API - Add LiteLLMProvider for multi-provider gateway - Build AIProviderService with provider routing, auto-selection, and transcript context injection - Add ai.chat IPC handler supporting chat, list_providers, set_provider, and configure actions - Add ai_chat, ai_list_providers, ai_configure Tauri commands - Build interactive AIChatPanel with message history, quick actions (Summarize, Action Items), and transcript context awareness - Tests: 30 Python, 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:25:10 -08:00
Josh Knapp	415a648a2b	Phase 4: Export to SRT, WebVTT, ASS, plain text, and Markdown - Implement ExportService using pysubs2 for caption formats (SRT, VTT, ASS) and custom formatters for plain text and Markdown - SRT exports with [Speaker]: prefix, WebVTT with <v Speaker> voice tags, ASS with color-coded speaker styles - Plain text groups by speaker with labels, Markdown adds timestamps - Add export.start IPC handler and export_transcript Tauri command - Add export dropdown menu in header (appears after transcription) - Uses native save dialog for output file selection - Add pysubs2 dependency - Tests: 30 Python (6 export tests), 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:18:54 -08:00
Josh Knapp	44480906a4	Phase 3: Speaker diarization and full transcription pipeline - Implement DiarizeService with pyannote.audio speaker detection - Build PipelineService combining transcribe → diarize → merge with overlap-based speaker assignment per segment - Add pipeline.start and diarize.start IPC handlers - Add run_pipeline Tauri command for full pipeline execution - Wire frontend to use pipeline: speakers auto-created with colors, segments assigned to detected speakers - Build SpeakerManager with rename support (double-click or edit button) - Add speaker color coding throughout transcript display - Add pyannote.audio dependency - Tests: 24 Python (including merge logic), 6 Rust, 0 Svelte errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:09:48 -08:00
Josh Knapp	48fe41b064	Phase 2: Core transcription pipeline and audio playback - Implement faster-whisper TranscribeService with word-level timestamps, progress reporting, and hardware auto-detection - Wire up Rust SidecarManager for Python process lifecycle (spawn, IPC, shutdown) - Add transcribe_file Tauri command bridging frontend to Python sidecar - Integrate wavesurfer.js WaveformPlayer with play/pause, skip, seek controls - Build TranscriptEditor with word-level click-to-seek and active highlighting - Connect file import flow: prompt → asset load → transcribe → display - Add typed tauri-bridge service with TranscriptionResult interface - Add Python tests for hardware detection and transcription result formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:53:09 -08:00
Josh Knapp	503cc6c0cf	Phase 1 foundation: Tauri shell, Python sidecar, SQLite database Tauri v2 + Svelte + TypeScript frontend: - App shell with workspace layout (waveform, transcript, speakers, AI chat) - Placeholder components for all major UI areas - Typed stores (project, transcript, playback, AI) - TypeScript interfaces matching the database schema - Tauri bridge service with typed invoke wrappers - svelte-check passes with 0 errors Rust backend: - Tauri v2 app entry point with command registration - SQLite database layer (rusqlite with bundled SQLite) - Full schema: projects, media_files, speakers, segments, words, ai_outputs, annotations (with indexes) - Model structs with serde serialization - CRUD queries for projects, speakers, segments, words - Segment text editing preserves original text - Schema versioning for future migrations - 6 tests passing - Command stubs for project, transcribe, export, AI, settings, system - App state management Python sidecar: - JSON-line IPC protocol (stdin/stdout) - Message types: IPCMessage, progress, error, ready - Handler registry with routing and error handling - Ping/pong handler for connectivity testing - Service stubs: transcribe, diarize, pipeline, AI, export - Provider stubs: local (llama-server), OpenAI, Anthropic, LiteLLM - Hardware detection stubs - 14 tests passing, ruff clean Also adds: - Testing strategy document (docs/TESTING.md) - Validation script (scripts/validate.sh) - Updated .gitignore for Svelte, Rust, Python artifacts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:16:06 -08:00

24 Commits