Send each segment to the frontend immediately after transcription via
a new pipeline.segment IPC message, then send speaker assignments as a
batch pipeline.speaker_update message after diarization completes. This
lets the UI display segments progressively instead of waiting for the
entire pipeline to finish.
Changes:
- Add partial_segment_message and speaker_update_message IPC factories
- Add on_segment callback parameter to TranscribeService.transcribe()
- Emit partial segments and speaker updates from PipelineService.run()
- Add send_and_receive_with_progress to SidecarManager (Rust)
- Route pipeline.segment/speaker_update events in run_pipeline command
- Listen for streaming events in Svelte frontend (+page.svelte)
- Add tests for new message types, callback signature, and update logic
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Convert non-WAV audio to 16kHz mono WAV before diarization (pyannote
v4.0.4 AudioDecoder returns None duration for FLAC, causing crash)
- Handle pyannote 4.0 DiarizeOutput return type (unwrap .speaker_diarization)
- Disable pyannote telemetry (np.isfinite(None) bug with max_speakers)
- Use huggingface_hub.login() to persist token for all sub-downloads
- Pre-download sub-models (segmentation-3.0, speaker-diarization-community-1)
- Add third required model license link in settings UI
- Improve SpeakerManager hints based on settings state
- Add word-wrap to transcript text
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add diarize.download IPC handler that downloads the pyannote model
and returns user-friendly error messages (missing license, bad token)
- Add download_diarize_model Tauri command
- Add "Test & Download Model" button in Speakers settings tab
- Update instructions to list both required model licenses
(speaker-diarization-3.1 AND segmentation-3.0)
- Make all HuggingFace URLs clickable (opens in system browser)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Speakers" tab in Settings with HF token input field
- Include step-by-step instructions for obtaining the token
- Pass hf_token from settings through Rust → Python pipeline → diarize
- Token can also be set via HF_TOKEN environment variable as fallback
- Move skip_diarization checkbox to Speakers tab
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace progress bar with task checklist showing pipeline steps
(load model, transcribe, load diarization, identify speakers, merge)
- Fix WaveformPlayer: track ready state, disable controls until loaded,
play from current position instead of resetting to start
- Fix workspace height calc to prevent bottom content cutoff
- Show HF_TOKEN setup hint in SpeakerManager when no speakers detected
- Add console logging for progress events to aid debugging
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Redirect sys.stdout to stderr in Python sidecar so library print()
calls don't corrupt the JSON-line IPC stream
- Save real stdout fd for exclusive IPC use via init_ipc()
- Skip non-JSON lines in Rust reader instead of failing with parse error
- Set Tauri window background color to match dark theme (#0a0a23)
- Add inline dark background on html/body to prevent white flash
- Use Svelte tick() to ensure progress overlay renders before invoke
- Improve ProgressOverlay with spinner, better styling, z-index 9999
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Enhanced ProgressOverlay with spinner, better styling, and z-index 9999
- Import button shows "Processing..." with pulse animation while transcribing
- Fix pyannote API: use token= instead of deprecated use_auth_token=
- Read HF_TOKEN from environment for pyannote model download
- Add console logging for click-to-seek debugging
- Add color-scheme: dark for native form controls
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Stream pipeline progress to frontend via Tauri events so the progress
overlay updates in real time during transcription/diarization
- Gracefully fall back to transcription-only when diarization fails
(e.g. pyannote not installed) instead of erroring the whole pipeline
- Add color-scheme: dark to fix native select/option elements rendering
with unreadable white backgrounds
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite SidecarManager as singleton with OnceLock, reusing one Python
process across all commands instead of spawning per call
- Separate stdin/stdout ownership with dedicated BufReader to prevent
data corruption between wait_for_ready and send_and_receive
- Add ensure_running() for auto-start on first command
- Fix asset protocol URL: use convertFileSrc() instead of manual
encodeURIComponent which broke file paths with slashes
- Add +layout.svelte with global dark theme, CSS reset, and custom
scrollbar styling to prevent white flash on startup
- Register AppState with Tauri .manage(), initialize SQLite database
on app startup at ~/.voicetonotes/voice_to_notes.db
- Wire project commands (create/get/list) to real database queries
instead of placeholder stubs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Implement LlamaManager in Rust for llama-server lifecycle: spawn with
port allocation, health check, clean shutdown on Drop, model listing
- Add llama_start/stop/status/list_models Tauri commands
- Add load_settings/save_settings commands with JSON persistence
- Build SettingsModal with tabs for Transcription, AI Provider, Local AI
settings (model size, device, language, API keys, provider selection)
- Wire settings into pipeline calls (model, device, language, skip diarization)
- Configure Tauri packaging: asset protocol for local audio files,
CSP policy, bundle metadata, Linux .deb/.AppImage and Windows .msi config
- Add keyboard shortcuts: Space (play/pause), Ctrl+O (import),
Ctrl+, (settings), Escape (close menus/modals)
- Close export dropdown on outside click
- Tests: 30 Python, 6 Rust, 0 Svelte errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Implement ExportService using pysubs2 for caption formats (SRT, VTT, ASS)
and custom formatters for plain text and Markdown
- SRT exports with [Speaker]: prefix, WebVTT with <v Speaker> voice tags,
ASS with color-coded speaker styles
- Plain text groups by speaker with labels, Markdown adds timestamps
- Add export.start IPC handler and export_transcript Tauri command
- Add export dropdown menu in header (appears after transcription)
- Uses native save dialog for output file selection
- Add pysubs2 dependency
- Tests: 30 Python (6 export tests), 6 Rust, 0 Svelte errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-scroll transcript to active segment during playback with smart
pause when user manually scrolls (resumes after 3s)
- Replace prompt() with native Tauri file dialog for audio/video import
with file type filters
- Add inline transcript editing via double-click with Enter to save,
Esc to cancel, preserving original text for change tracking
- Show "edited" badge on modified segments
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace Ollama dependency with bundled llama-server (llama.cpp)
so users need no separate install for local AI inference
- Rust backend manages llama-server lifecycle (spawn, port, shutdown)
- Add MIT license for open source release
- Update architecture doc, CLAUDE.md, and README accordingly
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detailed architecture covering Tauri + Svelte frontend, Rust backend,
Python sidecar for ML (faster-whisper, pyannote.audio), IPC protocol,
SQLite schema, AI provider system, export formats, and phased
implementation plan with agent work breakdown.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Establish the voice-to-notes project with documentation covering
goals, platform targets, and planned feature set.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>