- DevTools off by default (no more auto-open on launch)
- New "Developer" tab in Settings with a checkbox to toggle devtools
- Toggle takes effect immediately (opens/closes inspector)
- Setting persists: devtools restored on next launch if enabled
- toggle_devtools Tauri command wraps window.open/close_devtools
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major refactor: sidecar is no longer bundled in the installer. Instead,
it's downloaded on first launch with a setup screen offering CPU vs CUDA
choice. This solves the 2GB+ installer size limit and decouples app/sidecar.
Backend:
- New commands: check_sidecar, download_sidecar, check_sidecar_update
- Streaming download with progress events via reqwest
- Added reqwest + futures-util dependencies
- Removed sidecar.zip from bundle resources
- Restored NSIS target (no longer size-constrained)
CI:
- Each platform builds both CPU and CUDA sidecar variants (except macOS: CPU only)
- Sidecar zips uploaded as separate release assets
- Asset naming: sidecar-{os}-{arch}-{variant}.zip
Frontend:
- SidecarSetup.svelte: first-launch setup with CPU/CUDA radio choice,
progress bar, error/retry handling
- Update banner on launch if newer sidecar version available
- Conditional rendering: setup screen → main app flow
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When editing a segment, word timing is now intelligently redistributed:
- Spelling fixes (same word count): each word keeps its original timing
- Word splits (e.g. "gonna" → "going to"): original word's time range
is divided proportionally across the new words
- Inserted words: timing interpolated from neighboring words
- Deleted words: remaining words keep their timing, gaps collapse
This preserves click-to-seek accuracy for common edits like fixing
misheard words or splitting concatenated words.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the edited text has the same word count as the original (e.g. fixing
"Whisper" to "wisper"), each word keeps its original start/end timestamps.
Only falls back to segment-level timing when words are added or removed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The display renders segment.words (not segment.text), so editing the text
field alone had no visible effect. Now finishEditing() rebuilds the words
array from the edited text so the change is immediately visible.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Project persistence:
- save_project_transcript command: persists segments, speakers, words to SQLite
- load_project_transcript command: loads full transcript with nested words
- delete_project command: soft-delete projects
- Auto-save after pipeline completes (named from filename)
- Project dropdown in header to switch between saved transcripts
- Projects load audio, segments, and speakers from database
AI chat improvements:
- Markdown rendering in assistant messages (headers, lists, bold, italic, code)
- Better message spacing and visual distinction (border-left accents)
- Styled markdown elements matching dark theme
- Improved empty state and quick action button sizing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Convert non-WAV audio to 16kHz mono WAV before diarization (pyannote
v4.0.4 AudioDecoder returns None duration for FLAC, causing crash)
- Handle pyannote 4.0 DiarizeOutput return type (unwrap .speaker_diarization)
- Disable pyannote telemetry (np.isfinite(None) bug with max_speakers)
- Use huggingface_hub.login() to persist token for all sub-downloads
- Pre-download sub-models (segmentation-3.0, speaker-diarization-community-1)
- Add third required model license link in settings UI
- Improve SpeakerManager hints based on settings state
- Add word-wrap to transcript text
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add diarize.download IPC handler that downloads the pyannote model
and returns user-friendly error messages (missing license, bad token)
- Add download_diarize_model Tauri command
- Add "Test & Download Model" button in Speakers settings tab
- Update instructions to list both required model licenses
(speaker-diarization-3.1 AND segmentation-3.0)
- Make all HuggingFace URLs clickable (opens in system browser)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Speakers" tab in Settings with HF token input field
- Include step-by-step instructions for obtaining the token
- Pass hf_token from settings through Rust → Python pipeline → diarize
- Token can also be set via HF_TOKEN environment variable as fallback
- Move skip_diarization checkbox to Speakers tab
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace progress bar with task checklist showing pipeline steps
(load model, transcribe, load diarization, identify speakers, merge)
- Fix WaveformPlayer: track ready state, disable controls until loaded,
play from current position instead of resetting to start
- Fix workspace height calc to prevent bottom content cutoff
- Show HF_TOKEN setup hint in SpeakerManager when no speakers detected
- Add console logging for progress events to aid debugging
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Enhanced ProgressOverlay with spinner, better styling, and z-index 9999
- Import button shows "Processing..." with pulse animation while transcribing
- Fix pyannote API: use token= instead of deprecated use_auth_token=
- Read HF_TOKEN from environment for pyannote model download
- Add console logging for click-to-seek debugging
- Add color-scheme: dark for native form controls
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Implement LlamaManager in Rust for llama-server lifecycle: spawn with
port allocation, health check, clean shutdown on Drop, model listing
- Add llama_start/stop/status/list_models Tauri commands
- Add load_settings/save_settings commands with JSON persistence
- Build SettingsModal with tabs for Transcription, AI Provider, Local AI
settings (model size, device, language, API keys, provider selection)
- Wire settings into pipeline calls (model, device, language, skip diarization)
- Configure Tauri packaging: asset protocol for local audio files,
CSP policy, bundle metadata, Linux .deb/.AppImage and Windows .msi config
- Add keyboard shortcuts: Space (play/pause), Ctrl+O (import),
Ctrl+, (settings), Escape (close menus/modals)
- Close export dropdown on outside click
- Tests: 30 Python, 6 Rust, 0 Svelte errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-scroll transcript to active segment during playback with smart
pause when user manually scrolls (resumes after 3s)
- Replace prompt() with native Tauri file dialog for audio/video import
with file type filters
- Add inline transcript editing via double-click with Enter to save,
Esc to cancel, preserving original text for change tracking
- Show "edited" badge on modified segments
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>