AI provider:
- Extract configureAIProvider() from saveSettings for reuse
- Call it on app startup after sidecar is ready (was only called on Save)
- Call it after first-time sidecar download completes
- Sidecar now receives correct Ollama URL/model immediately
Video extraction:
- Hide ffmpeg console window on Windows (CREATE_NO_WINDOW flag)
- Show "Extracting audio from video..." overlay with spinner during extraction
- UI stays responsive while ffmpeg runs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Video files (MP4, MKV, etc.) are now processed with ffmpeg to extract
audio to a temp WAV file before loading into wavesurfer. This prevents
the WebView crash caused by trying to fetch multi-GB files into memory.
- New extract_audio Tauri command uses ffmpeg (sidecar-bundled or system)
- Frontend detects video extensions and extracts audio automatically
- User-friendly error if ffmpeg is not installed with install instructions
- Reverted wavesurfer MediaElement approach in favor of clean extraction
- Added FFmpeg install guide to USER_GUIDE.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The configure action registered the provider but never called
set_active(), so the sidecar kept using the old/default provider.
Also updated the local provider default from localhost:8080 to
localhost:11434/v1 (Ollama). Added debug logging for configure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diarization: Audio.crop patch now pads short segments with zeros to
match the expected duration. pyannote batches embeddings with vstack
which requires uniform tensor sizes — the last segment of a file can
be shorter than the 10s window.
CI: Reordered sidecar workflow to check for python/ changes FIRST,
before bumping version or configuring git. All subsequent steps are
gated on has_changes. This prevents unnecessary version bumps and
build runs when only app code changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs daily at 6am UTC and on manual dispatch. Separately tracks app
releases (v*) and sidecar releases (sidecar-v*), keeping the latest
5 of each and deleting older ones along with their tags.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
base_url was being set to 'http://localhost:11434/v1' by the frontend,
then LocalProvider appended another '/v1', resulting in '/v1/v1'.
Now the provider uses base_url directly (frontend already appends /v1).
Also fixed health check to hit Ollama root instead of /health.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous patch only replaced Audio.__call__ (segmentation), but
pyannote also calls Audio.crop during speaker embedding extraction.
crop loads a time segment of audio — patched to load full file via
soundfile then slice the tensor to the requested time range.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
soundfile was only a transitive dep of torchaudio but collect_all()
in PyInstaller can't bundle it if it's not installed. Adding it as
an explicit dependency ensures it's in the venv and bundled correctly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
soundfile needs collect_all() to include libsndfile native library —
hiddenimports alone wasn't enough, causing 'No module named soundfile'
in the frozen sidecar. This is needed for the pyannote Audio patch
that bypasses torchaudio/torchcodec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add min-width: 0 to flex container (allows shrinking for wrap)
- Add overflow-x: hidden to prevent horizontal scroll
- Add white-space: pre-wrap to segment text
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- AI Provider: "Local (llama-server)" changed to "Ollama" with URL and
model fields (defaults to localhost:11434, llama3.2)
- Ollama connects via its OpenAI-compatible API (/v1 endpoint)
- Removed empty "Local AI" tab
- Renamed "Developer" tab to "Debug"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
torchaudio 2.10 unconditionally delegates load() to torchcodec, ignoring
the backend parameter. Since torchcodec is excluded from PyInstaller,
this broke our pyannote Audio monkey-patch.
Fix: replace torchaudio.load() with soundfile.read() + torch.from_numpy().
soundfile handles WAV natively (audio is pre-converted to WAV), has no
torchcodec dependency, and is already a transitive dependency.
Also added soundfile to PyInstaller hiddenimports.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CSP: Add blob: to connect-src/img-src/media-src for wavesurfer.js audio
playback. Add http://tauri.localhost to default-src for devtools.
pyannote: sys.modules block didn't work — pyannote still uses AudioDecoder
unconditionally. New approach: monkey-patch Audio.__call__ in diarize.py
to use torchaudio.load() directly, bypassing the broken torchcodec path.
Patch runs once before pipeline loading.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- DevTools off by default (no more auto-open on launch)
- New "Developer" tab in Settings with a checkbox to toggle devtools
- Toggle takes effect immediately (opens/closes inspector)
- Setting persists: devtools restored on next launch if enabled
- toggle_devtools Tauri command wraps window.open/close_devtools
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CSP: Add connect-src for ipc.localhost and asset.localhost so Tauri IPC
commands and local file loading (waveform, audio playback) work.
pyannote: Block torchcodec in sys.modules at startup so pyannote.audio
falls back to torchaudio for audio decoding. pyannote has a bug where
it uses AudioDecoder unconditionally even when torchcodec import fails.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Enable Tauri devtools feature so right-click Inspect works in release
- Open devtools automatically on launch for debugging
- Add log_frontend command: frontend can write to ~/.voicetonotes/frontend.log
- Sidecar logs go to %LOCALAPPDATA%/com.voicetonotes.app/sidecar.log
- Frontend logs go to %USERPROFILE%/.voicetonotes/frontend.log
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
torchcodec is partially bundled but non-functional (missing FFmpeg DLLs),
causing pyannote.audio to try AudioDecoder which fails with NameError.
Excluding it forces pyannote to fall back to torchaudio for audio loading.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add git pull --rebase before push in both version bump workflows to
handle concurrent pushes from parallel workflows
- Add explicit python/ change detection in sidecar workflow (Gitea may
not support paths filter), skip all jobs if no python changes
- Gate all sidecar build jobs on has_changes output
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sidecar now has its own version (1.0.0) and release lifecycle:
- Sidecar tags: sidecar-v1.0.0, sidecar-v1.0.1, etc.
- App tags: v0.2.x (unchanged)
- Sidecar workflow triggers only on python/** changes or manual dispatch
- App release no longer bumps python/pyproject.toml
Sidecar version tracked via sidecar-version.txt in app data dir:
- resolve_sidecar_path() reads version from file instead of CARGO_PKG_VERSION
- download_sidecar() fetches latest sidecar-v* release from Gitea API
- check_sidecar_update() compares local vs remote sidecar versions
- Version file written after successful download
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>