Files >1 hour are split into 5-minute chunks. Previously each chunk
showed "Starting transcription..." making it look like a restart.
Now shows "Chunk 3/12: Starting transcription..." and
"Chunk 3/12: Transcribing segment 5 (42% of audio)..."
Also skips the "Loading model..." message for chunks after the first
since the model is already loaded.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AIChatPanel had its own hardcoded configMap with the old llama-server
URL (localhost:8080) and field names (local_model_path). Every chat
message reconfigured the provider with these wrong values, overriding
the correct settings applied at startup.
Fix: replace the duplicate with a call to the shared configureAIProvider().
Also strip trailing slashes from ollama_url before appending /v1 to
prevent double-slash URLs (http://localhost:11434//v1).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cache loaded audio in _sf_load() — previously the entire WAV file was
re-read from disk for every 10s crop call. For a 3-hour file with
1000+ chunks, this meant ~345GB of disk reads. Now read once, cached.
- Better progress messages for long files: show elapsed time in m:ss
format, warn "(180min audio, this may take a while)" for files >10min
- Increased progress poll interval from 2s to 5s (less noise)
- Better time estimate: use 0.8x audio duration (was 0.5x)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cancel button on the progress overlay during transcription
- Clicking Cancel shows confirmation: "Processing is incomplete. If you
cancel now, the transcription will need to be started over."
- "Continue Processing" dismisses the dialog, "Cancel Processing" stops
- Cancel clears partial results (segments, speakers) and resets UI
- Pipeline results are discarded if cancelled during processing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AI provider:
- Extract configureAIProvider() from saveSettings for reuse
- Call it on app startup after sidecar is ready (was only called on Save)
- Call it after first-time sidecar download completes
- Sidecar now receives correct Ollama URL/model immediately
Video extraction:
- Hide ffmpeg console window on Windows (CREATE_NO_WINDOW flag)
- Show "Extracting audio from video..." overlay with spinner during extraction
- UI stays responsive while ffmpeg runs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Video files (MP4, MKV, etc.) are now processed with ffmpeg to extract
audio to a temp WAV file before loading into wavesurfer. This prevents
the WebView crash caused by trying to fetch multi-GB files into memory.
- New extract_audio Tauri command uses ffmpeg (sidecar-bundled or system)
- Frontend detects video extensions and extracts audio automatically
- User-friendly error if ffmpeg is not installed with install instructions
- Reverted wavesurfer MediaElement approach in favor of clean extraction
- Added FFmpeg install guide to USER_GUIDE.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The configure action registered the provider but never called
set_active(), so the sidecar kept using the old/default provider.
Also updated the local provider default from localhost:8080 to
localhost:11434/v1 (Ollama). Added debug logging for configure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diarization: Audio.crop patch now pads short segments with zeros to
match the expected duration. pyannote batches embeddings with vstack
which requires uniform tensor sizes — the last segment of a file can
be shorter than the 10s window.
CI: Reordered sidecar workflow to check for python/ changes FIRST,
before bumping version or configuring git. All subsequent steps are
gated on has_changes. This prevents unnecessary version bumps and
build runs when only app code changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs daily at 6am UTC and on manual dispatch. Separately tracks app
releases (v*) and sidecar releases (sidecar-v*), keeping the latest
5 of each and deleting older ones along with their tags.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
base_url was being set to 'http://localhost:11434/v1' by the frontend,
then LocalProvider appended another '/v1', resulting in '/v1/v1'.
Now the provider uses base_url directly (frontend already appends /v1).
Also fixed health check to hit Ollama root instead of /health.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous patch only replaced Audio.__call__ (segmentation), but
pyannote also calls Audio.crop during speaker embedding extraction.
crop loads a time segment of audio — patched to load full file via
soundfile then slice the tensor to the requested time range.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
soundfile was only a transitive dep of torchaudio but collect_all()
in PyInstaller can't bundle it if it's not installed. Adding it as
an explicit dependency ensures it's in the venv and bundled correctly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
soundfile needs collect_all() to include libsndfile native library —
hiddenimports alone wasn't enough, causing 'No module named soundfile'
in the frozen sidecar. This is needed for the pyannote Audio patch
that bypasses torchaudio/torchcodec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add min-width: 0 to flex container (allows shrinking for wrap)
- Add overflow-x: hidden to prevent horizontal scroll
- Add white-space: pre-wrap to segment text
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- AI Provider: "Local (llama-server)" changed to "Ollama" with URL and
model fields (defaults to localhost:11434, llama3.2)
- Ollama connects via its OpenAI-compatible API (/v1 endpoint)
- Removed empty "Local AI" tab
- Renamed "Developer" tab to "Debug"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
torchaudio 2.10 unconditionally delegates load() to torchcodec, ignoring
the backend parameter. Since torchcodec is excluded from PyInstaller,
this broke our pyannote Audio monkey-patch.
Fix: replace torchaudio.load() with soundfile.read() + torch.from_numpy().
soundfile handles WAV natively (audio is pre-converted to WAV), has no
torchcodec dependency, and is already a transitive dependency.
Also added soundfile to PyInstaller hiddenimports.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CSP: Add blob: to connect-src/img-src/media-src for wavesurfer.js audio
playback. Add http://tauri.localhost to default-src for devtools.
pyannote: sys.modules block didn't work — pyannote still uses AudioDecoder
unconditionally. New approach: monkey-patch Audio.__call__ in diarize.py
to use torchaudio.load() directly, bypassing the broken torchcodec path.
Patch runs once before pipeline loading.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- DevTools off by default (no more auto-open on launch)
- New "Developer" tab in Settings with a checkbox to toggle devtools
- Toggle takes effect immediately (opens/closes inspector)
- Setting persists: devtools restored on next launch if enabled
- toggle_devtools Tauri command wraps window.open/close_devtools
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CSP: Add connect-src for ipc.localhost and asset.localhost so Tauri IPC
commands and local file loading (waveform, audio playback) work.
pyannote: Block torchcodec in sys.modules at startup so pyannote.audio
falls back to torchaudio for audio decoding. pyannote has a bug where
it uses AudioDecoder unconditionally even when torchcodec import fails.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>