98 Commits
v0.2.5 ... main

Author SHA1 Message Date
Gitea Actions
81407b51ee chore: bump version to 0.2.46 [skip ci] 2026-03-24 02:04:26 +00:00
Claude
a3a45cb308 Gate set_executable_permissions call with #[cfg(unix)]
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m18s
Release / Build App (Windows) (push) Successful in 3m11s
Release / Build App (Linux) (push) Successful in 3m31s
The method is cfg(unix) but the call site wasn't gated, causing a
compile error on Windows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 19:04:19 -07:00
Gitea Actions
e0e1638327 chore: bump version to 0.2.45 [skip ci] 2026-03-23 20:53:55 +00:00
Claude
c4fffad027 Fix permissions on demand instead of every launch
Some checks failed
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Windows) (push) Failing after 1m56s
Release / Build App (Linux) (push) Successful in 3m39s
Instead of chmod on every app start, catch EACCES (error 13) when
spawning sidecar or ffmpeg, fix permissions, then retry once:
- sidecar spawn: catches permission denied, runs set_executable_permissions
  on the sidecar dir, retries spawn
- ffmpeg: catches permission denied, chmod +x ffmpeg and ffprobe, retries

Zero overhead on normal launches. Only fixes permissions when actually needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 13:53:47 -07:00
Gitea Actions
618edf65ab chore: bump version to 0.2.44 [skip ci] 2026-03-23 20:45:32 +00:00
Claude
c5b8eb06c6 Fix permissions on already-extracted sidecar dirs
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m20s
Release / Build App (Windows) (push) Successful in 2m59s
Release / Build App (Linux) (push) Successful in 3m35s
The chmod fix only ran after fresh extraction, but existing sidecar
dirs extracted by older versions still lacked execute permissions.
Now set_executable_permissions() runs on EVERY app launch (both the
early-return path for existing dirs and after fresh extraction).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 13:45:26 -07:00
Gitea Actions
4f44bdd037 chore: bump version to 0.2.43 [skip ci] 2026-03-23 20:30:33 +00:00
Claude
32bfbd3791 Set execute permissions on ALL files in sidecar dir on Unix
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m43s
Release / Build App (Windows) (push) Successful in 3m20s
Release / Build App (Linux) (push) Successful in 3m36s
Previously only the main sidecar binary got chmod 755. Now all files
in the extraction directory get execute permissions — covers ffmpeg,
ffprobe, and any other bundled binaries. Applied in three places:
- sidecar/mod.rs: after local extraction
- commands/sidecar.rs: after download extraction
- commands/media.rs: removed single-file fix (now handled globally)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 13:30:26 -07:00
Gitea Actions
2bfb1b276e chore: bump version to 0.2.42 [skip ci] 2026-03-23 20:18:57 +00:00
Claude
908762073f Fix ffmpeg permission denied on Linux
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m31s
Release / Build App (Windows) (push) Successful in 3m25s
Release / Build App (Linux) (push) Successful in 3m28s
The bundled ffmpeg in the sidecar extract dir lacked execute permissions.
Now sets chmod 755 on Unix when find_ffmpeg locates the bundled binary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 13:18:51 -07:00
Gitea Actions
2011015c9a chore: bump version to 0.2.41 [skip ci] 2026-03-23 17:25:07 +00:00
Claude
fc5cfc4374 Save As: use save dialog so user can type a new project name
All checks were successful
Release / Bump version and tag (push) Successful in 4s
Release / Build App (macOS) (push) Successful in 1m20s
Release / Build App (Windows) (push) Successful in 3m5s
Release / Build App (Linux) (push) Successful in 3m44s
Changed from folder picker (can only select existing folders) to save
dialog where the user can type a new name. The typed name becomes the
project folder, created automatically if it doesn't exist. Any file
extension the user types is stripped (e.g. "My Project.vtn" becomes
the folder "My Project/").

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 10:25:00 -07:00
Gitea Actions
ac0fe3b4c7 chore: bump version to 0.2.40 [skip ci] 2026-03-23 16:56:19 +00:00
Claude
e05f9afaff Add Save As, auto-migrate v1 projects to folder structure
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Windows) (push) Successful in 3m5s
Release / Build App (Linux) (push) Successful in 3m22s
Save behavior:
- Save on v2 project: saves in place (no dialog)
- Save on v1 project: auto-migrates to folder structure next to the
  original .vtn (creates ProjectName/ folder with .vtn + audio.wav)
- Save on unsaved project: opens folder picker (Save As)
- Save As: always opens folder picker for a new location

Added projectIsV2 state to track project format version.
Split "Save Project" button into "Save" + "Save As".
Extracted saveToFolder() helper for shared save logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 09:56:13 -07:00
Gitea Actions
548d260061 chore: bump version to 0.2.39 [skip ci] 2026-03-23 16:51:24 +00:00
Claude
168a43e0e1 Save project: pick folder instead of file
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m16s
Release / Build App (Windows) (push) Successful in 3m7s
Release / Build App (Linux) (push) Successful in 3m22s
Changed save dialog from file picker (.vtn) to folder picker. The
project name is derived from the folder name. Files are created
inside the chosen folder:
  Folder/
    Folder.vtn
    audio.wav

Also: save-in-place for already-saved projects (Ctrl+S just saves,
no dialog). Extracted buildProjectData() helper for reuse.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 09:51:14 -07:00
Gitea Actions
543decd769 chore: bump version to 0.2.38 [skip ci] 2026-03-23 16:48:36 +00:00
Claude
e05f88eecf Make ProjectFile struct support both v1 and v2 formats
Some checks failed
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m20s
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
audio_file, source_file, audio_wav are all optional with serde defaults.
v1 projects have audio_file, v2 projects have source_file + audio_wav.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 09:48:29 -07:00
Gitea Actions
fee1255cac chore: bump version to 0.2.37 [skip ci] 2026-03-23 15:47:16 +00:00
Claude
2e9f2519b1 Project folders, always-extract audio, re-link support
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Windows) (push) Successful in 3m6s
Release / Build App (Linux) (push) Successful in 3m25s
Projects now save as folders containing .vtn + audio.wav:
  My Transcript/
    My Transcript.vtn
    audio.wav

Audio handling:
- Always extract to 22kHz mono WAV on import (all formats, not just video)
- Prevents WebAudio crash from decoding large MP3/FLAC/OGG to PCM in memory
- WAV saved alongside .vtn on project save (moved from temp)
- Sidecar still uses original file (does its own conversion)

Project format v2:
- source_file: original import path (for re-extraction)
- audio_wav: relative path to extracted WAV (portable)

Re-link on open:
- If audio.wav exists → load directly
- If missing but source exists → re-extract automatically
- If both missing → dialog to locate file via file picker
- V1 project migration: extracts WAV on first open

New Rust commands: check_file_exists, copy_file, create_dir
extract_audio: now accepts optional output_path, uses 22kHz sample rate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 08:47:08 -07:00
Gitea Actions
82bfcfb793 chore: bump version to 0.2.36 [skip ci] 2026-03-23 14:58:10 +00:00
Gitea Actions
73eab2e80c chore: bump sidecar version to 1.0.13 [skip ci] 2026-03-23 14:58:07 +00:00
Claude
33ca3e4a28 Show chunk context in transcription progress for large files
All checks were successful
Build Sidecars / Bump sidecar version and tag (push) Successful in 3s
Release / Bump version and tag (push) Successful in 3s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 8m30s
Release / Build App (macOS) (push) Successful in 1m19s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 12m9s
Release / Build App (Linux) (push) Successful in 3m36s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 29m36s
Release / Build App (Windows) (push) Successful in 3m13s
Files >1 hour are split into 5-minute chunks. Previously each chunk
showed "Starting transcription..." making it look like a restart.
Now shows "Chunk 3/12: Starting transcription..." and
"Chunk 3/12: Transcribing segment 5 (42% of audio)..."

Also skips the "Loading model..." message for chunks after the first
since the model is already loaded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 07:57:59 -07:00
Gitea Actions
e65d8b0510 chore: bump version to 0.2.35 [skip ci] 2026-03-23 14:31:13 +00:00
Claude
a7364f2e50 Fix 's is not defined' in AIChatPanel
All checks were successful
Release / Bump version and tag (push) Successful in 4s
Release / Build App (macOS) (push) Successful in 1m18s
Release / Build App (Linux) (push) Successful in 3m37s
Release / Build App (Windows) (push) Successful in 3m53s
Leftover reference to removed 's' variable — changed to $settings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 07:31:07 -07:00
Gitea Actions
809acfc781 chore: bump version to 0.2.34 [skip ci] 2026-03-23 13:42:26 +00:00
Claude
96e9a6d38b Fix Ollama: remove duplicate stale configMap in AIChatPanel
All checks were successful
Release / Bump version and tag (push) Successful in 6s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Linux) (push) Successful in 4m49s
Release / Build App (Windows) (push) Successful in 3m8s
AIChatPanel had its own hardcoded configMap with the old llama-server
URL (localhost:8080) and field names (local_model_path). Every chat
message reconfigured the provider with these wrong values, overriding
the correct settings applied at startup.

Fix: replace the duplicate with a call to the shared configureAIProvider().
Also strip trailing slashes from ollama_url before appending /v1 to
prevent double-slash URLs (http://localhost:11434//v1).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 06:33:03 -07:00
Gitea Actions
ddfbd65478 chore: bump version to 0.2.33 [skip ci] 2026-03-23 13:24:46 +00:00
Gitea Actions
e80ee3a18f chore: bump sidecar version to 1.0.12 [skip ci] 2026-03-23 13:24:34 +00:00
Claude
806586ae3d Fix diarization performance for long files + better progress
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 11s
Release / Bump version and tag (push) Successful in 10s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m0s
Release / Build App (macOS) (push) Successful in 1m16s
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Build Sidecars / Build Sidecar (Linux) (push) Successful in 17m34s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 28m9s
- Cache loaded audio in _sf_load() — previously the entire WAV file was
  re-read from disk for every 10s crop call. For a 3-hour file with
  1000+ chunks, this meant ~345GB of disk reads. Now read once, cached.
- Better progress messages for long files: show elapsed time in m:ss
  format, warn "(180min audio, this may take a while)" for files >10min
- Increased progress poll interval from 2s to 5s (less noise)
- Better time estimate: use 0.8x audio duration (was 0.5x)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 06:24:21 -07:00
Gitea Actions
999bdaa671 chore: bump version to 0.2.32 [skip ci] 2026-03-23 12:38:47 +00:00
Claude
b1d46fd42e Add cancel button to processing overlay with confirmation
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m21s
Release / Build App (Windows) (push) Successful in 3m8s
Release / Build App (Linux) (push) Successful in 3m40s
- Cancel button on the progress overlay during transcription
- Clicking Cancel shows confirmation: "Processing is incomplete. If you
  cancel now, the transcription will need to be started over."
- "Continue Processing" dismisses the dialog, "Cancel Processing" stops
- Cancel clears partial results (segments, speakers) and resets UI
- Pipeline results are discarded if cancelled during processing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 05:38:40 -07:00
Gitea Actions
818cbfa69c chore: bump version to 0.2.31 [skip ci] 2026-03-23 12:30:19 +00:00
Claude
aa319eb823 Fix Ollama settings on startup + video extraction UX
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m18s
Release / Build App (Linux) (push) Successful in 3m44s
Release / Build App (Windows) (push) Successful in 3m57s
AI provider:
- Extract configureAIProvider() from saveSettings for reuse
- Call it on app startup after sidecar is ready (was only called on Save)
- Call it after first-time sidecar download completes
- Sidecar now receives correct Ollama URL/model immediately

Video extraction:
- Hide ffmpeg console window on Windows (CREATE_NO_WINDOW flag)
- Show "Extracting audio from video..." overlay with spinner during extraction
- UI stays responsive while ffmpeg runs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 05:30:14 -07:00
Gitea Actions
8faa336cbc chore: bump version to 0.2.30 [skip ci] 2026-03-23 03:12:25 +00:00
Claude
02c70f90c8 Extract audio from video files before loading
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Linux) (push) Successful in 4m53s
Release / Build App (Windows) (push) Successful in 3m45s
Video files (MP4, MKV, etc.) are now processed with ffmpeg to extract
audio to a temp WAV file before loading into wavesurfer. This prevents
the WebView crash caused by trying to fetch multi-GB files into memory.

- New extract_audio Tauri command uses ffmpeg (sidecar-bundled or system)
- Frontend detects video extensions and extracts audio automatically
- User-friendly error if ffmpeg is not installed with install instructions
- Reverted wavesurfer MediaElement approach in favor of clean extraction
- Added FFmpeg install guide to USER_GUIDE.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 20:04:10 -07:00
Gitea Actions
66db827f17 chore: bump version to 0.2.29 [skip ci] 2026-03-23 02:55:23 +00:00
Gitea Actions
d9fcc9a5bd chore: bump sidecar version to 1.0.11 [skip ci] 2026-03-23 02:55:17 +00:00
Claude
ca5dc98d24 Fix Ollama: set_active after configure + fix default URL
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 5s
Release / Bump version and tag (push) Successful in 5s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m35s
Release / Build App (macOS) (push) Successful in 1m18s
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Build Sidecars / Build Sidecar (Linux) (push) Successful in 16m56s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 37m0s
The configure action registered the provider but never called
set_active(), so the sidecar kept using the old/default provider.
Also updated the local provider default from localhost:8080 to
localhost:11434/v1 (Ollama). Added debug logging for configure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:55:09 -07:00
Gitea Actions
da49c04119 chore: bump version to 0.2.28 [skip ci] 2026-03-23 01:30:57 +00:00
Gitea Actions
833ddb67de chore: bump sidecar version to 1.0.10 [skip ci] 2026-03-23 01:30:51 +00:00
Claude
879a1f3fd6 Fix diarization tensor mismatch + fix sidecar build triggers
All checks were successful
Build Sidecars / Bump sidecar version and tag (push) Successful in 7s
Release / Bump version and tag (push) Successful in 5s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m32s
Release / Build App (macOS) (push) Successful in 1m16s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 16m28s
Release / Build App (Linux) (push) Successful in 4m26s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 33m5s
Release / Build App (Windows) (push) Successful in 3m29s
Diarization: Audio.crop patch now pads short segments with zeros to
match the expected duration. pyannote batches embeddings with vstack
which requires uniform tensor sizes — the last segment of a file can
be shorter than the 10s window.

CI: Reordered sidecar workflow to check for python/ changes FIRST,
before bumping version or configuring git. All subsequent steps are
gated on has_changes. This prevents unnecessary version bumps and
build runs when only app code changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 18:30:43 -07:00
Gitea Actions
6f9dc9a95e chore: bump version to 0.2.27 [skip ci] 2026-03-23 01:05:15 +00:00
Claude
2a9a7e42a3 Add daily workflow to clean up old releases (keep latest 5)
All checks were successful
Release / Bump version and tag (push) Successful in 4s
Release / Build App (macOS) (push) Successful in 1m25s
Release / Build App (Linux) (push) Successful in 4m43s
Release / Build App (Windows) (push) Successful in 3m20s
Runs daily at 6am UTC and on manual dispatch. Separately tracks app
releases (v*) and sidecar releases (sidecar-v*), keeping the latest
5 of each and deleting older ones along with their tags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 18:05:08 -07:00
Gitea Actions
34b060a8a5 chore: bump version to 0.2.26 [skip ci] 2026-03-23 00:42:00 +00:00
Gitea Actions
3dc3172c00 chore: bump sidecar version to 1.0.9 [skip ci] 2026-03-23 00:41:56 +00:00
Claude
425e3c2b7c Fix Ollama connection: remove double /v1 in URL
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 3s
Release / Bump version and tag (push) Successful in 3s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 5m16s
Release / Build App (macOS) (push) Successful in 1m19s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m55s
Release / Build App (Linux) (push) Successful in 4m1s
Release / Build App (Windows) (push) Has been cancelled
Build Sidecars / Build Sidecar (Windows) (push) Successful in 33m38s
base_url was being set to 'http://localhost:11434/v1' by the frontend,
then LocalProvider appended another '/v1', resulting in '/v1/v1'.
Now the provider uses base_url directly (frontend already appends /v1).
Also fixed health check to hit Ollama root instead of /health.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 17:41:46 -07:00
Gitea Actions
bddce2fbeb chore: bump version to 0.2.25 [skip ci] 2026-03-23 00:38:11 +00:00
Gitea Actions
a764509fc5 chore: bump sidecar version to 1.0.8 [skip ci] 2026-03-23 00:38:07 +00:00
Claude
68524cbbd6 Also patch Audio.crop to fix diarization embedding extraction
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 4s
Release / Bump version and tag (push) Successful in 3s
Build Sidecars / Build Sidecar (Windows) (push) Has started running
Build Sidecars / Build Sidecar (Linux) (push) Has been cancelled
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Release / Build App (macOS) (push) Has been cancelled
Build Sidecars / Build Sidecar (macOS) (push) Has been cancelled
The previous patch only replaced Audio.__call__ (segmentation), but
pyannote also calls Audio.crop during speaker embedding extraction.
crop loads a time segment of audio — patched to load full file via
soundfile then slice the tensor to the requested time range.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 17:38:00 -07:00
Gitea Actions
cf4ac014df chore: bump version to 0.2.24 [skip ci] 2026-03-22 23:25:48 +00:00
Gitea Actions
3c270d6201 chore: bump sidecar version to 1.0.7 [skip ci] 2026-03-22 23:25:43 +00:00
Claude
aa49e8b7ed Add soundfile as explicit dependency
All checks were successful
Build Sidecars / Bump sidecar version and tag (push) Successful in 4s
Release / Bump version and tag (push) Successful in 5s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m46s
Release / Build App (macOS) (push) Successful in 1m17s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 15m3s
Release / Build App (Linux) (push) Successful in 4m25s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 56m51s
Release / Build App (Windows) (push) Successful in 3m18s
soundfile was only a transitive dep of torchaudio but collect_all()
in PyInstaller can't bundle it if it's not installed. Adding it as
an explicit dependency ensures it's in the venv and bundled correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 16:25:35 -07:00
Gitea Actions
a53da4f797 chore: bump version to 0.2.23 [skip ci] 2026-03-22 22:27:26 +00:00
Gitea Actions
212a8c874a chore: bump sidecar version to 1.0.6 [skip ci] 2026-03-22 22:27:21 +00:00
Claude
cd788026df Bundle soundfile with native libs in PyInstaller, link LICENSE in README
All checks were successful
Build Sidecars / Bump sidecar version and tag (push) Successful in 7s
Release / Bump version and tag (push) Successful in 5s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m48s
Release / Build App (macOS) (push) Successful in 1m19s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 12m2s
Release / Build App (Linux) (push) Successful in 4m40s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 28m52s
Release / Build App (Windows) (push) Successful in 3m30s
soundfile needs collect_all() to include libsndfile native library —
hiddenimports alone wasn't enough, causing 'No module named soundfile'
in the frozen sidecar. This is needed for the pyannote Audio patch
that bypasses torchaudio/torchcodec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 15:27:12 -07:00
Gitea Actions
a02a6bb441 chore: bump version to 0.2.22 [skip ci] 2026-03-22 19:06:30 +00:00
Claude
35173c54ce Update README, add User Guide and Contributing docs
All checks were successful
Release / Bump version and tag (push) Successful in 9s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Linux) (push) Successful in 4m50s
Release / Build App (Windows) (push) Successful in 3m21s
- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:13 -07:00
Gitea Actions
f022c6dfe0 chore: bump version to 0.2.21 [skip ci] 2026-03-22 19:03:34 +00:00
Claude
b1ae49066c Fix word wrap in transcript editor
Some checks failed
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m20s
Release / Build App (Windows) (push) Has been cancelled
Release / Build App (Linux) (push) Has been cancelled
- Add min-width: 0 to flex container (allows shrinking for wrap)
- Add overflow-x: hidden to prevent horizontal scroll
- Add white-space: pre-wrap to segment text

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:59:15 -07:00
Claude
4a9b00111d Settings: replace llama-server with Ollama, remove Local AI tab, rename Developer to Debug
Some checks failed
Release / Bump version and tag (push) Has been cancelled
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Release / Build App (macOS) (push) Has been cancelled
- AI Provider: "Local (llama-server)" changed to "Ollama" with URL and
  model fields (defaults to localhost:11434, llama3.2)
- Ollama connects via its OpenAI-compatible API (/v1 endpoint)
- Removed empty "Local AI" tab
- Renamed "Developer" tab to "Debug"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:55:09 -07:00
Gitea Actions
5af27f379d chore: bump version to 0.2.20 [skip ci] 2026-03-22 18:49:49 +00:00
Gitea Actions
be8d566604 chore: bump sidecar version to 1.0.5 [skip ci] 2026-03-22 18:49:46 +00:00
Claude
f9226ee4d0 Fix diarization: use soundfile instead of torchaudio for audio loading
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 3s
Release / Bump version and tag (push) Successful in 3s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m58s
Release / Build App (macOS) (push) Successful in 1m20s
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m41s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 34m33s
torchaudio 2.10 unconditionally delegates load() to torchcodec, ignoring
the backend parameter. Since torchcodec is excluded from PyInstaller,
this broke our pyannote Audio monkey-patch.

Fix: replace torchaudio.load() with soundfile.read() + torch.from_numpy().
soundfile handles WAV natively (audio is pre-converted to WAV), has no
torchcodec dependency, and is already a transitive dependency.

Also added soundfile to PyInstaller hiddenimports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:49:39 -07:00
Gitea Actions
4da40fc5fd chore: bump version to 0.2.19 [skip ci] 2026-03-22 18:00:07 +00:00
Gitea Actions
9989f65531 chore: bump sidecar version to 1.0.4 [skip ci] 2026-03-22 18:00:04 +00:00
Claude
2e7a5819bc Fix CSP for blob URLs + fix pyannote AudioDecoder with torchaudio patch
All checks were successful
Build Sidecars / Bump sidecar version and tag (push) Successful in 4s
Release / Bump version and tag (push) Successful in 3s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m25s
Release / Build App (macOS) (push) Successful in 1m26s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 14m31s
Release / Build App (Linux) (push) Successful in 3m50s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 27m7s
Release / Build App (Windows) (push) Successful in 3m26s
CSP: Add blob: to connect-src/img-src/media-src for wavesurfer.js audio
playback. Add http://tauri.localhost to default-src for devtools.

pyannote: sys.modules block didn't work — pyannote still uses AudioDecoder
unconditionally. New approach: monkey-patch Audio.__call__ in diarize.py
to use torchaudio.load() directly, bypassing the broken torchcodec path.
Patch runs once before pipeline loading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 10:59:54 -07:00
Gitea Actions
31044b9ad2 chore: bump version to 0.2.18 [skip ci] 2026-03-22 17:56:00 +00:00
Claude
7f1fa1904c Make DevTools a toggle in Settings > Developer tab
Some checks failed
Release / Bump version and tag (push) Successful in 7s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Windows) (push) Successful in 3m29s
Release / Build App (Linux) (push) Has been cancelled
- DevTools off by default (no more auto-open on launch)
- New "Developer" tab in Settings with a checkbox to toggle devtools
- Toggle takes effect immediately (opens/closes inspector)
- Setting persists: devtools restored on next launch if enabled
- toggle_devtools Tauri command wraps window.open/close_devtools

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 10:55:50 -07:00
Gitea Actions
33443c8b00 chore: bump version to 0.2.17 [skip ci] 2026-03-22 16:54:37 +00:00
Gitea Actions
1104d956c9 chore: bump sidecar version to 1.0.3 [skip ci] 2026-03-22 16:54:32 +00:00
Claude
db770c341d Fix CSP blocking IPC/assets + fix pyannote AudioDecoder crash
All checks were successful
Build Sidecars / Bump sidecar version and tag (push) Successful in 9s
Release / Bump version and tag (push) Successful in 5s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m37s
Release / Build App (macOS) (push) Successful in 1m16s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 14m3s
Release / Build App (Linux) (push) Successful in 4m45s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 24m32s
Release / Build App (Windows) (push) Successful in 3m12s
CSP: Add connect-src for ipc.localhost and asset.localhost so Tauri IPC
commands and local file loading (waveform, audio playback) work.

pyannote: Block torchcodec in sys.modules at startup so pyannote.audio
falls back to torchaudio for audio decoding. pyannote has a bug where
it uses AudioDecoder unconditionally even when torchcodec import fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 09:54:21 -07:00
Gitea Actions
4160202ad7 chore: bump version to 0.2.16 [skip ci] 2026-03-22 16:27:25 +00:00
Claude
e2c5db89b6 Enable devtools in release builds + add frontend logging
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m16s
Release / Build App (Linux) (push) Successful in 4m30s
Release / Build App (Windows) (push) Successful in 3m20s
- Enable Tauri devtools feature so right-click Inspect works in release
- Open devtools automatically on launch for debugging
- Add log_frontend command: frontend can write to ~/.voicetonotes/frontend.log
- Sidecar logs go to %LOCALAPPDATA%/com.voicetonotes.app/sidecar.log
- Frontend logs go to %USERPROFILE%/.voicetonotes/frontend.log

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 09:16:29 -07:00
Gitea Actions
25e4ceaec9 chore: bump version to 0.2.15 [skip ci] 2026-03-22 16:13:22 +00:00
Gitea Actions
a5e3d47f02 chore: bump sidecar version to 1.0.2 [skip ci] 2026-03-22 16:13:16 +00:00
Claude
62675c77ae Exclude torchcodec from PyInstaller to fix diarization
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 5s
Release / Bump version and tag (push) Successful in 5s
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Release / Build App (macOS) (push) Has been cancelled
Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m8s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m59s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 25m43s
torchcodec is partially bundled but non-functional (missing FFmpeg DLLs),
causing pyannote.audio to try AudioDecoder which fails with NameError.
Excluding it forces pyannote to fall back to torchaudio for audio loading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 09:13:09 -07:00
Gitea Actions
c783cde882 chore: bump version to 0.2.14 [skip ci] 2026-03-22 15:53:42 +00:00
Claude
fffd727d30 Add bytes crate dependency for reqwest stream chunks
All checks were successful
Release / Bump version and tag (push) Successful in 5s
Release / Build App (macOS) (push) Successful in 1m18s
Release / Build App (Windows) (push) Successful in 3m22s
Release / Build App (Linux) (push) Successful in 4m51s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 08:53:36 -07:00
Gitea Actions
ded8f8a4e0 chore: bump version to 0.2.13 [skip ci] 2026-03-22 15:46:41 +00:00
Claude
0c74572d94 Fix workflow race condition and sidecar path filter
Some checks failed
Release / Bump version and tag (push) Successful in 5s
Release / Build App (macOS) (push) Failing after 55s
Release / Build App (Windows) (push) Failing after 2m14s
Release / Build App (Linux) (push) Failing after 4m12s
- Add git pull --rebase before push in both version bump workflows to
  handle concurrent pushes from parallel workflows
- Add explicit python/ change detection in sidecar workflow (Gitea may
  not support paths filter), skip all jobs if no python changes
- Gate all sidecar build jobs on has_changes output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 08:46:34 -07:00
Gitea Actions
d33bb609d7 chore: bump sidecar version to 1.0.1 [skip ci] 2026-03-22 14:58:05 +00:00
Claude
45247ae66e Decouple sidecar versioning from app versioning
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 3s
Release / Bump version and tag (push) Failing after 3s
Release / Build App (Linux) (push) Has been skipped
Release / Build App (Windows) (push) Has been skipped
Release / Build App (macOS) (push) Has been skipped
Build Sidecars / Build Sidecar (macOS) (push) Successful in 5m28s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m54s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 37m38s
Sidecar now has its own version (1.0.0) and release lifecycle:
- Sidecar tags: sidecar-v1.0.0, sidecar-v1.0.1, etc.
- App tags: v0.2.x (unchanged)
- Sidecar workflow triggers only on python/** changes or manual dispatch
- App release no longer bumps python/pyproject.toml

Sidecar version tracked via sidecar-version.txt in app data dir:
- resolve_sidecar_path() reads version from file instead of CARGO_PKG_VERSION
- download_sidecar() fetches latest sidecar-v* release from Gitea API
- check_sidecar_update() compares local vs remote sidecar versions
- Version file written after successful download

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 07:57:51 -07:00
Claude
9652290a06 Split CI into app + sidecar workflows, fix reqwest compilation
Some checks failed
Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m39s
Release / Bump version and tag (push) Has been cancelled
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Release / Build App (macOS) (push) Has been cancelled
Build Sidecars / Build Sidecar (Windows) (push) Has been cancelled
Build Sidecars / Build Sidecar (Linux) (push) Has been cancelled
CI split:
- release.yml: version bump + lightweight app builds (no Python/sidecar)
- build-sidecar.yml: builds CPU + CUDA sidecar variants per platform,
  uploads as separate release assets, runs in parallel with app builds
- Sidecar workflow uses retry loop to find release (race with version bump)

Fixes:
- Add reqwest "json" feature for .json() method
- Add explicit type annotations for reqwest Response and bytes::Bytes
- Reuse client instance for download (was using reqwest::get directly)

Bundle targets: deb, rpm, nsis, msi, dmg (all formats, app is small now)
Windows upload finds both *.msi and *-setup.exe

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 07:50:55 -07:00
Gitea Actions
5947ffef66 chore: bump version to 0.2.12 [skip ci] 2026-03-22 14:09:28 +00:00
Claude
7fa903ad01 Download sidecar on first launch instead of bundling
Some checks failed
Release / Bump version and tag (push) Successful in 13s
Release / Build (macOS) (push) Failing after 4m55s
Release / Build (Windows) (push) Failing after 14m58s
Release / Build (Linux) (push) Failing after 17m18s
Major refactor: sidecar is no longer bundled in the installer. Instead,
it's downloaded on first launch with a setup screen offering CPU vs CUDA
choice. This solves the 2GB+ installer size limit and decouples app/sidecar.

Backend:
- New commands: check_sidecar, download_sidecar, check_sidecar_update
- Streaming download with progress events via reqwest
- Added reqwest + futures-util dependencies
- Removed sidecar.zip from bundle resources
- Restored NSIS target (no longer size-constrained)

CI:
- Each platform builds both CPU and CUDA sidecar variants (except macOS: CPU only)
- Sidecar zips uploaded as separate release assets
- Asset naming: sidecar-{os}-{arch}-{variant}.zip

Frontend:
- SidecarSetup.svelte: first-launch setup with CPU/CUDA radio choice,
  progress bar, error/retry handling
- Update banner on launch if newer sidecar version available
- Conditional rendering: setup screen → main app flow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 07:09:10 -07:00
Gitea Actions
1b706a855b chore: bump version to 0.2.11 [skip ci] 2026-03-22 13:35:22 +00:00
Claude
70b6b5f1ed Add RPM target for Linux, install 7z fallback on Windows
Some checks failed
Release / Bump version and tag (push) Successful in 3s
Release / Build (macOS) (push) Successful in 5m2s
Release / Build (Windows) (push) Failing after 8m12s
Release / Build (Linux) (push) Has been cancelled
- Add rpm to bundle targets and install rpm on Linux CI
- Upload both .deb and .rpm from Linux build
- Install 7-Zip via choco if not already available on Windows runner

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 06:35:12 -07:00
Gitea Actions
67d3325532 chore: bump version to 0.2.10 [skip ci] 2026-03-22 13:33:25 +00:00
Claude
024efccd42 Fix Windows CUDA build: replace Compress-Archive and drop NSIS
Some checks failed
Release / Bump version and tag (push) Successful in 4s
Release / Build (Windows) (push) Has been cancelled
Release / Build (macOS) (push) Has been cancelled
Release / Build (Linux) (push) Has been cancelled
- Replace Compress-Archive (2GB limit) with 7z for sidecar packaging
- Remove NSIS from bundle targets — NSIS has a 2GB per-file limit that
  breaks with CUDA-sized sidecar.zip; MSI (WiX) handles large files
  by splitting into multiple CABs
- Update Windows upload to look for .msi only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 06:33:08 -07:00
Gitea Actions
7c814805f3 chore: bump version to 0.2.9 [skip ci] 2026-03-22 13:14:43 +00:00
Claude
5f9f1426e6 Fix CUDA build: use PyTorch CUDA index URL explicitly
Some checks failed
Release / Bump version and tag (push) Successful in 15s
Release / Build (macOS) (push) Successful in 5m11s
Release / Build (Windows) (push) Failing after 14m6s
Release / Build (Linux) (push) Has been cancelled
Default torch on PyPI is CPU-only on Windows. Must use PyTorch's own
package index (cu126) to get CUDA-enabled wheels. This also pins the
CUDA version on Linux for deterministic builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 06:14:26 -07:00
Gitea Actions
5311b19fdc chore: bump version to 0.2.8 [skip ci] 2026-03-22 12:56:13 +00:00
Claude
8f6e1108cc Enable CUDA for Windows/Linux builds + clean up old sidecars
Some checks failed
Release / Bump version and tag (push) Successful in 16s
Release / Build (macOS) (push) Successful in 4m51s
Release / Build (Linux) (push) Failing after 15m36s
Release / Build (Windows) (push) Has been cancelled
- Windows and Linux sidecar builds now use --with-cuda for GPU acceleration
  (macOS stays CPU-only — Apple Silicon uses Metal, not CUDA)
- Windows upload switched from --data-binary to -T streaming for 2GB+ files
- Add cleanup_old_sidecars() that removes stale sidecar-* directories on
  startup, keeping only the current version
- Add NSIS uninstall hook to remove sidecar data dir on Windows uninstall
  (user data in ~/.voicetonotes is preserved)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 05:55:55 -07:00
Gitea Actions
fcbe1afd0c chore: bump version to 0.2.7 [skip ci] 2026-03-22 12:37:00 +00:00
Claude
7efa3bb116 Fix CUDA fallback: gracefully fall back to CPU when CUDA libs missing
Some checks failed
Release / Bump version and tag (push) Successful in 18s
Release / Build (macOS) (push) Successful in 5m27s
Release / Build (Linux) (push) Successful in 11m38s
Release / Build (Windows) (push) Has been cancelled
- transcribe: catch model load failures on CUDA and retry with CPU
- hardware detect: test CUDA runtime actually works (torch.zeros on cuda)
  before recommending GPU, since CPU-only builds detect CUDA via driver
  but lack cublas/cuDNN libraries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 05:36:40 -07:00
Gitea Actions
2be5024de7 chore: bump version to 0.2.6 [skip ci] 2026-03-22 03:48:06 +00:00
Claude
32e3c6d42e Remove unittest from PyInstaller excludes (needed by huggingface_hub)
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build (macOS) (push) Successful in 5m46s
Release / Build (Linux) (push) Successful in 7m46s
Release / Build (Windows) (push) Successful in 16m29s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 20:48:01 -07:00
37 changed files with 2641 additions and 281 deletions

View File

@@ -0,0 +1,402 @@
name: Build Sidecars
on:
push:
branches: [main]
paths: ['python/**']
workflow_dispatch:
jobs:
bump-sidecar-version:
name: Bump sidecar version and tag
if: "!contains(github.event.head_commit.message, '[skip ci]')"
runs-on: ubuntu-latest
outputs:
version: ${{ steps.bump.outputs.version }}
tag: ${{ steps.bump.outputs.tag }}
has_changes: ${{ steps.check_changes.outputs.has_changes }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 2
- name: Check for python changes
id: check_changes
run: |
# If triggered by workflow_dispatch, always build
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
exit 0
fi
# Check if any python/ files changed in this commit
CHANGED=$(git diff --name-only HEAD~1 HEAD -- python/ 2>/dev/null || echo "")
if [ -n "$CHANGED" ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
echo "Python changes detected: $CHANGED"
else
echo "has_changes=false" >> $GITHUB_OUTPUT
echo "No python/ changes detected, skipping sidecar build"
fi
- name: Configure git
if: steps.check_changes.outputs.has_changes == 'true'
run: |
git config user.name "Gitea Actions"
git config user.email "actions@gitea.local"
- name: Bump sidecar patch version
if: steps.check_changes.outputs.has_changes == 'true'
id: bump
run: |
# Read current version from python/pyproject.toml
CURRENT=$(grep '^version = ' python/pyproject.toml | head -1 | sed 's/version = "\(.*\)"/\1/')
echo "Current sidecar version: ${CURRENT}"
# Increment patch number
MAJOR=$(echo "${CURRENT}" | cut -d. -f1)
MINOR=$(echo "${CURRENT}" | cut -d. -f2)
PATCH=$(echo "${CURRENT}" | cut -d. -f3)
NEW_PATCH=$((PATCH + 1))
NEW_VERSION="${MAJOR}.${MINOR}.${NEW_PATCH}"
echo "New sidecar version: ${NEW_VERSION}"
# Update python/pyproject.toml
sed -i "s/^version = \"${CURRENT}\"/version = \"${NEW_VERSION}\"/" python/pyproject.toml
echo "version=${NEW_VERSION}" >> $GITHUB_OUTPUT
echo "tag=sidecar-v${NEW_VERSION}" >> $GITHUB_OUTPUT
- name: Commit and tag
if: steps.check_changes.outputs.has_changes == 'true'
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
NEW_VERSION="${{ steps.bump.outputs.version }}"
TAG="${{ steps.bump.outputs.tag }}"
git add python/pyproject.toml
git commit -m "chore: bump sidecar version to ${NEW_VERSION} [skip ci]"
git tag "${TAG}"
# Push using token for authentication (rebase in case another workflow pushed first)
REMOTE_URL=$(git remote get-url origin | sed "s|://|://gitea-actions:${BUILD_TOKEN}@|")
git pull --rebase "${REMOTE_URL}" main || true
git push "${REMOTE_URL}" HEAD:main
git push "${REMOTE_URL}" "${TAG}"
- name: Create Gitea release
if: steps.check_changes.outputs.has_changes == 'true'
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${{ steps.bump.outputs.tag }}"
VERSION="${{ steps.bump.outputs.version }}"
RELEASE_NAME="Sidecar v${VERSION}"
curl -s -X POST \
-H "Authorization: token ${BUILD_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"tag_name\": \"${TAG}\", \"name\": \"${RELEASE_NAME}\", \"body\": \"Automated sidecar build.\", \"draft\": false, \"prerelease\": false}" \
"${REPO_API}/releases"
echo "Created release: ${RELEASE_NAME}"
build-sidecar-linux:
name: Build Sidecar (Linux)
needs: bump-sidecar-version
if: needs.bump-sidecar-version.outputs.has_changes == 'true'
runs-on: ubuntu-latest
env:
PYTHON_VERSION: "3.11"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ needs.bump-sidecar-version.outputs.tag }}
- name: Install uv
run: |
if command -v uv &> /dev/null; then
echo "uv already installed: $(uv --version)"
else
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> $GITHUB_PATH
fi
- name: Install ffmpeg
run: sudo apt-get update && sudo apt-get install -y ffmpeg
- name: Set up Python
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Build sidecar (CUDA)
working-directory: python
run: uv run --python ${{ env.PYTHON_VERSION }} python build_sidecar.py --with-cuda
- name: Package sidecar (CUDA)
run: |
cd python/dist/voice-to-notes-sidecar && zip -r ../../../sidecar-linux-x86_64-cuda.zip .
- name: Build sidecar (CPU)
working-directory: python
run: |
rm -rf dist/voice-to-notes-sidecar
uv run --python ${{ env.PYTHON_VERSION }} python build_sidecar.py --cpu-only
- name: Package sidecar (CPU)
run: |
cd python/dist/voice-to-notes-sidecar && zip -r ../../../sidecar-linux-x86_64-cpu.zip .
- name: Upload to sidecar release
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
sudo apt-get install -y jq
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${{ needs.bump-sidecar-version.outputs.tag }}"
# Find the sidecar release by tag (retry up to 30 times with 10s delay)
echo "Waiting for sidecar release ${TAG} to be available..."
for i in $(seq 1 30); do
RELEASE_JSON=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/tags/${TAG}")
RELEASE_ID=$(echo "$RELEASE_JSON" | jq -r '.id // empty')
if [ -n "${RELEASE_ID}" ] && [ "${RELEASE_ID}" != "null" ]; then
echo "Found sidecar release: ${TAG} (ID: ${RELEASE_ID})"
break
fi
echo "Attempt ${i}/30: Release not ready yet, retrying in 10s..."
sleep 10
done
if [ -z "${RELEASE_ID}" ] || [ "${RELEASE_ID}" = "null" ]; then
echo "ERROR: Failed to find sidecar release for tag ${TAG} after 30 attempts."
exit 1
fi
for file in sidecar-*.zip; do
filename=$(basename "$file")
encoded_name=$(echo "$filename" | sed 's/ /%20/g')
echo "Uploading ${filename} ($(du -h "$file" | cut -f1))..."
ASSET_ID=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/${RELEASE_ID}/assets" | jq -r ".[] | select(.name == \"${filename}\") | .id // empty")
if [ -n "${ASSET_ID}" ]; then
curl -s -X DELETE -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/${RELEASE_ID}/assets/${ASSET_ID}"
fi
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -X POST \
-H "Authorization: token ${BUILD_TOKEN}" \
-H "Content-Type: application/octet-stream" \
-T "$file" \
"${REPO_API}/releases/${RELEASE_ID}/assets?name=${encoded_name}")
echo "Upload response: HTTP ${HTTP_CODE}"
done
build-sidecar-windows:
name: Build Sidecar (Windows)
needs: bump-sidecar-version
if: needs.bump-sidecar-version.outputs.has_changes == 'true'
runs-on: windows-latest
env:
PYTHON_VERSION: "3.11"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ needs.bump-sidecar-version.outputs.tag }}
- name: Install uv
shell: powershell
run: |
if (Get-Command uv -ErrorAction SilentlyContinue) {
Write-Host "uv already installed: $(uv --version)"
} else {
irm https://astral.sh/uv/install.ps1 | iex
echo "$env:USERPROFILE\.local\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
}
- name: Install ffmpeg
shell: powershell
run: choco install ffmpeg -y
- name: Set up Python
shell: powershell
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Install 7-Zip
shell: powershell
run: |
if (-not (Get-Command 7z -ErrorAction SilentlyContinue)) {
choco install 7zip -y
}
- name: Build sidecar (CUDA)
shell: powershell
working-directory: python
run: uv run --python ${{ env.PYTHON_VERSION }} python build_sidecar.py --with-cuda
- name: Package sidecar (CUDA)
shell: powershell
run: |
7z a -tzip -mx=5 sidecar-windows-x86_64-cuda.zip .\python\dist\voice-to-notes-sidecar\*
- name: Build sidecar (CPU)
shell: powershell
working-directory: python
run: |
Remove-Item -Recurse -Force dist\voice-to-notes-sidecar
uv run --python ${{ env.PYTHON_VERSION }} python build_sidecar.py --cpu-only
- name: Package sidecar (CPU)
shell: powershell
run: |
7z a -tzip -mx=5 sidecar-windows-x86_64-cpu.zip .\python\dist\voice-to-notes-sidecar\*
- name: Upload to sidecar release
shell: powershell
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
$REPO_API = "${{ github.server_url }}/api/v1/repos/${{ github.repository }}"
$Headers = @{ "Authorization" = "token $env:BUILD_TOKEN" }
$TAG = "${{ needs.bump-sidecar-version.outputs.tag }}"
# Find the sidecar release by tag (retry up to 30 times with 10s delay)
Write-Host "Waiting for sidecar release ${TAG} to be available..."
$RELEASE_ID = $null
for ($i = 1; $i -le 30; $i++) {
try {
$release = Invoke-RestMethod -Uri "${REPO_API}/releases/tags/${TAG}" -Headers $Headers -ErrorAction Stop
$RELEASE_ID = $release.id
if ($RELEASE_ID) {
Write-Host "Found sidecar release: ${TAG} (ID: ${RELEASE_ID})"
break
}
} catch {
# Release not ready yet
}
Write-Host "Attempt ${i}/30: Release not ready yet, retrying in 10s..."
Start-Sleep -Seconds 10
}
if (-not $RELEASE_ID) {
Write-Host "ERROR: Failed to find sidecar release for tag ${TAG} after 30 attempts."
exit 1
}
Get-ChildItem -Path . -Filter "sidecar-*.zip" | ForEach-Object {
$filename = $_.Name
$encodedName = [System.Uri]::EscapeDataString($filename)
$size = [math]::Round($_.Length / 1MB, 1)
Write-Host "Uploading ${filename} (${size} MB)..."
try {
$assets = Invoke-RestMethod -Uri "${REPO_API}/releases/${RELEASE_ID}/assets" -Headers $Headers
$existing = $assets | Where-Object { $_.name -eq $filename }
if ($existing) {
Invoke-RestMethod -Uri "${REPO_API}/releases/${RELEASE_ID}/assets/$($existing.id)" -Method Delete -Headers $Headers
}
} catch {}
$uploadUrl = "${REPO_API}/releases/${RELEASE_ID}/assets?name=${encodedName}"
$result = curl.exe --fail --silent --show-error `
-X POST `
-H "Authorization: token $env:BUILD_TOKEN" `
-H "Content-Type: application/octet-stream" `
-T "$($_.FullName)" `
"$uploadUrl" 2>&1
if ($LASTEXITCODE -eq 0) {
Write-Host "Upload successful: ${filename}"
} else {
Write-Host "WARNING: Upload failed for ${filename}: ${result}"
}
}
build-sidecar-macos:
name: Build Sidecar (macOS)
needs: bump-sidecar-version
if: needs.bump-sidecar-version.outputs.has_changes == 'true'
runs-on: macos-latest
env:
PYTHON_VERSION: "3.11"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ needs.bump-sidecar-version.outputs.tag }}
- name: Install uv
run: |
if command -v uv &> /dev/null; then
echo "uv already installed: $(uv --version)"
else
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> $GITHUB_PATH
fi
- name: Install ffmpeg
run: brew install ffmpeg
- name: Set up Python
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Build sidecar (CPU)
working-directory: python
run: uv run --python ${{ env.PYTHON_VERSION }} python build_sidecar.py --cpu-only
- name: Package sidecar (CPU)
run: |
cd python/dist/voice-to-notes-sidecar && zip -r ../../../sidecar-macos-aarch64-cpu.zip .
- name: Upload to sidecar release
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
which jq || brew install jq
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${{ needs.bump-sidecar-version.outputs.tag }}"
# Find the sidecar release by tag (retry up to 30 times with 10s delay)
echo "Waiting for sidecar release ${TAG} to be available..."
for i in $(seq 1 30); do
RELEASE_JSON=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/tags/${TAG}")
RELEASE_ID=$(echo "$RELEASE_JSON" | jq -r '.id // empty')
if [ -n "${RELEASE_ID}" ] && [ "${RELEASE_ID}" != "null" ]; then
echo "Found sidecar release: ${TAG} (ID: ${RELEASE_ID})"
break
fi
echo "Attempt ${i}/30: Release not ready yet, retrying in 10s..."
sleep 10
done
if [ -z "${RELEASE_ID}" ] || [ "${RELEASE_ID}" = "null" ]; then
echo "ERROR: Failed to find sidecar release for tag ${TAG} after 30 attempts."
exit 1
fi
for file in sidecar-*.zip; do
filename=$(basename "$file")
encoded_name=$(echo "$filename" | sed 's/ /%20/g')
echo "Uploading ${filename} ($(du -h "$file" | cut -f1))..."
ASSET_ID=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/${RELEASE_ID}/assets" | jq -r ".[] | select(.name == \"${filename}\") | .id // empty")
if [ -n "${ASSET_ID}" ]; then
curl -s -X DELETE -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/${RELEASE_ID}/assets/${ASSET_ID}"
fi
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -X POST \
-H "Authorization: token ${BUILD_TOKEN}" \
-H "Content-Type: application/octet-stream" \
-T "$file" \
"${REPO_API}/releases/${RELEASE_ID}/assets?name=${encoded_name}")
echo "Upload response: HTTP ${HTTP_CODE}"
done

View File

@@ -0,0 +1,65 @@
name: Cleanup Old Releases
on:
# Run after release and sidecar workflows complete
schedule:
- cron: '0 6 * * *' # Daily at 6am UTC
workflow_dispatch:
jobs:
cleanup:
name: Remove old releases
runs-on: ubuntu-latest
env:
KEEP_COUNT: 5
steps:
- name: Cleanup old app releases
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
# Get all releases, sorted newest first (API default)
RELEASES=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases?limit=50")
# Separate app releases (v*) and sidecar releases (sidecar-v*)
APP_IDS=$(echo "$RELEASES" | jq -r '[.[] | select(.tag_name | startswith("v") and (startswith("sidecar") | not)) | .id] | .[]')
SIDECAR_IDS=$(echo "$RELEASES" | jq -r '[.[] | select(.tag_name | startswith("sidecar-v")) | .id] | .[]')
# Delete app releases beyond KEEP_COUNT
COUNT=0
for ID in $APP_IDS; do
COUNT=$((COUNT + 1))
if [ $COUNT -le ${{ env.KEEP_COUNT }} ]; then
continue
fi
TAG=$(echo "$RELEASES" | jq -r ".[] | select(.id == $ID) | .tag_name")
echo "Deleting app release $ID ($TAG)..."
curl -s -o /dev/null -w "HTTP %{http_code}\n" -X DELETE \
-H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/$ID"
# Also delete the tag
curl -s -o /dev/null -X DELETE \
-H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/tags/$TAG"
done
# Delete sidecar releases beyond KEEP_COUNT
COUNT=0
for ID in $SIDECAR_IDS; do
COUNT=$((COUNT + 1))
if [ $COUNT -le ${{ env.KEEP_COUNT }} ]; then
continue
fi
TAG=$(echo "$RELEASES" | jq -r ".[] | select(.id == $ID) | .tag_name")
echo "Deleting sidecar release $ID ($TAG)..."
curl -s -o /dev/null -w "HTTP %{http_code}\n" -X DELETE \
-H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/$ID"
curl -s -o /dev/null -X DELETE \
-H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/tags/$TAG"
done
echo "Cleanup complete. Kept latest ${{ env.KEEP_COUNT }} of each type."

View File

@@ -47,9 +47,6 @@ jobs:
# Update src-tauri/Cargo.toml (match version = "x.y.z" in [package] section) # Update src-tauri/Cargo.toml (match version = "x.y.z" in [package] section)
sed -i "s/^version = \"${CURRENT}\"/version = \"${NEW_VERSION}\"/" src-tauri/Cargo.toml sed -i "s/^version = \"${CURRENT}\"/version = \"${NEW_VERSION}\"/" src-tauri/Cargo.toml
# Update python/pyproject.toml
sed -i "s/^version = \".*\"/version = \"${NEW_VERSION}\"/" python/pyproject.toml
echo "new_version=${NEW_VERSION}" >> $GITHUB_OUTPUT echo "new_version=${NEW_VERSION}" >> $GITHUB_OUTPUT
echo "tag=v${NEW_VERSION}" >> $GITHUB_OUTPUT echo "tag=v${NEW_VERSION}" >> $GITHUB_OUTPUT
@@ -58,12 +55,13 @@ jobs:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }} BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: | run: |
NEW_VERSION="${{ steps.bump.outputs.new_version }}" NEW_VERSION="${{ steps.bump.outputs.new_version }}"
git add package.json src-tauri/tauri.conf.json src-tauri/Cargo.toml python/pyproject.toml git add package.json src-tauri/tauri.conf.json src-tauri/Cargo.toml
git commit -m "chore: bump version to ${NEW_VERSION} [skip ci]" git commit -m "chore: bump version to ${NEW_VERSION} [skip ci]"
git tag "v${NEW_VERSION}" git tag "v${NEW_VERSION}"
# Push using token for authentication # Push using token for authentication (rebase in case another workflow pushed first)
REMOTE_URL=$(git remote get-url origin | sed "s|://|://gitea-actions:${BUILD_TOKEN}@|") REMOTE_URL=$(git remote get-url origin | sed "s|://|://gitea-actions:${BUILD_TOKEN}@|")
git pull --rebase "${REMOTE_URL}" main || true
git push "${REMOTE_URL}" HEAD:main git push "${REMOTE_URL}" HEAD:main
git push "${REMOTE_URL}" "v${NEW_VERSION}" git push "${REMOTE_URL}" "v${NEW_VERSION}"
@@ -85,42 +83,16 @@ jobs:
# ── Platform builds (run after version bump) ── # ── Platform builds (run after version bump) ──
build-linux: build-linux:
name: Build (Linux) name: Build App (Linux)
needs: bump-version needs: bump-version
runs-on: ubuntu-latest runs-on: ubuntu-latest
env: env:
PYTHON_VERSION: "3.11"
NODE_VERSION: "20" NODE_VERSION: "20"
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
with: with:
ref: ${{ needs.bump-version.outputs.tag }} ref: ${{ needs.bump-version.outputs.tag }}
# ── Python sidecar ──
- name: Install uv
run: |
if command -v uv &> /dev/null; then
echo "uv already installed: $(uv --version)"
else
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> $GITHUB_PATH
fi
- name: Install ffmpeg
run: sudo apt-get update && sudo apt-get install -y ffmpeg
- name: Set up Python
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Build sidecar
working-directory: python
run: uv run --python ${{ env.PYTHON_VERSION }} python build_sidecar.py --cpu-only
- name: Package sidecar for Tauri
run: |
cd python/dist/voice-to-notes-sidecar && zip -r ../../../src-tauri/sidecar.zip .
# ── Tauri app ──
- name: Set up Node.js - name: Set up Node.js
uses: actions/setup-node@v4 uses: actions/setup-node@v4
with: with:
@@ -133,7 +105,8 @@ jobs:
- name: Install system dependencies - name: Install system dependencies
run: | run: |
sudo apt-get install -y libgtk-3-dev libwebkit2gtk-4.1-dev libappindicator3-dev librsvg2-dev patchelf xdg-utils sudo apt-get update
sudo apt-get install -y libgtk-3-dev libwebkit2gtk-4.1-dev libappindicator3-dev librsvg2-dev patchelf xdg-utils rpm
- name: Install npm dependencies - name: Install npm dependencies
run: npm ci run: npm ci
@@ -141,7 +114,6 @@ jobs:
- name: Build Tauri app - name: Build Tauri app
run: npm run tauri build run: npm run tauri build
# ── Release ──
- name: Upload to release - name: Upload to release
env: env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }} BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
@@ -150,7 +122,6 @@ jobs:
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}" REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${{ needs.bump-version.outputs.tag }}" TAG="${{ needs.bump-version.outputs.tag }}"
RELEASE_NAME="Voice to Notes ${TAG}"
echo "Release tag: ${TAG}" echo "Release tag: ${TAG}"
RELEASE_ID=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \ RELEASE_ID=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
@@ -163,7 +134,7 @@ jobs:
echo "Release ID: ${RELEASE_ID}" echo "Release ID: ${RELEASE_ID}"
find src-tauri/target/release/bundle -type f -name "*.deb" | while IFS= read -r file; do find src-tauri/target/release/bundle -type f \( -name "*.deb" -o -name "*.rpm" \) | while IFS= read -r file; do
filename=$(basename "$file") filename=$(basename "$file")
encoded_name=$(echo "$filename" | sed 's/ /%20/g') encoded_name=$(echo "$filename" | sed 's/ /%20/g')
echo "Uploading ${filename} ($(du -h "$file" | cut -f1))..." echo "Uploading ${filename} ($(du -h "$file" | cut -f1))..."
@@ -184,47 +155,16 @@ jobs:
done done
build-windows: build-windows:
name: Build (Windows) name: Build App (Windows)
needs: bump-version needs: bump-version
runs-on: windows-latest runs-on: windows-latest
env: env:
PYTHON_VERSION: "3.11"
NODE_VERSION: "20" NODE_VERSION: "20"
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
with: with:
ref: ${{ needs.bump-version.outputs.tag }} ref: ${{ needs.bump-version.outputs.tag }}
# ── Python sidecar ──
- name: Install uv
shell: powershell
run: |
if (Get-Command uv -ErrorAction SilentlyContinue) {
Write-Host "uv already installed: $(uv --version)"
} else {
irm https://astral.sh/uv/install.ps1 | iex
echo "$env:USERPROFILE\.local\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
}
- name: Install ffmpeg
shell: powershell
run: choco install ffmpeg -y
- name: Set up Python
shell: powershell
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Build sidecar
shell: powershell
working-directory: python
run: uv run --python ${{ env.PYTHON_VERSION }} python build_sidecar.py --cpu-only
- name: Package sidecar for Tauri
shell: powershell
run: |
Compress-Archive -Path python\dist\voice-to-notes-sidecar\* -DestinationPath src-tauri\sidecar.zip
# ── Tauri app ──
- name: Set up Node.js - name: Set up Node.js
uses: actions/setup-node@v4 uses: actions/setup-node@v4
with: with:
@@ -249,7 +189,6 @@ jobs:
shell: powershell shell: powershell
run: npm run tauri build run: npm run tauri build
# ── Release ──
- name: Upload to release - name: Upload to release
shell: powershell shell: powershell
env: env:
@@ -259,7 +198,6 @@ jobs:
$Headers = @{ "Authorization" = "token $env:BUILD_TOKEN" } $Headers = @{ "Authorization" = "token $env:BUILD_TOKEN" }
$TAG = "${{ needs.bump-version.outputs.tag }}" $TAG = "${{ needs.bump-version.outputs.tag }}"
$RELEASE_NAME = "Voice to Notes ${TAG}"
Write-Host "Release tag: ${TAG}" Write-Host "Release tag: ${TAG}"
$release = Invoke-RestMethod -Uri "${REPO_API}/releases/tags/${TAG}" -Headers $Headers -ErrorAction Stop $release = Invoke-RestMethod -Uri "${REPO_API}/releases/tags/${TAG}" -Headers $Headers -ErrorAction Stop
@@ -286,7 +224,7 @@ jobs:
-X POST ` -X POST `
-H "Authorization: token $env:BUILD_TOKEN" ` -H "Authorization: token $env:BUILD_TOKEN" `
-H "Content-Type: application/octet-stream" ` -H "Content-Type: application/octet-stream" `
--data-binary "@$($_.FullName)" ` -T "$($_.FullName)" `
"$uploadUrl" 2>&1 "$uploadUrl" 2>&1
if ($LASTEXITCODE -eq 0) { if ($LASTEXITCODE -eq 0) {
Write-Host "Upload successful: ${filename}" Write-Host "Upload successful: ${filename}"
@@ -296,42 +234,16 @@ jobs:
} }
build-macos: build-macos:
name: Build (macOS) name: Build App (macOS)
needs: bump-version needs: bump-version
runs-on: macos-latest runs-on: macos-latest
env: env:
PYTHON_VERSION: "3.11"
NODE_VERSION: "20" NODE_VERSION: "20"
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
with: with:
ref: ${{ needs.bump-version.outputs.tag }} ref: ${{ needs.bump-version.outputs.tag }}
# ── Python sidecar ──
- name: Install uv
run: |
if command -v uv &> /dev/null; then
echo "uv already installed: $(uv --version)"
else
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> $GITHUB_PATH
fi
- name: Install ffmpeg
run: brew install ffmpeg
- name: Set up Python
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Build sidecar
working-directory: python
run: uv run --python ${{ env.PYTHON_VERSION }} python build_sidecar.py --cpu-only
- name: Package sidecar for Tauri
run: |
cd python/dist/voice-to-notes-sidecar && zip -r ../../../src-tauri/sidecar.zip .
# ── Tauri app ──
- name: Set up Node.js - name: Set up Node.js
uses: actions/setup-node@v4 uses: actions/setup-node@v4
with: with:
@@ -351,7 +263,6 @@ jobs:
- name: Build Tauri app - name: Build Tauri app
run: npm run tauri build run: npm run tauri build
# ── Release ──
- name: Upload to release - name: Upload to release
env: env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }} BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
@@ -361,7 +272,6 @@ jobs:
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}" REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${{ needs.bump-version.outputs.tag }}" TAG="${{ needs.bump-version.outputs.tag }}"
RELEASE_NAME="Voice to Notes ${TAG}"
echo "Release tag: ${TAG}" echo "Release tag: ${TAG}"
RELEASE_ID=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \ RELEASE_ID=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \

140
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,140 @@
# Contributing to Voice to Notes
Thank you for your interest in contributing! This guide covers how to set up the project for development and submit changes.
## Development Setup
### Prerequisites
- **Node.js 20+** and npm
- **Rust** (stable toolchain)
- **Python 3.11+** with [uv](https://docs.astral.sh/uv/) (recommended) or pip
- **System libraries (Linux only):**
```bash
sudo apt install libgtk-3-dev libwebkit2gtk-4.1-dev libappindicator3-dev librsvg2-dev patchelf xdg-utils
```
### Clone and Install
```bash
git clone https://repo.anhonesthost.net/MacroPad/voice-to-notes.git
cd voice-to-notes
# Frontend
npm install
# Python sidecar
cd python && pip install -e ".[dev]" && cd ..
```
### Running in Dev Mode
```bash
npm run tauri:dev
```
This runs the Svelte dev server + Tauri with hot-reload. The Python sidecar runs from your system Python (no PyInstaller needed in dev mode).
### Building
```bash
# Build the Python sidecar (frozen binary)
cd python && python build_sidecar.py --cpu-only && cd ..
# Build the full app
npm run tauri build
```
## Project Structure
```
src/ # Svelte 5 frontend
lib/components/ # Reusable UI components
lib/stores/ # Svelte stores (app state)
routes/ # SvelteKit pages
src-tauri/ # Rust backend (Tauri v2)
src/sidecar/ # Python sidecar lifecycle (download, extract, IPC)
src/commands/ # Tauri command handlers
src/db/ # SQLite database layer
python/ # Python ML sidecar
voice_to_notes/ # Main package
services/ # Transcription, diarization, AI, export
ipc/ # JSON-line IPC protocol
hardware/ # GPU/CPU detection
.gitea/workflows/ # CI/CD pipelines
docs/ # Documentation
```
## How It Works
The app has three layers:
1. **Frontend (Svelte)** — UI, audio playback (wavesurfer.js), transcript editing (TipTap)
2. **Backend (Rust/Tauri)** — Desktop integration, file access, SQLite, sidecar process management
3. **Sidecar (Python)** — ML inference (faster-whisper, pyannote.audio), AI chat, export
Rust and Python communicate via **JSON-line IPC** over stdin/stdout pipes. Each request has an `id`, `type`, and `payload`. The Python sidecar runs as a child process managed by `SidecarManager` in Rust.
## Conventions
### Rust
- Follow standard Rust conventions
- Run `cargo fmt` and `cargo clippy` before committing
- Tauri commands go in `src-tauri/src/commands/`
### Python
- Python 3.11+, type hints everywhere
- Use `ruff` for linting: `ruff check python/`
- Tests with pytest: `cd python && pytest`
- IPC messages: JSON-line format with `id`, `type`, `payload` fields
### TypeScript / Svelte
- Svelte 5 runes (`$state`, `$derived`, `$effect`)
- Strict TypeScript
- Components in `src/lib/components/`
- State in `src/lib/stores/`
### General
- All timestamps in milliseconds (integer)
- UUIDs as primary keys in the database
- Don't bundle API keys or secrets — those are user-configured
## Submitting Changes
1. Fork the repository
2. Create a feature branch: `git checkout -b my-feature`
3. Make your changes
4. Test locally with `npm run tauri:dev`
5. Run linters: `cargo fmt && cargo clippy`, `ruff check python/`
6. Commit with a clear message describing the change
7. Open a Pull Request against `main`
## CI/CD
Pushes to `main` automatically:
- Bump the app version and create a release (`release.yml`)
- Build app installers for all platforms
Changes to `python/` also trigger sidecar builds (`build-sidecar.yml`).
## Areas for Contribution
- UI/UX improvements
- New export formats
- Additional AI provider integrations
- Performance optimizations
- Accessibility improvements
- Documentation and translations
- Bug reports and testing on different platforms
## Reporting Issues
Open an issue on the [repository](https://repo.anhonesthost.net/MacroPad/voice-to-notes/issues) with:
- Steps to reproduce
- Expected vs actual behavior
- Platform and version info
- Sidecar logs (`%LOCALAPPDATA%\com.voicetonotes.app\sidecar.log` on Windows)
## License
By contributing, you agree that your contributions will be licensed under the [MIT License](LICENSE).

119
README.md
View File

@@ -1,32 +1,55 @@
# Voice to Notes # Voice to Notes
A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback. A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.
## Features ## Features
- **Speech-to-Text Transcription** — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps - **Speech-to-Text** — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
- **Speaker Identification (Diarization)** — Detect and distinguish between speakers using pyannote.audio - **Speaker Identification** — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
- **Synchronized Playback** — Click any word to seek to that point in the audio (Web Audio API for instant playback) - **GPU Acceleration** — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
- **AI Integration** — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM) - **Synchronized Playback** — Click any word to seek. Waveform visualization via wavesurfer.js.
- **Export Formats** — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels - **AI Chat** — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
- **Cross-Platform** — Builds for Linux, Windows, and macOS (Apple Silicon) - **Export** — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
- **Cross-Platform** — Linux, Windows, macOS (Apple Silicon).
## Quick Start
1. Download the installer from [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases)
2. On first launch, choose **CPU** or **CUDA** sidecar (the AI engine downloads separately, ~500MB2GB)
3. Import an audio/video file and click **Transcribe**
See the full [User Guide](docs/USER_GUIDE.md) for detailed setup and usage instructions.
## Platform Support ## Platform Support
| Platform | Architecture | Status | | Platform | Architecture | Installers |
|----------|-------------|--------| |----------|-------------|------------|
| Linux | x86_64 | Supported | | Linux | x86_64 | .deb, .rpm |
| Windows | x86_64 | Supported | | Windows | x86_64 | .msi, .exe (NSIS) |
| macOS | ARM (Apple Silicon) | Supported | | macOS | ARM (Apple Silicon) | .dmg |
## Architecture
The app is split into two independently versioned components:
- **App** (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
- **Sidecar** (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.
This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.
## Tech Stack ## Tech Stack
- **Desktop shell:** Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend) | Component | Technology |
- **ML pipeline:** Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution |-----------|-----------|
- **Audio playback:** wavesurfer.js with Web Audio API backend | Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) |
- **AI providers:** OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote) | Transcription | faster-whisper (CTranslate2) |
- **Local AI:** Bundled llama-server (llama.cpp) | Speaker ID | pyannote.audio 3.1 |
- **Caption export:** pysubs2 | Audio UI | wavesurfer.js |
| Transcript editor | TipTap (ProseMirror) |
| AI (local) | Ollama (any model) |
| AI (cloud) | OpenAI, Anthropic, OpenAI-compatible |
| Caption export | pysubs2 |
| Database | SQLite (rusqlite) |
## Development ## Development
@@ -34,8 +57,8 @@ A desktop application that transcribes audio/video recordings with speaker ident
- Node.js 20+ - Node.js 20+
- Rust (stable) - Rust (stable)
- Python 3.11+ with ML dependencies - Python 3.11+ with uv or pip
- System: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev` (Linux) - Linux: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`
### Getting Started ### Getting Started
@@ -44,49 +67,63 @@ A desktop application that transcribes audio/video recordings with speaker ident
npm install npm install
# Install Python sidecar dependencies # Install Python sidecar dependencies
cd python && pip install -e . && cd .. cd python && pip install -e ".[dev]" && cd ..
# Run in dev mode (uses system Python for the sidecar) # Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev npm run tauri:dev
``` ```
### Building for Distribution ### Building
```bash ```bash
# Build the frozen Python sidecar # Build the frozen Python sidecar (CPU-only)
npm run sidecar:build cd python && python build_sidecar.py --cpu-only && cd ..
# Build the Tauri app (requires sidecar in src-tauri/binaries/) # Build with CUDA support
cd python && python build_sidecar.py --with-cuda && cd ..
# Build the Tauri app
npm run tauri build npm run tauri build
``` ```
### CI/CD ### CI/CD
Gitea Actions workflows are in `.gitea/workflows/`. The build pipeline: Two Gitea Actions workflows in `.gitea/workflows/`:
1. **Build sidecar**PyInstaller-frozen Python binary per platform (CPU-only PyTorch) **`release.yml`** — Triggers on push to main:
2. **Build Tauri app** — Bundles the sidecar via `externalBin`, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS) 1. Bumps app version (patch), creates git tag and Gitea release
2. Builds lightweight app installers for all platforms (no sidecar bundled)
**`build-sidecar.yml`** — Triggers on changes to `python/` or manual dispatch:
1. Bumps sidecar version, creates `sidecar-v*` tag and release
2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
3. Uploads as separate release assets
#### Required Secrets #### Required Secrets
| Secret | Purpose | Required? | | Secret | Purpose |
|--------|---------|-----------| |--------|---------|
| `TAURI_SIGNING_PRIVATE_KEY` | Signs Tauri update bundles | Optional (for auto-updates) | | `BUILD_TOKEN` | Gitea API token for creating releases and pushing tags |
No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.
### Project Structure ### Project Structure
``` ```
src/ # Svelte 5 frontend src/ # Svelte 5 frontend
src-tauri/ # Rust backend (Tauri commands, sidecar manager, SQLite) lib/components/ # UI components (waveform, transcript editor, settings, etc.)
python/ # Python sidecar (transcription, diarization, AI) lib/stores/ # Svelte stores (settings, transcript state)
voice_to_notes/ # Python package routes/ # SvelteKit pages
build_sidecar.py # PyInstaller build script src-tauri/ # Rust backend
voice_to_notes.spec # PyInstaller spec src/sidecar/ # Sidecar process manager (download, extract, IPC)
.gitea/workflows/ # Gitea Actions CI/CD src/commands/ # Tauri command handlers
nsis-hooks.nsh # Windows uninstall cleanup
python/ # Python sidecar
voice_to_notes/ # Python package (transcription, diarization, AI, export)
build_sidecar.py # PyInstaller build script
voice_to_notes.spec # PyInstaller spec
.gitea/workflows/ # CI/CD (release.yml, build-sidecar.yml)
docs/ # Documentation
``` ```
## License ## License
MIT [MIT](LICENSE)

240
docs/USER_GUIDE.md Normal file
View File

@@ -0,0 +1,240 @@
# Voice to Notes — User Guide
## Getting Started
### Installation
Download the installer for your platform from the [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases) page:
- **Windows:** `.msi` or `-setup.exe`
- **Linux:** `.deb` or `.rpm`
- **macOS:** `.dmg`
### First-Time Setup
On first launch, Voice to Notes will prompt you to download its AI engine (the "sidecar"):
1. Choose **Standard (CPU)** (~500 MB) or **GPU Accelerated (CUDA)** (~2 GB)
- Choose CUDA if you have an NVIDIA GPU for significantly faster transcription
- CPU works on all computers
2. Click **Download & Install** and wait for the download to complete
3. The app will proceed to the main interface once the sidecar is ready
The sidecar only needs to be downloaded once. Updates are detected automatically on launch.
---
## Basic Workflow
### 1. Import Audio or Video
- Click **Import Audio** or press **Ctrl+O** (Cmd+O on Mac)
- **Audio formats:** MP3, WAV, FLAC, OGG, M4A, AAC, WMA
- **Video formats:** MP4, MKV, AVI, MOV, WebM — audio is automatically extracted
> **Note:** Video file import requires [FFmpeg](#installing-ffmpeg) to be installed on your system.
### 2. Transcribe
After importing, click **Transcribe** to start the transcription pipeline:
- **Transcription:** Converts speech to text with word-level timestamps
- **Speaker Detection:** Identifies different speakers (if configured — see [Speaker Detection](#speaker-detection))
- A progress bar shows the current stage and percentage
### 3. Review and Edit
- The **waveform** displays at the top — click anywhere to seek
- The **transcript** shows below with speaker labels and timestamps
- **Click any word** in the transcript to jump to that point in the audio
- The current word highlights during playback
- **Edit text** directly in the transcript — word timings are preserved
### 4. Export
Click **Export** and choose a format:
| Format | Extension | Best For |
|--------|-----------|----------|
| SRT | `.srt` | Video subtitles (most compatible) |
| WebVTT | `.vtt` | Web video players, HTML5 |
| ASS/SSA | `.ass` | Styled subtitles with speaker colors |
| Plain Text | `.txt` | Reading, sharing, pasting |
| Markdown | `.md` | Documentation, notes |
All formats include speaker labels when speaker detection is enabled.
### 5. Save Project
- **Ctrl+S** (Cmd+S) saves the current project as a `.vtn` file
- This preserves the full transcript, speaker assignments, and edits
- Reopen later to continue editing or re-export
---
## Playback Controls
| Action | Shortcut |
|--------|----------|
| Play / Pause | **Space** |
| Skip back 5s | **Left Arrow** |
| Skip forward 5s | **Right Arrow** |
| Seek to word | Click any word in the transcript |
| Import audio | **Ctrl+O** / **Cmd+O** |
| Open settings | **Ctrl+,** / **Cmd+,** |
---
## Speaker Detection
Speaker detection (diarization) identifies who is speaking at each point in the audio. It requires a one-time setup:
### Setup
1. Go to **Settings > Speakers**
2. Create a free account at [huggingface.co](https://huggingface.co/join)
3. Accept the license on **all three** model pages:
- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
- [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
- [pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1)
4. Create a token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) (read access is sufficient)
5. Paste the token in Settings and click **Test & Download Model**
### Speaker Options
- **Number of speakers:** Set to auto-detect or specify a fixed number for faster results
- **Skip speaker detection:** Check this to only transcribe without identifying speakers
### Managing Speakers
After transcription, speakers appear as "Speaker 1", "Speaker 2", etc. in the left sidebar. Double-click a speaker name to rename it — the new name appears throughout the transcript and in exports.
---
## AI Chat
The AI chat panel lets you ask questions about your transcript. The AI sees the full transcript with speaker labels as context.
Example prompts:
- "Summarize this conversation"
- "What were the key action items?"
- "What did Speaker 1 say about the budget?"
### Setting Up Ollama (Local AI)
[Ollama](https://ollama.com) runs AI models locally on your computer — no API keys or internet required.
1. **Install Ollama:**
- Download from [ollama.com](https://ollama.com)
- Or on Linux: `curl -fsSL https://ollama.com/install.sh | sh`
2. **Pull a model:**
```bash
ollama pull llama3.2
```
Other good options: `mistral`, `gemma2`, `phi3`
3. **Configure in Voice to Notes:**
- Go to **Settings > AI Provider**
- Select **Ollama**
- URL: `http://localhost:11434` (default, usually no change needed)
- Model: `llama3.2` (or whichever model you pulled)
4. **Use:** Open the AI chat panel (right sidebar) and start asking questions
### Cloud AI Providers
If you prefer cloud-based AI:
**OpenAI:**
- Select **OpenAI** in Settings > AI Provider
- Enter your API key from [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
- Default model: `gpt-4o-mini`
**Anthropic:**
- Select **Anthropic** in Settings > AI Provider
- Enter your API key from [console.anthropic.com](https://console.anthropic.com)
- Default model: `claude-sonnet-4-6`
**OpenAI Compatible:**
- For any provider with an OpenAI-compatible API (vLLM, LiteLLM, etc.)
- Enter the API base URL, key, and model name
---
## Settings Reference
### Transcription
| Setting | Options | Default |
|---------|---------|---------|
| Whisper Model | tiny, base, small, medium, large-v3 | base |
| Device | CPU, CUDA | CPU |
| Language | Auto-detect, or specify (en, es, fr, etc.) | Auto-detect |
**Model recommendations:**
- **tiny/base:** Fast, good for clear audio with one speaker
- **small:** Best balance of speed and accuracy
- **medium:** Better accuracy, noticeably slower
- **large-v3:** Best accuracy, requires 8GB+ VRAM (GPU) or 16GB+ RAM (CPU)
### Debug
- **Enable Developer Tools:** Opens the browser inspector for debugging
---
## Installing FFmpeg
FFmpeg is required for importing video files (MP4, MKV, AVI, etc.). It's used to extract the audio track before transcription.
**Windows:**
```
winget install ffmpeg
```
Or download from [ffmpeg.org/download.html](https://ffmpeg.org/download.html) and add to your PATH.
**macOS:**
```
brew install ffmpeg
```
**Linux (Debian/Ubuntu):**
```
sudo apt install ffmpeg
```
**Linux (Fedora/RHEL):**
```
sudo dnf install ffmpeg
```
After installing, restart Voice to Notes. FFmpeg is not needed for audio-only files (MP3, WAV, FLAC, etc.).
---
## Troubleshooting
### Video import fails / "FFmpeg not found"
- Install FFmpeg using the instructions above
- Make sure `ffmpeg` is in your system PATH
- Restart Voice to Notes after installing
### Transcription is slow
- Use a smaller model (tiny or base)
- If you have an NVIDIA GPU, select CUDA in Settings > Transcription > Device
- Ensure you downloaded the CUDA sidecar during setup
### Speaker detection not working
- Verify your HuggingFace token in Settings > Speakers
- Click "Test & Download Model" to re-download
- Make sure you accepted the license on all three model pages
### Audio won't play / No waveform
- Check that the audio file still exists at its original location
- Try re-importing the file
- Supported formats: MP3, WAV, FLAC, OGG, M4A, AAC, WMA
### App shows "Setting up Voice to Notes"
- This is the first-launch sidecar download — it only happens once
- If it fails, check your internet connection and click Retry

4
package-lock.json generated
View File

@@ -1,12 +1,12 @@
{ {
"name": "voice-to-notes", "name": "voice-to-notes",
"version": "0.1.0", "version": "0.2.10",
"lockfileVersion": 3, "lockfileVersion": 3,
"requires": true, "requires": true,
"packages": { "packages": {
"": { "": {
"name": "voice-to-notes", "name": "voice-to-notes",
"version": "0.1.0", "version": "0.2.10",
"license": "MIT", "license": "MIT",
"dependencies": { "dependencies": {
"@tauri-apps/api": "^2", "@tauri-apps/api": "^2",

View File

@@ -1,6 +1,6 @@
{ {
"name": "voice-to-notes", "name": "voice-to-notes",
"version": "0.2.5", "version": "0.2.46",
"description": "Desktop app for transcribing audio/video with speaker identification", "description": "Desktop app for transcribing audio/video with speaker identification",
"type": "module", "type": "module",
"scripts": { "scripts": {

View File

@@ -119,8 +119,11 @@ def create_venv_and_install(cpu_only: bool) -> Path:
"--index-url", "https://download.pytorch.org/whl/cpu", "--index-url", "https://download.pytorch.org/whl/cpu",
) )
else: else:
print("[build] Installing PyTorch (default, may include CUDA)") print("[build] Installing PyTorch (CUDA 12.6)")
pip_install("torch", "torchaudio") pip_install(
"torch", "torchaudio",
"--index-url", "https://download.pytorch.org/whl/cu126",
)
# Install project and dev deps (includes pyinstaller) # Install project and dev deps (includes pyinstaller)
print("[build] Installing project dependencies") print("[build] Installing project dependencies")

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "voice-to-notes" name = "voice-to-notes"
version = "0.2.5" version = "1.0.13"
description = "Python sidecar for Voice to Notes — transcription, diarization, and AI services" description = "Python sidecar for Voice to Notes — transcription, diarization, and AI services"
requires-python = ">=3.11" requires-python = ">=3.11"
license = "MIT" license = "MIT"
@@ -15,6 +15,7 @@ dependencies = [
"pysubs2>=1.7.0", "pysubs2>=1.7.0",
"openai>=1.0.0", "openai>=1.0.0",
"anthropic>=0.20.0", "anthropic>=0.20.0",
"soundfile>=0.12.0",
] ]
[project.optional-dependencies] [project.optional-dependencies]

View File

@@ -12,15 +12,17 @@ faster_whisper_datas, faster_whisper_binaries, faster_whisper_hiddenimports = co
"faster_whisper" "faster_whisper"
) )
pyannote_datas, pyannote_binaries, pyannote_hiddenimports = collect_all("pyannote") pyannote_datas, pyannote_binaries, pyannote_hiddenimports = collect_all("pyannote")
soundfile_datas, soundfile_binaries, soundfile_hiddenimports = collect_all("soundfile")
a = Analysis( a = Analysis(
["voice_to_notes/main.py"], ["voice_to_notes/main.py"],
pathex=[], pathex=[],
binaries=ctranslate2_binaries + faster_whisper_binaries + pyannote_binaries, binaries=ctranslate2_binaries + faster_whisper_binaries + pyannote_binaries + soundfile_binaries,
datas=ctranslate2_datas + faster_whisper_datas + pyannote_datas, datas=ctranslate2_datas + faster_whisper_datas + pyannote_datas + soundfile_datas,
hiddenimports=[ hiddenimports=[
"torch", "torch",
"torchaudio", "torchaudio",
"soundfile",
"huggingface_hub", "huggingface_hub",
"pysubs2", "pysubs2",
"openai", "openai",
@@ -29,16 +31,21 @@ a = Analysis(
] ]
+ ctranslate2_hiddenimports + ctranslate2_hiddenimports
+ faster_whisper_hiddenimports + faster_whisper_hiddenimports
+ pyannote_hiddenimports, + pyannote_hiddenimports
+ soundfile_hiddenimports,
hookspath=[], hookspath=[],
hooksconfig={}, hooksconfig={},
runtime_hooks=[], runtime_hooks=[],
excludes=[ excludes=[
"tkinter", "test", "unittest", "pip", "setuptools", "tkinter", "test", "pip", "setuptools",
# ctranslate2.converters imports torch at module level and causes # ctranslate2.converters imports torch at module level and causes
# circular import crashes under PyInstaller. These modules are only # circular import crashes under PyInstaller. These modules are only
# needed for model format conversion, never for inference. # needed for model format conversion, never for inference.
"ctranslate2.converters", "ctranslate2.converters",
# torchcodec is partially bundled by PyInstaller but non-functional
# (missing FFmpeg shared libs). Excluding it forces pyannote.audio
# to fall back to torchaudio for audio decoding.
"torchcodec",
], ],
win_no_prefer_redirects=False, win_no_prefer_redirects=False,
win_private_assemblies=False, win_private_assemblies=False,

View File

@@ -105,14 +105,23 @@ def detect_hardware() -> HardwareInfo:
# RAM info (cross-platform) # RAM info (cross-platform)
info.ram_mb = _detect_ram_mb() info.ram_mb = _detect_ram_mb()
# CUDA detection # CUDA detection — verify runtime libraries actually work, not just torch detection
try: try:
import torch import torch
if torch.cuda.is_available(): if torch.cuda.is_available():
info.has_cuda = True # Test that CUDA runtime libraries are actually loadable
info.cuda_device_name = torch.cuda.get_device_name(0) try:
info.vram_mb = torch.cuda.get_device_properties(0).total_mem // (1024 * 1024) torch.zeros(1, device="cuda")
info.has_cuda = True
info.cuda_device_name = torch.cuda.get_device_name(0)
info.vram_mb = torch.cuda.get_device_properties(0).total_mem // (1024 * 1024)
except Exception as e:
print(
f"[sidecar] CUDA detected but runtime unavailable: {e}. Using CPU.",
file=sys.stderr,
flush=True,
)
except ImportError: except ImportError:
print("[sidecar] torch not available, GPU detection skipped", file=sys.stderr, flush=True) print("[sidecar] torch not available, GPU detection skipped", file=sys.stderr, flush=True)

View File

@@ -254,15 +254,15 @@ def make_ai_chat_handler() -> HandlerFunc:
) )
if action == "configure": if action == "configure":
# Re-create a provider with custom settings # Re-create a provider with custom settings and set it active
provider_name = payload.get("provider", "") provider_name = payload.get("provider", "")
config = payload.get("config", {}) config = payload.get("config", {})
if provider_name == "local": if provider_name == "local":
from voice_to_notes.providers.local_provider import LocalProvider from voice_to_notes.providers.local_provider import LocalProvider
service.register_provider("local", LocalProvider( service.register_provider("local", LocalProvider(
base_url=config.get("base_url", "http://localhost:8080"), base_url=config.get("base_url", "http://localhost:11434/v1"),
model=config.get("model", "local"), model=config.get("model", "llama3.2"),
)) ))
elif provider_name == "openai": elif provider_name == "openai":
from voice_to_notes.providers.openai_provider import OpenAIProvider from voice_to_notes.providers.openai_provider import OpenAIProvider
@@ -286,6 +286,10 @@ def make_ai_chat_handler() -> HandlerFunc:
api_key=config.get("api_key"), api_key=config.get("api_key"),
api_base=config.get("api_base"), api_base=config.get("api_base"),
)) ))
# Set the configured provider as active
print(f"[sidecar] Configured AI provider: {provider_name} with config: {config}", file=sys.stderr, flush=True)
if provider_name in ("local", "openai", "anthropic", "litellm"):
service.set_active(provider_name)
return IPCMessage( return IPCMessage(
id=msg.id, id=msg.id,
type="ai.configured", type="ai.configured",

View File

@@ -5,6 +5,7 @@ from __future__ import annotations
import signal import signal
import sys import sys
# CRITICAL: Capture real stdout for IPC *before* importing any ML libraries # CRITICAL: Capture real stdout for IPC *before* importing any ML libraries
# that might print to stdout and corrupt the JSON-line protocol. # that might print to stdout and corrupt the JSON-line protocol.
from voice_to_notes.ipc.protocol import init_ipc from voice_to_notes.ipc.protocol import init_ipc

View File

@@ -1,4 +1,4 @@
"""Local AI provider — bundled llama-server (OpenAI-compatible API).""" """Local AI provider — Ollama or any OpenAI-compatible API."""
from __future__ import annotations from __future__ import annotations
@@ -9,9 +9,9 @@ from voice_to_notes.providers.base import AIProvider
class LocalProvider(AIProvider): class LocalProvider(AIProvider):
"""Connects to bundled llama-server via its OpenAI-compatible API.""" """Connects to Ollama or any OpenAI-compatible API server."""
def __init__(self, base_url: str = "http://localhost:8080", model: str = "local") -> None: def __init__(self, base_url: str = "http://localhost:11434/v1", model: str = "llama3.2") -> None:
self._base_url = base_url.rstrip("/") self._base_url = base_url.rstrip("/")
self._model = model self._model = model
self._client: Any = None self._client: Any = None
@@ -24,8 +24,8 @@ class LocalProvider(AIProvider):
from openai import OpenAI from openai import OpenAI
self._client = OpenAI( self._client = OpenAI(
base_url=f"{self._base_url}/v1", base_url=self._base_url,
api_key="not-needed", # llama-server doesn't require an API key api_key="ollama", # Ollama doesn't require a real key
) )
except ImportError: except ImportError:
raise RuntimeError( raise RuntimeError(
@@ -47,7 +47,9 @@ class LocalProvider(AIProvider):
try: try:
import urllib.request import urllib.request
req = urllib.request.Request(f"{self._base_url}/health", method="GET") # Check base URL without /v1 suffix for Ollama root endpoint
root_url = self._base_url.replace("/v1", "")
req = urllib.request.Request(root_url, method="GET")
with urllib.request.urlopen(req, timeout=2) as resp: with urllib.request.urlopen(req, timeout=2) as resp:
return resp.status == 200 return resp.status == 200
except Exception: except Exception:
@@ -55,4 +57,4 @@ class LocalProvider(AIProvider):
@property @property
def name(self) -> str: def name(self) -> str:
return "Local (llama-server)" return "Ollama"

View File

@@ -20,6 +20,81 @@ from voice_to_notes.utils.ffmpeg import get_ffmpeg_path
from voice_to_notes.ipc.messages import progress_message from voice_to_notes.ipc.messages import progress_message
from voice_to_notes.ipc.protocol import write_message from voice_to_notes.ipc.protocol import write_message
_patched = False
def _patch_pyannote_audio() -> None:
"""Monkey-patch pyannote.audio.core.io.Audio to use torchaudio.
pyannote.audio has a bug where AudioDecoder (from torchcodec) is used
unconditionally even when torchcodec is not installed, causing NameError.
This replaces the Audio.__call__ method with a torchaudio-based version.
"""
global _patched
if _patched:
return
_patched = True
try:
import numpy as np
import soundfile as sf
import torch
from pyannote.audio.core.io import Audio
# Cache loaded audio to avoid re-reading the entire file for every crop call.
# For a 3-hour file, crop is called 1000+ times — without caching, each call
# reads ~345MB from disk.
_audio_cache: dict[str, tuple] = {}
def _sf_load(audio_path: str) -> tuple:
"""Load audio via soundfile with caching."""
key = str(audio_path)
if key in _audio_cache:
return _audio_cache[key]
data, sample_rate = sf.read(key, dtype="float32")
waveform = torch.from_numpy(np.array(data))
if waveform.ndim == 1:
waveform = waveform.unsqueeze(0)
else:
waveform = waveform.T
_audio_cache[key] = (waveform, sample_rate)
return waveform, sample_rate
def _soundfile_call(self, file: dict) -> tuple:
"""Replacement for Audio.__call__."""
return _sf_load(file["audio"])
def _soundfile_crop(self, file: dict, segment, **kwargs) -> tuple:
"""Replacement for Audio.crop — load file once (cached) then slice.
Pads short segments with zeros to match the expected duration,
which pyannote requires for batched embedding extraction.
"""
duration = kwargs.get("duration", None)
waveform, sample_rate = _sf_load(file["audio"])
# Convert segment (seconds) to sample indices
start_sample = int(segment.start * sample_rate)
end_sample = int(segment.end * sample_rate)
# Clamp to bounds
start_sample = max(0, start_sample)
end_sample = min(waveform.shape[-1], end_sample)
cropped = waveform[:, start_sample:end_sample]
# Pad to expected duration if needed (pyannote batches require uniform size)
if duration is not None:
expected_samples = int(duration * sample_rate)
else:
expected_samples = int((segment.end - segment.start) * sample_rate)
if cropped.shape[-1] < expected_samples:
pad = torch.zeros(cropped.shape[0], expected_samples - cropped.shape[-1])
cropped = torch.cat([cropped, pad], dim=-1)
return cropped, sample_rate
Audio.__call__ = _soundfile_call # type: ignore[assignment]
Audio.crop = _soundfile_crop # type: ignore[assignment]
print("[sidecar] Patched pyannote Audio to use soundfile", file=sys.stderr, flush=True)
except Exception as e:
print(f"[sidecar] Warning: Could not patch pyannote Audio: {e}", file=sys.stderr, flush=True)
def _ensure_wav(file_path: str) -> tuple[str, str | None]: def _ensure_wav(file_path: str) -> tuple[str, str | None]:
"""Convert audio to 16kHz mono WAV if needed. """Convert audio to 16kHz mono WAV if needed.
@@ -113,6 +188,7 @@ class DiarizeService:
] ]
last_error: Exception | None = None last_error: Exception | None = None
_patch_pyannote_audio()
for model_name in models: for model_name in models:
try: try:
from pyannote.audio import Pipeline from pyannote.audio import Pipeline
@@ -212,13 +288,20 @@ class DiarizeService:
thread.start() thread.start()
elapsed = 0.0 elapsed = 0.0
estimated_total = max(audio_duration_sec * 0.5, 30.0) if audio_duration_sec else 120.0 estimated_total = max(audio_duration_sec * 0.8, 30.0) if audio_duration_sec else 120.0
while not done_event.wait(timeout=2.0): duration_str = ""
elapsed += 2.0 if audio_duration_sec and audio_duration_sec > 600:
mins = int(audio_duration_sec / 60)
duration_str = f" ({mins}min audio, this may take a while)"
while not done_event.wait(timeout=5.0):
elapsed += 5.0
pct = min(20 + int((elapsed / estimated_total) * 65), 85) pct = min(20 + int((elapsed / estimated_total) * 65), 85)
elapsed_min = int(elapsed / 60)
elapsed_sec = int(elapsed % 60)
time_str = f"{elapsed_min}m{elapsed_sec:02d}s" if elapsed_min > 0 else f"{int(elapsed)}s"
write_message(progress_message( write_message(progress_message(
request_id, pct, "diarizing", request_id, pct, "diarizing",
f"Analyzing speakers ({int(elapsed)}s elapsed)...")) f"Analyzing speakers ({time_str} elapsed){duration_str}"))
thread.join() thread.join()

View File

@@ -77,11 +77,28 @@ class TranscribeService:
file=sys.stderr, file=sys.stderr,
flush=True, flush=True,
) )
self._model = WhisperModel( try:
model_name, self._model = WhisperModel(
device=device, model_name,
compute_type=compute_type, device=device,
) compute_type=compute_type,
)
except Exception as e:
if device != "cpu":
print(
f"[sidecar] Failed to load on {device}: {e}. Falling back to CPU.",
file=sys.stderr,
flush=True,
)
device = "cpu"
compute_type = "int8"
self._model = WhisperModel(
model_name,
device=device,
compute_type=compute_type,
)
else:
raise
self._current_model_name = model_name self._current_model_name = model_name
self._current_device = device self._current_device = device
self._current_compute_type = compute_type self._current_compute_type = compute_type
@@ -96,17 +113,22 @@ class TranscribeService:
compute_type: str = "int8", compute_type: str = "int8",
language: str | None = None, language: str | None = None,
on_segment: Callable[[SegmentResult, int], None] | None = None, on_segment: Callable[[SegmentResult, int], None] | None = None,
chunk_label: str | None = None,
) -> TranscriptionResult: ) -> TranscriptionResult:
"""Transcribe an audio file with word-level timestamps. """Transcribe an audio file with word-level timestamps.
Sends progress messages via IPC during processing. Sends progress messages via IPC during processing.
If chunk_label is set (e.g. "chunk 3/12"), messages are prefixed with it.
""" """
# Stage: loading model prefix = f"{chunk_label}: " if chunk_label else ""
write_message(progress_message(request_id, 0, "loading_model", f"Loading {model_name}..."))
# Stage: loading model (skip for chunks after the first — model already loaded)
if not chunk_label:
write_message(progress_message(request_id, 0, "loading_model", f"Loading {model_name}..."))
model = self._ensure_model(model_name, device, compute_type) model = self._ensure_model(model_name, device, compute_type)
# Stage: transcribing # Stage: transcribing
write_message(progress_message(request_id, 10, "transcribing", "Starting transcription...")) write_message(progress_message(request_id, 10, "transcribing", f"{prefix}Starting transcription..."))
start_time = time.time() start_time = time.time()
segments_iter, info = model.transcribe( segments_iter, info = model.transcribe(
@@ -159,7 +181,7 @@ class TranscribeService:
request_id, request_id,
progress_pct, progress_pct,
"transcribing", "transcribing",
f"Transcribing segment {segment_count} ({progress_pct}% of audio)...", f"{prefix}Transcribing segment {segment_count} ({progress_pct}% of audio)...",
) )
) )
@@ -254,6 +276,7 @@ class TranscribeService:
chunk_result = self.transcribe( chunk_result = self.transcribe(
request_id, tmp.name, model_name, device, request_id, tmp.name, model_name, device,
compute_type, language, on_segment=chunk_on_segment, compute_type, language, on_segment=chunk_on_segment,
chunk_label=f"Chunk {chunk_idx + 1}/{num_chunks}",
) )
# Offset timestamps and merge # Offset timestamps and merge

52
src-tauri/Cargo.lock generated
View File

@@ -59,6 +59,15 @@ version = "1.0.102"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c" checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c"
[[package]]
name = "arbitrary"
version = "1.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c3d036a3c4ab069c7b410a2ce876bd74808d2d0888a82667669f8e783a898bf1"
dependencies = [
"derive_arbitrary",
]
[[package]] [[package]]
name = "async-broadcast" name = "async-broadcast"
version = "0.7.2" version = "0.7.2"
@@ -655,6 +664,17 @@ dependencies = [
"serde_core", "serde_core",
] ]
[[package]]
name = "derive_arbitrary"
version = "1.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e567bd82dcff979e4b03460c307b3cdc9e96fde3d73bed1496d2bc75d9dd62a"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.117",
]
[[package]] [[package]]
name = "derive_more" name = "derive_more"
version = "0.99.20" version = "0.99.20"
@@ -4362,7 +4382,7 @@ checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
[[package]] [[package]]
name = "voice-to-notes" name = "voice-to-notes"
version = "0.1.0" version = "0.2.2"
dependencies = [ dependencies = [
"chrono", "chrono",
"rusqlite", "rusqlite",
@@ -4374,6 +4394,7 @@ dependencies = [
"tauri-plugin-opener", "tauri-plugin-opener",
"thiserror 1.0.69", "thiserror 1.0.69",
"uuid", "uuid",
"zip",
] ]
[[package]] [[package]]
@@ -5412,12 +5433,41 @@ dependencies = [
"syn 2.0.117", "syn 2.0.117",
] ]
[[package]]
name = "zip"
version = "2.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fabe6324e908f85a1c52063ce7aa26b68dcb7eb6dbc83a2d148403c9bc3eba50"
dependencies = [
"arbitrary",
"crc32fast",
"crossbeam-utils",
"displaydoc",
"flate2",
"indexmap 2.13.0",
"memchr",
"thiserror 2.0.18",
"zopfli",
]
[[package]] [[package]]
name = "zmij" name = "zmij"
version = "1.0.21" version = "1.0.21"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa" checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa"
[[package]]
name = "zopfli"
version = "0.8.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f05cd8797d63865425ff89b5c4a48804f35ba0ce8d125800027ad6017d2b5249"
dependencies = [
"bumpalo",
"crc32fast",
"log",
"simd-adler32",
]
[[package]] [[package]]
name = "zvariant" name = "zvariant"
version = "5.10.0" version = "5.10.0"

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "voice-to-notes" name = "voice-to-notes"
version = "0.2.5" version = "0.2.46"
description = "Voice to Notes — desktop transcription with speaker identification" description = "Voice to Notes — desktop transcription with speaker identification"
authors = ["Voice to Notes Contributors"] authors = ["Voice to Notes Contributors"]
license = "MIT" license = "MIT"
@@ -14,7 +14,7 @@ crate-type = ["staticlib", "cdylib", "rlib"]
tauri-build = { version = "2", features = [] } tauri-build = { version = "2", features = [] }
[dependencies] [dependencies]
tauri = { version = "2", features = ["protocol-asset"] } tauri = { version = "2", features = ["protocol-asset", "devtools"] }
tauri-plugin-opener = "2" tauri-plugin-opener = "2"
serde = { version = "1", features = ["derive"] } serde = { version = "1", features = ["derive"] }
serde_json = "1" serde_json = "1"
@@ -24,3 +24,6 @@ zip = { version = "2", default-features = false, features = ["deflate"] }
thiserror = "1" thiserror = "1"
chrono = { version = "0.4", features = ["serde"] } chrono = { version = "0.4", features = ["serde"] }
tauri-plugin-dialog = "2.6.0" tauri-plugin-dialog = "2.6.0"
reqwest = { version = "0.12", features = ["stream", "json"] }
futures-util = "0.3"
bytes = "1"

11
src-tauri/nsis-hooks.nsh Normal file
View File

@@ -0,0 +1,11 @@
; NSIS uninstall hook for Voice to Notes
; Removes the sidecar data directory (extracted sidecar binaries + logs)
; but preserves user data in $PROFILE\.voicetonotes (database, settings, models)
!macro NSIS_HOOK_POSTUNINSTALL
; Remove the Tauri app_local_data_dir which contains:
; - Extracted sidecar directories (voice-to-notes-sidecar/)
; - sidecar.log
; Path: %LOCALAPPDATA%\com.voicetonotes.app
RMDir /r "$LOCALAPPDATA\com.voicetonotes.app"
!macroend

View File

@@ -0,0 +1,152 @@
use std::path::PathBuf;
use std::process::Command;
#[cfg(target_os = "windows")]
use std::os::windows::process::CommandExt;
/// Extract audio from a video file to a WAV file using ffmpeg.
/// Returns the path to the extracted audio file.
#[tauri::command]
pub fn extract_audio(file_path: String, output_path: Option<String>) -> Result<String, String> {
let input = PathBuf::from(&file_path);
if !input.exists() {
return Err(format!("File not found: {}", file_path));
}
// Use provided output path, or fall back to a temp WAV file
let stem = input.file_stem().unwrap_or_default().to_string_lossy();
let output = match output_path {
Some(ref p) => PathBuf::from(p),
None => std::env::temp_dir().join(format!("{stem}_audio.wav")),
};
eprintln!(
"[media] Extracting audio: {} -> {}",
input.display(),
output.display()
);
// Find ffmpeg — check sidecar extract dir first, then system PATH
let ffmpeg = find_ffmpeg().ok_or("ffmpeg not found. Install ffmpeg or ensure it's in PATH.")?;
let mut cmd = Command::new(&ffmpeg);
cmd.args([
"-y", // Overwrite output
"-i",
&file_path,
"-vn", // No video
"-acodec",
"pcm_s16le", // WAV PCM 16-bit
"-ar",
"22050", // 22kHz mono for better playback quality
"-ac",
"1", // Mono
])
.arg(output.to_str().unwrap())
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::piped());
// Hide the console window on Windows (CREATE_NO_WINDOW = 0x08000000)
#[cfg(target_os = "windows")]
cmd.creation_flags(0x08000000);
let status = match cmd.status() {
Ok(s) => s,
Err(e) if e.raw_os_error() == Some(13) => {
// Permission denied — fix permissions and retry
eprintln!("[media] Permission denied on ffmpeg, fixing permissions and retrying...");
#[cfg(unix)]
{
use std::os::unix::fs::PermissionsExt;
if let Ok(meta) = std::fs::metadata(&ffmpeg) {
let mut perms = meta.permissions();
perms.set_mode(0o755);
let _ = std::fs::set_permissions(&ffmpeg, perms);
}
// Also fix ffprobe if it exists
let ffprobe = ffmpeg.replace("ffmpeg", "ffprobe");
if let Ok(meta) = std::fs::metadata(&ffprobe) {
let mut perms = meta.permissions();
perms.set_mode(0o755);
let _ = std::fs::set_permissions(&ffprobe, perms);
}
}
Command::new(&ffmpeg)
.args(["-y", "-i", &file_path, "-vn", "-acodec", "pcm_s16le", "-ar", "22050", "-ac", "1"])
.arg(output.to_str().unwrap())
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::piped())
.status()
.map_err(|e| format!("Failed to run ffmpeg after chmod: {e}"))?
}
Err(e) => return Err(format!("Failed to run ffmpeg: {e}")),
};
if !status.success() {
return Err(format!("ffmpeg exited with status {status}"));
}
if !output.exists() {
return Err("ffmpeg completed but output file not found".to_string());
}
eprintln!("[media] Audio extracted successfully");
Ok(output.to_string_lossy().to_string())
}
#[tauri::command]
pub fn check_file_exists(path: String) -> bool {
std::path::Path::new(&path).exists()
}
#[tauri::command]
pub fn copy_file(src: String, dst: String) -> Result<(), String> {
std::fs::copy(&src, &dst).map_err(|e| format!("Failed to copy file: {e}"))?;
Ok(())
}
#[tauri::command]
pub fn create_dir(path: String) -> Result<(), String> {
std::fs::create_dir_all(&path).map_err(|e| format!("Failed to create directory: {e}"))?;
Ok(())
}
/// Find ffmpeg binary — check sidecar directory first, then system PATH.
fn find_ffmpeg() -> Option<String> {
// Check sidecar extract dir (ffmpeg is bundled with the sidecar)
if let Some(data_dir) = crate::sidecar::DATA_DIR.get() {
// Read sidecar version to find the right directory
let version_file = data_dir.join("sidecar-version.txt");
if let Ok(version) = std::fs::read_to_string(&version_file) {
let version = version.trim();
let sidecar_dir = data_dir.join(format!("sidecar-{version}"));
let ffmpeg_name = if cfg!(target_os = "windows") {
"ffmpeg.exe"
} else {
"ffmpeg"
};
let ffmpeg_path = sidecar_dir.join(ffmpeg_name);
if ffmpeg_path.exists() {
return Some(ffmpeg_path.to_string_lossy().to_string());
}
}
}
// Fall back to system PATH
let ffmpeg_name = if cfg!(target_os = "windows") {
"ffmpeg.exe"
} else {
"ffmpeg"
};
if Command::new(ffmpeg_name)
.arg("-version")
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null())
.status()
.is_ok()
{
return Some(ffmpeg_name.to_string());
}
None
}

View File

@@ -1,6 +1,8 @@
pub mod ai; pub mod ai;
pub mod export; pub mod export;
pub mod media;
pub mod project; pub mod project;
pub mod settings; pub mod settings;
pub mod sidecar;
pub mod system; pub mod system;
pub mod transcribe; pub mod transcribe;

View File

@@ -12,7 +12,12 @@ use crate::state::AppState;
pub struct ProjectFile { pub struct ProjectFile {
pub version: u32, pub version: u32,
pub name: String, pub name: String,
pub audio_file: String, #[serde(default)]
pub audio_file: Option<String>,
#[serde(default)]
pub source_file: Option<String>,
#[serde(default)]
pub audio_wav: Option<String>,
pub created_at: String, pub created_at: String,
pub segments: Vec<ProjectFileSegment>, pub segments: Vec<ProjectFileSegment>,
pub speakers: Vec<ProjectFileSpeaker>, pub speakers: Vec<ProjectFileSpeaker>,

View File

@@ -32,3 +32,16 @@ pub fn save_settings(settings: Value) -> Result<(), String> {
fs::write(&path, json).map_err(|e| format!("Cannot write settings: {e}"))?; fs::write(&path, json).map_err(|e| format!("Cannot write settings: {e}"))?;
Ok(()) Ok(())
} }
/// Toggle devtools on the main window.
#[tauri::command]
pub fn toggle_devtools(app: tauri::AppHandle, open: bool) {
use tauri::Manager;
if let Some(window) = app.get_webview_window("main") {
if open {
window.open_devtools();
} else {
window.close_devtools();
}
}
}

View File

@@ -0,0 +1,258 @@
use futures_util::StreamExt;
use serde::Serialize;
use std::io::Write;
use tauri::{AppHandle, Emitter};
use crate::sidecar::{SidecarManager, DATA_DIR};
const REPO_API: &str = "https://repo.anhonesthost.net/api/v1/repos/MacroPad/voice-to-notes";
#[derive(Serialize, Clone)]
struct DownloadProgress {
downloaded: u64,
total: u64,
percent: u8,
}
#[derive(Serialize)]
pub struct UpdateInfo {
pub current_version: String,
pub latest_version: String,
}
/// Read the locally installed sidecar version from `sidecar-version.txt`.
/// Returns `None` if the file doesn't exist or can't be read.
fn read_local_sidecar_version() -> Option<String> {
let data_dir = DATA_DIR.get()?;
let version_file = data_dir.join("sidecar-version.txt");
std::fs::read_to_string(version_file)
.ok()
.map(|v| v.trim().to_string())
.filter(|v| !v.is_empty())
}
/// Write the sidecar version to `sidecar-version.txt` after a successful download.
fn write_local_sidecar_version(version: &str) -> Result<(), String> {
let data_dir = DATA_DIR.get().ok_or("App data directory not initialized")?;
let version_file = data_dir.join("sidecar-version.txt");
std::fs::write(&version_file, version)
.map_err(|e| format!("Failed to write sidecar version file: {}", e))
}
/// Fetch releases from the Gitea API and find the latest sidecar release
/// (one whose tag_name starts with "sidecar-v").
async fn fetch_latest_sidecar_release(
client: &reqwest::Client,
) -> Result<serde_json::Value, String> {
let releases_url = format!("{}/releases?limit=20", REPO_API);
let resp = client
.get(&releases_url)
.header("Accept", "application/json")
.send()
.await
.map_err(|e| format!("Failed to fetch releases: {}", e))?;
if !resp.status().is_success() {
return Err(format!("Failed to fetch releases: HTTP {}", resp.status()));
}
let releases = resp
.json::<Vec<serde_json::Value>>()
.await
.map_err(|e| format!("Failed to parse releases JSON: {}", e))?;
releases
.into_iter()
.find(|r| {
r["tag_name"]
.as_str()
.map_or(false, |t| t.starts_with("sidecar-v"))
})
.ok_or_else(|| "No sidecar release found".to_string())
}
/// Extract the version string from a sidecar tag name (e.g. "sidecar-v1.0.1" -> "1.0.1").
fn version_from_sidecar_tag(tag: &str) -> &str {
tag.strip_prefix("sidecar-v").unwrap_or(tag)
}
/// Check if the sidecar binary exists for the currently installed version.
#[tauri::command]
pub fn check_sidecar() -> bool {
let data_dir = match DATA_DIR.get() {
Some(d) => d,
None => return false,
};
let version = match read_local_sidecar_version() {
Some(v) => v,
None => return false,
};
let binary_name = if cfg!(target_os = "windows") {
"voice-to-notes-sidecar.exe"
} else {
"voice-to-notes-sidecar"
};
let extract_dir = data_dir.join(format!("sidecar-{}", version));
extract_dir.join(binary_name).exists()
}
/// Determine the current platform name for asset downloads.
fn platform_os() -> &'static str {
if cfg!(target_os = "windows") {
"windows"
} else if cfg!(target_os = "macos") {
"macos"
} else {
"linux"
}
}
/// Determine the current architecture name for asset downloads.
fn platform_arch() -> &'static str {
if cfg!(target_arch = "aarch64") {
"aarch64"
} else {
"x86_64"
}
}
/// Download the sidecar binary for the given variant (cpu or cuda).
#[tauri::command]
pub async fn download_sidecar(app: AppHandle, variant: String) -> Result<(), String> {
let data_dir = DATA_DIR.get().ok_or("App data directory not initialized")?;
let os = platform_os();
let arch = platform_arch();
let asset_name = format!("sidecar-{}-{}-{}.zip", os, arch, variant);
// Fetch the latest sidecar release from Gitea API
let client = reqwest::Client::new();
let sidecar_release = fetch_latest_sidecar_release(&client).await?;
let tag = sidecar_release["tag_name"]
.as_str()
.ok_or("No tag_name in sidecar release")?;
let sidecar_version = version_from_sidecar_tag(tag).to_string();
// Find the matching asset
let assets = sidecar_release["assets"]
.as_array()
.ok_or("No assets found in sidecar release")?;
let download_url = assets
.iter()
.find(|a| a["name"].as_str() == Some(&asset_name))
.and_then(|a| a["browser_download_url"].as_str())
.ok_or_else(|| {
format!(
"Asset '{}' not found in sidecar release {}",
asset_name, tag
)
})?
.to_string();
// Stream download with progress events
let response: reqwest::Response = client
.get(&download_url)
.send()
.await
.map_err(|e| format!("Failed to start download: {}", e))?;
if !response.status().is_success() {
return Err(format!("Download failed: HTTP {}", response.status()));
}
let total: u64 = response.content_length().unwrap_or(0);
let mut downloaded: u64 = 0;
let mut stream = response.bytes_stream();
let zip_path = data_dir.join("sidecar.zip");
let mut file = std::fs::File::create(&zip_path)
.map_err(|e| format!("Failed to create zip file: {}", e))?;
while let Some(chunk) = stream.next().await {
let chunk: bytes::Bytes = chunk.map_err(|e| format!("Download stream error: {}", e))?;
file.write_all(&chunk)
.map_err(|e| format!("Failed to write chunk: {}", e))?;
downloaded += chunk.len() as u64;
let percent = if total > 0 {
(downloaded * 100 / total) as u8
} else {
0
};
let _ = app.emit(
"sidecar-download-progress",
DownloadProgress {
downloaded,
total,
percent,
},
);
}
// Extract the downloaded zip
let extract_dir = data_dir.join(format!("sidecar-{}", sidecar_version));
SidecarManager::extract_zip(&zip_path, &extract_dir)?;
// Make all binaries executable on Unix (sidecar, ffmpeg, ffprobe, etc.)
#[cfg(unix)]
{
use std::os::unix::fs::PermissionsExt;
if let Ok(entries) = std::fs::read_dir(&extract_dir) {
for entry in entries.flatten() {
let path = entry.path();
if path.is_file() {
if let Ok(meta) = std::fs::metadata(&path) {
let mut perms = meta.permissions();
perms.set_mode(0o755);
let _ = std::fs::set_permissions(&path, perms);
}
}
}
}
}
// Write the sidecar version file
write_local_sidecar_version(&sidecar_version)?;
// Clean up the zip file and old sidecar versions
let _ = std::fs::remove_file(&zip_path);
SidecarManager::cleanup_old_sidecars(data_dir, &sidecar_version);
Ok(())
}
/// Check if a sidecar update is available.
#[tauri::command]
pub async fn check_sidecar_update() -> Result<Option<UpdateInfo>, String> {
// If sidecar doesn't exist yet, return None (first launch handled separately)
if !check_sidecar() {
return Ok(None);
}
let current_version = match read_local_sidecar_version() {
Some(v) => v,
None => return Ok(None),
};
// Fetch latest sidecar release from Gitea API
let client = reqwest::Client::new();
let sidecar_release = fetch_latest_sidecar_release(&client).await?;
let latest_tag = sidecar_release["tag_name"]
.as_str()
.ok_or("No tag_name in sidecar release")?;
let latest_version = version_from_sidecar_tag(latest_tag);
if latest_version != current_version {
Ok(Some(UpdateInfo {
current_version,
latest_version: latest_version.to_string(),
}))
} else {
Ok(None)
}
}

View File

@@ -60,3 +60,18 @@ pub fn llama_list_models() -> Value {
pub fn get_data_dir() -> String { pub fn get_data_dir() -> String {
LlamaManager::data_dir().to_string_lossy().to_string() LlamaManager::data_dir().to_string_lossy().to_string()
} }
/// Log a message from the frontend to a file for debugging.
#[tauri::command]
pub fn log_frontend(level: String, message: String) {
use std::io::Write;
let log_path = LlamaManager::data_dir().join("frontend.log");
if let Ok(mut file) = std::fs::OpenOptions::new()
.create(true)
.append(true)
.open(&log_path)
{
let timestamp = chrono::Local::now().format("%Y-%m-%d %H:%M:%S");
let _ = writeln!(file, "[{timestamp}] [{level}] {message}");
}
}

View File

@@ -9,12 +9,16 @@ use tauri::Manager;
use commands::ai::{ai_chat, ai_configure, ai_list_providers}; use commands::ai::{ai_chat, ai_configure, ai_list_providers};
use commands::export::export_transcript; use commands::export::export_transcript;
use commands::media::{check_file_exists, copy_file, create_dir, extract_audio};
use commands::project::{ use commands::project::{
create_project, delete_project, get_project, list_projects, load_project_file, create_project, delete_project, get_project, list_projects, load_project_file,
load_project_transcript, save_project_file, save_project_transcript, update_segment, load_project_transcript, save_project_file, save_project_transcript, update_segment,
}; };
use commands::settings::{load_settings, save_settings}; use commands::settings::{load_settings, save_settings, toggle_devtools};
use commands::system::{get_data_dir, llama_list_models, llama_start, llama_status, llama_stop}; use commands::sidecar::{check_sidecar, check_sidecar_update, download_sidecar};
use commands::system::{
get_data_dir, llama_list_models, llama_start, llama_status, llama_stop, log_frontend,
};
use commands::transcribe::{download_diarize_model, run_pipeline, transcribe_file}; use commands::transcribe::{download_diarize_model, run_pipeline, transcribe_file};
use state::AppState; use state::AppState;
@@ -65,6 +69,15 @@ pub fn run() {
get_data_dir, get_data_dir,
load_settings, load_settings,
save_settings, save_settings,
check_sidecar,
download_sidecar,
check_sidecar_update,
log_frontend,
toggle_devtools,
extract_audio,
check_file_exists,
copy_file,
create_dir,
]) ])
.run(tauri::generate_context!()) .run(tauri::generate_context!())
.expect("error while running tauri application"); .expect("error while running tauri application");

View File

@@ -14,7 +14,7 @@ use crate::sidecar::messages::IPCMessage;
/// Resource directory set by the Tauri app during setup. /// Resource directory set by the Tauri app during setup.
static RESOURCE_DIR: OnceLock<PathBuf> = OnceLock::new(); static RESOURCE_DIR: OnceLock<PathBuf> = OnceLock::new();
/// App data directory for extracting the sidecar archive. /// App data directory for extracting the sidecar archive.
static DATA_DIR: OnceLock<PathBuf> = OnceLock::new(); pub(crate) static DATA_DIR: OnceLock<PathBuf> = OnceLock::new();
/// Initialize directories for sidecar resolution. /// Initialize directories for sidecar resolution.
/// Must be called from the Tauri setup before any sidecar operations. /// Must be called from the Tauri setup before any sidecar operations.
@@ -56,10 +56,33 @@ impl SidecarManager {
cfg!(debug_assertions) || std::env::var("VOICE_TO_NOTES_DEV").is_ok() cfg!(debug_assertions) || std::env::var("VOICE_TO_NOTES_DEV").is_ok()
} }
/// Read the locally installed sidecar version from `sidecar-version.txt`.
fn read_sidecar_version() -> Result<String, String> {
let data_dir = DATA_DIR.get().ok_or("App data directory not initialized")?;
let version_file = data_dir.join("sidecar-version.txt");
std::fs::read_to_string(&version_file)
.map_err(|_| {
"Sidecar not installed: sidecar-version.txt not found. Please download the sidecar."
.to_string()
})
.map(|v| v.trim().to_string())
.and_then(|v| {
if v.is_empty() {
Err(
"Sidecar version file is empty. Please re-download the sidecar."
.to_string(),
)
} else {
Ok(v)
}
})
}
/// Resolve the frozen sidecar binary path (production mode). /// Resolve the frozen sidecar binary path (production mode).
/// ///
/// First checks if the sidecar is already extracted to the app data directory. /// Reads the installed sidecar version from `sidecar-version.txt` and
/// If not, looks for `sidecar.zip` in the Tauri resource directory and extracts it. /// looks for the binary in the corresponding `sidecar-{version}` directory.
/// If the version file doesn't exist, the sidecar hasn't been downloaded yet.
fn resolve_sidecar_path() -> Result<PathBuf, String> { fn resolve_sidecar_path() -> Result<PathBuf, String> {
let binary_name = if cfg!(target_os = "windows") { let binary_name = if cfg!(target_os = "windows") {
"voice-to-notes-sidecar.exe" "voice-to-notes-sidecar.exe"
@@ -67,16 +90,15 @@ impl SidecarManager {
"voice-to-notes-sidecar" "voice-to-notes-sidecar"
}; };
// Versioned extraction directory prevents stale sidecar after app updates let data_dir = DATA_DIR.get().ok_or("App data directory not initialized")?;
let extract_dir = DATA_DIR let current_version = Self::read_sidecar_version()?;
.get() let extract_dir = data_dir.join(format!("sidecar-{}", current_version));
.ok_or("App data directory not initialized")?
.join(format!("sidecar-{}", env!("CARGO_PKG_VERSION")));
let binary_path = extract_dir.join(binary_name); let binary_path = extract_dir.join(binary_name);
// Already extracted — use it directly // Already extracted — use it directly
if binary_path.exists() { if binary_path.exists() {
Self::cleanup_old_sidecars(data_dir, &current_version);
return Ok(binary_path); return Ok(binary_path);
} }
@@ -91,17 +113,10 @@ impl SidecarManager {
)); ));
} }
// Make executable on Unix
#[cfg(unix)] #[cfg(unix)]
{ Self::set_executable_permissions(&extract_dir);
use std::os::unix::fs::PermissionsExt;
if let Ok(meta) = std::fs::metadata(&binary_path) {
let mut perms = meta.permissions();
perms.set_mode(0o755);
let _ = std::fs::set_permissions(&binary_path, perms);
}
}
Self::cleanup_old_sidecars(data_dir, &current_version);
Ok(binary_path) Ok(binary_path)
} }
@@ -135,7 +150,7 @@ impl SidecarManager {
} }
/// Extract a zip archive to the given directory. /// Extract a zip archive to the given directory.
fn extract_zip(zip_path: &Path, dest: &Path) -> Result<(), String> { pub(crate) fn extract_zip(zip_path: &Path, dest: &Path) -> Result<(), String> {
eprintln!( eprintln!(
"[sidecar-rs] Extracting sidecar from {} to {}", "[sidecar-rs] Extracting sidecar from {} to {}",
zip_path.display(), zip_path.display(),
@@ -182,6 +197,62 @@ impl SidecarManager {
Ok(()) Ok(())
} }
/// Remove old sidecar-* directories that don't match the current version.
/// Called after the current version's sidecar is confirmed ready.
/// Set execute permissions on all files in a directory (Unix only).
#[cfg(unix)]
fn set_executable_permissions(dir: &Path) {
use std::os::unix::fs::PermissionsExt;
if let Ok(entries) = std::fs::read_dir(dir) {
for entry in entries.flatten() {
let path = entry.path();
if path.is_file() {
if let Ok(meta) = std::fs::metadata(&path) {
let mut perms = meta.permissions();
perms.set_mode(0o755);
let _ = std::fs::set_permissions(&path, perms);
}
}
}
}
}
pub(crate) fn cleanup_old_sidecars(data_dir: &Path, current_version: &str) {
let current_dir_name = format!("sidecar-{}", current_version);
let entries = match std::fs::read_dir(data_dir) {
Ok(entries) => entries,
Err(e) => {
eprintln!("[sidecar-rs] Cannot read data dir for cleanup: {e}");
return;
}
};
for entry in entries.flatten() {
let name = entry.file_name();
let name_str = name.to_string_lossy();
if !name_str.starts_with("sidecar-") {
continue;
}
if *name_str == current_dir_name {
continue;
}
if entry.path().is_dir() {
eprintln!(
"[sidecar-rs] Removing old sidecar: {}",
entry.path().display()
);
if let Err(e) = std::fs::remove_dir_all(entry.path()) {
eprintln!(
"[sidecar-rs] Failed to remove {}: {e}",
entry.path().display()
);
}
}
}
}
/// Find a working Python command for the current platform. /// Find a working Python command for the current platform.
fn find_python_command() -> &'static str { fn find_python_command() -> &'static str {
if cfg!(target_os = "windows") { if cfg!(target_os = "windows") {
@@ -260,12 +331,40 @@ impl SidecarManager {
#[cfg(target_os = "windows")] #[cfg(target_os = "windows")]
cmd.creation_flags(0x08000000); cmd.creation_flags(0x08000000);
let child = cmd match cmd.spawn() {
.spawn() Ok(child) => {
.map_err(|e| format!("Failed to start sidecar binary: {e}"))?; self.attach(child)?;
self.wait_for_ready()
self.attach(child)?; }
self.wait_for_ready() Err(e) if e.raw_os_error() == Some(13) => {
// Permission denied — fix permissions and retry once
eprintln!("[sidecar-rs] Permission denied, fixing permissions and retrying...");
#[cfg(unix)]
if let Some(dir) = path.parent() {
Self::set_executable_permissions(dir);
}
let mut retry_cmd = Command::new(path);
retry_cmd
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(if let Some(data_dir) = DATA_DIR.get() {
let log_path = data_dir.join("sidecar.log");
std::fs::File::create(&log_path)
.map(Stdio::from)
.unwrap_or_else(|_| Stdio::inherit())
} else {
Stdio::inherit()
});
#[cfg(target_os = "windows")]
retry_cmd.creation_flags(0x08000000);
let child = retry_cmd
.spawn()
.map_err(|e| format!("Failed to start sidecar binary after chmod: {e}"))?;
self.attach(child)?;
self.wait_for_ready()
}
Err(e) => Err(format!("Failed to start sidecar binary: {e}")),
}
} }
/// Spawn the Python sidecar in dev mode (system Python). /// Spawn the Python sidecar in dev mode (system Python).

View File

@@ -1,7 +1,7 @@
{ {
"$schema": "https://schema.tauri.app/config/2", "$schema": "https://schema.tauri.app/config/2",
"productName": "Voice to Notes", "productName": "Voice to Notes",
"version": "0.2.5", "version": "0.2.46",
"identifier": "com.voicetonotes.app", "identifier": "com.voicetonotes.app",
"build": { "build": {
"beforeDevCommand": "npm run dev", "beforeDevCommand": "npm run dev",
@@ -22,7 +22,7 @@
} }
], ],
"security": { "security": {
"csp": "default-src 'self'; img-src 'self' asset: https://asset.localhost; media-src 'self' asset: https://asset.localhost; style-src 'self' 'unsafe-inline'", "csp": "default-src 'self' http://tauri.localhost; connect-src ipc: http://ipc.localhost http://asset.localhost https://asset.localhost blob:; img-src 'self' asset: http://asset.localhost https://asset.localhost blob:; media-src 'self' asset: http://asset.localhost https://asset.localhost blob:; style-src 'self' 'unsafe-inline'",
"assetProtocol": { "assetProtocol": {
"enable": true, "enable": true,
"scope": ["**"] "scope": ["**"]
@@ -31,7 +31,7 @@
}, },
"bundle": { "bundle": {
"active": true, "active": true,
"targets": ["deb", "nsis", "msi", "dmg"], "targets": ["deb", "rpm", "nsis", "msi", "dmg"],
"icon": [ "icon": [
"icons/32x32.png", "icons/32x32.png",
"icons/128x128.png", "icons/128x128.png",
@@ -42,7 +42,7 @@
"category": "Utility", "category": "Utility",
"shortDescription": "Transcribe audio/video with speaker identification", "shortDescription": "Transcribe audio/video with speaker identification",
"longDescription": "Voice to Notes is a desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, or plain text.", "longDescription": "Voice to Notes is a desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, or plain text.",
"resources": ["sidecar.zip"], "resources": [],
"copyright": "Voice to Notes Contributors", "copyright": "Voice to Notes Contributors",
"license": "MIT", "license": "MIT",
"linux": { "linux": {
@@ -51,6 +51,9 @@
} }
}, },
"windows": { "windows": {
"nsis": {
"installerHooks": "nsis-hooks.nsh"
},
"wix": { "wix": {
"language": "en-US" "language": "en-US"
} }

View File

@@ -1,7 +1,7 @@
<script lang="ts"> <script lang="ts">
import { invoke } from '@tauri-apps/api/core'; import { invoke } from '@tauri-apps/api/core';
import { segments, speakers } from '$lib/stores/transcript'; import { segments, speakers } from '$lib/stores/transcript';
import { settings } from '$lib/stores/settings'; import { settings, configureAIProvider } from '$lib/stores/settings';
interface ChatMessage { interface ChatMessage {
role: 'user' | 'assistant'; role: 'user' | 'assistant';
@@ -45,22 +45,12 @@
})); }));
// Ensure the provider is configured with current credentials before chatting // Ensure the provider is configured with current credentials before chatting
const s = $settings; await configureAIProvider($settings);
const configMap: Record<string, Record<string, string>> = {
openai: { api_key: s.openai_api_key, model: s.openai_model },
anthropic: { api_key: s.anthropic_api_key, model: s.anthropic_model },
litellm: { api_key: s.litellm_api_key, api_base: s.litellm_api_base, model: s.litellm_model },
local: { model: s.local_model_path, base_url: 'http://localhost:8080' },
};
const config = configMap[s.ai_provider];
if (config) {
await invoke('ai_configure', { provider: s.ai_provider, config });
}
const result = await invoke<{ response: string }>('ai_chat', { const result = await invoke<{ response: string }>('ai_chat', {
messages: chatMessages, messages: chatMessages,
transcriptContext: getTranscriptContext(), transcriptContext: getTranscriptContext(),
provider: s.ai_provider, provider: $settings.ai_provider,
}); });
messages = [...messages, { role: 'assistant', content: result.response }]; messages = [...messages, { role: 'assistant', content: result.response }];

View File

@@ -4,9 +4,25 @@
percent?: number; percent?: number;
stage?: string; stage?: string;
message?: string; message?: string;
onCancel?: () => void;
} }
let { visible = false, percent = 0, stage = '', message = '' }: Props = $props(); let { visible = false, percent = 0, stage = '', message = '', onCancel }: Props = $props();
let showConfirm = $state(false);
function handleCancelClick() {
showConfirm = true;
}
function confirmCancel() {
showConfirm = false;
onCancel?.();
}
function dismissCancel() {
showConfirm = false;
}
// Pipeline steps in order // Pipeline steps in order
const pipelineSteps = [ const pipelineSteps = [
@@ -89,6 +105,20 @@
<p class="status-text">{message || 'Please wait...'}</p> <p class="status-text">{message || 'Please wait...'}</p>
<p class="hint-text">This may take several minutes for large files</p> <p class="hint-text">This may take several minutes for large files</p>
{#if onCancel && !showConfirm}
<button class="cancel-btn" onclick={handleCancelClick}>Cancel</button>
{/if}
{#if showConfirm}
<div class="confirm-box">
<p class="confirm-text">Processing is incomplete. If you cancel now, the transcription will need to be started over.</p>
<div class="confirm-actions">
<button class="confirm-keep" onclick={dismissCancel}>Continue Processing</button>
<button class="confirm-cancel" onclick={confirmCancel}>Cancel Processing</button>
</div>
</div>
{/if}
</div> </div>
</div> </div>
{/if} {/if}
@@ -174,4 +204,62 @@
font-size: 0.75rem; font-size: 0.75rem;
color: #555; color: #555;
} }
.cancel-btn {
margin-top: 1.25rem;
width: 100%;
padding: 0.5rem;
background: none;
border: 1px solid #4a5568;
color: #999;
border-radius: 6px;
cursor: pointer;
font-size: 0.85rem;
}
.cancel-btn:hover {
color: #e0e0e0;
border-color: #e94560;
}
.confirm-box {
margin-top: 1.25rem;
padding: 0.75rem;
background: rgba(233, 69, 96, 0.08);
border: 1px solid #e94560;
border-radius: 6px;
}
.confirm-text {
margin: 0 0 0.75rem;
font-size: 0.8rem;
color: #e0e0e0;
line-height: 1.4;
}
.confirm-actions {
display: flex;
gap: 0.5rem;
}
.confirm-keep {
flex: 1;
padding: 0.4rem;
background: #0f3460;
border: 1px solid #4a5568;
color: #e0e0e0;
border-radius: 4px;
cursor: pointer;
font-size: 0.8rem;
}
.confirm-keep:hover {
background: #1a4a7a;
}
.confirm-cancel {
flex: 1;
padding: 0.4rem;
background: #e94560;
border: none;
color: white;
border-radius: 4px;
cursor: pointer;
font-size: 0.8rem;
}
.confirm-cancel:hover {
background: #d63851;
}
</style> </style>

View File

@@ -11,7 +11,7 @@
let { visible, onClose }: Props = $props(); let { visible, onClose }: Props = $props();
let localSettings = $state<AppSettings>({ ...$settings }); let localSettings = $state<AppSettings>({ ...$settings });
let activeTab = $state<'transcription' | 'speakers' | 'ai' | 'local'>('transcription'); let activeTab = $state<'transcription' | 'speakers' | 'ai' | 'debug'>('transcription');
let modelStatus = $state<'idle' | 'downloading' | 'success' | 'error'>('idle'); let modelStatus = $state<'idle' | 'downloading' | 'success' | 'error'>('idle');
let modelError = $state(''); let modelError = $state('');
let revealedFields = $state<Set<string>>(new Set()); let revealedFields = $state<Set<string>>(new Set());
@@ -81,8 +81,8 @@
<button class="tab" class:active={activeTab === 'ai'} onclick={() => activeTab = 'ai'}> <button class="tab" class:active={activeTab === 'ai'} onclick={() => activeTab = 'ai'}>
AI Provider AI Provider
</button> </button>
<button class="tab" class:active={activeTab === 'local'} onclick={() => activeTab = 'local'}> <button class="tab" class:active={activeTab === 'debug'} onclick={() => activeTab = 'debug'}>
Local AI Debug
</button> </button>
</div> </div>
@@ -181,14 +181,27 @@
<div class="field"> <div class="field">
<label for="ai-provider">AI Provider</label> <label for="ai-provider">AI Provider</label>
<select id="ai-provider" bind:value={localSettings.ai_provider}> <select id="ai-provider" bind:value={localSettings.ai_provider}>
<option value="local">Local (llama-server)</option> <option value="local">Ollama</option>
<option value="openai">OpenAI</option> <option value="openai">OpenAI</option>
<option value="anthropic">Anthropic</option> <option value="anthropic">Anthropic</option>
<option value="litellm">OpenAI Compatible</option> <option value="litellm">OpenAI Compatible</option>
</select> </select>
</div> </div>
{#if localSettings.ai_provider === 'openai'} {#if localSettings.ai_provider === 'local'}
<div class="field">
<label for="ollama-url">Ollama URL</label>
<input id="ollama-url" type="text" bind:value={localSettings.ollama_url} placeholder="http://localhost:11434" />
</div>
<div class="field">
<label for="ollama-model">Model</label>
<input id="ollama-model" type="text" bind:value={localSettings.ollama_model} placeholder="llama3.2" />
</div>
<p class="hint">
Install Ollama from ollama.com, then pull a model with <code>ollama pull llama3.2</code>.
The app connects via Ollama's OpenAI-compatible API.
</p>
{:else if localSettings.ai_provider === 'openai'}
<div class="field"> <div class="field">
<label for="openai-key">OpenAI API Key</label> <label for="openai-key">OpenAI API Key</label>
<div class="input-reveal"> <div class="input-reveal">
@@ -229,19 +242,21 @@
<input id="litellm-model" type="text" bind:value={localSettings.litellm_model} placeholder="provider/model-name" /> <input id="litellm-model" type="text" bind:value={localSettings.litellm_model} placeholder="provider/model-name" />
</div> </div>
{/if} {/if}
{:else} {:else if activeTab === 'debug'}
<div class="field"> <div class="field checkbox">
<label for="llama-binary">llama-server Binary Path</label> <label>
<input id="llama-binary" type="text" bind:value={localSettings.local_binary_path} placeholder="llama-server" /> <input
type="checkbox"
checked={localSettings.devtools_enabled}
onchange={async (e) => {
localSettings.devtools_enabled = (e.target as HTMLInputElement).checked;
await invoke('toggle_devtools', { open: localSettings.devtools_enabled });
}}
/>
Enable Developer Tools
</label>
<p class="hint">Opens the browser inspector for debugging. Changes take effect immediately.</p>
</div> </div>
<div class="field">
<label for="llama-model">GGUF Model Path</label>
<input id="llama-model" type="text" bind:value={localSettings.local_model_path} placeholder="~/.voicetonotes/models/model.gguf" />
</div>
<p class="hint">
Place GGUF model files in ~/.voicetonotes/models/ for auto-detection.
The local AI server uses the OpenAI-compatible API from llama.cpp.
</p>
{/if} {/if}
</div> </div>

View File

@@ -0,0 +1,320 @@
<script lang="ts">
import { invoke } from '@tauri-apps/api/core';
import { listen } from '@tauri-apps/api/event';
import type { UnlistenFn } from '@tauri-apps/api/event';
import { onMount } from 'svelte';
interface Props {
onComplete: () => void;
}
let { onComplete }: Props = $props();
let variant = $state<'cpu' | 'cuda'>('cpu');
let downloading = $state(false);
let downloadProgress = $state({ downloaded: 0, total: 0, percent: 0 });
let error = $state('');
let success = $state(false);
let unlisten: UnlistenFn | null = null;
onMount(() => {
return () => {
unlisten?.();
};
});
function formatBytes(bytes: number): string {
if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(0)} KB`;
if (bytes < 1024 * 1024 * 1024) return `${(bytes / (1024 * 1024)).toFixed(0)} MB`;
return `${(bytes / (1024 * 1024 * 1024)).toFixed(1)} GB`;
}
async function startDownload() {
downloading = true;
error = '';
success = false;
unlisten = await listen<{ downloaded: number; total: number; percent: number }>(
'sidecar-download-progress',
(event) => {
downloadProgress = event.payload;
}
);
try {
await invoke('download_sidecar', { variant });
success = true;
// Brief pause so the user sees "Complete" before the screen goes away
setTimeout(() => {
onComplete();
}, 800);
} catch (err) {
error = String(err);
} finally {
downloading = false;
unlisten?.();
unlisten = null;
}
}
</script>
<div class="setup-overlay">
<div class="setup-card">
<h1 class="app-title">Voice to Notes</h1>
<h2 class="setup-heading">First-Time Setup</h2>
<p class="setup-description">
Voice to Notes needs to download its AI engine to transcribe audio.
</p>
{#if !downloading && !success}
<div class="variant-options">
<label class="variant-option" class:selected={variant === 'cpu'}>
<input type="radio" name="variant" value="cpu" bind:group={variant} />
<div class="variant-info">
<span class="variant-label">Standard (CPU)</span>
<span class="variant-desc">Works on all computers (~500 MB download)</span>
</div>
</label>
<label class="variant-option" class:selected={variant === 'cuda'}>
<input type="radio" name="variant" value="cuda" bind:group={variant} />
<div class="variant-info">
<span class="variant-label">GPU Accelerated (CUDA)</span>
<span class="variant-desc">Faster transcription with NVIDIA GPU (~2 GB download)</span>
</div>
</label>
</div>
{#if error}
<div class="error-box">
<p class="error-text">{error}</p>
<button class="btn-retry" onclick={startDownload}>Retry</button>
</div>
{:else}
<button class="btn-download" onclick={startDownload}>
Download &amp; Install
</button>
{/if}
{:else if downloading}
<div class="progress-section">
<div class="progress-bar-track">
<div class="progress-bar-fill" style="width: {downloadProgress.percent}%"></div>
</div>
<p class="progress-text">
{downloadProgress.percent}% — {formatBytes(downloadProgress.downloaded)} / {formatBytes(downloadProgress.total)}
</p>
<p class="progress-hint">Downloading {variant === 'cuda' ? 'GPU' : 'CPU'} engine...</p>
</div>
{:else if success}
<div class="success-section">
<div class="success-icon">&#10003;</div>
<p class="success-text">Setup complete!</p>
</div>
{/if}
</div>
</div>
<style>
.setup-overlay {
position: fixed;
inset: 0;
background: #0a0a23;
display: flex;
align-items: center;
justify-content: center;
z-index: 10000;
}
.setup-card {
background: #16213e;
border: 1px solid #2a3a5e;
border-radius: 12px;
padding: 2.5rem 3rem;
max-width: 480px;
width: 90vw;
color: #e0e0e0;
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.5);
text-align: center;
}
.app-title {
font-size: 1.8rem;
margin: 0 0 0.25rem;
color: #e94560;
font-weight: 700;
}
.setup-heading {
font-size: 1.1rem;
margin: 0 0 0.75rem;
color: #e0e0e0;
font-weight: 500;
}
.setup-description {
font-size: 0.9rem;
color: #b0b0b0;
margin: 0 0 1.5rem;
line-height: 1.5;
}
.variant-options {
display: flex;
flex-direction: column;
gap: 0.75rem;
margin-bottom: 1.5rem;
text-align: left;
}
.variant-option {
display: flex;
align-items: flex-start;
gap: 0.75rem;
padding: 0.85rem 1rem;
border: 1px solid #2a3a5e;
border-radius: 8px;
cursor: pointer;
transition: border-color 0.15s, background 0.15s;
}
.variant-option:hover {
border-color: #4a5568;
background: rgba(255, 255, 255, 0.02);
}
.variant-option.selected {
border-color: #e94560;
background: rgba(233, 69, 96, 0.08);
}
.variant-option input[type='radio'] {
margin-top: 0.2rem;
accent-color: #e94560;
flex-shrink: 0;
}
.variant-info {
display: flex;
flex-direction: column;
gap: 0.2rem;
}
.variant-label {
font-size: 0.9rem;
font-weight: 500;
color: #e0e0e0;
}
.variant-desc {
font-size: 0.78rem;
color: #888;
}
.btn-download {
background: #e94560;
border: none;
color: white;
padding: 0.7rem 1.5rem;
border-radius: 6px;
cursor: pointer;
font-size: 0.9rem;
font-weight: 500;
width: 100%;
transition: background 0.15s;
}
.btn-download:hover {
background: #d63851;
}
.progress-section {
margin-top: 0.5rem;
}
.progress-bar-track {
width: 100%;
height: 8px;
background: #1a1a2e;
border-radius: 4px;
overflow: hidden;
border: 1px solid #2a3a5e;
}
.progress-bar-fill {
height: 100%;
background: #e94560;
border-radius: 4px;
transition: width 0.3s ease;
}
.progress-text {
margin: 0.75rem 0 0;
font-size: 0.85rem;
color: #e0e0e0;
font-variant-numeric: tabular-nums;
}
.progress-hint {
margin: 0.35rem 0 0;
font-size: 0.78rem;
color: #888;
}
.error-box {
background: rgba(233, 69, 96, 0.1);
border: 1px solid rgba(233, 69, 96, 0.3);
border-radius: 8px;
padding: 1rem;
}
.error-text {
color: #e94560;
font-size: 0.85rem;
margin: 0 0 0.75rem;
word-break: break-word;
line-height: 1.4;
}
.btn-retry {
background: #e94560;
border: none;
color: white;
padding: 0.5rem 1.25rem;
border-radius: 6px;
cursor: pointer;
font-size: 0.85rem;
font-weight: 500;
}
.btn-retry:hover {
background: #d63851;
}
.success-section {
display: flex;
flex-direction: column;
align-items: center;
gap: 0.5rem;
padding: 1rem 0;
}
.success-icon {
width: 48px;
height: 48px;
border-radius: 50%;
background: rgba(78, 205, 196, 0.15);
color: #4ecdc4;
display: flex;
align-items: center;
justify-content: center;
font-size: 1.5rem;
font-weight: 700;
}
.success-text {
color: #4ecdc4;
font-size: 1rem;
margin: 0;
font-weight: 500;
}
</style>

View File

@@ -272,7 +272,9 @@
<style> <style>
.transcript-editor { .transcript-editor {
flex: 1; flex: 1;
min-width: 0;
overflow-y: auto; overflow-y: auto;
overflow-x: hidden;
padding: 1rem; padding: 1rem;
background: #16213e; background: #16213e;
border-radius: 8px; border-radius: 8px;
@@ -319,6 +321,7 @@
.segment-text { .segment-text {
line-height: 1.6; line-height: 1.6;
padding-left: 0.75rem; padding-left: 0.75rem;
white-space: pre-wrap;
word-wrap: break-word; word-wrap: break-word;
overflow-wrap: break-word; overflow-wrap: break-word;
} }

View File

@@ -57,6 +57,12 @@
isReady = false; isReady = false;
}); });
wavesurfer.on('error', (err: Error) => {
console.error('[voice-to-notes] WaveSurfer error:', err);
isLoading = false;
loadError = 'Failed to load audio';
});
if (audioUrl) { if (audioUrl) {
loadAudio(audioUrl); loadAudio(audioUrl);
} }

View File

@@ -10,14 +10,15 @@ export interface AppSettings {
litellm_model: string; litellm_model: string;
litellm_api_key: string; litellm_api_key: string;
litellm_api_base: string; litellm_api_base: string;
local_model_path: string; ollama_url: string;
local_binary_path: string; ollama_model: string;
transcription_model: string; transcription_model: string;
transcription_device: string; transcription_device: string;
transcription_language: string; transcription_language: string;
skip_diarization: boolean; skip_diarization: boolean;
hf_token: string; hf_token: string;
num_speakers: number | null; num_speakers: number | null;
devtools_enabled: boolean;
} }
const defaults: AppSettings = { const defaults: AppSettings = {
@@ -29,14 +30,15 @@ const defaults: AppSettings = {
litellm_model: 'gpt-4o-mini', litellm_model: 'gpt-4o-mini',
litellm_api_key: '', litellm_api_key: '',
litellm_api_base: '', litellm_api_base: '',
local_model_path: '', ollama_url: 'http://localhost:11434',
local_binary_path: 'llama-server', ollama_model: 'llama3.2',
transcription_model: 'base', transcription_model: 'base',
transcription_device: 'cpu', transcription_device: 'cpu',
transcription_language: '', transcription_language: '',
skip_diarization: false, skip_diarization: false,
hf_token: '', hf_token: '',
num_speakers: null, num_speakers: null,
devtools_enabled: false,
}; };
export const settings = writable<AppSettings>({ ...defaults }); export const settings = writable<AppSettings>({ ...defaults });
@@ -50,23 +52,27 @@ export async function loadSettings(): Promise<void> {
} }
} }
export async function saveSettings(s: AppSettings): Promise<void> { export async function configureAIProvider(s: AppSettings): Promise<void> {
settings.set(s);
await invoke('save_settings', { settings: s });
// Configure the AI provider in the Python sidecar
const configMap: Record<string, Record<string, string>> = { const configMap: Record<string, Record<string, string>> = {
openai: { api_key: s.openai_api_key, model: s.openai_model }, openai: { api_key: s.openai_api_key, model: s.openai_model },
anthropic: { api_key: s.anthropic_api_key, model: s.anthropic_model }, anthropic: { api_key: s.anthropic_api_key, model: s.anthropic_model },
litellm: { api_key: s.litellm_api_key, api_base: s.litellm_api_base, model: s.litellm_model }, litellm: { api_key: s.litellm_api_key, api_base: s.litellm_api_base, model: s.litellm_model },
local: { model: s.local_model_path, base_url: 'http://localhost:8080' }, local: { model: s.ollama_model, base_url: s.ollama_url.replace(/\/+$/, '') + '/v1' },
}; };
const config = configMap[s.ai_provider]; const config = configMap[s.ai_provider];
if (config) { if (config) {
try { try {
await invoke('ai_configure', { provider: s.ai_provider, config }); await invoke('ai_configure', { provider: s.ai_provider, config });
} catch { } catch {
// Sidecar may not be running yet — provider will be configured on first use // Sidecar may not be running yet
} }
} }
} }
export async function saveSettings(s: AppSettings): Promise<void> {
settings.set(s);
await invoke('save_settings', { settings: s });
// Configure the AI provider in the Python sidecar
await configureAIProvider(s);
}

View File

@@ -8,8 +8,9 @@
import AIChatPanel from '$lib/components/AIChatPanel.svelte'; import AIChatPanel from '$lib/components/AIChatPanel.svelte';
import ProgressOverlay from '$lib/components/ProgressOverlay.svelte'; import ProgressOverlay from '$lib/components/ProgressOverlay.svelte';
import SettingsModal from '$lib/components/SettingsModal.svelte'; import SettingsModal from '$lib/components/SettingsModal.svelte';
import SidecarSetup from '$lib/components/SidecarSetup.svelte';
import { segments, speakers } from '$lib/stores/transcript'; import { segments, speakers } from '$lib/stores/transcript';
import { settings, loadSettings } from '$lib/stores/settings'; import { settings, loadSettings, configureAIProvider } from '$lib/stores/settings';
import type { Segment, Speaker } from '$lib/types/transcript'; import type { Segment, Speaker } from '$lib/types/transcript';
import { onMount, tick } from 'svelte'; import { onMount, tick } from 'svelte';
@@ -18,13 +19,65 @@
let audioUrl = $state(''); let audioUrl = $state('');
let showSettings = $state(false); let showSettings = $state(false);
// Sidecar state
let sidecarReady = $state(false);
let sidecarChecked = $state(false);
// Sidecar update state
let sidecarUpdate = $state<{ current_version: string; latest_version: string } | null>(null);
let showUpdateDownload = $state(false);
let updateDismissed = $state(false);
// Project management state // Project management state
let currentProjectPath = $state<string | null>(null); let currentProjectPath = $state<string | null>(null);
let currentProjectName = $state(''); let currentProjectName = $state('');
let projectIsV2 = $state(false);
let audioFilePath = $state(''); let audioFilePath = $state('');
let audioWavPath = $state('');
async function checkSidecar() {
try {
const ready = await invoke<boolean>('check_sidecar');
sidecarReady = ready;
} catch {
sidecarReady = false;
}
sidecarChecked = true;
}
async function checkSidecarUpdate() {
try {
const update = await invoke<{ current_version: string; latest_version: string } | null>('check_sidecar_update');
sidecarUpdate = update;
} catch {
// Silently ignore update check failures
}
}
function handleSidecarSetupComplete() {
sidecarReady = true;
configureAIProvider($settings);
checkSidecarUpdate();
}
function handleUpdateComplete() {
showUpdateDownload = false;
sidecarUpdate = null;
}
onMount(() => { onMount(() => {
loadSettings(); loadSettings().then(() => {
// Restore devtools state from settings
if ($settings.devtools_enabled) {
invoke('toggle_devtools', { open: true });
}
});
checkSidecar().then(() => {
if (sidecarReady) {
configureAIProvider($settings);
checkSidecarUpdate();
}
});
// Global keyboard shortcuts // Global keyboard shortcuts
function handleKeyDown(e: KeyboardEvent) { function handleKeyDown(e: KeyboardEvent) {
@@ -68,25 +121,32 @@
}; };
}); });
let isTranscribing = $state(false); let isTranscribing = $state(false);
let transcriptionCancelled = $state(false);
let transcriptionProgress = $state(0); let transcriptionProgress = $state(0);
let transcriptionStage = $state(''); let transcriptionStage = $state('');
let transcriptionMessage = $state(''); let transcriptionMessage = $state('');
let extractingAudio = $state(false);
function handleCancelProcessing() {
transcriptionCancelled = true;
isTranscribing = false;
transcriptionProgress = 0;
transcriptionStage = '';
transcriptionMessage = '';
// Clear any partial results
segments.set([]);
speakers.set([]);
}
// Speaker color palette for auto-assignment // Speaker color palette for auto-assignment
const speakerColors = ['#e94560', '#4ecdc4', '#ffe66d', '#a8e6cf', '#ff8b94', '#c7ceea', '#ffd93d', '#6bcb77']; const speakerColors = ['#e94560', '#4ecdc4', '#ffe66d', '#a8e6cf', '#ff8b94', '#c7ceea', '#ffd93d', '#6bcb77'];
async function saveProject() { function buildProjectData(projectName: string) {
const defaultName = currentProjectName || 'Untitled'; return {
const outputPath = await save({ version: 2,
defaultPath: `${defaultName}.vtn`, name: projectName,
filters: [{ name: 'Voice to Notes Project', extensions: ['vtn'] }], source_file: audioFilePath,
}); audio_wav: 'audio.wav',
if (!outputPath) return;
const projectData = {
version: 1,
name: outputPath.split(/[\\/]/).pop()?.replace('.vtn', '') || defaultName,
audio_file: audioFilePath,
created_at: new Date().toISOString(), created_at: new Date().toISOString(),
segments: $segments.map(seg => { segments: $segments.map(seg => {
const speaker = $speakers.find(s => s.id === seg.speaker_id); const speaker = $speakers.find(s => s.id === seg.speaker_id);
@@ -110,17 +170,75 @@
color: s.color || '#e94560', color: s.color || '#e94560',
})), })),
}; };
}
/** Save to a specific folder — creates .vtn + audio.wav inside it. */
async function saveToFolder(folderPath: string): Promise<boolean> {
const projectName = folderPath.split(/[\\/]/).pop() || currentProjectName || 'Untitled';
const vtnPath = `${folderPath}/${projectName}.vtn`;
const wavPath = `${folderPath}/audio.wav`;
const projectData = buildProjectData(projectName);
try { try {
await invoke('save_project_file', { path: outputPath, project: projectData }); await invoke('create_dir', { path: folderPath });
currentProjectPath = outputPath; if (audioWavPath && audioWavPath !== wavPath) {
currentProjectName = projectData.name; await invoke('copy_file', { src: audioWavPath, dst: wavPath });
audioWavPath = wavPath;
}
await invoke('save_project_file', { path: vtnPath, project: projectData });
currentProjectPath = vtnPath;
currentProjectName = projectName;
projectIsV2 = true;
return true;
} catch (err) { } catch (err) {
console.error('Failed to save project:', err); console.error('Failed to save project:', err);
alert(`Failed to save: ${err}`); alert(`Failed to save: ${err}`);
return false;
} }
} }
async function saveProject() {
// Already saved as v2 folder — save in place
if (currentProjectPath && projectIsV2) {
const folderPath = currentProjectPath.replace(/[\\/][^\\/]+$/, '');
await saveToFolder(folderPath);
return;
}
// V1 project opened — migrate to folder structure
if (currentProjectPath && !projectIsV2) {
const oldVtnDir = currentProjectPath.replace(/[\\/][^\\/]+$/, '');
const projectName = currentProjectPath.split(/[\\/]/).pop()?.replace(/\.vtn$/i, '') || 'Untitled';
const folderPath = `${oldVtnDir}/${projectName}`;
const success = await saveToFolder(folderPath);
if (success) {
// Optionally remove the old .vtn file
try {
// Leave old file — user can delete manually
} catch {}
}
return;
}
// Never saved — pick a folder
await saveProjectAs();
}
async function saveProjectAs() {
// Use save dialog so the user can type a new project name.
// The chosen path is treated as the project folder (created if needed).
const defaultName = currentProjectName || 'Untitled';
const chosenPath = await save({
defaultPath: defaultName,
title: 'Save Project — enter a project name',
});
if (!chosenPath) return;
// Strip any file extension the user may have typed (e.g. ".vtn")
const folderPath = chosenPath.replace(/\.[^.\\/]+$/, '');
await saveToFolder(folderPath);
}
async function openProject() { async function openProject() {
const filePath = await open({ const filePath = await open({
filters: [{ name: 'Voice to Notes Project', extensions: ['vtn'] }], filters: [{ name: 'Voice to Notes Project', extensions: ['vtn'] }],
@@ -130,9 +248,11 @@
try { try {
const project = await invoke<{ const project = await invoke<{
version: number; version?: number;
name: string; name: string;
audio_file: string; audio_file?: string;
source_file?: string;
audio_wav?: string;
segments: Array<{ segments: Array<{
text: string; text: string;
start_ms: number; start_ms: number;
@@ -182,10 +302,135 @@
})); }));
segments.set(newSegments); segments.set(newSegments);
// Load audio // Determine the directory the .vtn file is in
audioFilePath = project.audio_file; const vtnDir = (filePath as string).replace(/[\\/][^\\/]+$/, '');
audioUrl = convertFileSrc(project.audio_file); const version = project.version ?? 1;
waveformPlayer?.loadAudio(audioUrl); projectIsV2 = version >= 2;
// Resolve audio for wavesurfer playback
if (version >= 2) {
// Version 2: audio_wav is relative to the .vtn directory, source_file is the original import path
audioFilePath = project.source_file || '';
const wavRelative = project.audio_wav || 'audio.wav';
const resolvedWav = `${vtnDir}/${wavRelative}`;
const wavExists = await invoke<boolean>('check_file_exists', { path: resolvedWav });
if (wavExists) {
audioWavPath = resolvedWav;
audioUrl = convertFileSrc(resolvedWav);
waveformPlayer?.loadAudio(audioUrl);
} else {
// WAV missing — try re-extracting from the original source file
const sourceExists = audioFilePath ? await invoke<boolean>('check_file_exists', { path: audioFilePath }) : false;
if (sourceExists) {
extractingAudio = true;
await tick();
try {
const outputPath = `${vtnDir}/${wavRelative}`;
const wavPath = await invoke<string>('extract_audio', { filePath: audioFilePath, outputPath });
audioWavPath = wavPath;
audioUrl = convertFileSrc(wavPath);
waveformPlayer?.loadAudio(audioUrl);
} catch (err) {
console.error('Failed to re-extract audio:', err);
alert(`Failed to re-extract audio: ${err}`);
} finally {
extractingAudio = false;
}
} else {
// Both missing — ask user to locate the file
const shouldRelink = confirm(
'The audio file for this project could not be found.\n\n' +
`Original source: ${audioFilePath || '(unknown)'}\n\n` +
'Would you like to locate the file?'
);
if (shouldRelink) {
const newPath = await open({
multiple: false,
filters: [{
name: 'Audio/Video',
extensions: ['mp3', 'wav', 'flac', 'ogg', 'm4a', 'aac', 'wma',
'mp4', 'mkv', 'avi', 'mov', 'webm'],
}],
});
if (newPath) {
audioFilePath = newPath;
extractingAudio = true;
await tick();
try {
const outputPath = `${vtnDir}/${wavRelative}`;
const wavPath = await invoke<string>('extract_audio', { filePath: newPath, outputPath });
audioWavPath = wavPath;
audioUrl = convertFileSrc(wavPath);
waveformPlayer?.loadAudio(audioUrl);
} catch (err) {
console.error('Failed to extract audio from re-linked file:', err);
alert(`Failed to extract audio: ${err}`);
} finally {
extractingAudio = false;
}
}
}
}
}
} else {
// Version 1 (legacy): audio_file is the source path
const sourceFile = project.audio_file || '';
audioFilePath = sourceFile;
const sourceExists = sourceFile ? await invoke<boolean>('check_file_exists', { path: sourceFile }) : false;
if (sourceExists) {
// Extract WAV next to the .vtn file for playback
extractingAudio = true;
await tick();
try {
const outputPath = `${vtnDir}/audio.wav`;
const wavPath = await invoke<string>('extract_audio', { filePath: sourceFile, outputPath });
audioWavPath = wavPath;
audioUrl = convertFileSrc(wavPath);
waveformPlayer?.loadAudio(audioUrl);
} catch (err) {
console.error('Failed to extract audio:', err);
alert(`Failed to extract audio: ${err}`);
} finally {
extractingAudio = false;
}
} else {
// Source missing — ask user to locate the file
const shouldRelink = confirm(
'The audio file for this project could not be found.\n\n' +
`Original path: ${sourceFile || '(unknown)'}\n\n` +
'Would you like to locate the file?'
);
if (shouldRelink) {
const newPath = await open({
multiple: false,
filters: [{
name: 'Audio/Video',
extensions: ['mp3', 'wav', 'flac', 'ogg', 'm4a', 'aac', 'wma',
'mp4', 'mkv', 'avi', 'mov', 'webm'],
}],
});
if (newPath) {
audioFilePath = newPath;
extractingAudio = true;
await tick();
try {
const outputPath = `${vtnDir}/audio.wav`;
const wavPath = await invoke<string>('extract_audio', { filePath: newPath, outputPath });
audioWavPath = wavPath;
audioUrl = convertFileSrc(wavPath);
waveformPlayer?.loadAudio(audioUrl);
} catch (err) {
console.error('Failed to extract audio from re-linked file:', err);
alert(`Failed to extract audio: ${err}`);
} finally {
extractingAudio = false;
}
}
}
}
}
currentProjectPath = filePath as string; currentProjectPath = filePath as string;
currentProjectName = project.name; currentProjectName = project.name;
@@ -216,9 +461,35 @@
}); });
if (!filePath) return; if (!filePath) return;
// Track the original file path and convert to asset URL for wavesurfer // Always extract audio to WAV for wavesurfer playback
extractingAudio = true;
await tick();
try {
const wavPath = await invoke<string>('extract_audio', { filePath });
audioWavPath = wavPath;
} catch (err) {
console.error('[voice-to-notes] Failed to extract audio:', err);
const msg = String(err);
if (msg.includes('ffmpeg not found')) {
alert(
'FFmpeg is required to extract audio.\n\n' +
'Install FFmpeg:\n' +
' Windows: winget install ffmpeg\n' +
' macOS: brew install ffmpeg\n' +
' Linux: sudo apt install ffmpeg\n\n' +
'Then restart Voice to Notes and try again.'
);
} else {
alert(`Failed to extract audio: ${msg}`);
}
return;
} finally {
extractingAudio = false;
}
// Track the original file path for the sidecar (it does its own conversion)
audioFilePath = filePath; audioFilePath = filePath;
audioUrl = convertFileSrc(filePath); audioUrl = convertFileSrc(audioWavPath);
waveformPlayer?.loadAudio(audioUrl); waveformPlayer?.loadAudio(audioUrl);
// Clear previous results // Clear previous results
@@ -227,6 +498,7 @@
// Start pipeline (transcription + diarization) // Start pipeline (transcription + diarization)
isTranscribing = true; isTranscribing = true;
transcriptionCancelled = false;
transcriptionProgress = 0; transcriptionProgress = 0;
transcriptionStage = 'Starting...'; transcriptionStage = 'Starting...';
transcriptionMessage = 'Initializing pipeline...'; transcriptionMessage = 'Initializing pipeline...';
@@ -337,6 +609,9 @@
numSpeakers: $settings.num_speakers && $settings.num_speakers > 0 ? $settings.num_speakers : undefined, numSpeakers: $settings.num_speakers && $settings.num_speakers > 0 ? $settings.num_speakers : undefined,
}); });
// If cancelled while processing, discard results
if (transcriptionCancelled) return;
// Create speaker entries from pipeline result // Create speaker entries from pipeline result
const newSpeakers: Speaker[] = (result.speakers || []).map((label, idx) => ({ const newSpeakers: Speaker[] = (result.speakers || []).map((label, idx) => ({
id: `speaker-${idx}`, id: `speaker-${idx}`,
@@ -443,14 +718,31 @@
} }
</script> </script>
{#if !appReady} {#if !appReady || !sidecarChecked}
<div class="splash-screen"> <div class="splash-screen">
<h1 class="splash-title">Voice to Notes</h1> <h1 class="splash-title">Voice to Notes</h1>
<p class="splash-subtitle">Loading...</p> <p class="splash-subtitle">Loading...</p>
<div class="splash-spinner"></div> <div class="splash-spinner"></div>
</div> </div>
{:else if sidecarChecked && !sidecarReady && !showUpdateDownload}
<SidecarSetup onComplete={handleSidecarSetupComplete} />
{:else if showUpdateDownload}
<SidecarSetup onComplete={handleUpdateComplete} />
{:else} {:else}
<div class="app-shell"> <div class="app-shell">
{#if sidecarUpdate && !updateDismissed}
<div class="update-banner">
<span class="update-text">
Sidecar update available (v{sidecarUpdate.current_version} &rarr; v{sidecarUpdate.latest_version})
</span>
<button class="update-btn" onclick={() => showUpdateDownload = true}>
Update
</button>
<button class="update-dismiss" onclick={() => updateDismissed = true} title="Dismiss">
&times;
</button>
</div>
{/if}
<div class="app-header"> <div class="app-header">
<div class="header-actions"> <div class="header-actions">
<button class="settings-btn" onclick={openProject} disabled={isTranscribing}> <button class="settings-btn" onclick={openProject} disabled={isTranscribing}>
@@ -458,7 +750,10 @@
</button> </button>
{#if $segments.length > 0} {#if $segments.length > 0}
<button class="settings-btn" onclick={saveProject}> <button class="settings-btn" onclick={saveProject}>
Save Project Save
</button>
<button class="settings-btn" onclick={saveProjectAs}>
Save As
</button> </button>
{/if} {/if}
<button class="import-btn" onclick={handleFileImport} disabled={isTranscribing}> <button class="import-btn" onclick={handleFileImport} disabled={isTranscribing}>
@@ -507,8 +802,18 @@
percent={transcriptionProgress} percent={transcriptionProgress}
stage={transcriptionStage} stage={transcriptionStage}
message={transcriptionMessage} message={transcriptionMessage}
onCancel={handleCancelProcessing}
/> />
{#if extractingAudio}
<div class="extraction-overlay">
<div class="extraction-card">
<div class="extraction-spinner"></div>
<p>Extracting audio...</p>
</div>
</div>
{/if}
<SettingsModal <SettingsModal
visible={showSettings} visible={showSettings}
onClose={() => showSettings = false} onClose={() => showSettings = false}
@@ -674,4 +979,80 @@
@keyframes spin { @keyframes spin {
to { transform: rotate(360deg); } to { transform: rotate(360deg); }
} }
/* Sidecar update banner */
.update-banner {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.5rem 1rem;
background: rgba(78, 205, 196, 0.1);
border-bottom: 1px solid rgba(78, 205, 196, 0.25);
color: #e0e0e0;
font-size: 0.85rem;
}
.update-text {
flex: 1;
color: #b0b0b0;
}
.update-btn {
background: #4ecdc4;
border: none;
color: #0a0a23;
padding: 0.3rem 0.85rem;
border-radius: 4px;
cursor: pointer;
font-size: 0.8rem;
font-weight: 600;
}
.update-btn:hover {
background: #3dbdb5;
}
.update-dismiss {
background: none;
border: none;
color: #888;
font-size: 1.1rem;
cursor: pointer;
padding: 0.1rem 0.3rem;
line-height: 1;
}
.update-dismiss:hover {
color: #e0e0e0;
}
/* Audio extraction overlay */
.extraction-overlay {
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.8);
display: flex;
align-items: center;
justify-content: center;
z-index: 9999;
}
.extraction-card {
background: #16213e;
padding: 2rem 2.5rem;
border-radius: 12px;
color: #e0e0e0;
border: 1px solid #2a3a5e;
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.5);
display: flex;
flex-direction: column;
align-items: center;
gap: 1rem;
}
.extraction-card p {
margin: 0;
font-size: 1rem;
}
.extraction-spinner {
width: 32px;
height: 32px;
border: 3px solid #2a3a5e;
border-top-color: #e94560;
border-radius: 50%;
animation: spin 0.8s linear infinite;
}
</style> </style>