24 Commits

Author SHA1 Message Date
Gitea Actions
73eab2e80c chore: bump sidecar version to 1.0.13 [skip ci] 2026-03-23 14:58:07 +00:00
Claude
33ca3e4a28 Show chunk context in transcription progress for large files
All checks were successful
Build Sidecars / Bump sidecar version and tag (push) Successful in 3s
Release / Bump version and tag (push) Successful in 3s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 8m30s
Release / Build App (macOS) (push) Successful in 1m19s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 12m9s
Release / Build App (Linux) (push) Successful in 3m36s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 29m36s
Release / Build App (Windows) (push) Successful in 3m13s
Files >1 hour are split into 5-minute chunks. Previously each chunk
showed "Starting transcription..." making it look like a restart.
Now shows "Chunk 3/12: Starting transcription..." and
"Chunk 3/12: Transcribing segment 5 (42% of audio)..."

Also skips the "Loading model..." message for chunks after the first
since the model is already loaded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 07:57:59 -07:00
Gitea Actions
e65d8b0510 chore: bump version to 0.2.35 [skip ci] 2026-03-23 14:31:13 +00:00
Claude
a7364f2e50 Fix 's is not defined' in AIChatPanel
All checks were successful
Release / Bump version and tag (push) Successful in 4s
Release / Build App (macOS) (push) Successful in 1m18s
Release / Build App (Linux) (push) Successful in 3m37s
Release / Build App (Windows) (push) Successful in 3m53s
Leftover reference to removed 's' variable — changed to $settings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 07:31:07 -07:00
Gitea Actions
809acfc781 chore: bump version to 0.2.34 [skip ci] 2026-03-23 13:42:26 +00:00
Claude
96e9a6d38b Fix Ollama: remove duplicate stale configMap in AIChatPanel
All checks were successful
Release / Bump version and tag (push) Successful in 6s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Linux) (push) Successful in 4m49s
Release / Build App (Windows) (push) Successful in 3m8s
AIChatPanel had its own hardcoded configMap with the old llama-server
URL (localhost:8080) and field names (local_model_path). Every chat
message reconfigured the provider with these wrong values, overriding
the correct settings applied at startup.

Fix: replace the duplicate with a call to the shared configureAIProvider().
Also strip trailing slashes from ollama_url before appending /v1 to
prevent double-slash URLs (http://localhost:11434//v1).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 06:33:03 -07:00
Gitea Actions
ddfbd65478 chore: bump version to 0.2.33 [skip ci] 2026-03-23 13:24:46 +00:00
Gitea Actions
e80ee3a18f chore: bump sidecar version to 1.0.12 [skip ci] 2026-03-23 13:24:34 +00:00
Claude
806586ae3d Fix diarization performance for long files + better progress
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 11s
Release / Bump version and tag (push) Successful in 10s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m0s
Release / Build App (macOS) (push) Successful in 1m16s
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Build Sidecars / Build Sidecar (Linux) (push) Successful in 17m34s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 28m9s
- Cache loaded audio in _sf_load() — previously the entire WAV file was
  re-read from disk for every 10s crop call. For a 3-hour file with
  1000+ chunks, this meant ~345GB of disk reads. Now read once, cached.
- Better progress messages for long files: show elapsed time in m:ss
  format, warn "(180min audio, this may take a while)" for files >10min
- Increased progress poll interval from 2s to 5s (less noise)
- Better time estimate: use 0.8x audio duration (was 0.5x)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 06:24:21 -07:00
Gitea Actions
999bdaa671 chore: bump version to 0.2.32 [skip ci] 2026-03-23 12:38:47 +00:00
Claude
b1d46fd42e Add cancel button to processing overlay with confirmation
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m21s
Release / Build App (Windows) (push) Successful in 3m8s
Release / Build App (Linux) (push) Successful in 3m40s
- Cancel button on the progress overlay during transcription
- Clicking Cancel shows confirmation: "Processing is incomplete. If you
  cancel now, the transcription will need to be started over."
- "Continue Processing" dismisses the dialog, "Cancel Processing" stops
- Cancel clears partial results (segments, speakers) and resets UI
- Pipeline results are discarded if cancelled during processing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 05:38:40 -07:00
Gitea Actions
818cbfa69c chore: bump version to 0.2.31 [skip ci] 2026-03-23 12:30:19 +00:00
Claude
aa319eb823 Fix Ollama settings on startup + video extraction UX
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m18s
Release / Build App (Linux) (push) Successful in 3m44s
Release / Build App (Windows) (push) Successful in 3m57s
AI provider:
- Extract configureAIProvider() from saveSettings for reuse
- Call it on app startup after sidecar is ready (was only called on Save)
- Call it after first-time sidecar download completes
- Sidecar now receives correct Ollama URL/model immediately

Video extraction:
- Hide ffmpeg console window on Windows (CREATE_NO_WINDOW flag)
- Show "Extracting audio from video..." overlay with spinner during extraction
- UI stays responsive while ffmpeg runs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 05:30:14 -07:00
Gitea Actions
8faa336cbc chore: bump version to 0.2.30 [skip ci] 2026-03-23 03:12:25 +00:00
Claude
02c70f90c8 Extract audio from video files before loading
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Linux) (push) Successful in 4m53s
Release / Build App (Windows) (push) Successful in 3m45s
Video files (MP4, MKV, etc.) are now processed with ffmpeg to extract
audio to a temp WAV file before loading into wavesurfer. This prevents
the WebView crash caused by trying to fetch multi-GB files into memory.

- New extract_audio Tauri command uses ffmpeg (sidecar-bundled or system)
- Frontend detects video extensions and extracts audio automatically
- User-friendly error if ffmpeg is not installed with install instructions
- Reverted wavesurfer MediaElement approach in favor of clean extraction
- Added FFmpeg install guide to USER_GUIDE.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 20:04:10 -07:00
Gitea Actions
66db827f17 chore: bump version to 0.2.29 [skip ci] 2026-03-23 02:55:23 +00:00
Gitea Actions
d9fcc9a5bd chore: bump sidecar version to 1.0.11 [skip ci] 2026-03-23 02:55:17 +00:00
Claude
ca5dc98d24 Fix Ollama: set_active after configure + fix default URL
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 5s
Release / Bump version and tag (push) Successful in 5s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m35s
Release / Build App (macOS) (push) Successful in 1m18s
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Build Sidecars / Build Sidecar (Linux) (push) Successful in 16m56s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 37m0s
The configure action registered the provider but never called
set_active(), so the sidecar kept using the old/default provider.
Also updated the local provider default from localhost:8080 to
localhost:11434/v1 (Ollama). Added debug logging for configure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:55:09 -07:00
Gitea Actions
da49c04119 chore: bump version to 0.2.28 [skip ci] 2026-03-23 01:30:57 +00:00
Gitea Actions
833ddb67de chore: bump sidecar version to 1.0.10 [skip ci] 2026-03-23 01:30:51 +00:00
Claude
879a1f3fd6 Fix diarization tensor mismatch + fix sidecar build triggers
All checks were successful
Build Sidecars / Bump sidecar version and tag (push) Successful in 7s
Release / Bump version and tag (push) Successful in 5s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m32s
Release / Build App (macOS) (push) Successful in 1m16s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 16m28s
Release / Build App (Linux) (push) Successful in 4m26s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 33m5s
Release / Build App (Windows) (push) Successful in 3m29s
Diarization: Audio.crop patch now pads short segments with zeros to
match the expected duration. pyannote batches embeddings with vstack
which requires uniform tensor sizes — the last segment of a file can
be shorter than the 10s window.

CI: Reordered sidecar workflow to check for python/ changes FIRST,
before bumping version or configuring git. All subsequent steps are
gated on has_changes. This prevents unnecessary version bumps and
build runs when only app code changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 18:30:43 -07:00
Gitea Actions
6f9dc9a95e chore: bump version to 0.2.27 [skip ci] 2026-03-23 01:05:15 +00:00
Claude
2a9a7e42a3 Add daily workflow to clean up old releases (keep latest 5)
All checks were successful
Release / Bump version and tag (push) Successful in 4s
Release / Build App (macOS) (push) Successful in 1m25s
Release / Build App (Linux) (push) Successful in 4m43s
Release / Build App (Windows) (push) Successful in 3m20s
Runs daily at 6am UTC and on manual dispatch. Separately tracks app
releases (v*) and sidecar releases (sidecar-v*), keeping the latest
5 of each and deleting older ones along with their tags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 18:05:08 -07:00
Gitea Actions
34b060a8a5 chore: bump version to 0.2.26 [skip ci] 2026-03-23 00:42:00 +00:00
18 changed files with 496 additions and 62 deletions

View File

@@ -18,14 +18,34 @@ jobs:
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
with: with:
fetch-depth: 0 fetch-depth: 2
- name: Check for python changes
id: check_changes
run: |
# If triggered by workflow_dispatch, always build
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
exit 0
fi
# Check if any python/ files changed in this commit
CHANGED=$(git diff --name-only HEAD~1 HEAD -- python/ 2>/dev/null || echo "")
if [ -n "$CHANGED" ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
echo "Python changes detected: $CHANGED"
else
echo "has_changes=false" >> $GITHUB_OUTPUT
echo "No python/ changes detected, skipping sidecar build"
fi
- name: Configure git - name: Configure git
if: steps.check_changes.outputs.has_changes == 'true'
run: | run: |
git config user.name "Gitea Actions" git config user.name "Gitea Actions"
git config user.email "actions@gitea.local" git config user.email "actions@gitea.local"
- name: Bump sidecar patch version - name: Bump sidecar patch version
if: steps.check_changes.outputs.has_changes == 'true'
id: bump id: bump
run: | run: |
# Read current version from python/pyproject.toml # Read current version from python/pyproject.toml
@@ -46,23 +66,6 @@ jobs:
echo "version=${NEW_VERSION}" >> $GITHUB_OUTPUT echo "version=${NEW_VERSION}" >> $GITHUB_OUTPUT
echo "tag=sidecar-v${NEW_VERSION}" >> $GITHUB_OUTPUT echo "tag=sidecar-v${NEW_VERSION}" >> $GITHUB_OUTPUT
- name: Check for python changes
id: check_changes
run: |
# If triggered by workflow_dispatch, always build
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
exit 0
fi
# Check if any python/ files changed in this commit
CHANGED=$(git diff --name-only HEAD~1 HEAD -- python/ || echo "")
if [ -n "$CHANGED" ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
else
echo "has_changes=false" >> $GITHUB_OUTPUT
echo "No python/ changes detected, skipping sidecar build"
fi
- name: Commit and tag - name: Commit and tag
if: steps.check_changes.outputs.has_changes == 'true' if: steps.check_changes.outputs.has_changes == 'true'
env: env:

View File

@@ -0,0 +1,65 @@
name: Cleanup Old Releases
on:
# Run after release and sidecar workflows complete
schedule:
- cron: '0 6 * * *' # Daily at 6am UTC
workflow_dispatch:
jobs:
cleanup:
name: Remove old releases
runs-on: ubuntu-latest
env:
KEEP_COUNT: 5
steps:
- name: Cleanup old app releases
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
# Get all releases, sorted newest first (API default)
RELEASES=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases?limit=50")
# Separate app releases (v*) and sidecar releases (sidecar-v*)
APP_IDS=$(echo "$RELEASES" | jq -r '[.[] | select(.tag_name | startswith("v") and (startswith("sidecar") | not)) | .id] | .[]')
SIDECAR_IDS=$(echo "$RELEASES" | jq -r '[.[] | select(.tag_name | startswith("sidecar-v")) | .id] | .[]')
# Delete app releases beyond KEEP_COUNT
COUNT=0
for ID in $APP_IDS; do
COUNT=$((COUNT + 1))
if [ $COUNT -le ${{ env.KEEP_COUNT }} ]; then
continue
fi
TAG=$(echo "$RELEASES" | jq -r ".[] | select(.id == $ID) | .tag_name")
echo "Deleting app release $ID ($TAG)..."
curl -s -o /dev/null -w "HTTP %{http_code}\n" -X DELETE \
-H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/$ID"
# Also delete the tag
curl -s -o /dev/null -X DELETE \
-H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/tags/$TAG"
done
# Delete sidecar releases beyond KEEP_COUNT
COUNT=0
for ID in $SIDECAR_IDS; do
COUNT=$((COUNT + 1))
if [ $COUNT -le ${{ env.KEEP_COUNT }} ]; then
continue
fi
TAG=$(echo "$RELEASES" | jq -r ".[] | select(.id == $ID) | .tag_name")
echo "Deleting sidecar release $ID ($TAG)..."
curl -s -o /dev/null -w "HTTP %{http_code}\n" -X DELETE \
-H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/$ID"
curl -s -o /dev/null -X DELETE \
-H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/tags/$TAG"
done
echo "Cleanup complete. Kept latest ${{ env.KEEP_COUNT }} of each type."

View File

@@ -26,10 +26,13 @@ The sidecar only needs to be downloaded once. Updates are detected automatically
## Basic Workflow ## Basic Workflow
### 1. Import Audio ### 1. Import Audio or Video
- Click **Import Audio** or press **Ctrl+O** (Cmd+O on Mac) - Click **Import Audio** or press **Ctrl+O** (Cmd+O on Mac)
- Supported formats: MP3, WAV, FLAC, OGG, M4A, AAC, WMA, MP4, MKV, AVI, MOV, WebM - **Audio formats:** MP3, WAV, FLAC, OGG, M4A, AAC, WMA
- **Video formats:** MP4, MKV, AVI, MOV, WebM — audio is automatically extracted
> **Note:** Video file import requires [FFmpeg](#installing-ffmpeg) to be installed on your system.
### 2. Transcribe ### 2. Transcribe
@@ -181,8 +184,42 @@ If you prefer cloud-based AI:
--- ---
## Installing FFmpeg
FFmpeg is required for importing video files (MP4, MKV, AVI, etc.). It's used to extract the audio track before transcription.
**Windows:**
```
winget install ffmpeg
```
Or download from [ffmpeg.org/download.html](https://ffmpeg.org/download.html) and add to your PATH.
**macOS:**
```
brew install ffmpeg
```
**Linux (Debian/Ubuntu):**
```
sudo apt install ffmpeg
```
**Linux (Fedora/RHEL):**
```
sudo dnf install ffmpeg
```
After installing, restart Voice to Notes. FFmpeg is not needed for audio-only files (MP3, WAV, FLAC, etc.).
---
## Troubleshooting ## Troubleshooting
### Video import fails / "FFmpeg not found"
- Install FFmpeg using the instructions above
- Make sure `ffmpeg` is in your system PATH
- Restart Voice to Notes after installing
### Transcription is slow ### Transcription is slow
- Use a smaller model (tiny or base) - Use a smaller model (tiny or base)
- If you have an NVIDIA GPU, select CUDA in Settings > Transcription > Device - If you have an NVIDIA GPU, select CUDA in Settings > Transcription > Device

View File

@@ -1,6 +1,6 @@
{ {
"name": "voice-to-notes", "name": "voice-to-notes",
"version": "0.2.25", "version": "0.2.35",
"description": "Desktop app for transcribing audio/video with speaker identification", "description": "Desktop app for transcribing audio/video with speaker identification",
"type": "module", "type": "module",
"scripts": { "scripts": {

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "voice-to-notes" name = "voice-to-notes"
version = "1.0.9" version = "1.0.13"
description = "Python sidecar for Voice to Notes — transcription, diarization, and AI services" description = "Python sidecar for Voice to Notes — transcription, diarization, and AI services"
requires-python = ">=3.11" requires-python = ">=3.11"
license = "MIT" license = "MIT"

View File

@@ -254,15 +254,15 @@ def make_ai_chat_handler() -> HandlerFunc:
) )
if action == "configure": if action == "configure":
# Re-create a provider with custom settings # Re-create a provider with custom settings and set it active
provider_name = payload.get("provider", "") provider_name = payload.get("provider", "")
config = payload.get("config", {}) config = payload.get("config", {})
if provider_name == "local": if provider_name == "local":
from voice_to_notes.providers.local_provider import LocalProvider from voice_to_notes.providers.local_provider import LocalProvider
service.register_provider("local", LocalProvider( service.register_provider("local", LocalProvider(
base_url=config.get("base_url", "http://localhost:8080"), base_url=config.get("base_url", "http://localhost:11434/v1"),
model=config.get("model", "local"), model=config.get("model", "llama3.2"),
)) ))
elif provider_name == "openai": elif provider_name == "openai":
from voice_to_notes.providers.openai_provider import OpenAIProvider from voice_to_notes.providers.openai_provider import OpenAIProvider
@@ -286,6 +286,10 @@ def make_ai_chat_handler() -> HandlerFunc:
api_key=config.get("api_key"), api_key=config.get("api_key"),
api_base=config.get("api_base"), api_base=config.get("api_base"),
)) ))
# Set the configured provider as active
print(f"[sidecar] Configured AI provider: {provider_name} with config: {config}", file=sys.stderr, flush=True)
if provider_name in ("local", "openai", "anthropic", "litellm"):
service.set_active(provider_name)
return IPCMessage( return IPCMessage(
id=msg.id, id=msg.id,
type="ai.configured", type="ai.configured",

View File

@@ -41,14 +41,23 @@ def _patch_pyannote_audio() -> None:
import torch import torch
from pyannote.audio.core.io import Audio from pyannote.audio.core.io import Audio
# Cache loaded audio to avoid re-reading the entire file for every crop call.
# For a 3-hour file, crop is called 1000+ times — without caching, each call
# reads ~345MB from disk.
_audio_cache: dict[str, tuple] = {}
def _sf_load(audio_path: str) -> tuple: def _sf_load(audio_path: str) -> tuple:
"""Load audio via soundfile, return (channels, samples) tensor + sample_rate.""" """Load audio via soundfile with caching."""
data, sample_rate = sf.read(str(audio_path), dtype="float32") key = str(audio_path)
if key in _audio_cache:
return _audio_cache[key]
data, sample_rate = sf.read(key, dtype="float32")
waveform = torch.from_numpy(np.array(data)) waveform = torch.from_numpy(np.array(data))
if waveform.ndim == 1: if waveform.ndim == 1:
waveform = waveform.unsqueeze(0) waveform = waveform.unsqueeze(0)
else: else:
waveform = waveform.T waveform = waveform.T
_audio_cache[key] = (waveform, sample_rate)
return waveform, sample_rate return waveform, sample_rate
def _soundfile_call(self, file: dict) -> tuple: def _soundfile_call(self, file: dict) -> tuple:
@@ -56,7 +65,12 @@ def _patch_pyannote_audio() -> None:
return _sf_load(file["audio"]) return _sf_load(file["audio"])
def _soundfile_crop(self, file: dict, segment, **kwargs) -> tuple: def _soundfile_crop(self, file: dict, segment, **kwargs) -> tuple:
"""Replacement for Audio.crop — load full file then slice.""" """Replacement for Audio.crop — load file once (cached) then slice.
Pads short segments with zeros to match the expected duration,
which pyannote requires for batched embedding extraction.
"""
duration = kwargs.get("duration", None)
waveform, sample_rate = _sf_load(file["audio"]) waveform, sample_rate = _sf_load(file["audio"])
# Convert segment (seconds) to sample indices # Convert segment (seconds) to sample indices
start_sample = int(segment.start * sample_rate) start_sample = int(segment.start * sample_rate)
@@ -65,6 +79,14 @@ def _patch_pyannote_audio() -> None:
start_sample = max(0, start_sample) start_sample = max(0, start_sample)
end_sample = min(waveform.shape[-1], end_sample) end_sample = min(waveform.shape[-1], end_sample)
cropped = waveform[:, start_sample:end_sample] cropped = waveform[:, start_sample:end_sample]
# Pad to expected duration if needed (pyannote batches require uniform size)
if duration is not None:
expected_samples = int(duration * sample_rate)
else:
expected_samples = int((segment.end - segment.start) * sample_rate)
if cropped.shape[-1] < expected_samples:
pad = torch.zeros(cropped.shape[0], expected_samples - cropped.shape[-1])
cropped = torch.cat([cropped, pad], dim=-1)
return cropped, sample_rate return cropped, sample_rate
Audio.__call__ = _soundfile_call # type: ignore[assignment] Audio.__call__ = _soundfile_call # type: ignore[assignment]
@@ -266,13 +288,20 @@ class DiarizeService:
thread.start() thread.start()
elapsed = 0.0 elapsed = 0.0
estimated_total = max(audio_duration_sec * 0.5, 30.0) if audio_duration_sec else 120.0 estimated_total = max(audio_duration_sec * 0.8, 30.0) if audio_duration_sec else 120.0
while not done_event.wait(timeout=2.0): duration_str = ""
elapsed += 2.0 if audio_duration_sec and audio_duration_sec > 600:
mins = int(audio_duration_sec / 60)
duration_str = f" ({mins}min audio, this may take a while)"
while not done_event.wait(timeout=5.0):
elapsed += 5.0
pct = min(20 + int((elapsed / estimated_total) * 65), 85) pct = min(20 + int((elapsed / estimated_total) * 65), 85)
elapsed_min = int(elapsed / 60)
elapsed_sec = int(elapsed % 60)
time_str = f"{elapsed_min}m{elapsed_sec:02d}s" if elapsed_min > 0 else f"{int(elapsed)}s"
write_message(progress_message( write_message(progress_message(
request_id, pct, "diarizing", request_id, pct, "diarizing",
f"Analyzing speakers ({int(elapsed)}s elapsed)...")) f"Analyzing speakers ({time_str} elapsed){duration_str}"))
thread.join() thread.join()

View File

@@ -113,17 +113,22 @@ class TranscribeService:
compute_type: str = "int8", compute_type: str = "int8",
language: str | None = None, language: str | None = None,
on_segment: Callable[[SegmentResult, int], None] | None = None, on_segment: Callable[[SegmentResult, int], None] | None = None,
chunk_label: str | None = None,
) -> TranscriptionResult: ) -> TranscriptionResult:
"""Transcribe an audio file with word-level timestamps. """Transcribe an audio file with word-level timestamps.
Sends progress messages via IPC during processing. Sends progress messages via IPC during processing.
If chunk_label is set (e.g. "chunk 3/12"), messages are prefixed with it.
""" """
# Stage: loading model prefix = f"{chunk_label}: " if chunk_label else ""
write_message(progress_message(request_id, 0, "loading_model", f"Loading {model_name}..."))
# Stage: loading model (skip for chunks after the first — model already loaded)
if not chunk_label:
write_message(progress_message(request_id, 0, "loading_model", f"Loading {model_name}..."))
model = self._ensure_model(model_name, device, compute_type) model = self._ensure_model(model_name, device, compute_type)
# Stage: transcribing # Stage: transcribing
write_message(progress_message(request_id, 10, "transcribing", "Starting transcription...")) write_message(progress_message(request_id, 10, "transcribing", f"{prefix}Starting transcription..."))
start_time = time.time() start_time = time.time()
segments_iter, info = model.transcribe( segments_iter, info = model.transcribe(
@@ -176,7 +181,7 @@ class TranscribeService:
request_id, request_id,
progress_pct, progress_pct,
"transcribing", "transcribing",
f"Transcribing segment {segment_count} ({progress_pct}% of audio)...", f"{prefix}Transcribing segment {segment_count} ({progress_pct}% of audio)...",
) )
) )
@@ -271,6 +276,7 @@ class TranscribeService:
chunk_result = self.transcribe( chunk_result = self.transcribe(
request_id, tmp.name, model_name, device, request_id, tmp.name, model_name, device,
compute_type, language, on_segment=chunk_on_segment, compute_type, language, on_segment=chunk_on_segment,
chunk_label=f"Chunk {chunk_idx + 1}/{num_chunks}",
) )
# Offset timestamps and merge # Offset timestamps and merge

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "voice-to-notes" name = "voice-to-notes"
version = "0.2.25" version = "0.2.35"
description = "Voice to Notes — desktop transcription with speaker identification" description = "Voice to Notes — desktop transcription with speaker identification"
authors = ["Voice to Notes Contributors"] authors = ["Voice to Notes Contributors"]
license = "MIT" license = "MIT"

View File

@@ -0,0 +1,104 @@
use std::path::PathBuf;
use std::process::Command;
#[cfg(target_os = "windows")]
use std::os::windows::process::CommandExt;
/// Extract audio from a video file to a WAV file using ffmpeg.
/// Returns the path to the extracted audio file.
#[tauri::command]
pub fn extract_audio(file_path: String) -> Result<String, String> {
let input = PathBuf::from(&file_path);
if !input.exists() {
return Err(format!("File not found: {}", file_path));
}
// Output to a temp WAV file next to the original or in temp dir
let stem = input.file_stem().unwrap_or_default().to_string_lossy();
let output = std::env::temp_dir().join(format!("{stem}_audio.wav"));
eprintln!(
"[media] Extracting audio: {} -> {}",
input.display(),
output.display()
);
// Find ffmpeg — check sidecar extract dir first, then system PATH
let ffmpeg = find_ffmpeg().ok_or("ffmpeg not found. Install ffmpeg or ensure it's in PATH.")?;
let mut cmd = Command::new(&ffmpeg);
cmd.args([
"-y", // Overwrite output
"-i",
&file_path,
"-vn", // No video
"-acodec",
"pcm_s16le", // WAV PCM 16-bit
"-ar",
"16000", // 16kHz (optimal for whisper)
"-ac",
"1", // Mono
])
.arg(output.to_str().unwrap())
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::piped());
// Hide the console window on Windows (CREATE_NO_WINDOW = 0x08000000)
#[cfg(target_os = "windows")]
cmd.creation_flags(0x08000000);
let status = cmd
.status()
.map_err(|e| format!("Failed to run ffmpeg: {e}"))?;
if !status.success() {
return Err(format!("ffmpeg exited with status {status}"));
}
if !output.exists() {
return Err("ffmpeg completed but output file not found".to_string());
}
eprintln!("[media] Audio extracted successfully");
Ok(output.to_string_lossy().to_string())
}
/// Find ffmpeg binary — check sidecar directory first, then system PATH.
fn find_ffmpeg() -> Option<String> {
// Check sidecar extract dir (ffmpeg is bundled with the sidecar)
if let Some(data_dir) = crate::sidecar::DATA_DIR.get() {
// Read sidecar version to find the right directory
let version_file = data_dir.join("sidecar-version.txt");
if let Ok(version) = std::fs::read_to_string(&version_file) {
let version = version.trim();
let sidecar_dir = data_dir.join(format!("sidecar-{version}"));
let ffmpeg_name = if cfg!(target_os = "windows") {
"ffmpeg.exe"
} else {
"ffmpeg"
};
let ffmpeg_path = sidecar_dir.join(ffmpeg_name);
if ffmpeg_path.exists() {
return Some(ffmpeg_path.to_string_lossy().to_string());
}
}
}
// Fall back to system PATH
let ffmpeg_name = if cfg!(target_os = "windows") {
"ffmpeg.exe"
} else {
"ffmpeg"
};
if Command::new(ffmpeg_name)
.arg("-version")
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null())
.status()
.is_ok()
{
return Some(ffmpeg_name.to_string());
}
None
}

View File

@@ -1,5 +1,6 @@
pub mod ai; pub mod ai;
pub mod export; pub mod export;
pub mod media;
pub mod project; pub mod project;
pub mod settings; pub mod settings;
pub mod sidecar; pub mod sidecar;

View File

@@ -9,6 +9,7 @@ use tauri::Manager;
use commands::ai::{ai_chat, ai_configure, ai_list_providers}; use commands::ai::{ai_chat, ai_configure, ai_list_providers};
use commands::export::export_transcript; use commands::export::export_transcript;
use commands::media::extract_audio;
use commands::project::{ use commands::project::{
create_project, delete_project, get_project, list_projects, load_project_file, create_project, delete_project, get_project, list_projects, load_project_file,
load_project_transcript, save_project_file, save_project_transcript, update_segment, load_project_transcript, save_project_file, save_project_transcript, update_segment,
@@ -73,6 +74,7 @@ pub fn run() {
check_sidecar_update, check_sidecar_update,
log_frontend, log_frontend,
toggle_devtools, toggle_devtools,
extract_audio,
]) ])
.run(tauri::generate_context!()) .run(tauri::generate_context!())
.expect("error while running tauri application"); .expect("error while running tauri application");

View File

@@ -1,7 +1,7 @@
{ {
"$schema": "https://schema.tauri.app/config/2", "$schema": "https://schema.tauri.app/config/2",
"productName": "Voice to Notes", "productName": "Voice to Notes",
"version": "0.2.25", "version": "0.2.35",
"identifier": "com.voicetonotes.app", "identifier": "com.voicetonotes.app",
"build": { "build": {
"beforeDevCommand": "npm run dev", "beforeDevCommand": "npm run dev",

View File

@@ -1,7 +1,7 @@
<script lang="ts"> <script lang="ts">
import { invoke } from '@tauri-apps/api/core'; import { invoke } from '@tauri-apps/api/core';
import { segments, speakers } from '$lib/stores/transcript'; import { segments, speakers } from '$lib/stores/transcript';
import { settings } from '$lib/stores/settings'; import { settings, configureAIProvider } from '$lib/stores/settings';
interface ChatMessage { interface ChatMessage {
role: 'user' | 'assistant'; role: 'user' | 'assistant';
@@ -45,22 +45,12 @@
})); }));
// Ensure the provider is configured with current credentials before chatting // Ensure the provider is configured with current credentials before chatting
const s = $settings; await configureAIProvider($settings);
const configMap: Record<string, Record<string, string>> = {
openai: { api_key: s.openai_api_key, model: s.openai_model },
anthropic: { api_key: s.anthropic_api_key, model: s.anthropic_model },
litellm: { api_key: s.litellm_api_key, api_base: s.litellm_api_base, model: s.litellm_model },
local: { model: s.local_model_path, base_url: 'http://localhost:8080' },
};
const config = configMap[s.ai_provider];
if (config) {
await invoke('ai_configure', { provider: s.ai_provider, config });
}
const result = await invoke<{ response: string }>('ai_chat', { const result = await invoke<{ response: string }>('ai_chat', {
messages: chatMessages, messages: chatMessages,
transcriptContext: getTranscriptContext(), transcriptContext: getTranscriptContext(),
provider: s.ai_provider, provider: $settings.ai_provider,
}); });
messages = [...messages, { role: 'assistant', content: result.response }]; messages = [...messages, { role: 'assistant', content: result.response }];

View File

@@ -4,9 +4,25 @@
percent?: number; percent?: number;
stage?: string; stage?: string;
message?: string; message?: string;
onCancel?: () => void;
} }
let { visible = false, percent = 0, stage = '', message = '' }: Props = $props(); let { visible = false, percent = 0, stage = '', message = '', onCancel }: Props = $props();
let showConfirm = $state(false);
function handleCancelClick() {
showConfirm = true;
}
function confirmCancel() {
showConfirm = false;
onCancel?.();
}
function dismissCancel() {
showConfirm = false;
}
// Pipeline steps in order // Pipeline steps in order
const pipelineSteps = [ const pipelineSteps = [
@@ -89,6 +105,20 @@
<p class="status-text">{message || 'Please wait...'}</p> <p class="status-text">{message || 'Please wait...'}</p>
<p class="hint-text">This may take several minutes for large files</p> <p class="hint-text">This may take several minutes for large files</p>
{#if onCancel && !showConfirm}
<button class="cancel-btn" onclick={handleCancelClick}>Cancel</button>
{/if}
{#if showConfirm}
<div class="confirm-box">
<p class="confirm-text">Processing is incomplete. If you cancel now, the transcription will need to be started over.</p>
<div class="confirm-actions">
<button class="confirm-keep" onclick={dismissCancel}>Continue Processing</button>
<button class="confirm-cancel" onclick={confirmCancel}>Cancel Processing</button>
</div>
</div>
{/if}
</div> </div>
</div> </div>
{/if} {/if}
@@ -174,4 +204,62 @@
font-size: 0.75rem; font-size: 0.75rem;
color: #555; color: #555;
} }
.cancel-btn {
margin-top: 1.25rem;
width: 100%;
padding: 0.5rem;
background: none;
border: 1px solid #4a5568;
color: #999;
border-radius: 6px;
cursor: pointer;
font-size: 0.85rem;
}
.cancel-btn:hover {
color: #e0e0e0;
border-color: #e94560;
}
.confirm-box {
margin-top: 1.25rem;
padding: 0.75rem;
background: rgba(233, 69, 96, 0.08);
border: 1px solid #e94560;
border-radius: 6px;
}
.confirm-text {
margin: 0 0 0.75rem;
font-size: 0.8rem;
color: #e0e0e0;
line-height: 1.4;
}
.confirm-actions {
display: flex;
gap: 0.5rem;
}
.confirm-keep {
flex: 1;
padding: 0.4rem;
background: #0f3460;
border: 1px solid #4a5568;
color: #e0e0e0;
border-radius: 4px;
cursor: pointer;
font-size: 0.8rem;
}
.confirm-keep:hover {
background: #1a4a7a;
}
.confirm-cancel {
flex: 1;
padding: 0.4rem;
background: #e94560;
border: none;
color: white;
border-radius: 4px;
cursor: pointer;
font-size: 0.8rem;
}
.confirm-cancel:hover {
background: #d63851;
}
</style> </style>

View File

@@ -57,6 +57,12 @@
isReady = false; isReady = false;
}); });
wavesurfer.on('error', (err: Error) => {
console.error('[voice-to-notes] WaveSurfer error:', err);
isLoading = false;
loadError = 'Failed to load audio';
});
if (audioUrl) { if (audioUrl) {
loadAudio(audioUrl); loadAudio(audioUrl);
} }

View File

@@ -52,23 +52,27 @@ export async function loadSettings(): Promise<void> {
} }
} }
export async function saveSettings(s: AppSettings): Promise<void> { export async function configureAIProvider(s: AppSettings): Promise<void> {
settings.set(s);
await invoke('save_settings', { settings: s });
// Configure the AI provider in the Python sidecar
const configMap: Record<string, Record<string, string>> = { const configMap: Record<string, Record<string, string>> = {
openai: { api_key: s.openai_api_key, model: s.openai_model }, openai: { api_key: s.openai_api_key, model: s.openai_model },
anthropic: { api_key: s.anthropic_api_key, model: s.anthropic_model }, anthropic: { api_key: s.anthropic_api_key, model: s.anthropic_model },
litellm: { api_key: s.litellm_api_key, api_base: s.litellm_api_base, model: s.litellm_model }, litellm: { api_key: s.litellm_api_key, api_base: s.litellm_api_base, model: s.litellm_model },
local: { model: s.ollama_model, base_url: s.ollama_url + '/v1' }, local: { model: s.ollama_model, base_url: s.ollama_url.replace(/\/+$/, '') + '/v1' },
}; };
const config = configMap[s.ai_provider]; const config = configMap[s.ai_provider];
if (config) { if (config) {
try { try {
await invoke('ai_configure', { provider: s.ai_provider, config }); await invoke('ai_configure', { provider: s.ai_provider, config });
} catch { } catch {
// Sidecar may not be running yet — provider will be configured on first use // Sidecar may not be running yet
} }
} }
} }
export async function saveSettings(s: AppSettings): Promise<void> {
settings.set(s);
await invoke('save_settings', { settings: s });
// Configure the AI provider in the Python sidecar
await configureAIProvider(s);
}

View File

@@ -10,7 +10,7 @@
import SettingsModal from '$lib/components/SettingsModal.svelte'; import SettingsModal from '$lib/components/SettingsModal.svelte';
import SidecarSetup from '$lib/components/SidecarSetup.svelte'; import SidecarSetup from '$lib/components/SidecarSetup.svelte';
import { segments, speakers } from '$lib/stores/transcript'; import { segments, speakers } from '$lib/stores/transcript';
import { settings, loadSettings } from '$lib/stores/settings'; import { settings, loadSettings, configureAIProvider } from '$lib/stores/settings';
import type { Segment, Speaker } from '$lib/types/transcript'; import type { Segment, Speaker } from '$lib/types/transcript';
import { onMount, tick } from 'svelte'; import { onMount, tick } from 'svelte';
@@ -54,6 +54,7 @@
function handleSidecarSetupComplete() { function handleSidecarSetupComplete() {
sidecarReady = true; sidecarReady = true;
configureAIProvider($settings);
checkSidecarUpdate(); checkSidecarUpdate();
} }
@@ -71,6 +72,7 @@
}); });
checkSidecar().then(() => { checkSidecar().then(() => {
if (sidecarReady) { if (sidecarReady) {
configureAIProvider($settings);
checkSidecarUpdate(); checkSidecarUpdate();
} }
}); });
@@ -117,9 +119,22 @@
}; };
}); });
let isTranscribing = $state(false); let isTranscribing = $state(false);
let transcriptionCancelled = $state(false);
let transcriptionProgress = $state(0); let transcriptionProgress = $state(0);
let transcriptionStage = $state(''); let transcriptionStage = $state('');
let transcriptionMessage = $state(''); let transcriptionMessage = $state('');
let extractingAudio = $state(false);
function handleCancelProcessing() {
transcriptionCancelled = true;
isTranscribing = false;
transcriptionProgress = 0;
transcriptionStage = '';
transcriptionMessage = '';
// Clear any partial results
segments.set([]);
speakers.set([]);
}
// Speaker color palette for auto-assignment // Speaker color palette for auto-assignment
const speakerColors = ['#e94560', '#4ecdc4', '#ffe66d', '#a8e6cf', '#ff8b94', '#c7ceea', '#ffd93d', '#6bcb77']; const speakerColors = ['#e94560', '#4ecdc4', '#ffe66d', '#a8e6cf', '#ff8b94', '#c7ceea', '#ffd93d', '#6bcb77'];
@@ -254,6 +269,8 @@
// Changes persist when user saves the project file. // Changes persist when user saves the project file.
} }
const VIDEO_EXTENSIONS = ['mp4', 'mkv', 'avi', 'mov', 'webm'];
async function handleFileImport() { async function handleFileImport() {
const filePath = await open({ const filePath = await open({
multiple: false, multiple: false,
@@ -265,9 +282,38 @@
}); });
if (!filePath) return; if (!filePath) return;
// Track the original file path and convert to asset URL for wavesurfer // For video files, extract audio first using ffmpeg
const ext = filePath.split('.').pop()?.toLowerCase() ?? '';
let audioPath = filePath;
if (VIDEO_EXTENSIONS.includes(ext)) {
extractingAudio = true;
await tick();
try {
audioPath = await invoke<string>('extract_audio', { filePath });
} catch (err) {
console.error('[voice-to-notes] Failed to extract audio:', err);
const msg = String(err);
if (msg.includes('ffmpeg not found')) {
alert(
'FFmpeg is required to import video files.\n\n' +
'Install FFmpeg:\n' +
' Windows: winget install ffmpeg\n' +
' macOS: brew install ffmpeg\n' +
' Linux: sudo apt install ffmpeg\n\n' +
'Then restart Voice to Notes and try again.'
);
} else {
alert(`Failed to extract audio from video: ${msg}`);
}
return;
} finally {
extractingAudio = false;
}
}
// Track the original file path (video or audio) for the sidecar
audioFilePath = filePath; audioFilePath = filePath;
audioUrl = convertFileSrc(filePath); audioUrl = convertFileSrc(audioPath);
waveformPlayer?.loadAudio(audioUrl); waveformPlayer?.loadAudio(audioUrl);
// Clear previous results // Clear previous results
@@ -276,6 +322,7 @@
// Start pipeline (transcription + diarization) // Start pipeline (transcription + diarization)
isTranscribing = true; isTranscribing = true;
transcriptionCancelled = false;
transcriptionProgress = 0; transcriptionProgress = 0;
transcriptionStage = 'Starting...'; transcriptionStage = 'Starting...';
transcriptionMessage = 'Initializing pipeline...'; transcriptionMessage = 'Initializing pipeline...';
@@ -386,6 +433,9 @@
numSpeakers: $settings.num_speakers && $settings.num_speakers > 0 ? $settings.num_speakers : undefined, numSpeakers: $settings.num_speakers && $settings.num_speakers > 0 ? $settings.num_speakers : undefined,
}); });
// If cancelled while processing, discard results
if (transcriptionCancelled) return;
// Create speaker entries from pipeline result // Create speaker entries from pipeline result
const newSpeakers: Speaker[] = (result.speakers || []).map((label, idx) => ({ const newSpeakers: Speaker[] = (result.speakers || []).map((label, idx) => ({
id: `speaker-${idx}`, id: `speaker-${idx}`,
@@ -573,8 +623,18 @@
percent={transcriptionProgress} percent={transcriptionProgress}
stage={transcriptionStage} stage={transcriptionStage}
message={transcriptionMessage} message={transcriptionMessage}
onCancel={handleCancelProcessing}
/> />
{#if extractingAudio}
<div class="extraction-overlay">
<div class="extraction-card">
<div class="extraction-spinner"></div>
<p>Extracting audio from video...</p>
</div>
</div>
{/if}
<SettingsModal <SettingsModal
visible={showSettings} visible={showSettings}
onClose={() => showSettings = false} onClose={() => showSettings = false}
@@ -781,4 +841,39 @@
.update-dismiss:hover { .update-dismiss:hover {
color: #e0e0e0; color: #e0e0e0;
} }
/* Audio extraction overlay */
.extraction-overlay {
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.8);
display: flex;
align-items: center;
justify-content: center;
z-index: 9999;
}
.extraction-card {
background: #16213e;
padding: 2rem 2.5rem;
border-radius: 12px;
color: #e0e0e0;
border: 1px solid #2a3a5e;
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.5);
display: flex;
flex-direction: column;
align-items: center;
gap: 1rem;
}
.extraction-card p {
margin: 0;
font-size: 1rem;
}
.extraction-spinner {
width: 32px;
height: 32px;
border: 3px solid #2a3a5e;
border-top-color: #e94560;
border-radius: 50%;
animation: spin 0.8s linear infinite;
}
</style> </style>