chore: bump sidecar version to 1.0.13 [skip ci]

Show chunk context in transcription progress for large files
Files >1 hour are split into 5-minute chunks. Previously each chunk showed "Starting transcription..." making it look like a restart. Now shows "Chunk 3/12: Starting transcription..." and "Chunk 3/12: Transcribing segment 5 (42% of audio)..." Also skips the "Loading model..." message for chunks after the first since the model is already loaded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 14:58:07 +00:00 · 2026-03-23 07:57:59 -07:00 · 2026-03-23 14:31:13 +00:00 · 2026-03-23 07:31:07 -07:00 · 2026-03-23 13:42:26 +00:00 · 2026-03-23 06:33:03 -07:00
18 changed files with 496 additions and 62 deletions
--- a/.gitea/workflows/build-sidecar.yml
+++ b/.gitea/workflows/build-sidecar.yml
@@ -18,14 +18,34 @@ jobs:
    steps:
      - uses: actions/checkout@v4
        with:
-          fetch-depth: 0
+          fetch-depth: 2
      - name: Check for python changes
        id: check_changes
        run: |
          # If triggered by workflow_dispatch, always build
          if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
            echo "has_changes=true" >> $GITHUB_OUTPUT
            exit 0
          fi
          # Check if any python/ files changed in this commit
          CHANGED=$(git diff --name-only HEAD~1 HEAD -- python/ 2>/dev/null || echo "")
          if [ -n "$CHANGED" ]; then
            echo "has_changes=true" >> $GITHUB_OUTPUT
            echo "Python changes detected: $CHANGED"
          else
            echo "has_changes=false" >> $GITHUB_OUTPUT
            echo "No python/ changes detected, skipping sidecar build"
          fi
      - name: Configure git
        if: steps.check_changes.outputs.has_changes == 'true'
        run: |
          git config user.name "Gitea Actions"
          git config user.email "actions@gitea.local"
      - name: Bump sidecar patch version
        if: steps.check_changes.outputs.has_changes == 'true'
        id: bump
        run: |
          # Read current version from python/pyproject.toml
@@ -46,23 +66,6 @@ jobs:
          echo "version=${NEW_VERSION}" >> $GITHUB_OUTPUT
          echo "tag=sidecar-v${NEW_VERSION}" >> $GITHUB_OUTPUT
      - name: Check for python changes
        id: check_changes
        run: |
          # If triggered by workflow_dispatch, always build
          if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
            echo "has_changes=true" >> $GITHUB_OUTPUT
            exit 0
          fi
          # Check if any python/ files changed in this commit
          CHANGED=$(git diff --name-only HEAD~1 HEAD -- python/ || echo "")
          if [ -n "$CHANGED" ]; then
            echo "has_changes=true" >> $GITHUB_OUTPUT
          else
            echo "has_changes=false" >> $GITHUB_OUTPUT
            echo "No python/ changes detected, skipping sidecar build"
          fi
      - name: Commit and tag
        if: steps.check_changes.outputs.has_changes == 'true'
        env:
--- a/.gitea/workflows/cleanup-releases.yml
+++ b/.gitea/workflows/cleanup-releases.yml
@@ -0,0 +1,65 @@
 name: Cleanup Old Releases
 on:
  # Run after release and sidecar workflows complete
  schedule:
    - cron: '0 6 * * *'  # Daily at 6am UTC
  workflow_dispatch:
 jobs:
  cleanup:
    name: Remove old releases
    runs-on: ubuntu-latest
    env:
      KEEP_COUNT: 5
    steps:
      - name: Cleanup old app releases
        env:
          BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
        run: |
          REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
          # Get all releases, sorted newest first (API default)
          RELEASES=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
            "${REPO_API}/releases?limit=50")
          # Separate app releases (v*) and sidecar releases (sidecar-v*)
          APP_IDS=$(echo "$RELEASES" | jq -r '[.[] | select(.tag_name | startswith("v") and (startswith("sidecar") | not)) | .id] | .[]')
          SIDECAR_IDS=$(echo "$RELEASES" | jq -r '[.[] | select(.tag_name | startswith("sidecar-v")) | .id] | .[]')
          # Delete app releases beyond KEEP_COUNT
          COUNT=0
          for ID in $APP_IDS; do
            COUNT=$((COUNT + 1))
            if [ $COUNT -le ${{ env.KEEP_COUNT }} ]; then
              continue
            fi
            TAG=$(echo "$RELEASES" | jq -r ".[] | select(.id == $ID) | .tag_name")
            echo "Deleting app release $ID ($TAG)..."
            curl -s -o /dev/null -w "HTTP %{http_code}\n" -X DELETE \
              -H "Authorization: token ${BUILD_TOKEN}" \
              "${REPO_API}/releases/$ID"
            # Also delete the tag
            curl -s -o /dev/null -X DELETE \
              -H "Authorization: token ${BUILD_TOKEN}" \
              "${REPO_API}/tags/$TAG"
          done
          # Delete sidecar releases beyond KEEP_COUNT
          COUNT=0
          for ID in $SIDECAR_IDS; do
            COUNT=$((COUNT + 1))
            if [ $COUNT -le ${{ env.KEEP_COUNT }} ]; then
              continue
            fi
            TAG=$(echo "$RELEASES" | jq -r ".[] | select(.id == $ID) | .tag_name")
            echo "Deleting sidecar release $ID ($TAG)..."
            curl -s -o /dev/null -w "HTTP %{http_code}\n" -X DELETE \
              -H "Authorization: token ${BUILD_TOKEN}" \
              "${REPO_API}/releases/$ID"
            curl -s -o /dev/null -X DELETE \
              -H "Authorization: token ${BUILD_TOKEN}" \
              "${REPO_API}/tags/$TAG"
          done
          echo "Cleanup complete. Kept latest ${{ env.KEEP_COUNT }} of each type."
--- a/docs/USER_GUIDE.md
+++ b/docs/USER_GUIDE.md
@@ -26,10 +26,13 @@ The sidecar only needs to be downloaded once. Updates are detected automatically
 ## Basic Workflow
-### 1. Import Audio
+### 1. Import Audio or Video
 - Click **Import Audio** or press **Ctrl+O** (Cmd+O on Mac)
- Supported formats: MP3, WAV, FLAC, OGG, M4A, AAC, WMA, MP4, MKV, AVI, MOV, WebM
+- **Audio formats:** MP3, WAV, FLAC, OGG, M4A, AAC, WMA
 - **Video formats:** MP4, MKV, AVI, MOV, WebM — audio is automatically extracted
 > **Note:** Video file import requires [FFmpeg](#installing-ffmpeg) to be installed on your system.
 ### 2. Transcribe
@@ -181,8 +184,42 @@ If you prefer cloud-based AI:
 ---
 ## Installing FFmpeg
 FFmpeg is required for importing video files (MP4, MKV, AVI, etc.). It's used to extract the audio track before transcription.
 **Windows:**
 ```
 winget install ffmpeg
 ```
 Or download from [ffmpeg.org/download.html](https://ffmpeg.org/download.html) and add to your PATH.
 **macOS:**
 ```
 brew install ffmpeg
 ```
 **Linux (Debian/Ubuntu):**
 ```
 sudo apt install ffmpeg
 ```
 **Linux (Fedora/RHEL):**
 ```
 sudo dnf install ffmpeg
 ```
 After installing, restart Voice to Notes. FFmpeg is not needed for audio-only files (MP3, WAV, FLAC, etc.).
 ---
 ## Troubleshooting
 ### Video import fails / "FFmpeg not found"
 - Install FFmpeg using the instructions above
 - Make sure `ffmpeg` is in your system PATH
 - Restart Voice to Notes after installing
 ### Transcription is slow
 - Use a smaller model (tiny or base)
 - If you have an NVIDIA GPU, select CUDA in Settings > Transcription > Device
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "voice-to-notes",
-  "version": "0.2.25",
+  "version": "0.2.35",
  "description": "Desktop app for transcribing audio/video with speaker identification",
  "type": "module",
  "scripts": {
--- a/python/pyproject.toml
+++ b/python/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "voice-to-notes"
-version = "1.0.9"
+version = "1.0.13"
 description = "Python sidecar for Voice to Notes — transcription, diarization, and AI services"
 requires-python = ">=3.11"
 license = "MIT"
--- a/python/voice_to_notes/ipc/handlers.py
+++ b/python/voice_to_notes/ipc/handlers.py
@@ -254,15 +254,15 @@ def make_ai_chat_handler() -> HandlerFunc:
            )
        if action == "configure":
-            # Re-create a provider with custom settings
+            # Re-create a provider with custom settings and set it active
            provider_name = payload.get("provider", "")
            config = payload.get("config", {})
            if provider_name == "local":
                from voice_to_notes.providers.local_provider import LocalProvider
                service.register_provider("local", LocalProvider(
-                    base_url=config.get("base_url", "http://localhost:8080"),
+                    base_url=config.get("base_url", "http://localhost:11434/v1"),
-                    model=config.get("model", "local"),
+                    model=config.get("model", "llama3.2"),
                ))
            elif provider_name == "openai":
                from voice_to_notes.providers.openai_provider import OpenAIProvider
@@ -286,6 +286,10 @@ def make_ai_chat_handler() -> HandlerFunc:
                    api_key=config.get("api_key"),
                    api_base=config.get("api_base"),
                ))
            # Set the configured provider as active
            print(f"[sidecar] Configured AI provider: {provider_name} with config: {config}", file=sys.stderr, flush=True)
            if provider_name in ("local", "openai", "anthropic", "litellm"):
                service.set_active(provider_name)
            return IPCMessage(
                id=msg.id,
                type="ai.configured",
--- a/python/voice_to_notes/services/diarize.py
+++ b/python/voice_to_notes/services/diarize.py
@@ -41,14 +41,23 @@ def _patch_pyannote_audio() -> None:
        import torch
        from pyannote.audio.core.io import Audio
        # Cache loaded audio to avoid re-reading the entire file for every crop call.
        # For a 3-hour file, crop is called 1000+ times — without caching, each call
        # reads ~345MB from disk.
        _audio_cache: dict[str, tuple] = {}
        def _sf_load(audio_path: str) -> tuple:
-            """Load audio via soundfile, return (channels, samples) tensor + sample_rate."""
+            """Load audio via soundfile with caching."""
-            data, sample_rate = sf.read(str(audio_path), dtype="float32")
+            key = str(audio_path)
            if key in _audio_cache:
                return _audio_cache[key]
            data, sample_rate = sf.read(key, dtype="float32")
            waveform = torch.from_numpy(np.array(data))
            if waveform.ndim == 1:
                waveform = waveform.unsqueeze(0)
            else:
                waveform = waveform.T
            _audio_cache[key] = (waveform, sample_rate)
            return waveform, sample_rate
        def _soundfile_call(self, file: dict) -> tuple:
@@ -56,7 +65,12 @@ def _patch_pyannote_audio() -> None:
            return _sf_load(file["audio"])
        def _soundfile_crop(self, file: dict, segment, **kwargs) -> tuple:
-            """Replacement for Audio.crop — load full file then slice."""
+            """Replacement for Audio.crop — load file once (cached) then slice.
            Pads short segments with zeros to match the expected duration,
            which pyannote requires for batched embedding extraction.
            """
            duration = kwargs.get("duration", None)
            waveform, sample_rate = _sf_load(file["audio"])
            # Convert segment (seconds) to sample indices
            start_sample = int(segment.start * sample_rate)
@@ -65,6 +79,14 @@ def _patch_pyannote_audio() -> None:
            start_sample = max(0, start_sample)
            end_sample = min(waveform.shape[-1], end_sample)
            cropped = waveform[:, start_sample:end_sample]
            # Pad to expected duration if needed (pyannote batches require uniform size)
            if duration is not None:
                expected_samples = int(duration * sample_rate)
            else:
                expected_samples = int((segment.end - segment.start) * sample_rate)
            if cropped.shape[-1] < expected_samples:
                pad = torch.zeros(cropped.shape[0], expected_samples - cropped.shape[-1])
                cropped = torch.cat([cropped, pad], dim=-1)
            return cropped, sample_rate
        Audio.__call__ = _soundfile_call  # type: ignore[assignment]
@@ -266,13 +288,20 @@ class DiarizeService:
        thread.start()
        elapsed = 0.0
-        estimated_total = max(audio_duration_sec * 0.5, 30.0) if audio_duration_sec else 120.0
+        estimated_total = max(audio_duration_sec * 0.8, 30.0) if audio_duration_sec else 120.0
-        while not done_event.wait(timeout=2.0):
+        duration_str = ""
-            elapsed += 2.0
+        if audio_duration_sec and audio_duration_sec > 600:
            mins = int(audio_duration_sec / 60)
            duration_str = f" ({mins}min audio, this may take a while)"
        while not done_event.wait(timeout=5.0):
            elapsed += 5.0
            pct = min(20 + int((elapsed / estimated_total) * 65), 85)
            elapsed_min = int(elapsed / 60)
            elapsed_sec = int(elapsed % 60)
            time_str = f"{elapsed_min}m{elapsed_sec:02d}s" if elapsed_min > 0 else f"{int(elapsed)}s"
            write_message(progress_message(
                request_id, pct, "diarizing",
-                f"Analyzing speakers ({int(elapsed)}s elapsed)..."))
+                f"Analyzing speakers ({time_str} elapsed){duration_str}"))
        thread.join()
--- a/python/voice_to_notes/services/transcribe.py
+++ b/python/voice_to_notes/services/transcribe.py
@@ -113,17 +113,22 @@ class TranscribeService:
        compute_type: str = "int8",
        language: str | None = None,
        on_segment: Callable[[SegmentResult, int], None] | None = None,
        chunk_label: str | None = None,
    ) -> TranscriptionResult:
        """Transcribe an audio file with word-level timestamps.
        Sends progress messages via IPC during processing.
        If chunk_label is set (e.g. "chunk 3/12"), messages are prefixed with it.
        """
-        # Stage: loading model
+        prefix = f"{chunk_label}: " if chunk_label else ""
-        write_message(progress_message(request_id, 0, "loading_model", f"Loading {model_name}..."))
+
        # Stage: loading model (skip for chunks after the first — model already loaded)
        if not chunk_label:
            write_message(progress_message(request_id, 0, "loading_model", f"Loading {model_name}..."))
        model = self._ensure_model(model_name, device, compute_type)
        # Stage: transcribing
-        write_message(progress_message(request_id, 10, "transcribing", "Starting transcription..."))
+        write_message(progress_message(request_id, 10, "transcribing", f"{prefix}Starting transcription..."))
        start_time = time.time()
        segments_iter, info = model.transcribe(
@@ -176,7 +181,7 @@ class TranscribeService:
                    request_id,
                    progress_pct,
                    "transcribing",
-                    f"Transcribing segment {segment_count} ({progress_pct}% of audio)...",
+                    f"{prefix}Transcribing segment {segment_count} ({progress_pct}% of audio)...",
                )
            )
@@ -271,6 +276,7 @@ class TranscribeService:
                chunk_result = self.transcribe(
                    request_id, tmp.name, model_name, device,
                    compute_type, language, on_segment=chunk_on_segment,
                    chunk_label=f"Chunk {chunk_idx + 1}/{num_chunks}",
                )
                # Offset timestamps and merge
--- a/src-tauri/Cargo.toml
+++ b/src-tauri/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "voice-to-notes"
-version = "0.2.25"
+version = "0.2.35"
 description = "Voice to Notes — desktop transcription with speaker identification"
 authors = ["Voice to Notes Contributors"]
 license = "MIT"
--- a/src-tauri/src/commands/media.rs
+++ b/src-tauri/src/commands/media.rs
@@ -0,0 +1,104 @@
 use std::path::PathBuf;
 use std::process::Command;
 #[cfg(target_os = "windows")]
 use std::os::windows::process::CommandExt;
 /// Extract audio from a video file to a WAV file using ffmpeg.
 /// Returns the path to the extracted audio file.
 #[tauri::command]
 pub fn extract_audio(file_path: String) -> Result<String, String> {
    let input = PathBuf::from(&file_path);
    if !input.exists() {
        return Err(format!("File not found: {}", file_path));
    }
    // Output to a temp WAV file next to the original or in temp dir
    let stem = input.file_stem().unwrap_or_default().to_string_lossy();
    let output = std::env::temp_dir().join(format!("{stem}_audio.wav"));
    eprintln!(
        "[media] Extracting audio: {} -> {}",
        input.display(),
        output.display()
    );
    // Find ffmpeg — check sidecar extract dir first, then system PATH
    let ffmpeg = find_ffmpeg().ok_or("ffmpeg not found. Install ffmpeg or ensure it's in PATH.")?;
    let mut cmd = Command::new(&ffmpeg);
    cmd.args([
            "-y",             // Overwrite output
            "-i",
            &file_path,
            "-vn",            // No video
            "-acodec",
            "pcm_s16le",      // WAV PCM 16-bit
            "-ar",
            "16000",          // 16kHz (optimal for whisper)
            "-ac",
            "1",              // Mono
        ])
        .arg(output.to_str().unwrap())
        .stdout(std::process::Stdio::null())
        .stderr(std::process::Stdio::piped());
    // Hide the console window on Windows (CREATE_NO_WINDOW = 0x08000000)
    #[cfg(target_os = "windows")]
    cmd.creation_flags(0x08000000);
    let status = cmd
        .status()
        .map_err(|e| format!("Failed to run ffmpeg: {e}"))?;
    if !status.success() {
        return Err(format!("ffmpeg exited with status {status}"));
    }
    if !output.exists() {
        return Err("ffmpeg completed but output file not found".to_string());
    }
    eprintln!("[media] Audio extracted successfully");
    Ok(output.to_string_lossy().to_string())
 }
 /// Find ffmpeg binary — check sidecar directory first, then system PATH.
 fn find_ffmpeg() -> Option<String> {
    // Check sidecar extract dir (ffmpeg is bundled with the sidecar)
    if let Some(data_dir) = crate::sidecar::DATA_DIR.get() {
        // Read sidecar version to find the right directory
        let version_file = data_dir.join("sidecar-version.txt");
        if let Ok(version) = std::fs::read_to_string(&version_file) {
            let version = version.trim();
            let sidecar_dir = data_dir.join(format!("sidecar-{version}"));
            let ffmpeg_name = if cfg!(target_os = "windows") {
                "ffmpeg.exe"
            } else {
                "ffmpeg"
            };
            let ffmpeg_path = sidecar_dir.join(ffmpeg_name);
            if ffmpeg_path.exists() {
                return Some(ffmpeg_path.to_string_lossy().to_string());
            }
        }
    }
    // Fall back to system PATH
    let ffmpeg_name = if cfg!(target_os = "windows") {
        "ffmpeg.exe"
    } else {
        "ffmpeg"
    };
    if Command::new(ffmpeg_name)
        .arg("-version")
        .stdout(std::process::Stdio::null())
        .stderr(std::process::Stdio::null())
        .status()
        .is_ok()
    {
        return Some(ffmpeg_name.to_string());
    }
    None
 }
--- a/src-tauri/src/commands/mod.rs
+++ b/src-tauri/src/commands/mod.rs
@@ -1,5 +1,6 @@
 pub mod ai;
 pub mod export;
 pub mod media;
 pub mod project;
 pub mod settings;
 pub mod sidecar;
--- a/src-tauri/src/lib.rs
+++ b/src-tauri/src/lib.rs
@@ -9,6 +9,7 @@ use tauri::Manager;
 use commands::ai::{ai_chat, ai_configure, ai_list_providers};
 use commands::export::export_transcript;
 use commands::media::extract_audio;
 use commands::project::{
    create_project, delete_project, get_project, list_projects, load_project_file,
    load_project_transcript, save_project_file, save_project_transcript, update_segment,
@@ -73,6 +74,7 @@ pub fn run() {
            check_sidecar_update,
            log_frontend,
            toggle_devtools,
            extract_audio,
        ])
        .run(tauri::generate_context!())
        .expect("error while running tauri application");
--- a/src-tauri/tauri.conf.json
+++ b/src-tauri/tauri.conf.json
@@ -1,7 +1,7 @@
 {
  "$schema": "https://schema.tauri.app/config/2",
  "productName": "Voice to Notes",
-  "version": "0.2.25",
+  "version": "0.2.35",
  "identifier": "com.voicetonotes.app",
  "build": {
    "beforeDevCommand": "npm run dev",
--- a/src/lib/components/AIChatPanel.svelte
+++ b/src/lib/components/AIChatPanel.svelte
@@ -1,7 +1,7 @@
 <script lang="ts">
  import { invoke } from '@tauri-apps/api/core';
  import { segments, speakers } from '$lib/stores/transcript';
-  import { settings } from '$lib/stores/settings';
+  import { settings, configureAIProvider } from '$lib/stores/settings';
  interface ChatMessage {
    role: 'user' | 'assistant';
@@ -45,22 +45,12 @@
      }));
      // Ensure the provider is configured with current credentials before chatting
-      const s = $settings;
+      await configureAIProvider($settings);
      const configMap: Record<string, Record<string, string>> = {
        openai: { api_key: s.openai_api_key, model: s.openai_model },
        anthropic: { api_key: s.anthropic_api_key, model: s.anthropic_model },
        litellm: { api_key: s.litellm_api_key, api_base: s.litellm_api_base, model: s.litellm_model },
        local: { model: s.local_model_path, base_url: 'http://localhost:8080' },
      };
      const config = configMap[s.ai_provider];
      if (config) {
        await invoke('ai_configure', { provider: s.ai_provider, config });
      }
      const result = await invoke<{ response: string }>('ai_chat', {
        messages: chatMessages,
        transcriptContext: getTranscriptContext(),
-        provider: s.ai_provider,
+        provider: $settings.ai_provider,
      });
      messages = [...messages, { role: 'assistant', content: result.response }];
--- a/src/lib/components/ProgressOverlay.svelte
+++ b/src/lib/components/ProgressOverlay.svelte
@@ -4,9 +4,25 @@
    percent?: number;
    stage?: string;
    message?: string;
    onCancel?: () => void;
  }
-  let { visible = false, percent = 0, stage = '', message = '' }: Props = $props();
+  let { visible = false, percent = 0, stage = '', message = '', onCancel }: Props = $props();
  let showConfirm = $state(false);
  function handleCancelClick() {
    showConfirm = true;
  }
  function confirmCancel() {
    showConfirm = false;
    onCancel?.();
  }
  function dismissCancel() {
    showConfirm = false;
  }
  // Pipeline steps in order
  const pipelineSteps = [
@@ -89,6 +105,20 @@
      <p class="status-text">{message || 'Please wait...'}</p>
      <p class="hint-text">This may take several minutes for large files</p>
      {#if onCancel && !showConfirm}
        <button class="cancel-btn" onclick={handleCancelClick}>Cancel</button>
      {/if}
      {#if showConfirm}
        <div class="confirm-box">
          <p class="confirm-text">Processing is incomplete. If you cancel now, the transcription will need to be started over.</p>
          <div class="confirm-actions">
            <button class="confirm-keep" onclick={dismissCancel}>Continue Processing</button>
            <button class="confirm-cancel" onclick={confirmCancel}>Cancel Processing</button>
          </div>
        </div>
      {/if}
    </div>
  </div>
 {/if}
@@ -174,4 +204,62 @@
    font-size: 0.75rem;
    color: #555;
  }
  .cancel-btn {
    margin-top: 1.25rem;
    width: 100%;
    padding: 0.5rem;
    background: none;
    border: 1px solid #4a5568;
    color: #999;
    border-radius: 6px;
    cursor: pointer;
    font-size: 0.85rem;
  }
  .cancel-btn:hover {
    color: #e0e0e0;
    border-color: #e94560;
  }
  .confirm-box {
    margin-top: 1.25rem;
    padding: 0.75rem;
    background: rgba(233, 69, 96, 0.08);
    border: 1px solid #e94560;
    border-radius: 6px;
  }
  .confirm-text {
    margin: 0 0 0.75rem;
    font-size: 0.8rem;
    color: #e0e0e0;
    line-height: 1.4;
  }
  .confirm-actions {
    display: flex;
    gap: 0.5rem;
  }
  .confirm-keep {
    flex: 1;
    padding: 0.4rem;
    background: #0f3460;
    border: 1px solid #4a5568;
    color: #e0e0e0;
    border-radius: 4px;
    cursor: pointer;
    font-size: 0.8rem;
  }
  .confirm-keep:hover {
    background: #1a4a7a;
  }
  .confirm-cancel {
    flex: 1;
    padding: 0.4rem;
    background: #e94560;
    border: none;
    color: white;
    border-radius: 4px;
    cursor: pointer;
    font-size: 0.8rem;
  }
  .confirm-cancel:hover {
    background: #d63851;
  }
 </style>
--- a/src/lib/components/WaveformPlayer.svelte
+++ b/src/lib/components/WaveformPlayer.svelte
@@ -57,6 +57,12 @@
      isReady = false;
    });
    wavesurfer.on('error', (err: Error) => {
      console.error('[voice-to-notes] WaveSurfer error:', err);
      isLoading = false;
      loadError = 'Failed to load audio';
    });
    if (audioUrl) {
      loadAudio(audioUrl);
    }
--- a/src/lib/stores/settings.ts
+++ b/src/lib/stores/settings.ts
@@ -52,23 +52,27 @@ export async function loadSettings(): Promise<void> {
  }
 }
-export async function saveSettings(s: AppSettings): Promise<void> {
+export async function configureAIProvider(s: AppSettings): Promise<void> {
  settings.set(s);
  await invoke('save_settings', { settings: s });
  // Configure the AI provider in the Python sidecar
  const configMap: Record<string, Record<string, string>> = {
    openai: { api_key: s.openai_api_key, model: s.openai_model },
    anthropic: { api_key: s.anthropic_api_key, model: s.anthropic_model },
    litellm: { api_key: s.litellm_api_key, api_base: s.litellm_api_base, model: s.litellm_model },
-    local: { model: s.ollama_model, base_url: s.ollama_url + '/v1' },
+    local: { model: s.ollama_model, base_url: s.ollama_url.replace(/\/+$/, '') + '/v1' },
  };
  const config = configMap[s.ai_provider];
  if (config) {
    try {
      await invoke('ai_configure', { provider: s.ai_provider, config });
    } catch {
-      // Sidecar may not be running yet — provider will be configured on first use
+      // Sidecar may not be running yet
    }
  }
 }
 export async function saveSettings(s: AppSettings): Promise<void> {
  settings.set(s);
  await invoke('save_settings', { settings: s });
  // Configure the AI provider in the Python sidecar
  await configureAIProvider(s);
 }
--- a/src/routes/+page.svelte
+++ b/src/routes/+page.svelte
@@ -10,7 +10,7 @@
  import SettingsModal from '$lib/components/SettingsModal.svelte';
  import SidecarSetup from '$lib/components/SidecarSetup.svelte';
  import { segments, speakers } from '$lib/stores/transcript';
-  import { settings, loadSettings } from '$lib/stores/settings';
+  import { settings, loadSettings, configureAIProvider } from '$lib/stores/settings';
  import type { Segment, Speaker } from '$lib/types/transcript';
  import { onMount, tick } from 'svelte';
@@ -54,6 +54,7 @@
  function handleSidecarSetupComplete() {
    sidecarReady = true;
    configureAIProvider($settings);
    checkSidecarUpdate();
  }
@@ -71,6 +72,7 @@
    });
    checkSidecar().then(() => {
      if (sidecarReady) {
        configureAIProvider($settings);
        checkSidecarUpdate();
      }
    });
@@ -117,9 +119,22 @@
    };
  });
  let isTranscribing = $state(false);
  let transcriptionCancelled = $state(false);
  let transcriptionProgress = $state(0);
  let transcriptionStage = $state('');
  let transcriptionMessage = $state('');
  let extractingAudio = $state(false);
  function handleCancelProcessing() {
    transcriptionCancelled = true;
    isTranscribing = false;
    transcriptionProgress = 0;
    transcriptionStage = '';
    transcriptionMessage = '';
    // Clear any partial results
    segments.set([]);
    speakers.set([]);
  }
  // Speaker color palette for auto-assignment
  const speakerColors = ['#e94560', '#4ecdc4', '#ffe66d', '#a8e6cf', '#ff8b94', '#c7ceea', '#ffd93d', '#6bcb77'];
@@ -254,6 +269,8 @@
    // Changes persist when user saves the project file.
  }
  const VIDEO_EXTENSIONS = ['mp4', 'mkv', 'avi', 'mov', 'webm'];
  async function handleFileImport() {
    const filePath = await open({
      multiple: false,
@@ -265,9 +282,38 @@
    });
    if (!filePath) return;
-    // Track the original file path and convert to asset URL for wavesurfer
+    // For video files, extract audio first using ffmpeg
    const ext = filePath.split('.').pop()?.toLowerCase() ?? '';
    let audioPath = filePath;
    if (VIDEO_EXTENSIONS.includes(ext)) {
      extractingAudio = true;
      await tick();
      try {
        audioPath = await invoke<string>('extract_audio', { filePath });
      } catch (err) {
        console.error('[voice-to-notes] Failed to extract audio:', err);
        const msg = String(err);
        if (msg.includes('ffmpeg not found')) {
          alert(
            'FFmpeg is required to import video files.\n\n' +
            'Install FFmpeg:\n' +
            '  Windows: winget install ffmpeg\n' +
            '  macOS: brew install ffmpeg\n' +
            '  Linux: sudo apt install ffmpeg\n\n' +
            'Then restart Voice to Notes and try again.'
          );
        } else {
          alert(`Failed to extract audio from video: ${msg}`);
        }
        return;
      } finally {
        extractingAudio = false;
      }
    }
    // Track the original file path (video or audio) for the sidecar
    audioFilePath = filePath;
-    audioUrl = convertFileSrc(filePath);
+    audioUrl = convertFileSrc(audioPath);
    waveformPlayer?.loadAudio(audioUrl);
    // Clear previous results
@@ -276,6 +322,7 @@
    // Start pipeline (transcription + diarization)
    isTranscribing = true;
    transcriptionCancelled = false;
    transcriptionProgress = 0;
    transcriptionStage = 'Starting...';
    transcriptionMessage = 'Initializing pipeline...';
@@ -386,6 +433,9 @@
        numSpeakers: $settings.num_speakers && $settings.num_speakers > 0 ? $settings.num_speakers : undefined,
      });
      // If cancelled while processing, discard results
      if (transcriptionCancelled) return;
      // Create speaker entries from pipeline result
      const newSpeakers: Speaker[] = (result.speakers || []).map((label, idx) => ({
        id: `speaker-${idx}`,
@@ -573,8 +623,18 @@
    percent={transcriptionProgress}
    stage={transcriptionStage}
    message={transcriptionMessage}
    onCancel={handleCancelProcessing}
  />
  {#if extractingAudio}
    <div class="extraction-overlay">
      <div class="extraction-card">
        <div class="extraction-spinner"></div>
        <p>Extracting audio from video...</p>
      </div>
    </div>
  {/if}
  <SettingsModal
    visible={showSettings}
    onClose={() => showSettings = false}
@@ -781,4 +841,39 @@
  .update-dismiss:hover {
    color: #e0e0e0;
  }
  /* Audio extraction overlay */
  .extraction-overlay {
    position: fixed;
    inset: 0;
    background: rgba(0, 0, 0, 0.8);
    display: flex;
    align-items: center;
    justify-content: center;
    z-index: 9999;
  }
  .extraction-card {
    background: #16213e;
    padding: 2rem 2.5rem;
    border-radius: 12px;
    color: #e0e0e0;
    border: 1px solid #2a3a5e;
    box-shadow: 0 8px 32px rgba(0, 0, 0, 0.5);
    display: flex;
    flex-direction: column;
    align-items: center;
    gap: 1rem;
  }
  .extraction-card p {
    margin: 0;
    font-size: 1rem;
  }
  .extraction-spinner {
    width: 32px;
    height: 32px;
    border: 3px solid #2a3a5e;
    border-top-color: #e94560;
    border-radius: 50%;
    animation: spin 0.8s linear infinite;
  }
 </style>
Author	SHA1	Message	Date
Gitea Actions	73eab2e80c	chore: bump sidecar version to 1.0.13 [skip ci]	2026-03-23 14:58:07 +00:00
Claude	33ca3e4a28	Show chunk context in transcription progress for large files All checks were successful Build Sidecars / Bump sidecar version and tag (push) Successful in 3s Details Release / Bump version and tag (push) Successful in 3s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 8m30s Details Release / Build App (macOS) (push) Successful in 1m19s Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 12m9s Details Release / Build App (Linux) (push) Successful in 3m36s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 29m36s Details Release / Build App (Windows) (push) Successful in 3m13s Details Files >1 hour are split into 5-minute chunks. Previously each chunk showed "Starting transcription..." making it look like a restart. Now shows "Chunk 3/12: Starting transcription..." and "Chunk 3/12: Transcribing segment 5 (42% of audio)..." Also skips the "Loading model..." message for chunks after the first since the model is already loaded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 07:57:59 -07:00
Gitea Actions	e65d8b0510	chore: bump version to 0.2.35 [skip ci]	2026-03-23 14:31:13 +00:00
Claude	a7364f2e50	Fix 's is not defined' in AIChatPanel All checks were successful Release / Bump version and tag (push) Successful in 4s Details Release / Build App (macOS) (push) Successful in 1m18s Details Release / Build App (Linux) (push) Successful in 3m37s Details Release / Build App (Windows) (push) Successful in 3m53s Details Leftover reference to removed 's' variable — changed to $settings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 07:31:07 -07:00
Gitea Actions	809acfc781	chore: bump version to 0.2.34 [skip ci]	2026-03-23 13:42:26 +00:00
Claude	96e9a6d38b	Fix Ollama: remove duplicate stale configMap in AIChatPanel All checks were successful Release / Bump version and tag (push) Successful in 6s Details Release / Build App (macOS) (push) Successful in 1m17s Details Release / Build App (Linux) (push) Successful in 4m49s Details Release / Build App (Windows) (push) Successful in 3m8s Details AIChatPanel had its own hardcoded configMap with the old llama-server URL (localhost:8080) and field names (local_model_path). Every chat message reconfigured the provider with these wrong values, overriding the correct settings applied at startup. Fix: replace the duplicate with a call to the shared configureAIProvider(). Also strip trailing slashes from ollama_url before appending /v1 to prevent double-slash URLs (http://localhost:11434//v1). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 06:33:03 -07:00
Gitea Actions	ddfbd65478	chore: bump version to 0.2.33 [skip ci]	2026-03-23 13:24:46 +00:00
Gitea Actions	e80ee3a18f	chore: bump sidecar version to 1.0.12 [skip ci]	2026-03-23 13:24:34 +00:00
Claude	806586ae3d	Fix diarization performance for long files + better progress Some checks failed Build Sidecars / Bump sidecar version and tag (push) Successful in 11s Details Release / Bump version and tag (push) Successful in 10s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m0s Details Release / Build App (macOS) (push) Successful in 1m16s Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 17m34s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 28m9s Details - Cache loaded audio in _sf_load() — previously the entire WAV file was re-read from disk for every 10s crop call. For a 3-hour file with 1000+ chunks, this meant ~345GB of disk reads. Now read once, cached. - Better progress messages for long files: show elapsed time in m:ss format, warn "(180min audio, this may take a while)" for files >10min - Increased progress poll interval from 2s to 5s (less noise) - Better time estimate: use 0.8x audio duration (was 0.5x) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 06:24:21 -07:00
Gitea Actions	999bdaa671	chore: bump version to 0.2.32 [skip ci]	2026-03-23 12:38:47 +00:00
Claude	b1d46fd42e	Add cancel button to processing overlay with confirmation All checks were successful Release / Bump version and tag (push) Successful in 3s Details Release / Build App (macOS) (push) Successful in 1m21s Details Release / Build App (Windows) (push) Successful in 3m8s Details Release / Build App (Linux) (push) Successful in 3m40s Details - Cancel button on the progress overlay during transcription - Clicking Cancel shows confirmation: "Processing is incomplete. If you cancel now, the transcription will need to be started over." - "Continue Processing" dismisses the dialog, "Cancel Processing" stops - Cancel clears partial results (segments, speakers) and resets UI - Pipeline results are discarded if cancelled during processing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 05:38:40 -07:00
Gitea Actions	818cbfa69c	chore: bump version to 0.2.31 [skip ci]	2026-03-23 12:30:19 +00:00
Claude	aa319eb823	Fix Ollama settings on startup + video extraction UX All checks were successful Release / Bump version and tag (push) Successful in 3s Details Release / Build App (macOS) (push) Successful in 1m18s Details Release / Build App (Linux) (push) Successful in 3m44s Details Release / Build App (Windows) (push) Successful in 3m57s Details AI provider: - Extract configureAIProvider() from saveSettings for reuse - Call it on app startup after sidecar is ready (was only called on Save) - Call it after first-time sidecar download completes - Sidecar now receives correct Ollama URL/model immediately Video extraction: - Hide ffmpeg console window on Windows (CREATE_NO_WINDOW flag) - Show "Extracting audio from video..." overlay with spinner during extraction - UI stays responsive while ffmpeg runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 05:30:14 -07:00
Gitea Actions	8faa336cbc	chore: bump version to 0.2.30 [skip ci]	2026-03-23 03:12:25 +00:00
Claude	02c70f90c8	Extract audio from video files before loading All checks were successful Release / Bump version and tag (push) Successful in 3s Details Release / Build App (macOS) (push) Successful in 1m17s Details Release / Build App (Linux) (push) Successful in 4m53s Details Release / Build App (Windows) (push) Successful in 3m45s Details Video files (MP4, MKV, etc.) are now processed with ffmpeg to extract audio to a temp WAV file before loading into wavesurfer. This prevents the WebView crash caused by trying to fetch multi-GB files into memory. - New extract_audio Tauri command uses ffmpeg (sidecar-bundled or system) - Frontend detects video extensions and extracts audio automatically - User-friendly error if ffmpeg is not installed with install instructions - Reverted wavesurfer MediaElement approach in favor of clean extraction - Added FFmpeg install guide to USER_GUIDE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 20:04:10 -07:00
Gitea Actions	66db827f17	chore: bump version to 0.2.29 [skip ci]	2026-03-23 02:55:23 +00:00
Gitea Actions	d9fcc9a5bd	chore: bump sidecar version to 1.0.11 [skip ci]	2026-03-23 02:55:17 +00:00
Claude	ca5dc98d24	Fix Ollama: set_active after configure + fix default URL Some checks failed Build Sidecars / Bump sidecar version and tag (push) Successful in 5s Details Release / Bump version and tag (push) Successful in 5s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m35s Details Release / Build App (macOS) (push) Successful in 1m18s Details Release / Build App (Linux) (push) Has been cancelled Details Release / Build App (Windows) (push) Has been cancelled Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 16m56s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 37m0s Details The configure action registered the provider but never called set_active(), so the sidecar kept using the old/default provider. Also updated the local provider default from localhost:8080 to localhost:11434/v1 (Ollama). Added debug logging for configure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 19:55:09 -07:00
Gitea Actions	da49c04119	chore: bump version to 0.2.28 [skip ci]	2026-03-23 01:30:57 +00:00
Gitea Actions	833ddb67de	chore: bump sidecar version to 1.0.10 [skip ci]	2026-03-23 01:30:51 +00:00
Claude	879a1f3fd6	Fix diarization tensor mismatch + fix sidecar build triggers All checks were successful Build Sidecars / Bump sidecar version and tag (push) Successful in 7s Details Release / Bump version and tag (push) Successful in 5s Details Build Sidecars / Build Sidecar (macOS) (push) Successful in 4m32s Details Release / Build App (macOS) (push) Successful in 1m16s Details Build Sidecars / Build Sidecar (Linux) (push) Successful in 16m28s Details Release / Build App (Linux) (push) Successful in 4m26s Details Build Sidecars / Build Sidecar (Windows) (push) Successful in 33m5s Details Release / Build App (Windows) (push) Successful in 3m29s Details Diarization: Audio.crop patch now pads short segments with zeros to match the expected duration. pyannote batches embeddings with vstack which requires uniform tensor sizes — the last segment of a file can be shorter than the 10s window. CI: Reordered sidecar workflow to check for python/ changes FIRST, before bumping version or configuring git. All subsequent steps are gated on has_changes. This prevents unnecessary version bumps and build runs when only app code changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 18:30:43 -07:00
Gitea Actions	6f9dc9a95e	chore: bump version to 0.2.27 [skip ci]	2026-03-23 01:05:15 +00:00
Claude	2a9a7e42a3	Add daily workflow to clean up old releases (keep latest 5) All checks were successful Release / Bump version and tag (push) Successful in 4s Details Release / Build App (macOS) (push) Successful in 1m25s Details Release / Build App (Linux) (push) Successful in 4m43s Details Release / Build App (Windows) (push) Successful in 3m20s Details Runs daily at 6am UTC and on manual dispatch. Separately tracks app releases (v) and sidecar releases (sidecar-v), keeping the latest 5 of each and deleting older ones along with their tags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 18:05:08 -07:00
Gitea Actions	34b060a8a5	chore: bump version to 0.2.26 [skip ci]	2026-03-23 00:42:00 +00:00