Commit Graph

24 Commits

Author SHA1 Message Date
Claude
aa319eb823 Fix Ollama settings on startup + video extraction UX
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m18s
Release / Build App (Linux) (push) Successful in 3m44s
Release / Build App (Windows) (push) Successful in 3m57s
AI provider:
- Extract configureAIProvider() from saveSettings for reuse
- Call it on app startup after sidecar is ready (was only called on Save)
- Call it after first-time sidecar download completes
- Sidecar now receives correct Ollama URL/model immediately

Video extraction:
- Hide ffmpeg console window on Windows (CREATE_NO_WINDOW flag)
- Show "Extracting audio from video..." overlay with spinner during extraction
- UI stays responsive while ffmpeg runs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 05:30:14 -07:00
Claude
02c70f90c8 Extract audio from video files before loading
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Linux) (push) Successful in 4m53s
Release / Build App (Windows) (push) Successful in 3m45s
Video files (MP4, MKV, etc.) are now processed with ffmpeg to extract
audio to a temp WAV file before loading into wavesurfer. This prevents
the WebView crash caused by trying to fetch multi-GB files into memory.

- New extract_audio Tauri command uses ffmpeg (sidecar-bundled or system)
- Frontend detects video extensions and extracts audio automatically
- User-friendly error if ffmpeg is not installed with install instructions
- Reverted wavesurfer MediaElement approach in favor of clean extraction
- Added FFmpeg install guide to USER_GUIDE.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 20:04:10 -07:00
Claude
7f1fa1904c Make DevTools a toggle in Settings > Developer tab
Some checks failed
Release / Bump version and tag (push) Successful in 7s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Windows) (push) Successful in 3m29s
Release / Build App (Linux) (push) Has been cancelled
- DevTools off by default (no more auto-open on launch)
- New "Developer" tab in Settings with a checkbox to toggle devtools
- Toggle takes effect immediately (opens/closes inspector)
- Setting persists: devtools restored on next launch if enabled
- toggle_devtools Tauri command wraps window.open/close_devtools

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 10:55:50 -07:00
Claude
e2c5db89b6 Enable devtools in release builds + add frontend logging
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m16s
Release / Build App (Linux) (push) Successful in 4m30s
Release / Build App (Windows) (push) Successful in 3m20s
- Enable Tauri devtools feature so right-click Inspect works in release
- Open devtools automatically on launch for debugging
- Add log_frontend command: frontend can write to ~/.voicetonotes/frontend.log
- Sidecar logs go to %LOCALAPPDATA%/com.voicetonotes.app/sidecar.log
- Frontend logs go to %USERPROFILE%/.voicetonotes/frontend.log

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 09:16:29 -07:00
Claude
45247ae66e Decouple sidecar versioning from app versioning
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 3s
Release / Bump version and tag (push) Failing after 3s
Release / Build App (Linux) (push) Has been skipped
Release / Build App (Windows) (push) Has been skipped
Release / Build App (macOS) (push) Has been skipped
Build Sidecars / Build Sidecar (macOS) (push) Successful in 5m28s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m54s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 37m38s
Sidecar now has its own version (1.0.0) and release lifecycle:
- Sidecar tags: sidecar-v1.0.0, sidecar-v1.0.1, etc.
- App tags: v0.2.x (unchanged)
- Sidecar workflow triggers only on python/** changes or manual dispatch
- App release no longer bumps python/pyproject.toml

Sidecar version tracked via sidecar-version.txt in app data dir:
- resolve_sidecar_path() reads version from file instead of CARGO_PKG_VERSION
- download_sidecar() fetches latest sidecar-v* release from Gitea API
- check_sidecar_update() compares local vs remote sidecar versions
- Version file written after successful download

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 07:57:51 -07:00
Claude
9652290a06 Split CI into app + sidecar workflows, fix reqwest compilation
Some checks failed
Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m39s
Release / Bump version and tag (push) Has been cancelled
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Release / Build App (macOS) (push) Has been cancelled
Build Sidecars / Build Sidecar (Windows) (push) Has been cancelled
Build Sidecars / Build Sidecar (Linux) (push) Has been cancelled
CI split:
- release.yml: version bump + lightweight app builds (no Python/sidecar)
- build-sidecar.yml: builds CPU + CUDA sidecar variants per platform,
  uploads as separate release assets, runs in parallel with app builds
- Sidecar workflow uses retry loop to find release (race with version bump)

Fixes:
- Add reqwest "json" feature for .json() method
- Add explicit type annotations for reqwest Response and bytes::Bytes
- Reuse client instance for download (was using reqwest::get directly)

Bundle targets: deb, rpm, nsis, msi, dmg (all formats, app is small now)
Windows upload finds both *.msi and *-setup.exe

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 07:50:55 -07:00
Claude
7fa903ad01 Download sidecar on first launch instead of bundling
Some checks failed
Release / Bump version and tag (push) Successful in 13s
Release / Build (macOS) (push) Failing after 4m55s
Release / Build (Windows) (push) Failing after 14m58s
Release / Build (Linux) (push) Failing after 17m18s
Major refactor: sidecar is no longer bundled in the installer. Instead,
it's downloaded on first launch with a setup screen offering CPU vs CUDA
choice. This solves the 2GB+ installer size limit and decouples app/sidecar.

Backend:
- New commands: check_sidecar, download_sidecar, check_sidecar_update
- Streaming download with progress events via reqwest
- Added reqwest + futures-util dependencies
- Removed sidecar.zip from bundle resources
- Restored NSIS target (no longer size-constrained)

CI:
- Each platform builds both CPU and CUDA sidecar variants (except macOS: CPU only)
- Sidecar zips uploaded as separate release assets
- Asset naming: sidecar-{os}-{arch}-{variant}.zip

Frontend:
- SidecarSetup.svelte: first-launch setup with CPU/CUDA radio choice,
  progress bar, error/retry handling
- Update banner on launch if newer sidecar version available
- Conditional rendering: setup screen → main app flow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 07:09:10 -07:00
Claude
27b705b5b6 File-based project save/load, AI chat formatting, text edit fix
Project files (.vtn):
- Save Project: serializes transcript, speakers, audio path to JSON file
- Open Project: loads .vtn file, restores audio/transcript/speakers
- User chooses filename and location via save dialog
- Replaces SQLite-based project persistence (DB commands remain for future use)
- Text edits update in-memory store immediately, persist on explicit save
- Fix Windows path separator in project name extraction

AI chat:
- Markdown rendering in assistant messages (headers, lists, bold, code)
- Better visual distinction with border-left accents
- Styled markdown elements for dark theme

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 22:17:35 -07:00
Claude
8e7d21d22b Persist segment text edits to database on Enter
- Add update_segment Tauri command (calls existing update_segment_text query)
- Wire onTextEdit handler from TranscriptEditor to invoke update_segment
- Edits are saved to SQLite immediately when user presses Enter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 22:08:52 -07:00
Claude
61caa07e4c Add project save/load and improve AI chat formatting
Project persistence:
- save_project_transcript command: persists segments, speakers, words to SQLite
- load_project_transcript command: loads full transcript with nested words
- delete_project command: soft-delete projects
- Auto-save after pipeline completes (named from filename)
- Project dropdown in header to switch between saved transcripts
- Projects load audio, segments, and speakers from database

AI chat improvements:
- Markdown rendering in assistant messages (headers, lists, bold, italic, code)
- Better message spacing and visual distinction (border-left accents)
- Styled markdown elements matching dark theme
- Improved empty state and quick action button sizing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 22:06:29 -07:00
Claude
58faa83cb3 Cross-platform distribution, UI improvements, and performance optimizations
- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:33:43 -07:00
Claude
42ccd3e21d Fix cargo fmt formatting and diarize threading test mock 2026-03-20 13:56:32 -07:00
Claude
c3b6ad38fd Merge perf/stream-segments: streaming partial transcript segments and speaker updates 2026-03-20 13:51:51 -07:00
67ed69df00 Stream transcript segments to frontend as they are transcribed
Send each segment to the frontend immediately after transcription via
a new pipeline.segment IPC message, then send speaker assignments as a
batch pipeline.speaker_update message after diarization completes. This
lets the UI display segments progressively instead of waiting for the
entire pipeline to finish.

Changes:
- Add partial_segment_message and speaker_update_message IPC factories
- Add on_segment callback parameter to TranscribeService.transcribe()
- Emit partial segments and speaker updates from PipelineService.run()
- Add send_and_receive_with_progress to SidecarManager (Rust)
- Route pipeline.segment/speaker_update events in run_pipeline command
- Listen for streaming events in Svelte frontend (+page.svelte)
- Add tests for new message types, callback signature, and update logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 13:47:57 -07:00
a3612c986d Add Test & Download button for diarization model, clickable links
- Add diarize.download IPC handler that downloads the pyannote model
  and returns user-friendly error messages (missing license, bad token)
- Add download_diarize_model Tauri command
- Add "Test & Download Model" button in Speakers settings tab
- Update instructions to list both required model licenses
  (speaker-diarization-3.1 AND segmentation-3.0)
- Make all HuggingFace URLs clickable (opens in system browser)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:21:42 -08:00
baf820286f Add HuggingFace token setting for speaker detection
- Add "Speakers" tab in Settings with HF token input field
- Include step-by-step instructions for obtaining the token
- Pass hf_token from settings through Rust → Python pipeline → diarize
- Token can also be set via HF_TOKEN environment variable as fallback
- Move skip_diarization checkbox to Speakers tab

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:08:51 -08:00
669d88f143 Fix progress feedback, diarization fallback, and dropdown readability
- Stream pipeline progress to frontend via Tauri events so the progress
  overlay updates in real time during transcription/diarization
- Gracefully fall back to transcription-only when diarization fails
  (e.g. pyannote not installed) instead of erroring the whole pipeline
- Add color-scheme: dark to fix native select/option elements rendering
  with unreadable white backgrounds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 17:14:25 -08:00
d00281f0c7 Fix critical integration issues for end-to-end functionality
- Rewrite SidecarManager as singleton with OnceLock, reusing one Python
  process across all commands instead of spawning per call
- Separate stdin/stdout ownership with dedicated BufReader to prevent
  data corruption between wait_for_ready and send_and_receive
- Add ensure_running() for auto-start on first command
- Fix asset protocol URL: use convertFileSrc() instead of manual
  encodeURIComponent which broke file paths with slashes
- Add +layout.svelte with global dark theme, CSS reset, and custom
  scrollbar styling to prevent white flash on startup
- Register AppState with Tauri .manage(), initialize SQLite database
  on app startup at ~/.voicetonotes/voice_to_notes.db
- Wire project commands (create/get/list) to real database queries
  instead of placeholder stubs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:50:14 -08:00
97a1a15755 Phase 6: Llama-server manager, settings UI, packaging, and polish
- Implement LlamaManager in Rust for llama-server lifecycle: spawn with
  port allocation, health check, clean shutdown on Drop, model listing
- Add llama_start/stop/status/list_models Tauri commands
- Add load_settings/save_settings commands with JSON persistence
- Build SettingsModal with tabs for Transcription, AI Provider, Local AI
  settings (model size, device, language, API keys, provider selection)
- Wire settings into pipeline calls (model, device, language, skip diarization)
- Configure Tauri packaging: asset protocol for local audio files,
  CSP policy, bundle metadata, Linux .deb/.AppImage and Windows .msi config
- Add keyboard shortcuts: Space (play/pause), Ctrl+O (import),
  Ctrl+, (settings), Escape (close menus/modals)
- Close export dropdown on outside click
- Tests: 30 Python, 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:38:23 -08:00
d67625cd5a Phase 5: AI provider system with local and cloud support
- Implement AIProvider base interface with chat() and is_available()
- Add LocalProvider connecting to bundled llama-server via OpenAI SDK
- Add OpenAIProvider for direct OpenAI API access
- Add AnthropicProvider for Anthropic Claude API
- Add LiteLLMProvider for multi-provider gateway
- Build AIProviderService with provider routing, auto-selection,
  and transcript context injection
- Add ai.chat IPC handler supporting chat, list_providers, set_provider,
  and configure actions
- Add ai_chat, ai_list_providers, ai_configure Tauri commands
- Build interactive AIChatPanel with message history, quick actions
  (Summarize, Action Items), and transcript context awareness
- Tests: 30 Python, 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:25:10 -08:00
415a648a2b Phase 4: Export to SRT, WebVTT, ASS, plain text, and Markdown
- Implement ExportService using pysubs2 for caption formats (SRT, VTT, ASS)
  and custom formatters for plain text and Markdown
- SRT exports with [Speaker]: prefix, WebVTT with <v Speaker> voice tags,
  ASS with color-coded speaker styles
- Plain text groups by speaker with labels, Markdown adds timestamps
- Add export.start IPC handler and export_transcript Tauri command
- Add export dropdown menu in header (appears after transcription)
- Uses native save dialog for output file selection
- Add pysubs2 dependency
- Tests: 30 Python (6 export tests), 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:18:54 -08:00
44480906a4 Phase 3: Speaker diarization and full transcription pipeline
- Implement DiarizeService with pyannote.audio speaker detection
- Build PipelineService combining transcribe → diarize → merge with
  overlap-based speaker assignment per segment
- Add pipeline.start and diarize.start IPC handlers
- Add run_pipeline Tauri command for full pipeline execution
- Wire frontend to use pipeline: speakers auto-created with colors,
  segments assigned to detected speakers
- Build SpeakerManager with rename support (double-click or edit button)
- Add speaker color coding throughout transcript display
- Add pyannote.audio dependency
- Tests: 24 Python (including merge logic), 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:09:48 -08:00
48fe41b064 Phase 2: Core transcription pipeline and audio playback
- Implement faster-whisper TranscribeService with word-level timestamps,
  progress reporting, and hardware auto-detection
- Wire up Rust SidecarManager for Python process lifecycle (spawn, IPC, shutdown)
- Add transcribe_file Tauri command bridging frontend to Python sidecar
- Integrate wavesurfer.js WaveformPlayer with play/pause, skip, seek controls
- Build TranscriptEditor with word-level click-to-seek and active highlighting
- Connect file import flow: prompt → asset load → transcribe → display
- Add typed tauri-bridge service with TranscriptionResult interface
- Add Python tests for hardware detection and transcription result formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 15:53:09 -08:00
503cc6c0cf Phase 1 foundation: Tauri shell, Python sidecar, SQLite database
Tauri v2 + Svelte + TypeScript frontend:
- App shell with workspace layout (waveform, transcript, speakers, AI chat)
- Placeholder components for all major UI areas
- Typed stores (project, transcript, playback, AI)
- TypeScript interfaces matching the database schema
- Tauri bridge service with typed invoke wrappers
- svelte-check passes with 0 errors

Rust backend:
- Tauri v2 app entry point with command registration
- SQLite database layer (rusqlite with bundled SQLite)
  - Full schema: projects, media_files, speakers, segments, words,
    ai_outputs, annotations (with indexes)
  - Model structs with serde serialization
  - CRUD queries for projects, speakers, segments, words
  - Segment text editing preserves original text
  - Schema versioning for future migrations
  - 6 tests passing
- Command stubs for project, transcribe, export, AI, settings, system
- App state management

Python sidecar:
- JSON-line IPC protocol (stdin/stdout)
- Message types: IPCMessage, progress, error, ready
- Handler registry with routing and error handling
- Ping/pong handler for connectivity testing
- Service stubs: transcribe, diarize, pipeline, AI, export
- Provider stubs: local (llama-server), OpenAI, Anthropic, LiteLLM
- Hardware detection stubs
- 14 tests passing, ruff clean

Also adds:
- Testing strategy document (docs/TESTING.md)
- Validation script (scripts/validate.sh)
- Updated .gitignore for Svelte, Rust, Python artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 15:16:06 -08:00