fc5cfc437485357735f412df55d8dcaf0b98d748
Changed from folder picker (can only select existing folders) to save dialog where the user can type a new name. The typed name becomes the project folder, created automatically if it doesn't exist. Any file extension the user types is stripped (e.g. "My Project.vtn" becomes the folder "My Project/"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Voice to Notes
A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.
Features
- Speech-to-Text — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
- Speaker Identification — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
- GPU Acceleration — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
- Synchronized Playback — Click any word to seek. Waveform visualization via wavesurfer.js.
- AI Chat — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
- Export — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
- Cross-Platform — Linux, Windows, macOS (Apple Silicon).
Quick Start
- Download the installer from Releases
- On first launch, choose CPU or CUDA sidecar (the AI engine downloads separately, ~500MB–2GB)
- Import an audio/video file and click Transcribe
See the full User Guide for detailed setup and usage instructions.
Platform Support
| Platform | Architecture | Installers |
|---|---|---|
| Linux | x86_64 | .deb, .rpm |
| Windows | x86_64 | .msi, .exe (NSIS) |
| macOS | ARM (Apple Silicon) | .dmg |
Architecture
The app is split into two independently versioned components:
- App (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
- Sidecar (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.
This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.
Tech Stack
| Component | Technology |
|---|---|
| Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) |
| Transcription | faster-whisper (CTranslate2) |
| Speaker ID | pyannote.audio 3.1 |
| Audio UI | wavesurfer.js |
| Transcript editor | TipTap (ProseMirror) |
| AI (local) | Ollama (any model) |
| AI (cloud) | OpenAI, Anthropic, OpenAI-compatible |
| Caption export | pysubs2 |
| Database | SQLite (rusqlite) |
Development
Prerequisites
- Node.js 20+
- Rust (stable)
- Python 3.11+ with uv or pip
- Linux:
libgtk-3-dev,libwebkit2gtk-4.1-dev,libappindicator3-dev,librsvg2-dev
Getting Started
# Install frontend dependencies
npm install
# Install Python sidecar dependencies
cd python && pip install -e ".[dev]" && cd ..
# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev
Building
# Build the frozen Python sidecar (CPU-only)
cd python && python build_sidecar.py --cpu-only && cd ..
# Build with CUDA support
cd python && python build_sidecar.py --with-cuda && cd ..
# Build the Tauri app
npm run tauri build
CI/CD
Two Gitea Actions workflows in .gitea/workflows/:
release.yml — Triggers on push to main:
- Bumps app version (patch), creates git tag and Gitea release
- Builds lightweight app installers for all platforms (no sidecar bundled)
build-sidecar.yml — Triggers on changes to python/ or manual dispatch:
- Bumps sidecar version, creates
sidecar-v*tag and release - Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
- Uploads as separate release assets
Required Secrets
| Secret | Purpose |
|---|---|
BUILD_TOKEN |
Gitea API token for creating releases and pushing tags |
Project Structure
src/ # Svelte 5 frontend
lib/components/ # UI components (waveform, transcript editor, settings, etc.)
lib/stores/ # Svelte stores (settings, transcript state)
routes/ # SvelteKit pages
src-tauri/ # Rust backend
src/sidecar/ # Sidecar process manager (download, extract, IPC)
src/commands/ # Tauri command handlers
nsis-hooks.nsh # Windows uninstall cleanup
python/ # Python sidecar
voice_to_notes/ # Python package (transcription, diarization, AI, export)
build_sidecar.py # PyInstaller build script
voice_to_notes.spec # PyInstaller spec
.gitea/workflows/ # CI/CD (release.yml, build-sidecar.yml)
docs/ # Documentation
License
Releases
10
Voice to Notes v0.2.46
Latest
Languages
Python
36.6%
Svelte
30.3%
Rust
29.6%
TypeScript
2.2%
Shell
0.5%
Other
0.8%