727107323cfc86bafce79b6b8d233334dce10abe
The display renders segment.words (not segment.text), so editing the text field alone had no visible effect. Now finishEditing() rebuilds the words array from the edited text so the change is immediately visible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Voice to Notes
A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.
Features
- Speech-to-Text Transcription — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
- Speaker Identification (Diarization) — Detect and distinguish between speakers using pyannote.audio
- Synchronized Playback — Click any word to seek to that point in the audio (Web Audio API for instant playback)
- AI Integration — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
- Export Formats — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
- Cross-Platform — Builds for Linux, Windows, and macOS (Apple Silicon)
Platform Support
| Platform | Architecture | Status |
|---|---|---|
| Linux | x86_64 | Supported |
| Windows | x86_64 | Supported |
| macOS | ARM (Apple Silicon) | Supported |
Tech Stack
- Desktop shell: Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
- ML pipeline: Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
- Audio playback: wavesurfer.js with Web Audio API backend
- AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
- Local AI: Bundled llama-server (llama.cpp)
- Caption export: pysubs2
Development
Prerequisites
- Node.js 20+
- Rust (stable)
- Python 3.11+ with ML dependencies
- System:
libgtk-3-dev,libwebkit2gtk-4.1-dev(Linux)
Getting Started
# Install frontend dependencies
npm install
# Install Python sidecar dependencies
cd python && pip install -e . && cd ..
# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev
Building for Distribution
# Build the frozen Python sidecar
npm run sidecar:build
# Build the Tauri app (requires sidecar in src-tauri/binaries/)
npm run tauri build
CI/CD
Gitea Actions workflows are in .gitea/workflows/. The build pipeline:
- Build sidecar — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
- Build Tauri app — Bundles the sidecar via
externalBin, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)
Required Secrets
| Secret | Purpose | Required? |
|---|---|---|
TAURI_SIGNING_PRIVATE_KEY |
Signs Tauri update bundles | Optional (for auto-updates) |
No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.
Project Structure
src/ # Svelte 5 frontend
src-tauri/ # Rust backend (Tauri commands, sidecar manager, SQLite)
python/ # Python sidecar (transcription, diarization, AI)
voice_to_notes/ # Python package
build_sidecar.py # PyInstaller build script
voice_to_notes.spec # PyInstaller spec
.gitea/workflows/ # Gitea Actions CI/CD
License
MIT
Releases
10
Voice to Notes v0.2.46
Latest
Languages
Python
36.6%
Svelte
30.3%
Rust
29.6%
TypeScript
2.2%
Shell
0.5%
Other
0.8%