latest
Some checks failed
Build Linux / Build sidecar (Linux) (push) Successful in 6m8s
Build macOS / Build sidecar (macOS) (push) Successful in 7m6s
Build Linux / Build app (Linux) (push) Successful in 4m14s
Build Linux / Release (Linux) (push) Failing after 8s
Build macOS / Build app (macOS) (push) Successful in 3m48s
Build macOS / Release (macOS) (push) Failing after 3s
Build Windows / Build app (Windows) (push) Has been cancelled
Build Windows / Release (Windows) (push) Has been cancelled
Build Windows / Build sidecar (Windows) (push) Has been cancelled
Each platform (Linux, macOS, Windows) now has its own workflow file that builds the sidecar, builds the Tauri app, and uploads to a shared "latest" release independently. A failure on one platform no longer blocks releases for the others. - build-linux.yml: bash throughout, apt for deps - build-macos.yml: bash throughout, brew for deps - build-windows.yml: powershell throughout, choco for deps - All use uv for Python, upload to shared "latest" release tag - Each platform replaces its own artifacts on the release Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Voice to Notes
A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.
Features
- Speech-to-Text Transcription — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
- Speaker Identification (Diarization) — Detect and distinguish between speakers using pyannote.audio
- Synchronized Playback — Click any word to seek to that point in the audio (Web Audio API for instant playback)
- AI Integration — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
- Export Formats — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
- Cross-Platform — Builds for Linux, Windows, and macOS (Apple Silicon)
Platform Support
| Platform | Architecture | Status |
|---|---|---|
| Linux | x86_64 | Supported |
| Windows | x86_64 | Supported |
| macOS | ARM (Apple Silicon) | Supported |
Tech Stack
- Desktop shell: Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
- ML pipeline: Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
- Audio playback: wavesurfer.js with Web Audio API backend
- AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
- Local AI: Bundled llama-server (llama.cpp)
- Caption export: pysubs2
Development
Prerequisites
- Node.js 20+
- Rust (stable)
- Python 3.11+ with ML dependencies
- System:
libgtk-3-dev,libwebkit2gtk-4.1-dev(Linux)
Getting Started
# Install frontend dependencies
npm install
# Install Python sidecar dependencies
cd python && pip install -e . && cd ..
# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev
Building for Distribution
# Build the frozen Python sidecar
npm run sidecar:build
# Build the Tauri app (requires sidecar in src-tauri/binaries/)
npm run tauri build
CI/CD
Gitea Actions workflows are in .gitea/workflows/. The build pipeline:
- Build sidecar — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
- Build Tauri app — Bundles the sidecar via
externalBin, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)
Required Secrets
| Secret | Purpose | Required? |
|---|---|---|
TAURI_SIGNING_PRIVATE_KEY |
Signs Tauri update bundles | Optional (for auto-updates) |
No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.
Project Structure
src/ # Svelte 5 frontend
src-tauri/ # Rust backend (Tauri commands, sidecar manager, SQLite)
python/ # Python sidecar (transcription, diarization, AI)
voice_to_notes/ # Python package
build_sidecar.py # PyInstaller build script
voice_to_notes.spec # PyInstaller spec
.gitea/workflows/ # Gitea Actions CI/CD
License
MIT
Releases
10
Voice to Notes v0.2.46
Latest
Languages
Python
36.6%
Svelte
30.3%
Rust
29.6%
TypeScript
2.2%
Shell
0.5%
Other
0.8%