2026-02-26 08:11:57 -08:00
# Voice to Notes
A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.
2026-03-20 21:33:43 -07:00
## Features
2026-02-26 08:11:57 -08:00
2026-03-20 21:33:43 -07:00
- **Speech-to-Text Transcription** — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
- **Speaker Identification (Diarization)** — Detect and distinguish between speakers using pyannote.audio
- **Synchronized Playback** — Click any word to seek to that point in the audio (Web Audio API for instant playback)
- **AI Integration** — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
- **Export Formats** — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
- **Cross-Platform** — Builds for Linux, Windows, and macOS (Apple Silicon)
2026-02-26 08:11:57 -08:00
## Platform Support
2026-03-20 21:33:43 -07:00
| Platform | Architecture | Status |
|----------|-------------|--------|
| Linux | x86_64 | Supported |
| Windows | x86_64 | Supported |
| macOS | ARM (Apple Silicon) | Supported |
2026-02-26 08:11:57 -08:00
2026-03-20 21:33:43 -07:00
## Tech Stack
2026-02-26 08:11:57 -08:00
2026-03-20 21:33:43 -07:00
- **Desktop shell:** Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
- **ML pipeline:** Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
- **Audio playback:** wavesurfer.js with Web Audio API backend
- **AI providers:** OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
- **Local AI:** Bundled llama-server (llama.cpp)
- **Caption export:** pysubs2
## Development
### Prerequisites
- Node.js 20+
- Rust (stable)
- Python 3.11+ with ML dependencies
- System: `libgtk-3-dev` , `libwebkit2gtk-4.1-dev` (Linux)
### Getting Started
```bash
# Install frontend dependencies
npm install
# Install Python sidecar dependencies
cd python && pip install -e . && cd ..
# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev
```
### Building for Distribution
```bash
# Build the frozen Python sidecar
npm run sidecar:build
# Build the Tauri app (requires sidecar in src-tauri/binaries/)
npm run tauri build
```
### CI/CD
Gitea Actions workflows are in `.gitea/workflows/` . The build pipeline:
1. **Build sidecar ** — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
2. **Build Tauri app ** — Bundles the sidecar via `externalBin` , produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)
#### Required Secrets
| Secret | Purpose | Required? |
|--------|---------|-----------|
| `TAURI_SIGNING_PRIVATE_KEY` | Signs Tauri update bundles | Optional (for auto-updates) |
No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.
### Project Structure
```
src/ # Svelte 5 frontend
src-tauri/ # Rust backend (Tauri commands, sidecar manager, SQLite)
python/ # Python sidecar (transcription, diarization, AI)
voice_to_notes/ # Python package
build_sidecar.py # PyInstaller build script
voice_to_notes.spec # PyInstaller spec
.gitea/workflows/ # Gitea Actions CI/CD
```
2026-02-26 08:11:57 -08:00
## License
2026-02-26 09:00:47 -08:00
MIT