- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
93 lines
3.1 KiB
Markdown
93 lines
3.1 KiB
Markdown
# Voice to Notes
|
|
|
|
A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.
|
|
|
|
## Features
|
|
|
|
- **Speech-to-Text Transcription** — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
|
|
- **Speaker Identification (Diarization)** — Detect and distinguish between speakers using pyannote.audio
|
|
- **Synchronized Playback** — Click any word to seek to that point in the audio (Web Audio API for instant playback)
|
|
- **AI Integration** — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
|
|
- **Export Formats** — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
|
|
- **Cross-Platform** — Builds for Linux, Windows, and macOS (Apple Silicon)
|
|
|
|
## Platform Support
|
|
|
|
| Platform | Architecture | Status |
|
|
|----------|-------------|--------|
|
|
| Linux | x86_64 | Supported |
|
|
| Windows | x86_64 | Supported |
|
|
| macOS | ARM (Apple Silicon) | Supported |
|
|
|
|
## Tech Stack
|
|
|
|
- **Desktop shell:** Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
|
|
- **ML pipeline:** Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
|
|
- **Audio playback:** wavesurfer.js with Web Audio API backend
|
|
- **AI providers:** OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
|
|
- **Local AI:** Bundled llama-server (llama.cpp)
|
|
- **Caption export:** pysubs2
|
|
|
|
## Development
|
|
|
|
### Prerequisites
|
|
|
|
- Node.js 20+
|
|
- Rust (stable)
|
|
- Python 3.11+ with ML dependencies
|
|
- System: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev` (Linux)
|
|
|
|
### Getting Started
|
|
|
|
```bash
|
|
# Install frontend dependencies
|
|
npm install
|
|
|
|
# Install Python sidecar dependencies
|
|
cd python && pip install -e . && cd ..
|
|
|
|
# Run in dev mode (uses system Python for the sidecar)
|
|
npm run tauri:dev
|
|
```
|
|
|
|
### Building for Distribution
|
|
|
|
```bash
|
|
# Build the frozen Python sidecar
|
|
npm run sidecar:build
|
|
|
|
# Build the Tauri app (requires sidecar in src-tauri/binaries/)
|
|
npm run tauri build
|
|
```
|
|
|
|
### CI/CD
|
|
|
|
Gitea Actions workflows are in `.gitea/workflows/`. The build pipeline:
|
|
|
|
1. **Build sidecar** — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
|
|
2. **Build Tauri app** — Bundles the sidecar via `externalBin`, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)
|
|
|
|
#### Required Secrets
|
|
|
|
| Secret | Purpose | Required? |
|
|
|--------|---------|-----------|
|
|
| `TAURI_SIGNING_PRIVATE_KEY` | Signs Tauri update bundles | Optional (for auto-updates) |
|
|
|
|
No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
src/ # Svelte 5 frontend
|
|
src-tauri/ # Rust backend (Tauri commands, sidecar manager, SQLite)
|
|
python/ # Python sidecar (transcription, diarization, AI)
|
|
voice_to_notes/ # Python package
|
|
build_sidecar.py # PyInstaller build script
|
|
voice_to_notes.spec # PyInstaller spec
|
|
.gitea/workflows/ # Gitea Actions CI/CD
|
|
```
|
|
|
|
## License
|
|
|
|
MIT
|