# Voice to Notes A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback. ## Features - **Speech-to-Text Transcription** — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps - **Speaker Identification (Diarization)** — Detect and distinguish between speakers using pyannote.audio - **Synchronized Playback** — Click any word to seek to that point in the audio (Web Audio API for instant playback) - **AI Integration** — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM) - **Export Formats** — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels - **Cross-Platform** — Builds for Linux, Windows, and macOS (Apple Silicon) ## Platform Support | Platform | Architecture | Status | |----------|-------------|--------| | Linux | x86_64 | Supported | | Windows | x86_64 | Supported | | macOS | ARM (Apple Silicon) | Supported | ## Tech Stack - **Desktop shell:** Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend) - **ML pipeline:** Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution - **Audio playback:** wavesurfer.js with Web Audio API backend - **AI providers:** OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote) - **Local AI:** Bundled llama-server (llama.cpp) - **Caption export:** pysubs2 ## Development ### Prerequisites - Node.js 20+ - Rust (stable) - Python 3.11+ with ML dependencies - System: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev` (Linux) ### Getting Started ```bash # Install frontend dependencies npm install # Install Python sidecar dependencies cd python && pip install -e . && cd .. # Run in dev mode (uses system Python for the sidecar) npm run tauri:dev ``` ### Building for Distribution ```bash # Build the frozen Python sidecar npm run sidecar:build # Build the Tauri app (requires sidecar in src-tauri/binaries/) npm run tauri build ``` ### CI/CD Gitea Actions workflows are in `.gitea/workflows/`. The build pipeline: 1. **Build sidecar** — PyInstaller-frozen Python binary per platform (CPU-only PyTorch) 2. **Build Tauri app** — Bundles the sidecar via `externalBin`, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS) #### Required Secrets | Secret | Purpose | Required? | |--------|---------|-----------| | `TAURI_SIGNING_PRIVATE_KEY` | Signs Tauri update bundles | Optional (for auto-updates) | No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings. ### Project Structure ``` src/ # Svelte 5 frontend src-tauri/ # Rust backend (Tauri commands, sidecar manager, SQLite) python/ # Python sidecar (transcription, diarization, AI) voice_to_notes/ # Python package build_sidecar.py # PyInstaller build script voice_to_notes.spec # PyInstaller spec .gitea/workflows/ # Gitea Actions CI/CD ``` ## License MIT