# Voice to Notes A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown. ## Features - **Speech-to-Text** — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages. - **Speaker Identification** — Detect and label speakers using pyannote.audio. Rename speakers for clean exports. - **GPU Acceleration** — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically. - **Synchronized Playback** — Click any word to seek. Waveform visualization via wavesurfer.js. - **AI Chat** — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API. - **Export** — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels. - **Cross-Platform** — Linux, Windows, macOS (Apple Silicon). ## Quick Start 1. Download the installer from [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases) 2. On first launch, choose **CPU** or **CUDA** sidecar (the AI engine downloads separately, ~500MB–2GB) 3. Import an audio/video file and click **Transcribe** See the full [User Guide](docs/USER_GUIDE.md) for detailed setup and usage instructions. ## Platform Support | Platform | Architecture | Installers | |----------|-------------|------------| | Linux | x86_64 | .deb, .rpm | | Windows | x86_64 | .msi, .exe (NSIS) | | macOS | ARM (Apple Silicon) | .dmg | ## Architecture The app is split into two independently versioned components: - **App** (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB). - **Sidecar** (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants. This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app. ## Tech Stack | Component | Technology | |-----------|-----------| | Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) | | Transcription | faster-whisper (CTranslate2) | | Speaker ID | pyannote.audio 3.1 | | Audio UI | wavesurfer.js | | Transcript editor | TipTap (ProseMirror) | | AI (local) | Ollama (any model) | | AI (cloud) | OpenAI, Anthropic, OpenAI-compatible | | Caption export | pysubs2 | | Database | SQLite (rusqlite) | ## Development ### Prerequisites - Node.js 20+ - Rust (stable) - Python 3.11+ with uv or pip - Linux: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev` ### Getting Started ```bash # Install frontend dependencies npm install # Install Python sidecar dependencies cd python && pip install -e ".[dev]" && cd .. # Run in dev mode (uses system Python for the sidecar) npm run tauri:dev ``` ### Building ```bash # Build the frozen Python sidecar (CPU-only) cd python && python build_sidecar.py --cpu-only && cd .. # Build with CUDA support cd python && python build_sidecar.py --with-cuda && cd .. # Build the Tauri app npm run tauri build ``` ### CI/CD Two Gitea Actions workflows in `.gitea/workflows/`: **`release.yml`** — Triggers on push to main: 1. Bumps app version (patch), creates git tag and Gitea release 2. Builds lightweight app installers for all platforms (no sidecar bundled) **`build-sidecar.yml`** — Triggers on changes to `python/` or manual dispatch: 1. Bumps sidecar version, creates `sidecar-v*` tag and release 2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS 3. Uploads as separate release assets #### Required Secrets | Secret | Purpose | |--------|---------| | `BUILD_TOKEN` | Gitea API token for creating releases and pushing tags | ### Project Structure ``` src/ # Svelte 5 frontend lib/components/ # UI components (waveform, transcript editor, settings, etc.) lib/stores/ # Svelte stores (settings, transcript state) routes/ # SvelteKit pages src-tauri/ # Rust backend src/sidecar/ # Sidecar process manager (download, extract, IPC) src/commands/ # Tauri command handlers nsis-hooks.nsh # Windows uninstall cleanup python/ # Python sidecar voice_to_notes/ # Python package (transcription, diarization, AI, export) build_sidecar.py # PyInstaller build script voice_to_notes.spec # PyInstaller spec .gitea/workflows/ # CI/CD (release.yml, build-sidecar.yml) docs/ # Documentation ``` ## License [MIT](LICENSE)