2026-03-23 20:45:32 +00:00

Voice to Notes

A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.

Features

  • Speech-to-Text — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
  • Speaker Identification — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
  • GPU Acceleration — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
  • Synchronized Playback — Click any word to seek. Waveform visualization via wavesurfer.js.
  • AI Chat — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
  • Export — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
  • Cross-Platform — Linux, Windows, macOS (Apple Silicon).

Quick Start

  1. Download the installer from Releases
  2. On first launch, choose CPU or CUDA sidecar (the AI engine downloads separately, ~500MB2GB)
  3. Import an audio/video file and click Transcribe

See the full User Guide for detailed setup and usage instructions.

Platform Support

Platform Architecture Installers
Linux x86_64 .deb, .rpm
Windows x86_64 .msi, .exe (NSIS)
macOS ARM (Apple Silicon) .dmg

Architecture

The app is split into two independently versioned components:

  • App (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
  • Sidecar (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.

This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.

Tech Stack

Component Technology
Desktop shell Tauri v2 (Rust + Svelte 5 / TypeScript)
Transcription faster-whisper (CTranslate2)
Speaker ID pyannote.audio 3.1
Audio UI wavesurfer.js
Transcript editor TipTap (ProseMirror)
AI (local) Ollama (any model)
AI (cloud) OpenAI, Anthropic, OpenAI-compatible
Caption export pysubs2
Database SQLite (rusqlite)

Development

Prerequisites

  • Node.js 20+
  • Rust (stable)
  • Python 3.11+ with uv or pip
  • Linux: libgtk-3-dev, libwebkit2gtk-4.1-dev, libappindicator3-dev, librsvg2-dev

Getting Started

# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e ".[dev]" && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev

Building

# Build the frozen Python sidecar (CPU-only)
cd python && python build_sidecar.py --cpu-only && cd ..

# Build with CUDA support
cd python && python build_sidecar.py --with-cuda && cd ..

# Build the Tauri app
npm run tauri build

CI/CD

Two Gitea Actions workflows in .gitea/workflows/:

release.yml — Triggers on push to main:

  1. Bumps app version (patch), creates git tag and Gitea release
  2. Builds lightweight app installers for all platforms (no sidecar bundled)

build-sidecar.yml — Triggers on changes to python/ or manual dispatch:

  1. Bumps sidecar version, creates sidecar-v* tag and release
  2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
  3. Uploads as separate release assets

Required Secrets

Secret Purpose
BUILD_TOKEN Gitea API token for creating releases and pushing tags

Project Structure

src/                        # Svelte 5 frontend
  lib/components/           # UI components (waveform, transcript editor, settings, etc.)
  lib/stores/               # Svelte stores (settings, transcript state)
  routes/                   # SvelteKit pages
src-tauri/                  # Rust backend
  src/sidecar/              # Sidecar process manager (download, extract, IPC)
  src/commands/             # Tauri command handlers
  nsis-hooks.nsh            # Windows uninstall cleanup
python/                     # Python sidecar
  voice_to_notes/           # Python package (transcription, diarization, AI, export)
  build_sidecar.py          # PyInstaller build script
  voice_to_notes.spec       # PyInstaller spec
.gitea/workflows/           # CI/CD (release.yml, build-sidecar.yml)
docs/                       # Documentation

License

MIT

Description
Convert recorded audio to text with speaker identifying and text to audio scrubbing
Readme MIT 1.1 MiB
2026-03-24 02:04:26 +00:00
Languages
Python 36.6%
Svelte 30.3%
Rust 29.6%
TypeScript 2.2%
Shell 0.5%
Other 0.8%