Claude 908762073f
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Release / Build App (macOS) (push) Successful in 1m31s
Release / Build App (Windows) (push) Successful in 3m25s
Release / Build App (Linux) (push) Successful in 3m28s
Fix ffmpeg permission denied on Linux
The bundled ffmpeg in the sidecar extract dir lacked execute permissions.
Now sets chmod 755 on Unix when find_ffmpeg locates the bundled binary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 13:18:51 -07:00

Voice to Notes

A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.

Features

  • Speech-to-Text — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
  • Speaker Identification — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
  • GPU Acceleration — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
  • Synchronized Playback — Click any word to seek. Waveform visualization via wavesurfer.js.
  • AI Chat — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
  • Export — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
  • Cross-Platform — Linux, Windows, macOS (Apple Silicon).

Quick Start

  1. Download the installer from Releases
  2. On first launch, choose CPU or CUDA sidecar (the AI engine downloads separately, ~500MB2GB)
  3. Import an audio/video file and click Transcribe

See the full User Guide for detailed setup and usage instructions.

Platform Support

Platform Architecture Installers
Linux x86_64 .deb, .rpm
Windows x86_64 .msi, .exe (NSIS)
macOS ARM (Apple Silicon) .dmg

Architecture

The app is split into two independently versioned components:

  • App (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
  • Sidecar (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.

This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.

Tech Stack

Component Technology
Desktop shell Tauri v2 (Rust + Svelte 5 / TypeScript)
Transcription faster-whisper (CTranslate2)
Speaker ID pyannote.audio 3.1
Audio UI wavesurfer.js
Transcript editor TipTap (ProseMirror)
AI (local) Ollama (any model)
AI (cloud) OpenAI, Anthropic, OpenAI-compatible
Caption export pysubs2
Database SQLite (rusqlite)

Development

Prerequisites

  • Node.js 20+
  • Rust (stable)
  • Python 3.11+ with uv or pip
  • Linux: libgtk-3-dev, libwebkit2gtk-4.1-dev, libappindicator3-dev, librsvg2-dev

Getting Started

# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e ".[dev]" && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev

Building

# Build the frozen Python sidecar (CPU-only)
cd python && python build_sidecar.py --cpu-only && cd ..

# Build with CUDA support
cd python && python build_sidecar.py --with-cuda && cd ..

# Build the Tauri app
npm run tauri build

CI/CD

Two Gitea Actions workflows in .gitea/workflows/:

release.yml — Triggers on push to main:

  1. Bumps app version (patch), creates git tag and Gitea release
  2. Builds lightweight app installers for all platforms (no sidecar bundled)

build-sidecar.yml — Triggers on changes to python/ or manual dispatch:

  1. Bumps sidecar version, creates sidecar-v* tag and release
  2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
  3. Uploads as separate release assets

Required Secrets

Secret Purpose
BUILD_TOKEN Gitea API token for creating releases and pushing tags

Project Structure

src/                        # Svelte 5 frontend
  lib/components/           # UI components (waveform, transcript editor, settings, etc.)
  lib/stores/               # Svelte stores (settings, transcript state)
  routes/                   # SvelteKit pages
src-tauri/                  # Rust backend
  src/sidecar/              # Sidecar process manager (download, extract, IPC)
  src/commands/             # Tauri command handlers
  nsis-hooks.nsh            # Windows uninstall cleanup
python/                     # Python sidecar
  voice_to_notes/           # Python package (transcription, diarization, AI, export)
  build_sidecar.py          # PyInstaller build script
  voice_to_notes.spec       # PyInstaller spec
.gitea/workflows/           # CI/CD (release.yml, build-sidecar.yml)
docs/                       # Documentation

License

MIT

Description
Convert recorded audio to text with speaker identifying and text to audio scrubbing
Readme MIT 1.1 MiB
2026-03-24 02:04:26 +00:00
Languages
Python 36.6%
Svelte 30.3%
Rust 29.6%
TypeScript 2.2%
Shell 0.5%
Other 0.8%