Claude f9226ee4d0
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 3s
Release / Bump version and tag (push) Successful in 3s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m58s
Release / Build App (macOS) (push) Successful in 1m20s
Release / Build App (Linux) (push) Has been cancelled
Release / Build App (Windows) (push) Has been cancelled
Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m41s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 34m33s
Fix diarization: use soundfile instead of torchaudio for audio loading
torchaudio 2.10 unconditionally delegates load() to torchcodec, ignoring
the backend parameter. Since torchcodec is excluded from PyInstaller,
this broke our pyannote Audio monkey-patch.

Fix: replace torchaudio.load() with soundfile.read() + torch.from_numpy().
soundfile handles WAV natively (audio is pre-converted to WAV), has no
torchcodec dependency, and is already a transitive dependency.

Also added soundfile to PyInstaller hiddenimports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:49:39 -07:00

Voice to Notes

A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.

Features

  • Speech-to-Text Transcription — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
  • Speaker Identification (Diarization) — Detect and distinguish between speakers using pyannote.audio
  • Synchronized Playback — Click any word to seek to that point in the audio (Web Audio API for instant playback)
  • AI Integration — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
  • Export Formats — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
  • Cross-Platform — Builds for Linux, Windows, and macOS (Apple Silicon)

Platform Support

Platform Architecture Status
Linux x86_64 Supported
Windows x86_64 Supported
macOS ARM (Apple Silicon) Supported

Tech Stack

  • Desktop shell: Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
  • ML pipeline: Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
  • Audio playback: wavesurfer.js with Web Audio API backend
  • AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
  • Local AI: Bundled llama-server (llama.cpp)
  • Caption export: pysubs2

Development

Prerequisites

  • Node.js 20+
  • Rust (stable)
  • Python 3.11+ with ML dependencies
  • System: libgtk-3-dev, libwebkit2gtk-4.1-dev (Linux)

Getting Started

# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e . && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev

Building for Distribution

# Build the frozen Python sidecar
npm run sidecar:build

# Build the Tauri app (requires sidecar in src-tauri/binaries/)
npm run tauri build

CI/CD

Gitea Actions workflows are in .gitea/workflows/. The build pipeline:

  1. Build sidecar — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
  2. Build Tauri app — Bundles the sidecar via externalBin, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)

Required Secrets

Secret Purpose Required?
TAURI_SIGNING_PRIVATE_KEY Signs Tauri update bundles Optional (for auto-updates)

No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.

Project Structure

src/                    # Svelte 5 frontend
src-tauri/              # Rust backend (Tauri commands, sidecar manager, SQLite)
python/                 # Python sidecar (transcription, diarization, AI)
  voice_to_notes/       # Python package
  build_sidecar.py      # PyInstaller build script
  voice_to_notes.spec   # PyInstaller spec
.gitea/workflows/       # Gitea Actions CI/CD

License

MIT

Description
Convert recorded audio to text with speaker identifying and text to audio scrubbing
Readme MIT 1.1 MiB
2026-03-24 02:04:26 +00:00
Languages
Python 36.6%
Svelte 30.3%
Rust 29.6%
TypeScript 2.2%
Shell 0.5%
Other 0.8%