Claude 45247ae66e
Some checks failed
Build Sidecars / Bump sidecar version and tag (push) Successful in 3s
Release / Bump version and tag (push) Failing after 3s
Release / Build App (Linux) (push) Has been skipped
Release / Build App (Windows) (push) Has been skipped
Release / Build App (macOS) (push) Has been skipped
Build Sidecars / Build Sidecar (macOS) (push) Successful in 5m28s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m54s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 37m38s
Decouple sidecar versioning from app versioning
Sidecar now has its own version (1.0.0) and release lifecycle:
- Sidecar tags: sidecar-v1.0.0, sidecar-v1.0.1, etc.
- App tags: v0.2.x (unchanged)
- Sidecar workflow triggers only on python/** changes or manual dispatch
- App release no longer bumps python/pyproject.toml

Sidecar version tracked via sidecar-version.txt in app data dir:
- resolve_sidecar_path() reads version from file instead of CARGO_PKG_VERSION
- download_sidecar() fetches latest sidecar-v* release from Gitea API
- check_sidecar_update() compares local vs remote sidecar versions
- Version file written after successful download

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 07:57:51 -07:00

Voice to Notes

A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.

Features

  • Speech-to-Text Transcription — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
  • Speaker Identification (Diarization) — Detect and distinguish between speakers using pyannote.audio
  • Synchronized Playback — Click any word to seek to that point in the audio (Web Audio API for instant playback)
  • AI Integration — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
  • Export Formats — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
  • Cross-Platform — Builds for Linux, Windows, and macOS (Apple Silicon)

Platform Support

Platform Architecture Status
Linux x86_64 Supported
Windows x86_64 Supported
macOS ARM (Apple Silicon) Supported

Tech Stack

  • Desktop shell: Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
  • ML pipeline: Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
  • Audio playback: wavesurfer.js with Web Audio API backend
  • AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
  • Local AI: Bundled llama-server (llama.cpp)
  • Caption export: pysubs2

Development

Prerequisites

  • Node.js 20+
  • Rust (stable)
  • Python 3.11+ with ML dependencies
  • System: libgtk-3-dev, libwebkit2gtk-4.1-dev (Linux)

Getting Started

# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e . && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev

Building for Distribution

# Build the frozen Python sidecar
npm run sidecar:build

# Build the Tauri app (requires sidecar in src-tauri/binaries/)
npm run tauri build

CI/CD

Gitea Actions workflows are in .gitea/workflows/. The build pipeline:

  1. Build sidecar — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
  2. Build Tauri app — Bundles the sidecar via externalBin, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)

Required Secrets

Secret Purpose Required?
TAURI_SIGNING_PRIVATE_KEY Signs Tauri update bundles Optional (for auto-updates)

No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.

Project Structure

src/                    # Svelte 5 frontend
src-tauri/              # Rust backend (Tauri commands, sidecar manager, SQLite)
python/                 # Python sidecar (transcription, diarization, AI)
  voice_to_notes/       # Python package
  build_sidecar.py      # PyInstaller build script
  voice_to_notes.spec   # PyInstaller spec
.gitea/workflows/       # Gitea Actions CI/CD

License

MIT

Description
Convert recorded audio to text with speaker identifying and text to audio scrubbing
Readme MIT 1.1 MiB
2026-03-24 02:04:26 +00:00
Languages
Python 36.6%
Svelte 30.3%
Rust 29.6%
TypeScript 2.2%
Shell 0.5%
Other 0.8%