Claude 727107323c Fix transcript text edit not showing after Enter
The display renders segment.words (not segment.text), so editing the text
field alone had no visible effect. Now finishEditing() rebuilds the words
array from the edited text so the change is immediately visible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 22:20:20 -07:00

Voice to Notes

A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.

Features

  • Speech-to-Text Transcription — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
  • Speaker Identification (Diarization) — Detect and distinguish between speakers using pyannote.audio
  • Synchronized Playback — Click any word to seek to that point in the audio (Web Audio API for instant playback)
  • AI Integration — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
  • Export Formats — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
  • Cross-Platform — Builds for Linux, Windows, and macOS (Apple Silicon)

Platform Support

Platform Architecture Status
Linux x86_64 Supported
Windows x86_64 Supported
macOS ARM (Apple Silicon) Supported

Tech Stack

  • Desktop shell: Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
  • ML pipeline: Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
  • Audio playback: wavesurfer.js with Web Audio API backend
  • AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
  • Local AI: Bundled llama-server (llama.cpp)
  • Caption export: pysubs2

Development

Prerequisites

  • Node.js 20+
  • Rust (stable)
  • Python 3.11+ with ML dependencies
  • System: libgtk-3-dev, libwebkit2gtk-4.1-dev (Linux)

Getting Started

# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e . && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev

Building for Distribution

# Build the frozen Python sidecar
npm run sidecar:build

# Build the Tauri app (requires sidecar in src-tauri/binaries/)
npm run tauri build

CI/CD

Gitea Actions workflows are in .gitea/workflows/. The build pipeline:

  1. Build sidecar — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
  2. Build Tauri app — Bundles the sidecar via externalBin, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)

Required Secrets

Secret Purpose Required?
TAURI_SIGNING_PRIVATE_KEY Signs Tauri update bundles Optional (for auto-updates)

No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.

Project Structure

src/                    # Svelte 5 frontend
src-tauri/              # Rust backend (Tauri commands, sidecar manager, SQLite)
python/                 # Python sidecar (transcription, diarization, AI)
  voice_to_notes/       # Python package
  build_sidecar.py      # PyInstaller build script
  voice_to_notes.spec   # PyInstaller spec
.gitea/workflows/       # Gitea Actions CI/CD

License

MIT

Description
Convert recorded audio to text with speaker identifying and text to audio scrubbing
Readme MIT 1.1 MiB
2026-03-24 02:04:26 +00:00
Languages
Python 36.6%
Svelte 30.3%
Rust 29.6%
TypeScript 2.2%
Shell 0.5%
Other 0.8%