README.md

# Voice to Notes

A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.

## Features

- **Speech-to-Text Transcription** — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
- **Speaker Identification (Diarization)** — Detect and distinguish between speakers using pyannote.audio
- **Synchronized Playback** — Click any word to seek to that point in the audio (Web Audio API for instant playback)
- **AI Integration** — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
- **Export Formats** — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
- **Cross-Platform** — Builds for Linux, Windows, and macOS (Apple Silicon)

## Platform Support

| Platform | Architecture | Status |
|----------|-------------|--------|
| Linux    | x86_64      | Supported |
| Windows  | x86_64      | Supported |
| macOS    | ARM (Apple Silicon) | Supported |

## Tech Stack

- **Desktop shell:** Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
- **ML pipeline:** Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
- **Audio playback:** wavesurfer.js with Web Audio API backend
- **AI providers:** OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
- **Local AI:** Bundled llama-server (llama.cpp)
- **Caption export:** pysubs2

## Development

### Prerequisites

- Node.js 20+
- Rust (stable)
- Python 3.11+ with ML dependencies
- System: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev` (Linux)

### Getting Started

```bash
# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e . && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev
```

### Building for Distribution

```bash
# Build the frozen Python sidecar
npm run sidecar:build

# Build the Tauri app (requires sidecar in src-tauri/binaries/)
npm run tauri build
```

### CI/CD

Gitea Actions workflows are in `.gitea/workflows/`. The build pipeline:

1. **Build sidecar** — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
2. **Build Tauri app** — Bundles the sidecar via `externalBin`, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)

#### Required Secrets

| Secret | Purpose | Required? |
|--------|---------|-----------|
| `TAURI_SIGNING_PRIVATE_KEY` | Signs Tauri update bundles | Optional (for auto-updates) |

No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.

### Project Structure

```
src/                    # Svelte 5 frontend
src-tauri/              # Rust backend (Tauri commands, sidecar manager, SQLite)
python/                 # Python sidecar (transcription, diarization, AI)
  voice_to_notes/       # Python package
  build_sidecar.py      # PyInstaller build script
  voice_to_notes.spec   # PyInstaller spec
.gitea/workflows/       # Gitea Actions CI/CD
```

## License

MIT
Initial project setup with README and gitignore Establish the voice-to-notes project with documentation covering goals, platform targets, and planned feature set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-26 08:11:57 -08:00			`# Voice to Notes`

			`A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.`

Cross-platform distribution, UI improvements, and performance optimizations - PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-20 21:33:43 -07:00			`## Features`
Initial project setup with README and gitignore Establish the voice-to-notes project with documentation covering goals, platform targets, and planned feature set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-26 08:11:57 -08:00
Cross-platform distribution, UI improvements, and performance optimizations - PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-20 21:33:43 -07:00			`- Speech-to-Text Transcription — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps`
			`- Speaker Identification (Diarization) — Detect and distinguish between speakers using pyannote.audio`
			`- Synchronized Playback — Click any word to seek to that point in the audio (Web Audio API for instant playback)`
			`- AI Integration — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)`
			`- Export Formats — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels`
			`- Cross-Platform — Builds for Linux, Windows, and macOS (Apple Silicon)`
Initial project setup with README and gitignore Establish the voice-to-notes project with documentation covering goals, platform targets, and planned feature set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-26 08:11:57 -08:00
			`## Platform Support`

Cross-platform distribution, UI improvements, and performance optimizations - PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-20 21:33:43 -07:00			`\| Platform \| Architecture \| Status \|`
			`\|----------\|-------------\|--------\|`
			`\| Linux \| x86_64 \| Supported \|`
			`\| Windows \| x86_64 \| Supported \|`
			`\| macOS \| ARM (Apple Silicon) \| Supported \|`
Initial project setup with README and gitignore Establish the voice-to-notes project with documentation covering goals, platform targets, and planned feature set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-26 08:11:57 -08:00
Cross-platform distribution, UI improvements, and performance optimizations - PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-20 21:33:43 -07:00			`## Tech Stack`
Initial project setup with README and gitignore Establish the voice-to-notes project with documentation covering goals, platform targets, and planned feature set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-26 08:11:57 -08:00
Cross-platform distribution, UI improvements, and performance optimizations - PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-20 21:33:43 -07:00			`- Desktop shell: Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)`
			`- ML pipeline: Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution`
			`- Audio playback: wavesurfer.js with Web Audio API backend`
			`- AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)`
			`- Local AI: Bundled llama-server (llama.cpp)`
			`- Caption export: pysubs2`

			`## Development`

			`### Prerequisites`

			`- Node.js 20+`
			`- Rust (stable)`
			`- Python 3.11+ with ML dependencies`
			- System: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev` (Linux)

			`### Getting Started`

			```bash
			`# Install frontend dependencies`
			`npm install`

			`# Install Python sidecar dependencies`
			`cd python && pip install -e . && cd ..`

			`# Run in dev mode (uses system Python for the sidecar)`
			`npm run tauri:dev`
			```

			`### Building for Distribution`

			```bash
			`# Build the frozen Python sidecar`
			`npm run sidecar:build`

			`# Build the Tauri app (requires sidecar in src-tauri/binaries/)`
			`npm run tauri build`
			```

			`### CI/CD`

			Gitea Actions workflows are in `.gitea/workflows/`. The build pipeline:

			`1. Build sidecar — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)`
			2. Build Tauri app — Bundles the sidecar via `externalBin`, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)

			`#### Required Secrets`

			`\| Secret \| Purpose \| Required? \|`
			`\|--------\|---------\|-----------\|`
			\| `TAURI_SIGNING_PRIVATE_KEY` \| Signs Tauri update bundles \| Optional (for auto-updates) \|

			`No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.`

			`### Project Structure`

			```
			`src/ # Svelte 5 frontend`
			`src-tauri/ # Rust backend (Tauri commands, sidecar manager, SQLite)`
			`python/ # Python sidecar (transcription, diarization, AI)`
			`voice_to_notes/ # Python package`
			`build_sidecar.py # PyInstaller build script`
			`voice_to_notes.spec # PyInstaller spec`
			`.gitea/workflows/ # Gitea Actions CI/CD`
			```
Initial project setup with README and gitignore Establish the voice-to-notes project with documentation covering goals, platform targets, and planned feature set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-26 08:11:57 -08:00
			`## License`

Switch local AI from Ollama to bundled llama-server, add MIT license - Replace Ollama dependency with bundled llama-server (llama.cpp) so users need no separate install for local AI inference - Rust backend manages llama-server lifecycle (spawn, port, shutdown) - Add MIT license for open source release - Update architecture doc, CLAUDE.md, and README accordingly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-26 09:00:47 -08:00			`MIT`