Files
voice-to-notes/README.md
Claude cd788026df
All checks were successful
Build Sidecars / Bump sidecar version and tag (push) Successful in 7s
Release / Bump version and tag (push) Successful in 5s
Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m48s
Release / Build App (macOS) (push) Successful in 1m19s
Build Sidecars / Build Sidecar (Linux) (push) Successful in 12m2s
Release / Build App (Linux) (push) Successful in 4m40s
Build Sidecars / Build Sidecar (Windows) (push) Successful in 28m52s
Release / Build App (Windows) (push) Successful in 3m30s
Bundle soundfile with native libs in PyInstaller, link LICENSE in README
soundfile needs collect_all() to include libsndfile native library —
hiddenimports alone wasn't enough, causing 'No module named soundfile'
in the frozen sidecar. This is needed for the pyannote Audio patch
that bypasses torchaudio/torchcodec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 15:27:12 -07:00

130 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Voice to Notes
A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.
## Features
- **Speech-to-Text** — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
- **Speaker Identification** — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
- **GPU Acceleration** — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
- **Synchronized Playback** — Click any word to seek. Waveform visualization via wavesurfer.js.
- **AI Chat** — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
- **Export** — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
- **Cross-Platform** — Linux, Windows, macOS (Apple Silicon).
## Quick Start
1. Download the installer from [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases)
2. On first launch, choose **CPU** or **CUDA** sidecar (the AI engine downloads separately, ~500MB2GB)
3. Import an audio/video file and click **Transcribe**
See the full [User Guide](docs/USER_GUIDE.md) for detailed setup and usage instructions.
## Platform Support
| Platform | Architecture | Installers |
|----------|-------------|------------|
| Linux | x86_64 | .deb, .rpm |
| Windows | x86_64 | .msi, .exe (NSIS) |
| macOS | ARM (Apple Silicon) | .dmg |
## Architecture
The app is split into two independently versioned components:
- **App** (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
- **Sidecar** (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.
This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.
## Tech Stack
| Component | Technology |
|-----------|-----------|
| Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) |
| Transcription | faster-whisper (CTranslate2) |
| Speaker ID | pyannote.audio 3.1 |
| Audio UI | wavesurfer.js |
| Transcript editor | TipTap (ProseMirror) |
| AI (local) | Ollama (any model) |
| AI (cloud) | OpenAI, Anthropic, OpenAI-compatible |
| Caption export | pysubs2 |
| Database | SQLite (rusqlite) |
## Development
### Prerequisites
- Node.js 20+
- Rust (stable)
- Python 3.11+ with uv or pip
- Linux: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`
### Getting Started
```bash
# Install frontend dependencies
npm install
# Install Python sidecar dependencies
cd python && pip install -e ".[dev]" && cd ..
# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev
```
### Building
```bash
# Build the frozen Python sidecar (CPU-only)
cd python && python build_sidecar.py --cpu-only && cd ..
# Build with CUDA support
cd python && python build_sidecar.py --with-cuda && cd ..
# Build the Tauri app
npm run tauri build
```
### CI/CD
Two Gitea Actions workflows in `.gitea/workflows/`:
**`release.yml`** — Triggers on push to main:
1. Bumps app version (patch), creates git tag and Gitea release
2. Builds lightweight app installers for all platforms (no sidecar bundled)
**`build-sidecar.yml`** — Triggers on changes to `python/` or manual dispatch:
1. Bumps sidecar version, creates `sidecar-v*` tag and release
2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
3. Uploads as separate release assets
#### Required Secrets
| Secret | Purpose |
|--------|---------|
| `BUILD_TOKEN` | Gitea API token for creating releases and pushing tags |
### Project Structure
```
src/ # Svelte 5 frontend
lib/components/ # UI components (waveform, transcript editor, settings, etc.)
lib/stores/ # Svelte stores (settings, transcript state)
routes/ # SvelteKit pages
src-tauri/ # Rust backend
src/sidecar/ # Sidecar process manager (download, extract, IPC)
src/commands/ # Tauri command handlers
nsis-hooks.nsh # Windows uninstall cleanup
python/ # Python sidecar
voice_to_notes/ # Python package (transcription, diarization, AI, export)
build_sidecar.py # PyInstaller build script
voice_to_notes.spec # PyInstaller spec
.gitea/workflows/ # CI/CD (release.yml, build-sidecar.yml)
docs/ # Documentation
```
## License
[MIT](LICENSE)