Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar), Ollama as local AI, CUDA support, split CI workflows - USER_GUIDE.md: Complete how-to including first-time setup, transcription workflow, speaker detection setup, Ollama configuration, export formats, keyboard shortcuts, and troubleshooting - CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
parent f022c6dfe0
commit 35173c54ce
3 changed files with 420 additions and 40 deletions
--- a/README.md
+++ b/README.md
@@ -1,32 +1,55 @@
 # Voice to Notes

-A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.
+A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.

 ## Features

- **Speech-to-Text Transcription** — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
- **Speaker Identification (Diarization)** — Detect and distinguish between speakers using pyannote.audio
- **Synchronized Playback** — Click any word to seek to that point in the audio (Web Audio API for instant playback)
- **AI Integration** — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
- **Export Formats** — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
- **Cross-Platform** — Builds for Linux, Windows, and macOS (Apple Silicon)
+- **Speech-to-Text** — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
+- **Speaker Identification** — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
+- **GPU Acceleration** — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
+- **Synchronized Playback** — Click any word to seek. Waveform visualization via wavesurfer.js.
+- **AI Chat** — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
+- **Export** — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
+- **Cross-Platform** — Linux, Windows, macOS (Apple Silicon).
+
+## Quick Start
+
+1. Download the installer from [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases)
+2. On first launch, choose **CPU** or **CUDA** sidecar (the AI engine downloads separately, ~500MB–2GB)
+3. Import an audio/video file and click **Transcribe**
+
+See the full [User Guide](docs/USER_GUIDE.md) for detailed setup and usage instructions.

 ## Platform Support

-| Platform | Architecture | Status |
-|----------|-------------|--------|
-| Linux    | x86_64      | Supported |
-| Windows  | x86_64      | Supported |
-| macOS    | ARM (Apple Silicon) | Supported |
+| Platform | Architecture | Installers |
+|----------|-------------|------------|
+| Linux    | x86_64      | .deb, .rpm |
+| Windows  | x86_64      | .msi, .exe (NSIS) |
+| macOS    | ARM (Apple Silicon) | .dmg |
+
+## Architecture
+
+The app is split into two independently versioned components:
+
+- **App** (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
+- **Sidecar** (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.
+
+This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.

 ## Tech Stack

- **Desktop shell:** Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
- **ML pipeline:** Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
- **Audio playback:** wavesurfer.js with Web Audio API backend
- **AI providers:** OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
- **Local AI:** Bundled llama-server (llama.cpp)
- **Caption export:** pysubs2
+| Component | Technology |
+|-----------|-----------|
+| Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) |
+| Transcription | faster-whisper (CTranslate2) |
+| Speaker ID | pyannote.audio 3.1 |
+| Audio UI | wavesurfer.js |
+| Transcript editor | TipTap (ProseMirror) |
+| AI (local) | Ollama (any model) |
+| AI (cloud) | OpenAI, Anthropic, OpenAI-compatible |
+| Caption export | pysubs2 |
+| Database | SQLite (rusqlite) |

 ## Development

@@ -34,8 +57,8 @@ A desktop application that transcribes audio/video recordings with speaker ident

 - Node.js 20+
 - Rust (stable)
- Python 3.11+ with ML dependencies
- System: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev` (Linux)
+- Python 3.11+ with uv or pip
+- Linux: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`

 ### Getting Started

@@ -44,47 +67,61 @@ A desktop application that transcribes audio/video recordings with speaker ident
 npm install

 # Install Python sidecar dependencies
-cd python && pip install -e . && cd ..
+cd python && pip install -e ".[dev]" && cd ..

 # Run in dev mode (uses system Python for the sidecar)
 npm run tauri:dev
 ```

-### Building for Distribution
+### Building

 ```bash
-# Build the frozen Python sidecar
-npm run sidecar:build
+# Build the frozen Python sidecar (CPU-only)
+cd python && python build_sidecar.py --cpu-only && cd ..

-# Build the Tauri app (requires sidecar in src-tauri/binaries/)
+# Build with CUDA support
+cd python && python build_sidecar.py --with-cuda && cd ..
+
+# Build the Tauri app
 npm run tauri build
 ```

 ### CI/CD

-Gitea Actions workflows are in `.gitea/workflows/`. The build pipeline:
+Two Gitea Actions workflows in `.gitea/workflows/`:

-1. **Build sidecar** — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
-2. **Build Tauri app** — Bundles the sidecar via `externalBin`, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)
+**`release.yml`** — Triggers on push to main:
+1. Bumps app version (patch), creates git tag and Gitea release
+2. Builds lightweight app installers for all platforms (no sidecar bundled)
+
+**`build-sidecar.yml`** — Triggers on changes to `python/` or manual dispatch:
+1. Bumps sidecar version, creates `sidecar-v*` tag and release
+2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
+3. Uploads as separate release assets

 #### Required Secrets

-| Secret | Purpose | Required? |
-|--------|---------|-----------|
-| `TAURI_SIGNING_PRIVATE_KEY` | Signs Tauri update bundles | Optional (for auto-updates) |
-
-No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.
+| Secret | Purpose |
+|--------|---------|
+| `BUILD_TOKEN` | Gitea API token for creating releases and pushing tags |

 ### Project Structure

 ```
-src/                    # Svelte 5 frontend
-src-tauri/              # Rust backend (Tauri commands, sidecar manager, SQLite)
-python/                 # Python sidecar (transcription, diarization, AI)
-  voice_to_notes/       # Python package
-  build_sidecar.py      # PyInstaller build script
-  voice_to_notes.spec   # PyInstaller spec
-.gitea/workflows/       # Gitea Actions CI/CD
+src/                        # Svelte 5 frontend
+  lib/components/           # UI components (waveform, transcript editor, settings, etc.)
+  lib/stores/               # Svelte stores (settings, transcript state)
+  routes/                   # SvelteKit pages
+src-tauri/                  # Rust backend
+  src/sidecar/              # Sidecar process manager (download, extract, IPC)
+  src/commands/             # Tauri command handlers
+  nsis-hooks.nsh            # Windows uninstall cleanup
+python/                     # Python sidecar
+  voice_to_notes/           # Python package (transcription, diarization, AI, export)
+  build_sidecar.py          # PyInstaller build script
+  voice_to_notes.spec       # PyInstaller spec
+.gitea/workflows/           # CI/CD (release.yml, build-sidecar.yml)
+docs/                       # Documentation
 ```

 ## License