Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar), Ollama as local AI, CUDA support, split CI workflows - USER_GUIDE.md: Complete how-to including first-time setup, transcription workflow, speaker detection setup, Ollama configuration, export formats, keyboard shortcuts, and troubleshooting - CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
117
README.md
117
README.md
@@ -1,32 +1,55 @@
|
||||
# Voice to Notes
|
||||
|
||||
A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.
|
||||
A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.
|
||||
|
||||
## Features
|
||||
|
||||
- **Speech-to-Text Transcription** — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
|
||||
- **Speaker Identification (Diarization)** — Detect and distinguish between speakers using pyannote.audio
|
||||
- **Synchronized Playback** — Click any word to seek to that point in the audio (Web Audio API for instant playback)
|
||||
- **AI Integration** — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
|
||||
- **Export Formats** — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
|
||||
- **Cross-Platform** — Builds for Linux, Windows, and macOS (Apple Silicon)
|
||||
- **Speech-to-Text** — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
|
||||
- **Speaker Identification** — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
|
||||
- **GPU Acceleration** — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
|
||||
- **Synchronized Playback** — Click any word to seek. Waveform visualization via wavesurfer.js.
|
||||
- **AI Chat** — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
|
||||
- **Export** — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
|
||||
- **Cross-Platform** — Linux, Windows, macOS (Apple Silicon).
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Download the installer from [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases)
|
||||
2. On first launch, choose **CPU** or **CUDA** sidecar (the AI engine downloads separately, ~500MB–2GB)
|
||||
3. Import an audio/video file and click **Transcribe**
|
||||
|
||||
See the full [User Guide](docs/USER_GUIDE.md) for detailed setup and usage instructions.
|
||||
|
||||
## Platform Support
|
||||
|
||||
| Platform | Architecture | Status |
|
||||
|----------|-------------|--------|
|
||||
| Linux | x86_64 | Supported |
|
||||
| Windows | x86_64 | Supported |
|
||||
| macOS | ARM (Apple Silicon) | Supported |
|
||||
| Platform | Architecture | Installers |
|
||||
|----------|-------------|------------|
|
||||
| Linux | x86_64 | .deb, .rpm |
|
||||
| Windows | x86_64 | .msi, .exe (NSIS) |
|
||||
| macOS | ARM (Apple Silicon) | .dmg |
|
||||
|
||||
## Architecture
|
||||
|
||||
The app is split into two independently versioned components:
|
||||
|
||||
- **App** (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
|
||||
- **Sidecar** (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.
|
||||
|
||||
This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Desktop shell:** Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
|
||||
- **ML pipeline:** Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
|
||||
- **Audio playback:** wavesurfer.js with Web Audio API backend
|
||||
- **AI providers:** OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
|
||||
- **Local AI:** Bundled llama-server (llama.cpp)
|
||||
- **Caption export:** pysubs2
|
||||
| Component | Technology |
|
||||
|-----------|-----------|
|
||||
| Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) |
|
||||
| Transcription | faster-whisper (CTranslate2) |
|
||||
| Speaker ID | pyannote.audio 3.1 |
|
||||
| Audio UI | wavesurfer.js |
|
||||
| Transcript editor | TipTap (ProseMirror) |
|
||||
| AI (local) | Ollama (any model) |
|
||||
| AI (cloud) | OpenAI, Anthropic, OpenAI-compatible |
|
||||
| Caption export | pysubs2 |
|
||||
| Database | SQLite (rusqlite) |
|
||||
|
||||
## Development
|
||||
|
||||
@@ -34,8 +57,8 @@ A desktop application that transcribes audio/video recordings with speaker ident
|
||||
|
||||
- Node.js 20+
|
||||
- Rust (stable)
|
||||
- Python 3.11+ with ML dependencies
|
||||
- System: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev` (Linux)
|
||||
- Python 3.11+ with uv or pip
|
||||
- Linux: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`
|
||||
|
||||
### Getting Started
|
||||
|
||||
@@ -44,47 +67,61 @@ A desktop application that transcribes audio/video recordings with speaker ident
|
||||
npm install
|
||||
|
||||
# Install Python sidecar dependencies
|
||||
cd python && pip install -e . && cd ..
|
||||
cd python && pip install -e ".[dev]" && cd ..
|
||||
|
||||
# Run in dev mode (uses system Python for the sidecar)
|
||||
npm run tauri:dev
|
||||
```
|
||||
|
||||
### Building for Distribution
|
||||
### Building
|
||||
|
||||
```bash
|
||||
# Build the frozen Python sidecar
|
||||
npm run sidecar:build
|
||||
# Build the frozen Python sidecar (CPU-only)
|
||||
cd python && python build_sidecar.py --cpu-only && cd ..
|
||||
|
||||
# Build the Tauri app (requires sidecar in src-tauri/binaries/)
|
||||
# Build with CUDA support
|
||||
cd python && python build_sidecar.py --with-cuda && cd ..
|
||||
|
||||
# Build the Tauri app
|
||||
npm run tauri build
|
||||
```
|
||||
|
||||
### CI/CD
|
||||
|
||||
Gitea Actions workflows are in `.gitea/workflows/`. The build pipeline:
|
||||
Two Gitea Actions workflows in `.gitea/workflows/`:
|
||||
|
||||
1. **Build sidecar** — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
|
||||
2. **Build Tauri app** — Bundles the sidecar via `externalBin`, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)
|
||||
**`release.yml`** — Triggers on push to main:
|
||||
1. Bumps app version (patch), creates git tag and Gitea release
|
||||
2. Builds lightweight app installers for all platforms (no sidecar bundled)
|
||||
|
||||
**`build-sidecar.yml`** — Triggers on changes to `python/` or manual dispatch:
|
||||
1. Bumps sidecar version, creates `sidecar-v*` tag and release
|
||||
2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
|
||||
3. Uploads as separate release assets
|
||||
|
||||
#### Required Secrets
|
||||
|
||||
| Secret | Purpose | Required? |
|
||||
|--------|---------|-----------|
|
||||
| `TAURI_SIGNING_PRIVATE_KEY` | Signs Tauri update bundles | Optional (for auto-updates) |
|
||||
|
||||
No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.
|
||||
| Secret | Purpose |
|
||||
|--------|---------|
|
||||
| `BUILD_TOKEN` | Gitea API token for creating releases and pushing tags |
|
||||
|
||||
### Project Structure
|
||||
|
||||
```
|
||||
src/ # Svelte 5 frontend
|
||||
src-tauri/ # Rust backend (Tauri commands, sidecar manager, SQLite)
|
||||
python/ # Python sidecar (transcription, diarization, AI)
|
||||
voice_to_notes/ # Python package
|
||||
build_sidecar.py # PyInstaller build script
|
||||
voice_to_notes.spec # PyInstaller spec
|
||||
.gitea/workflows/ # Gitea Actions CI/CD
|
||||
src/ # Svelte 5 frontend
|
||||
lib/components/ # UI components (waveform, transcript editor, settings, etc.)
|
||||
lib/stores/ # Svelte stores (settings, transcript state)
|
||||
routes/ # SvelteKit pages
|
||||
src-tauri/ # Rust backend
|
||||
src/sidecar/ # Sidecar process manager (download, extract, IPC)
|
||||
src/commands/ # Tauri command handlers
|
||||
nsis-hooks.nsh # Windows uninstall cleanup
|
||||
python/ # Python sidecar
|
||||
voice_to_notes/ # Python package (transcription, diarization, AI, export)
|
||||
build_sidecar.py # PyInstaller build script
|
||||
voice_to_notes.spec # PyInstaller spec
|
||||
.gitea/workflows/ # CI/CD (release.yml, build-sidecar.yml)
|
||||
docs/ # Documentation
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
Reference in New Issue
Block a user