Files
voice-to-notes/README.md
Claude 35173c54ce
All checks were successful
Release / Bump version and tag (push) Successful in 9s
Release / Build App (macOS) (push) Successful in 1m17s
Release / Build App (Linux) (push) Successful in 4m50s
Release / Build App (Windows) (push) Successful in 3m21s
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:13 -07:00

130 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Voice to Notes
A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.
## Features
- **Speech-to-Text** — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
- **Speaker Identification** — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
- **GPU Acceleration** — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
- **Synchronized Playback** — Click any word to seek. Waveform visualization via wavesurfer.js.
- **AI Chat** — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
- **Export** — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
- **Cross-Platform** — Linux, Windows, macOS (Apple Silicon).
## Quick Start
1. Download the installer from [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases)
2. On first launch, choose **CPU** or **CUDA** sidecar (the AI engine downloads separately, ~500MB2GB)
3. Import an audio/video file and click **Transcribe**
See the full [User Guide](docs/USER_GUIDE.md) for detailed setup and usage instructions.
## Platform Support
| Platform | Architecture | Installers |
|----------|-------------|------------|
| Linux | x86_64 | .deb, .rpm |
| Windows | x86_64 | .msi, .exe (NSIS) |
| macOS | ARM (Apple Silicon) | .dmg |
## Architecture
The app is split into two independently versioned components:
- **App** (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
- **Sidecar** (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.
This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.
## Tech Stack
| Component | Technology |
|-----------|-----------|
| Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) |
| Transcription | faster-whisper (CTranslate2) |
| Speaker ID | pyannote.audio 3.1 |
| Audio UI | wavesurfer.js |
| Transcript editor | TipTap (ProseMirror) |
| AI (local) | Ollama (any model) |
| AI (cloud) | OpenAI, Anthropic, OpenAI-compatible |
| Caption export | pysubs2 |
| Database | SQLite (rusqlite) |
## Development
### Prerequisites
- Node.js 20+
- Rust (stable)
- Python 3.11+ with uv or pip
- Linux: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`
### Getting Started
```bash
# Install frontend dependencies
npm install
# Install Python sidecar dependencies
cd python && pip install -e ".[dev]" && cd ..
# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev
```
### Building
```bash
# Build the frozen Python sidecar (CPU-only)
cd python && python build_sidecar.py --cpu-only && cd ..
# Build with CUDA support
cd python && python build_sidecar.py --with-cuda && cd ..
# Build the Tauri app
npm run tauri build
```
### CI/CD
Two Gitea Actions workflows in `.gitea/workflows/`:
**`release.yml`** — Triggers on push to main:
1. Bumps app version (patch), creates git tag and Gitea release
2. Builds lightweight app installers for all platforms (no sidecar bundled)
**`build-sidecar.yml`** — Triggers on changes to `python/` or manual dispatch:
1. Bumps sidecar version, creates `sidecar-v*` tag and release
2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
3. Uploads as separate release assets
#### Required Secrets
| Secret | Purpose |
|--------|---------|
| `BUILD_TOKEN` | Gitea API token for creating releases and pushing tags |
### Project Structure
```
src/ # Svelte 5 frontend
lib/components/ # UI components (waveform, transcript editor, settings, etc.)
lib/stores/ # Svelte stores (settings, transcript state)
routes/ # SvelteKit pages
src-tauri/ # Rust backend
src/sidecar/ # Sidecar process manager (download, extract, IPC)
src/commands/ # Tauri command handlers
nsis-hooks.nsh # Windows uninstall cleanup
python/ # Python sidecar
voice_to_notes/ # Python package (transcription, diarization, AI, export)
build_sidecar.py # PyInstaller build script
voice_to_notes.spec # PyInstaller spec
.gitea/workflows/ # CI/CD (release.yml, build-sidecar.yml)
docs/ # Documentation
```
## License
MIT