2026-02-26 08:11:57 -08:00
# Voice to Notes
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.
2026-02-26 08:11:57 -08:00
2026-03-20 21:33:43 -07:00
## Features
2026-02-26 08:11:57 -08:00
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
- **Speech-to-Text** — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
- **Speaker Identification** — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
- **GPU Acceleration** — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
- **Synchronized Playback** — Click any word to seek. Waveform visualization via wavesurfer.js.
- **AI Chat** — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
- **Export** — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
- **Cross-Platform** — Linux, Windows, macOS (Apple Silicon).
## Quick Start
1. Download the installer from [Releases ](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases )
2. On first launch, choose **CPU ** or **CUDA ** sidecar (the AI engine downloads separately, ~500MB– 2GB)
3. Import an audio/video file and click **Transcribe **
See the full [User Guide ](docs/USER_GUIDE.md ) for detailed setup and usage instructions.
2026-02-26 08:11:57 -08:00
## Platform Support
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
| Platform | Architecture | Installers |
|----------|-------------|------------|
| Linux | x86_64 | .deb, .rpm |
| Windows | x86_64 | .msi, .exe (NSIS) |
| macOS | ARM (Apple Silicon) | .dmg |
## Architecture
The app is split into two independently versioned components:
- **App** (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
- **Sidecar** (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.
This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.
2026-02-26 08:11:57 -08:00
2026-03-20 21:33:43 -07:00
## Tech Stack
2026-02-26 08:11:57 -08:00
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
| Component | Technology |
|-----------|-----------|
| Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) |
| Transcription | faster-whisper (CTranslate2) |
| Speaker ID | pyannote.audio 3.1 |
| Audio UI | wavesurfer.js |
| Transcript editor | TipTap (ProseMirror) |
| AI (local) | Ollama (any model) |
| AI (cloud) | OpenAI, Anthropic, OpenAI-compatible |
| Caption export | pysubs2 |
| Database | SQLite (rusqlite) |
2026-03-20 21:33:43 -07:00
## Development
### Prerequisites
- Node.js 20+
- Rust (stable)
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
- Python 3.11+ with uv or pip
- Linux: `libgtk-3-dev` , `libwebkit2gtk-4.1-dev` , `libappindicator3-dev` , `librsvg2-dev`
2026-03-20 21:33:43 -07:00
### Getting Started
```bash
# Install frontend dependencies
npm install
# Install Python sidecar dependencies
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
cd python && pip install -e ".[dev]" && cd ..
2026-03-20 21:33:43 -07:00
# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev
```
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
### Building
2026-03-20 21:33:43 -07:00
```bash
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
# Build the frozen Python sidecar (CPU-only)
cd python && python build_sidecar.py --cpu-only && cd ..
# Build with CUDA support
cd python && python build_sidecar.py --with-cuda && cd ..
2026-03-20 21:33:43 -07:00
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
# Build the Tauri app
2026-03-20 21:33:43 -07:00
npm run tauri build
```
### CI/CD
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
Two Gitea Actions workflows in `.gitea/workflows/` :
2026-03-20 21:33:43 -07:00
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
**`release.yml` ** — Triggers on push to main:
1. Bumps app version (patch), creates git tag and Gitea release
2. Builds lightweight app installers for all platforms (no sidecar bundled)
2026-03-20 21:33:43 -07:00
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
**`build-sidecar.yml` ** — Triggers on changes to `python/` or manual dispatch:
1. Bumps sidecar version, creates `sidecar-v*` tag and release
2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
3. Uploads as separate release assets
2026-03-20 21:33:43 -07:00
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
#### Required Secrets
2026-03-20 21:33:43 -07:00
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
| Secret | Purpose |
|--------|---------|
| `BUILD_TOKEN` | Gitea API token for creating releases and pushing tags |
2026-03-20 21:33:43 -07:00
### Project Structure
```
Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar),
Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
workflow, speaker detection setup, Ollama configuration, export formats,
keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 12:06:10 -07:00
src/ # Svelte 5 frontend
lib/components/ # UI components (waveform, transcript editor, settings, etc.)
lib/stores/ # Svelte stores (settings, transcript state)
routes/ # SvelteKit pages
src-tauri/ # Rust backend
src/sidecar/ # Sidecar process manager (download, extract, IPC)
src/commands/ # Tauri command handlers
nsis-hooks.nsh # Windows uninstall cleanup
python/ # Python sidecar
voice_to_notes/ # Python package (transcription, diarization, AI, export)
build_sidecar.py # PyInstaller build script
voice_to_notes.spec # PyInstaller spec
.gitea/workflows/ # CI/CD (release.yml, build-sidecar.yml)
docs/ # Documentation
2026-03-20 21:33:43 -07:00
```
2026-02-26 08:11:57 -08:00
## License
2026-02-26 09:00:47 -08:00
MIT