Update README, add User Guide and Contributing docs
- README: Updated to reflect current architecture (decoupled app/sidecar), Ollama as local AI, CUDA support, split CI workflows - USER_GUIDE.md: Complete how-to including first-time setup, transcription workflow, speaker detection setup, Ollama configuration, export formats, keyboard shortcuts, and troubleshooting - CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
140
CONTRIBUTING.md
Normal file
140
CONTRIBUTING.md
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
# Contributing to Voice to Notes
|
||||||
|
|
||||||
|
Thank you for your interest in contributing! This guide covers how to set up the project for development and submit changes.
|
||||||
|
|
||||||
|
## Development Setup
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- **Node.js 20+** and npm
|
||||||
|
- **Rust** (stable toolchain)
|
||||||
|
- **Python 3.11+** with [uv](https://docs.astral.sh/uv/) (recommended) or pip
|
||||||
|
- **System libraries (Linux only):**
|
||||||
|
```bash
|
||||||
|
sudo apt install libgtk-3-dev libwebkit2gtk-4.1-dev libappindicator3-dev librsvg2-dev patchelf xdg-utils
|
||||||
|
```
|
||||||
|
|
||||||
|
### Clone and Install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://repo.anhonesthost.net/MacroPad/voice-to-notes.git
|
||||||
|
cd voice-to-notes
|
||||||
|
|
||||||
|
# Frontend
|
||||||
|
npm install
|
||||||
|
|
||||||
|
# Python sidecar
|
||||||
|
cd python && pip install -e ".[dev]" && cd ..
|
||||||
|
```
|
||||||
|
|
||||||
|
### Running in Dev Mode
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run tauri:dev
|
||||||
|
```
|
||||||
|
|
||||||
|
This runs the Svelte dev server + Tauri with hot-reload. The Python sidecar runs from your system Python (no PyInstaller needed in dev mode).
|
||||||
|
|
||||||
|
### Building
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build the Python sidecar (frozen binary)
|
||||||
|
cd python && python build_sidecar.py --cpu-only && cd ..
|
||||||
|
|
||||||
|
# Build the full app
|
||||||
|
npm run tauri build
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
src/ # Svelte 5 frontend
|
||||||
|
lib/components/ # Reusable UI components
|
||||||
|
lib/stores/ # Svelte stores (app state)
|
||||||
|
routes/ # SvelteKit pages
|
||||||
|
src-tauri/ # Rust backend (Tauri v2)
|
||||||
|
src/sidecar/ # Python sidecar lifecycle (download, extract, IPC)
|
||||||
|
src/commands/ # Tauri command handlers
|
||||||
|
src/db/ # SQLite database layer
|
||||||
|
python/ # Python ML sidecar
|
||||||
|
voice_to_notes/ # Main package
|
||||||
|
services/ # Transcription, diarization, AI, export
|
||||||
|
ipc/ # JSON-line IPC protocol
|
||||||
|
hardware/ # GPU/CPU detection
|
||||||
|
.gitea/workflows/ # CI/CD pipelines
|
||||||
|
docs/ # Documentation
|
||||||
|
```
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
The app has three layers:
|
||||||
|
|
||||||
|
1. **Frontend (Svelte)** — UI, audio playback (wavesurfer.js), transcript editing (TipTap)
|
||||||
|
2. **Backend (Rust/Tauri)** — Desktop integration, file access, SQLite, sidecar process management
|
||||||
|
3. **Sidecar (Python)** — ML inference (faster-whisper, pyannote.audio), AI chat, export
|
||||||
|
|
||||||
|
Rust and Python communicate via **JSON-line IPC** over stdin/stdout pipes. Each request has an `id`, `type`, and `payload`. The Python sidecar runs as a child process managed by `SidecarManager` in Rust.
|
||||||
|
|
||||||
|
## Conventions
|
||||||
|
|
||||||
|
### Rust
|
||||||
|
- Follow standard Rust conventions
|
||||||
|
- Run `cargo fmt` and `cargo clippy` before committing
|
||||||
|
- Tauri commands go in `src-tauri/src/commands/`
|
||||||
|
|
||||||
|
### Python
|
||||||
|
- Python 3.11+, type hints everywhere
|
||||||
|
- Use `ruff` for linting: `ruff check python/`
|
||||||
|
- Tests with pytest: `cd python && pytest`
|
||||||
|
- IPC messages: JSON-line format with `id`, `type`, `payload` fields
|
||||||
|
|
||||||
|
### TypeScript / Svelte
|
||||||
|
- Svelte 5 runes (`$state`, `$derived`, `$effect`)
|
||||||
|
- Strict TypeScript
|
||||||
|
- Components in `src/lib/components/`
|
||||||
|
- State in `src/lib/stores/`
|
||||||
|
|
||||||
|
### General
|
||||||
|
- All timestamps in milliseconds (integer)
|
||||||
|
- UUIDs as primary keys in the database
|
||||||
|
- Don't bundle API keys or secrets — those are user-configured
|
||||||
|
|
||||||
|
## Submitting Changes
|
||||||
|
|
||||||
|
1. Fork the repository
|
||||||
|
2. Create a feature branch: `git checkout -b my-feature`
|
||||||
|
3. Make your changes
|
||||||
|
4. Test locally with `npm run tauri:dev`
|
||||||
|
5. Run linters: `cargo fmt && cargo clippy`, `ruff check python/`
|
||||||
|
6. Commit with a clear message describing the change
|
||||||
|
7. Open a Pull Request against `main`
|
||||||
|
|
||||||
|
## CI/CD
|
||||||
|
|
||||||
|
Pushes to `main` automatically:
|
||||||
|
- Bump the app version and create a release (`release.yml`)
|
||||||
|
- Build app installers for all platforms
|
||||||
|
|
||||||
|
Changes to `python/` also trigger sidecar builds (`build-sidecar.yml`).
|
||||||
|
|
||||||
|
## Areas for Contribution
|
||||||
|
|
||||||
|
- UI/UX improvements
|
||||||
|
- New export formats
|
||||||
|
- Additional AI provider integrations
|
||||||
|
- Performance optimizations
|
||||||
|
- Accessibility improvements
|
||||||
|
- Documentation and translations
|
||||||
|
- Bug reports and testing on different platforms
|
||||||
|
|
||||||
|
## Reporting Issues
|
||||||
|
|
||||||
|
Open an issue on the [repository](https://repo.anhonesthost.net/MacroPad/voice-to-notes/issues) with:
|
||||||
|
- Steps to reproduce
|
||||||
|
- Expected vs actual behavior
|
||||||
|
- Platform and version info
|
||||||
|
- Sidecar logs (`%LOCALAPPDATA%\com.voicetonotes.app\sidecar.log` on Windows)
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
By contributing, you agree that your contributions will be licensed under the [MIT License](LICENSE).
|
||||||
117
README.md
117
README.md
@@ -1,32 +1,55 @@
|
|||||||
# Voice to Notes
|
# Voice to Notes
|
||||||
|
|
||||||
A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.
|
A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **Speech-to-Text Transcription** — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
|
- **Speech-to-Text** — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
|
||||||
- **Speaker Identification (Diarization)** — Detect and distinguish between speakers using pyannote.audio
|
- **Speaker Identification** — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
|
||||||
- **Synchronized Playback** — Click any word to seek to that point in the audio (Web Audio API for instant playback)
|
- **GPU Acceleration** — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
|
||||||
- **AI Integration** — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
|
- **Synchronized Playback** — Click any word to seek. Waveform visualization via wavesurfer.js.
|
||||||
- **Export Formats** — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
|
- **AI Chat** — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
|
||||||
- **Cross-Platform** — Builds for Linux, Windows, and macOS (Apple Silicon)
|
- **Export** — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
|
||||||
|
- **Cross-Platform** — Linux, Windows, macOS (Apple Silicon).
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
1. Download the installer from [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases)
|
||||||
|
2. On first launch, choose **CPU** or **CUDA** sidecar (the AI engine downloads separately, ~500MB–2GB)
|
||||||
|
3. Import an audio/video file and click **Transcribe**
|
||||||
|
|
||||||
|
See the full [User Guide](docs/USER_GUIDE.md) for detailed setup and usage instructions.
|
||||||
|
|
||||||
## Platform Support
|
## Platform Support
|
||||||
|
|
||||||
| Platform | Architecture | Status |
|
| Platform | Architecture | Installers |
|
||||||
|----------|-------------|--------|
|
|----------|-------------|------------|
|
||||||
| Linux | x86_64 | Supported |
|
| Linux | x86_64 | .deb, .rpm |
|
||||||
| Windows | x86_64 | Supported |
|
| Windows | x86_64 | .msi, .exe (NSIS) |
|
||||||
| macOS | ARM (Apple Silicon) | Supported |
|
| macOS | ARM (Apple Silicon) | .dmg |
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
The app is split into two independently versioned components:
|
||||||
|
|
||||||
|
- **App** (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
|
||||||
|
- **Sidecar** (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.
|
||||||
|
|
||||||
|
This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.
|
||||||
|
|
||||||
## Tech Stack
|
## Tech Stack
|
||||||
|
|
||||||
- **Desktop shell:** Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
|
| Component | Technology |
|
||||||
- **ML pipeline:** Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
|
|-----------|-----------|
|
||||||
- **Audio playback:** wavesurfer.js with Web Audio API backend
|
| Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) |
|
||||||
- **AI providers:** OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
|
| Transcription | faster-whisper (CTranslate2) |
|
||||||
- **Local AI:** Bundled llama-server (llama.cpp)
|
| Speaker ID | pyannote.audio 3.1 |
|
||||||
- **Caption export:** pysubs2
|
| Audio UI | wavesurfer.js |
|
||||||
|
| Transcript editor | TipTap (ProseMirror) |
|
||||||
|
| AI (local) | Ollama (any model) |
|
||||||
|
| AI (cloud) | OpenAI, Anthropic, OpenAI-compatible |
|
||||||
|
| Caption export | pysubs2 |
|
||||||
|
| Database | SQLite (rusqlite) |
|
||||||
|
|
||||||
## Development
|
## Development
|
||||||
|
|
||||||
@@ -34,8 +57,8 @@ A desktop application that transcribes audio/video recordings with speaker ident
|
|||||||
|
|
||||||
- Node.js 20+
|
- Node.js 20+
|
||||||
- Rust (stable)
|
- Rust (stable)
|
||||||
- Python 3.11+ with ML dependencies
|
- Python 3.11+ with uv or pip
|
||||||
- System: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev` (Linux)
|
- Linux: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`
|
||||||
|
|
||||||
### Getting Started
|
### Getting Started
|
||||||
|
|
||||||
@@ -44,47 +67,61 @@ A desktop application that transcribes audio/video recordings with speaker ident
|
|||||||
npm install
|
npm install
|
||||||
|
|
||||||
# Install Python sidecar dependencies
|
# Install Python sidecar dependencies
|
||||||
cd python && pip install -e . && cd ..
|
cd python && pip install -e ".[dev]" && cd ..
|
||||||
|
|
||||||
# Run in dev mode (uses system Python for the sidecar)
|
# Run in dev mode (uses system Python for the sidecar)
|
||||||
npm run tauri:dev
|
npm run tauri:dev
|
||||||
```
|
```
|
||||||
|
|
||||||
### Building for Distribution
|
### Building
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Build the frozen Python sidecar
|
# Build the frozen Python sidecar (CPU-only)
|
||||||
npm run sidecar:build
|
cd python && python build_sidecar.py --cpu-only && cd ..
|
||||||
|
|
||||||
# Build the Tauri app (requires sidecar in src-tauri/binaries/)
|
# Build with CUDA support
|
||||||
|
cd python && python build_sidecar.py --with-cuda && cd ..
|
||||||
|
|
||||||
|
# Build the Tauri app
|
||||||
npm run tauri build
|
npm run tauri build
|
||||||
```
|
```
|
||||||
|
|
||||||
### CI/CD
|
### CI/CD
|
||||||
|
|
||||||
Gitea Actions workflows are in `.gitea/workflows/`. The build pipeline:
|
Two Gitea Actions workflows in `.gitea/workflows/`:
|
||||||
|
|
||||||
1. **Build sidecar** — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
|
**`release.yml`** — Triggers on push to main:
|
||||||
2. **Build Tauri app** — Bundles the sidecar via `externalBin`, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)
|
1. Bumps app version (patch), creates git tag and Gitea release
|
||||||
|
2. Builds lightweight app installers for all platforms (no sidecar bundled)
|
||||||
|
|
||||||
|
**`build-sidecar.yml`** — Triggers on changes to `python/` or manual dispatch:
|
||||||
|
1. Bumps sidecar version, creates `sidecar-v*` tag and release
|
||||||
|
2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
|
||||||
|
3. Uploads as separate release assets
|
||||||
|
|
||||||
#### Required Secrets
|
#### Required Secrets
|
||||||
|
|
||||||
| Secret | Purpose | Required? |
|
| Secret | Purpose |
|
||||||
|--------|---------|-----------|
|
|--------|---------|
|
||||||
| `TAURI_SIGNING_PRIVATE_KEY` | Signs Tauri update bundles | Optional (for auto-updates) |
|
| `BUILD_TOKEN` | Gitea API token for creating releases and pushing tags |
|
||||||
|
|
||||||
No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.
|
|
||||||
|
|
||||||
### Project Structure
|
### Project Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
src/ # Svelte 5 frontend
|
src/ # Svelte 5 frontend
|
||||||
src-tauri/ # Rust backend (Tauri commands, sidecar manager, SQLite)
|
lib/components/ # UI components (waveform, transcript editor, settings, etc.)
|
||||||
python/ # Python sidecar (transcription, diarization, AI)
|
lib/stores/ # Svelte stores (settings, transcript state)
|
||||||
voice_to_notes/ # Python package
|
routes/ # SvelteKit pages
|
||||||
build_sidecar.py # PyInstaller build script
|
src-tauri/ # Rust backend
|
||||||
voice_to_notes.spec # PyInstaller spec
|
src/sidecar/ # Sidecar process manager (download, extract, IPC)
|
||||||
.gitea/workflows/ # Gitea Actions CI/CD
|
src/commands/ # Tauri command handlers
|
||||||
|
nsis-hooks.nsh # Windows uninstall cleanup
|
||||||
|
python/ # Python sidecar
|
||||||
|
voice_to_notes/ # Python package (transcription, diarization, AI, export)
|
||||||
|
build_sidecar.py # PyInstaller build script
|
||||||
|
voice_to_notes.spec # PyInstaller spec
|
||||||
|
.gitea/workflows/ # CI/CD (release.yml, build-sidecar.yml)
|
||||||
|
docs/ # Documentation
|
||||||
```
|
```
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|||||||
203
docs/USER_GUIDE.md
Normal file
203
docs/USER_GUIDE.md
Normal file
@@ -0,0 +1,203 @@
|
|||||||
|
# Voice to Notes — User Guide
|
||||||
|
|
||||||
|
## Getting Started
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
Download the installer for your platform from the [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases) page:
|
||||||
|
|
||||||
|
- **Windows:** `.msi` or `-setup.exe`
|
||||||
|
- **Linux:** `.deb` or `.rpm`
|
||||||
|
- **macOS:** `.dmg`
|
||||||
|
|
||||||
|
### First-Time Setup
|
||||||
|
|
||||||
|
On first launch, Voice to Notes will prompt you to download its AI engine (the "sidecar"):
|
||||||
|
|
||||||
|
1. Choose **Standard (CPU)** (~500 MB) or **GPU Accelerated (CUDA)** (~2 GB)
|
||||||
|
- Choose CUDA if you have an NVIDIA GPU for significantly faster transcription
|
||||||
|
- CPU works on all computers
|
||||||
|
2. Click **Download & Install** and wait for the download to complete
|
||||||
|
3. The app will proceed to the main interface once the sidecar is ready
|
||||||
|
|
||||||
|
The sidecar only needs to be downloaded once. Updates are detected automatically on launch.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Basic Workflow
|
||||||
|
|
||||||
|
### 1. Import Audio
|
||||||
|
|
||||||
|
- Click **Import Audio** or press **Ctrl+O** (Cmd+O on Mac)
|
||||||
|
- Supported formats: MP3, WAV, FLAC, OGG, M4A, AAC, WMA, MP4, MKV, AVI, MOV, WebM
|
||||||
|
|
||||||
|
### 2. Transcribe
|
||||||
|
|
||||||
|
After importing, click **Transcribe** to start the transcription pipeline:
|
||||||
|
|
||||||
|
- **Transcription:** Converts speech to text with word-level timestamps
|
||||||
|
- **Speaker Detection:** Identifies different speakers (if configured — see [Speaker Detection](#speaker-detection))
|
||||||
|
- A progress bar shows the current stage and percentage
|
||||||
|
|
||||||
|
### 3. Review and Edit
|
||||||
|
|
||||||
|
- The **waveform** displays at the top — click anywhere to seek
|
||||||
|
- The **transcript** shows below with speaker labels and timestamps
|
||||||
|
- **Click any word** in the transcript to jump to that point in the audio
|
||||||
|
- The current word highlights during playback
|
||||||
|
- **Edit text** directly in the transcript — word timings are preserved
|
||||||
|
|
||||||
|
### 4. Export
|
||||||
|
|
||||||
|
Click **Export** and choose a format:
|
||||||
|
|
||||||
|
| Format | Extension | Best For |
|
||||||
|
|--------|-----------|----------|
|
||||||
|
| SRT | `.srt` | Video subtitles (most compatible) |
|
||||||
|
| WebVTT | `.vtt` | Web video players, HTML5 |
|
||||||
|
| ASS/SSA | `.ass` | Styled subtitles with speaker colors |
|
||||||
|
| Plain Text | `.txt` | Reading, sharing, pasting |
|
||||||
|
| Markdown | `.md` | Documentation, notes |
|
||||||
|
|
||||||
|
All formats include speaker labels when speaker detection is enabled.
|
||||||
|
|
||||||
|
### 5. Save Project
|
||||||
|
|
||||||
|
- **Ctrl+S** (Cmd+S) saves the current project as a `.vtn` file
|
||||||
|
- This preserves the full transcript, speaker assignments, and edits
|
||||||
|
- Reopen later to continue editing or re-export
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Playback Controls
|
||||||
|
|
||||||
|
| Action | Shortcut |
|
||||||
|
|--------|----------|
|
||||||
|
| Play / Pause | **Space** |
|
||||||
|
| Skip back 5s | **Left Arrow** |
|
||||||
|
| Skip forward 5s | **Right Arrow** |
|
||||||
|
| Seek to word | Click any word in the transcript |
|
||||||
|
| Import audio | **Ctrl+O** / **Cmd+O** |
|
||||||
|
| Open settings | **Ctrl+,** / **Cmd+,** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Speaker Detection
|
||||||
|
|
||||||
|
Speaker detection (diarization) identifies who is speaking at each point in the audio. It requires a one-time setup:
|
||||||
|
|
||||||
|
### Setup
|
||||||
|
|
||||||
|
1. Go to **Settings > Speakers**
|
||||||
|
2. Create a free account at [huggingface.co](https://huggingface.co/join)
|
||||||
|
3. Accept the license on **all three** model pages:
|
||||||
|
- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
|
||||||
|
- [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
|
||||||
|
- [pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1)
|
||||||
|
4. Create a token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) (read access is sufficient)
|
||||||
|
5. Paste the token in Settings and click **Test & Download Model**
|
||||||
|
|
||||||
|
### Speaker Options
|
||||||
|
|
||||||
|
- **Number of speakers:** Set to auto-detect or specify a fixed number for faster results
|
||||||
|
- **Skip speaker detection:** Check this to only transcribe without identifying speakers
|
||||||
|
|
||||||
|
### Managing Speakers
|
||||||
|
|
||||||
|
After transcription, speakers appear as "Speaker 1", "Speaker 2", etc. in the left sidebar. Double-click a speaker name to rename it — the new name appears throughout the transcript and in exports.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AI Chat
|
||||||
|
|
||||||
|
The AI chat panel lets you ask questions about your transcript. The AI sees the full transcript with speaker labels as context.
|
||||||
|
|
||||||
|
Example prompts:
|
||||||
|
- "Summarize this conversation"
|
||||||
|
- "What were the key action items?"
|
||||||
|
- "What did Speaker 1 say about the budget?"
|
||||||
|
|
||||||
|
### Setting Up Ollama (Local AI)
|
||||||
|
|
||||||
|
[Ollama](https://ollama.com) runs AI models locally on your computer — no API keys or internet required.
|
||||||
|
|
||||||
|
1. **Install Ollama:**
|
||||||
|
- Download from [ollama.com](https://ollama.com)
|
||||||
|
- Or on Linux: `curl -fsSL https://ollama.com/install.sh | sh`
|
||||||
|
|
||||||
|
2. **Pull a model:**
|
||||||
|
```bash
|
||||||
|
ollama pull llama3.2
|
||||||
|
```
|
||||||
|
Other good options: `mistral`, `gemma2`, `phi3`
|
||||||
|
|
||||||
|
3. **Configure in Voice to Notes:**
|
||||||
|
- Go to **Settings > AI Provider**
|
||||||
|
- Select **Ollama**
|
||||||
|
- URL: `http://localhost:11434` (default, usually no change needed)
|
||||||
|
- Model: `llama3.2` (or whichever model you pulled)
|
||||||
|
|
||||||
|
4. **Use:** Open the AI chat panel (right sidebar) and start asking questions
|
||||||
|
|
||||||
|
### Cloud AI Providers
|
||||||
|
|
||||||
|
If you prefer cloud-based AI:
|
||||||
|
|
||||||
|
**OpenAI:**
|
||||||
|
- Select **OpenAI** in Settings > AI Provider
|
||||||
|
- Enter your API key from [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
|
||||||
|
- Default model: `gpt-4o-mini`
|
||||||
|
|
||||||
|
**Anthropic:**
|
||||||
|
- Select **Anthropic** in Settings > AI Provider
|
||||||
|
- Enter your API key from [console.anthropic.com](https://console.anthropic.com)
|
||||||
|
- Default model: `claude-sonnet-4-6`
|
||||||
|
|
||||||
|
**OpenAI Compatible:**
|
||||||
|
- For any provider with an OpenAI-compatible API (vLLM, LiteLLM, etc.)
|
||||||
|
- Enter the API base URL, key, and model name
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Settings Reference
|
||||||
|
|
||||||
|
### Transcription
|
||||||
|
|
||||||
|
| Setting | Options | Default |
|
||||||
|
|---------|---------|---------|
|
||||||
|
| Whisper Model | tiny, base, small, medium, large-v3 | base |
|
||||||
|
| Device | CPU, CUDA | CPU |
|
||||||
|
| Language | Auto-detect, or specify (en, es, fr, etc.) | Auto-detect |
|
||||||
|
|
||||||
|
**Model recommendations:**
|
||||||
|
- **tiny/base:** Fast, good for clear audio with one speaker
|
||||||
|
- **small:** Best balance of speed and accuracy
|
||||||
|
- **medium:** Better accuracy, noticeably slower
|
||||||
|
- **large-v3:** Best accuracy, requires 8GB+ VRAM (GPU) or 16GB+ RAM (CPU)
|
||||||
|
|
||||||
|
### Debug
|
||||||
|
|
||||||
|
- **Enable Developer Tools:** Opens the browser inspector for debugging
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Transcription is slow
|
||||||
|
- Use a smaller model (tiny or base)
|
||||||
|
- If you have an NVIDIA GPU, select CUDA in Settings > Transcription > Device
|
||||||
|
- Ensure you downloaded the CUDA sidecar during setup
|
||||||
|
|
||||||
|
### Speaker detection not working
|
||||||
|
- Verify your HuggingFace token in Settings > Speakers
|
||||||
|
- Click "Test & Download Model" to re-download
|
||||||
|
- Make sure you accepted the license on all three model pages
|
||||||
|
|
||||||
|
### Audio won't play / No waveform
|
||||||
|
- Check that the audio file still exists at its original location
|
||||||
|
- Try re-importing the file
|
||||||
|
- Supported formats: MP3, WAV, FLAC, OGG, M4A, AAC, WMA
|
||||||
|
|
||||||
|
### App shows "Setting up Voice to Notes"
|
||||||
|
- This is the first-launch sidecar download — it only happens once
|
||||||
|
- If it fails, check your internet connection and click Retry
|
||||||
Reference in New Issue
Block a user