README.md

# Voice to Notes

A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.

## Features

- **Speech-to-Text** — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
- **Speaker Identification** — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
- **GPU Acceleration** — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
- **Synchronized Playback** — Click any word to seek. Waveform visualization via wavesurfer.js.
- **AI Chat** — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
- **Export** — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
- **Cross-Platform** — Linux, Windows, macOS (Apple Silicon).

## Quick Start

1. Download the installer from [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases)
2. On first launch, choose **CPU** or **CUDA** sidecar (the AI engine downloads separately, ~500MB–2GB)
3. Import an audio/video file and click **Transcribe**

See the full [User Guide](docs/USER_GUIDE.md) for detailed setup and usage instructions.

## Platform Support

| Platform | Architecture | Installers |
|----------|-------------|------------|
| Linux    | x86_64      | .deb, .rpm |
| Windows  | x86_64      | .msi, .exe (NSIS) |
| macOS    | ARM (Apple Silicon) | .dmg |

## Architecture

The app is split into two independently versioned components:

- **App** (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
- **Sidecar** (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.

This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.

## Tech Stack

| Component | Technology |
|-----------|-----------|
| Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) |
| Transcription | faster-whisper (CTranslate2) |
| Speaker ID | pyannote.audio 3.1 |
| Audio UI | wavesurfer.js |
| Transcript editor | TipTap (ProseMirror) |
| AI (local) | Ollama (any model) |
| AI (cloud) | OpenAI, Anthropic, OpenAI-compatible |
| Caption export | pysubs2 |
| Database | SQLite (rusqlite) |

## Development

### Prerequisites

- Node.js 20+
- Rust (stable)
- Python 3.11+ with uv or pip
- Linux: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`

### Getting Started

```bash
# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e ".[dev]" && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev
```

### Building

```bash
# Build the frozen Python sidecar (CPU-only)
cd python && python build_sidecar.py --cpu-only && cd ..

# Build with CUDA support
cd python && python build_sidecar.py --with-cuda && cd ..

# Build the Tauri app
npm run tauri build
```

### CI/CD

Two Gitea Actions workflows in `.gitea/workflows/`:

**`release.yml`** — Triggers on push to main:
1. Bumps app version (patch), creates git tag and Gitea release
2. Builds lightweight app installers for all platforms (no sidecar bundled)

**`build-sidecar.yml`** — Triggers on changes to `python/` or manual dispatch:
1. Bumps sidecar version, creates `sidecar-v*` tag and release
2. Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
3. Uploads as separate release assets

#### Required Secrets

| Secret | Purpose |
|--------|---------|
| `BUILD_TOKEN` | Gitea API token for creating releases and pushing tags |

### Project Structure

```
src/                        # Svelte 5 frontend
  lib/components/           # UI components (waveform, transcript editor, settings, etc.)
  lib/stores/               # Svelte stores (settings, transcript state)
  routes/                   # SvelteKit pages
src-tauri/                  # Rust backend
  src/sidecar/              # Sidecar process manager (download, extract, IPC)
  src/commands/             # Tauri command handlers
  nsis-hooks.nsh            # Windows uninstall cleanup
python/                     # Python sidecar
  voice_to_notes/           # Python package (transcription, diarization, AI, export)
  build_sidecar.py          # PyInstaller build script
  voice_to_notes.spec       # PyInstaller spec
.gitea/workflows/           # CI/CD (release.yml, build-sidecar.yml)
docs/                       # Documentation
```

## License

MIT
-												Initial project setup with README and gitignore

Establish the voice-to-notes project with documentation covering
goals, platform targets, and planned feature set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-02-26 08:11:57 -08:00
+								# Voice to Notes
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.
-												Initial project setup with README and gitignore

Establish the voice-to-notes project with documentation covering
goals, platform targets, and planned feature set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-02-26 08:11:57 -08:00
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
+								## Features
-												Initial project setup with README and gitignore

Establish the voice-to-notes project with documentation covering
goals, platform targets, and planned feature set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-02-26 08:11:57 -08:00
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								- **Speech-to-Text** — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
 								- **Speaker Identification** — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
 								- **GPU Acceleration** — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
 								- **Synchronized Playback** — Click any word to seek. Waveform visualization via wavesurfer.js.
 								- **AI Chat** — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
 								- **Export** — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
 								- **Cross-Platform** — Linux, Windows, macOS (Apple Silicon).
 								## Quick Start
 . Download the installer from [Releases](https://repo.anhonesthost.net/MacroPad/voice-to-notes/releases)
 . On first launch, choose **CPU** or **CUDA** sidecar (the AI engine downloads separately, ~500MB–2GB)
 . Import an audio/video file and click **Transcribe**
 								See the full [User Guide](docs/USER_GUIDE.md) for detailed setup and usage instructions.
-												Initial project setup with README and gitignore

Establish the voice-to-notes project with documentation covering
goals, platform targets, and planned feature set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-02-26 08:11:57 -08:00
 								## Platform Support
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								| Platform | Architecture | Installers |
 								|----------|-------------|------------|
 								| Linux    | x86_64      | .deb, .rpm |
 								| Windows  | x86_64      | .msi, .exe (NSIS) |
 								| macOS    | ARM (Apple Silicon) | .dmg |
 								## Architecture
 								The app is split into two independently versioned components:
 								- **App** (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
 								- **Sidecar** (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.
 								This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.
-												Initial project setup with README and gitignore

Establish the voice-to-notes project with documentation covering
goals, platform targets, and planned feature set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-02-26 08:11:57 -08:00
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
+								## Tech Stack
-												Initial project setup with README and gitignore

Establish the voice-to-notes project with documentation covering
goals, platform targets, and planned feature set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-02-26 08:11:57 -08:00
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								| Component | Technology |
 								|-----------|-----------|
 								| Desktop shell | Tauri v2 (Rust + Svelte 5 / TypeScript) |
 								| Transcription | faster-whisper (CTranslate2) |
 								| Speaker ID | pyannote.audio 3.1 |
 								| Audio UI | wavesurfer.js |
 								| Transcript editor | TipTap (ProseMirror) |
 								| AI (local) | Ollama (any model) |
 								| AI (cloud) | OpenAI, Anthropic, OpenAI-compatible |
 								| Caption export | pysubs2 |
 								| Database | SQLite (rusqlite) |
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
 								## Development
 								### Prerequisites
 								- Node.js 20+
 								- Rust (stable)
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								- Python 3.11+ with uv or pip
 								- Linux: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
 								### Getting Started
 								```bash
 								# Install frontend dependencies
 								npm install
 								# Install Python sidecar dependencies
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								cd python && pip install -e ".[dev]" && cd ..
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
 								# Run in dev mode (uses system Python for the sidecar)
 								npm run tauri:dev
 								```
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								### Building
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
 								```bash
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								# Build the frozen Python sidecar (CPU-only)
 								cd python && python build_sidecar.py --cpu-only && cd ..
 								# Build with CUDA support
 								cd python && python build_sidecar.py --with-cuda && cd ..
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								# Build the Tauri app
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
+								npm run tauri build
 								```
 								### CI/CD
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								Two Gitea Actions workflows in `.gitea/workflows/`:
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								**`release.yml`** — Triggers on push to main:
 . Bumps app version (patch), creates git tag and Gitea release
 . Builds lightweight app installers for all platforms (no sidecar bundled)
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								**`build-sidecar.yml`** — Triggers on changes to `python/` or manual dispatch:
 . Bumps sidecar version, creates `sidecar-v*` tag and release
 . Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
 . Uploads as separate release assets
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								#### Required Secrets
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								| Secret | Purpose |
 								|--------|---------|
 								| `BUILD_TOKEN` | Gitea API token for creating releases and pushing tags |
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
 								### Project Structure
 								```
-												Update README, add User Guide and Contributing docs

- README: Updated to reflect current architecture (decoupled app/sidecar),
  Ollama as local AI, CUDA support, split CI workflows
- USER_GUIDE.md: Complete how-to including first-time setup, transcription
  workflow, speaker detection setup, Ollama configuration, export formats,
  keyboard shortcuts, and troubleshooting
- CONTRIBUTING.md: Dev setup, project structure, conventions, CI/CD overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-03-22 12:06:10 -07:00
+								src/                        # Svelte 5 frontend
 								  lib/components/           # UI components (waveform, transcript editor, settings, etc.)
 								  lib/stores/               # Svelte stores (settings, transcript state)
 								  routes/                   # SvelteKit pages
 								src-tauri/                  # Rust backend
 								  src/sidecar/              # Sidecar process manager (download, extract, IPC)
 								  src/commands/             # Tauri command handlers
 								  nsis-hooks.nsh            # Windows uninstall cleanup
 								python/                     # Python sidecar
 								  voice_to_notes/           # Python package (transcription, diarization, AI, export)
 								  build_sidecar.py          # PyInstaller build script
 								  voice_to_notes.spec       # PyInstaller spec
 								.gitea/workflows/           # CI/CD (release.yml, build-sidecar.yml)
 								docs/                       # Documentation
-												Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-20 21:33:43 -07:00
+								```
-												Initial project setup with README and gitignore

Establish the voice-to-notes project with documentation covering
goals, platform targets, and planned feature set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-02-26 08:11:57 -08:00
 								## License
-												Switch local AI from Ollama to bundled llama-server, add MIT license

- Replace Ollama dependency with bundled llama-server (llama.cpp)
  so users need no separate install for local AI inference
- Rust backend manages llama-server lifecycle (spawn, port, shutdown)
- Add MIT license for open source release
- Update architecture doc, CLAUDE.md, and README accordingly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-02-26 09:00:47 -08:00
+								MIT