Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:33:43 -07:00
parent 42ccd3e21d
commit 58faa83cb3
27 changed files with 1301 additions and 283 deletions
--- a/README.md
+++ b/README.md
@@ -2,28 +2,90 @@

 A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.

-## Goals
+## Features

- **Speech-to-Text Transcription** — Accurately convert spoken audio from recordings into text
- **Speaker Identification (Diarization)** — Detect and distinguish between different speakers in a conversation
- **Speaker Naming** — Assign and persist speaker names/IDs across the transcription
- **Synchronized Playback** — Click any transcribed text segment to play back the corresponding audio for review and correction
- **Export Formats**
-  - Closed captioning files (SRT, VTT) for video
-  - Plain text documents with speaker labels
- **AI Integration** — Connect to AI providers to ask questions about the conversation and generate condensed notes/summaries
+- **Speech-to-Text Transcription** — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
+- **Speaker Identification (Diarization)** — Detect and distinguish between speakers using pyannote.audio
+- **Synchronized Playback** — Click any word to seek to that point in the audio (Web Audio API for instant playback)
+- **AI Integration** — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
+- **Export Formats** — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
+- **Cross-Platform** — Builds for Linux, Windows, and macOS (Apple Silicon)

 ## Platform Support

-| Platform | Status |
-|----------|--------|
-| Linux    | Planned (initial target) |
-| Windows  | Planned (initial target) |
-| macOS    | Future (pending hardware) |
+| Platform | Architecture | Status |
+|----------|-------------|--------|
+| Linux    | x86_64      | Supported |
+| Windows  | x86_64      | Supported |
+| macOS    | ARM (Apple Silicon) | Supported |

-## Project Status
+## Tech Stack

-**Early planning phase** — Architecture and technology decisions in progress.
+- **Desktop shell:** Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
+- **ML pipeline:** Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
+- **Audio playback:** wavesurfer.js with Web Audio API backend
+- **AI providers:** OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
+- **Local AI:** Bundled llama-server (llama.cpp)
+- **Caption export:** pysubs2
+
+## Development
+
+### Prerequisites
+
+- Node.js 20+
+- Rust (stable)
+- Python 3.11+ with ML dependencies
+- System: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev` (Linux)
+
+### Getting Started
+
+```bash
+# Install frontend dependencies
+npm install
+
+# Install Python sidecar dependencies
+cd python && pip install -e . && cd ..
+
+# Run in dev mode (uses system Python for the sidecar)
+npm run tauri:dev
+```
+
+### Building for Distribution
+
+```bash
+# Build the frozen Python sidecar
+npm run sidecar:build
+
+# Build the Tauri app (requires sidecar in src-tauri/binaries/)
+npm run tauri build
+```
+
+### CI/CD
+
+Gitea Actions workflows are in `.gitea/workflows/`. The build pipeline:
+
+1. **Build sidecar** — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
+2. **Build Tauri app** — Bundles the sidecar via `externalBin`, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)
+
+#### Required Secrets
+
+| Secret | Purpose | Required? |
+|--------|---------|-----------|
+| `TAURI_SIGNING_PRIVATE_KEY` | Signs Tauri update bundles | Optional (for auto-updates) |
+
+No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.
+
+### Project Structure
+
+```
+src/                    # Svelte 5 frontend
+src-tauri/              # Rust backend (Tauri commands, sidecar manager, SQLite)
+python/                 # Python sidecar (transcription, diarization, AI)
+  voice_to_notes/       # Python package
+  build_sidecar.py      # PyInstaller build script
+  voice_to_notes.spec   # PyInstaller spec
+.gitea/workflows/       # Gitea Actions CI/CD
+```

 ## License