Update README and CLAUDE.md for Tauri rewrite

Update both docs to reflect the new architecture: - Tauri v2 + Svelte 5 frontend replacing PySide6/Qt - Headless Python backend with FastAPI control API - Cross-platform support (Windows, macOS, Linux) - Deepgram remote transcription (managed/BYOK) - Gitea CI/CD workflows for automated builds - New project structure with backend/, src/, src-tauri/ - Updated development commands and build instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:34:10 -07:00
parent 25d2a55efb
commit 47ca74e75d
2 changed files with 342 additions and 295 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -4,52 +4,108 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 ## Project Overview
-Local Transcription is a desktop application for real-time speech-to-text transcription designed for streamers. It uses Whisper models (via faster-whisper) to transcribe audio locally with optional multi-user server synchronization.
+Local Transcription is a cross-platform desktop application for real-time speech-to-text transcription designed for streamers. It supports local Whisper models and cloud-based Deepgram transcription, with OBS browser source integration and optional multi-user sync.
 **Architecture:** Two-process model — a Tauri v2 shell (Svelte 5 frontend) communicates with a headless Python backend (sidecar) via REST API and WebSocket.
 **Key Features:**
- Standalone desktop GUI (PySide6/Qt)
+- Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
- Local transcription with CPU/GPU support
+- Headless Python backend with FastAPI control API
- Built-in web server for OBS browser source integration
+- Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK)
- Optional Node.js-based multi-user server for syncing transcriptions across users
+- Built-in web server for OBS browser source at `http://localhost:8080`
- Noise suppression and Voice Activity Detection (VAD)
+- Optional multi-user sync via Node.js server
- Cross-platform builds (Linux/Windows) with PyInstaller
+- CUDA, MPS (Apple Silicon), and CPU support
 - Auto-updates, custom fonts, configurable colors
 > **Legacy GUI:** The original PySide6/Qt GUI (`main.py`, `gui/`) still works during the transition. New features should target the Tauri frontend and headless backend.
 ## Project Structure
 ```
 local-transcription/
-├── client/                   # Core transcription logic
+├── src/                             # Svelte 5 frontend (Tauri UI)
-│   ├── audio_capture.py      # Audio input and buffering
+│   ├── App.svelte                   # Main app shell
-│   ├── transcription_engine.py # Whisper model integration
+│   ├── app.css                      # Global dark theme styles
-│   ├── noise_suppression.py  # VAD and noise reduction
+│   ├── main.ts                      # Svelte mount point
-│   ├── device_utils.py       # CPU/GPU device management
+│   ├── lib/components/              # UI components
-│   ├── config.py             # Configuration management
+│   │   ├── Header.svelte            # Title bar + settings button
-│   └── server_sync.py        # Multi-user server sync client
+│   │   ├── StatusBar.svelte         # State indicator, device, user info
-├── gui/                      # Desktop application UI
+│   │   ├── Controls.svelte          # Start/Stop, Clear, Save buttons
-│   ├── main_window_qt.py     # Main application window (PySide6)
+│   │   ├── TranscriptionDisplay.svelte  # Scrolling transcript view
-│   ├── settings_dialog_qt.py # Settings dialog (PySide6)
+│   │   └── Settings.svelte          # Full settings modal (all sections)
-│   └── transcription_display_qt.py # Display widget
+│   └── lib/stores/                  # Svelte 5 reactive stores ($state/$derived)
-├── server/                   # Web display servers
+│       ├── backend.ts               # WebSocket + REST API client
-│   ├── web_display.py        # FastAPI server for OBS browser source (local)
+│       ├── config.ts                # App configuration fetch/update
-│   └── nodejs/               # Optional multi-user Node.js server
+│       └── transcriptions.ts        # Transcript data management
-│       ├── server.js         # Multi-user sync server with WebSocket
+├── src-tauri/                       # Tauri v2 Rust shell
-│       ├── package.json      # Node.js dependencies
+│   ├── src/lib.rs                   # Plugin registration (shell, dialog, process)
-│       └── README.md         # Server deployment documentation
+│   ├── src/main.rs                  # Entry point
-├── config/                   # Example configuration files
+│   ├── tauri.conf.json              # Window, bundle, plugin config
-│   └── default_config.yaml   # Default settings template
+│   └── Cargo.toml                   # Rust dependencies
-├── main.py                   # GUI application entry point
+├── backend/                         # Headless Python backend (the sidecar)
-├── main_cli.py              # CLI version for testing
+│   ├── app_controller.py            # Core orchestration (engine, sync, config)
-└── pyproject.toml           # Dependencies and build config
+│   ├── api_server.py                # FastAPI REST endpoints + /ws/control
 │   └── main_headless.py             # Headless entry point (prints JSON to stdout)
 ├── client/                          # Core transcription modules (used by backend)
 │   ├── audio_capture.py             # Audio input handling
 │   ├── transcription_engine_realtime.py  # RealtimeSTT / Whisper engine
 │   ├── deepgram_transcription.py    # Deepgram WebSocket cloud transcription
 │   ├── noise_suppression.py         # VAD and noise reduction
 │   ├── device_utils.py              # CPU/GPU/MPS detection
 │   ├── config.py                    # YAML config management (~/.local-transcription/)
 │   ├── server_sync.py               # Multi-user server sync client
 │   ├── instance_lock.py             # Single-instance PID lock
 │   └── update_checker.py            # Gitea release update checker
 ├── gui/                             # Legacy PySide6/Qt GUI (still functional)
 │   ├── main_window_qt.py            # Main window (orchestration lives here in legacy)
 │   ├── settings_dialog_qt.py        # Settings dialog
 │   └── transcription_display_qt.py  # Display widget
 ├── server/
 │   ├── web_display.py               # FastAPI OBS display server (WebSocket + HTML)
 │   └── nodejs/                      # Optional multi-user sync server
 ├── .gitea/workflows/                # CI/CD
 │   ├── release.yml                  # Tauri app builds (Linux/Windows/macOS)
 │   └── build-sidecar.yml            # Python sidecar builds (CUDA + CPU)
 ├── config/default_config.yaml       # Default settings template
 ├── main.py                          # Legacy PySide6 GUI entry point
 ├── main_cli.py                      # CLI version for testing
 ├── version.py                       # Version string (__version__)
 ├── local-transcription.spec         # PyInstaller config (legacy, includes PySide6)
 ├── local-transcription-headless.spec # PyInstaller config (headless sidecar, no Qt)
 ├── pyproject.toml                   # Python deps (uv, CUDA PyTorch index)
 ├── package.json                     # Node/Tauri deps
 └── vite.config.ts                   # Vite build config ($lib alias)
 ```
 ## Development Commands
-### Installation and Setup
+### Frontend (Tauri + Svelte)
 ```bash
-# Install dependencies (creates .venv automatically)
+# Install npm dependencies
 npm install
 # Run Tauri in development mode (hot-reload)
 npm run tauri dev
 # Build frontend only (for testing)
 npx vite build
 # Type-check Svelte
 npx svelte-check
 # Check Rust compiles
 cd src-tauri && cargo check
 ```
 ### Backend (Python)
 ```bash
 # Install Python dependencies
 uv sync
-# Run the GUI application
+# Run the headless backend standalone (for development)
 uv run python -m backend.main_headless --port 8080
 # Run the legacy PySide6 GUI
 uv run python main.py
 # Run CLI version (headless, for testing)
@@ -57,257 +113,154 @@ uv run python main_cli.py
 # List available audio devices
 uv run python main_cli.py --list-devices
 # Install with CUDA support (if needed)
 uv pip install torch --index-url https://download.pytorch.org/whl/cu121
 ```
-### Building Executables
+### Building
 ```bash
-# Linux (includes CUDA support - works on both GPU and CPU systems)
+# Build Tauri app (produces platform installer)
-./build.sh
+npm run tauri build
-# Windows (includes CUDA support - works on both GPU and CPU systems)
+# Build headless Python sidecar (no PySide6)
-build.bat
+uv run pyinstaller local-transcription-headless.spec
 # Output: dist/local-transcription-backend/
-# Manual build with PyInstaller
+# Build legacy PySide6 app
 uv sync                          # Install dependencies (includes CUDA PyTorch)
 uv pip uninstall -q enum34       # Remove incompatible enum34 package
 uv run pyinstaller local-transcription.spec
 # Or use: ./build.sh (Linux) / build.bat (Windows)
 ```
 **Important:** All builds include CUDA support via `pyproject.toml` configuration. CUDA builds can be created on systems without NVIDIA GPUs. The PyTorch CUDA runtime is bundled, and the app automatically falls back to CPU if no GPU is available.
 ### Testing
 ```bash
 # Run component tests
 uv run python test_components.py
 # Check CUDA availability
 uv run python check_cuda.py
 # Test web server manually
 uv run python -m uvicorn server.web_display:app --reload
 ```
-## Architecture
+## Architecture Details
-### Audio Processing Pipeline
+### Communication: Tauri <-> Python Backend
-1. **Audio Capture** ([client/audio_capture.py](client/audio_capture.py))
+The Svelte frontend connects to the Python backend via two channels:
   - Captures audio from microphone/system using sounddevice
   - Handles automatic sample rate detection and resampling
   - Uses chunking with overlap for better transcription quality
   - Default: 3-second chunks with 0.5s overlap
-2. **Noise Suppression** ([client/noise_suppression.py](client/noise_suppression.py))
+**REST API** (on port 8081 by default):
-   - Applies noisereduce for background noise reduction
+- `GET /api/status` — app state, device info, version
-   - Voice Activity Detection (VAD) using webrtcvad
+- `POST /api/start` / `POST /api/stop` — transcription control
-   - Skips silent segments to improve performance
+- `GET /api/config` / `PUT /api/config` — read/write settings (dot-notation keys)
 - `GET /api/audio-devices` / `GET /api/compute-devices` — device enumeration
 - `POST /api/reload-engine` — reload with new model/device
 - `GET /api/transcriptions` / `POST /api/clear` — transcript management
 - `POST /api/save-file` — write text to a file path
 - `GET /api/check-update` / `POST /api/skip-version` — update management
 - `POST /api/login` / `POST /api/register` / `GET /api/balance` — managed mode proxy
-3. **Transcription** ([client/transcription_engine.py](client/transcription_engine.py))
+**WebSocket** `/ws/control`:
-   - Uses faster-whisper for efficient inference
+- Pushes real-time events: `state_changed`, `transcription`, `preview`, `error`, `credits_low`
-   - Supports CPU, CUDA, and Apple MPS (Mac)
+- Client sends keepalive pings
   - Models: tiny, base, small, medium, large
   - Thread-safe model loading with locks
-4. **Display** ([gui/main_window_qt.py](gui/main_window_qt.py))
+The OBS display server runs separately on port 8080 (`GET /` for HTML, `WebSocket /ws` for transcriptions).
   - PySide6/Qt-based desktop GUI
   - Real-time transcription display with scrolling
   - Settings panel with live updates (no restart needed)
-### Web Server Architecture
+### Backend Process Lifecycle
-**Local Web Server** ([server/web_display.py](server/web_display.py))
+1. `main_headless.py` starts, acquires instance lock, creates `AppController`
- Always runs when GUI starts (port 8080 by default)
+2. `AppController.initialize()` starts the OBS web server (port 8080) and engine init thread
- FastAPI with WebSocket for real-time updates
+3. `APIServer` wraps the controller with FastAPI routes, runs on port 8081
- Used for OBS browser source integration
+4. Backend prints `{"event": "ready", "port": 8080}` to stdout for Tauri to discover
- Single-user (displays only local transcriptions)
+5. On shutdown: engine stopped, web server stopped, lock released
-**Multi-User Server** (Optional - for syncing across multiple users)
+### Headless Backend vs Legacy GUI
-**Node.js WebSocket Server** ([server/nodejs/](server/nodejs/)) - **RECOMMENDED**
+The `AppController` class (`backend/app_controller.py`) extracts all orchestration logic from `gui/main_window_qt.py` into a Qt-free class. The mapping:
 - Real-time WebSocket support (< 100ms latency)
 - Handles 100+ concurrent users
 - Easy deployment to VPS/cloud hosting (Railway, Heroku, DigitalOcean, or any VPS)
 - Configurable display options via URL parameters:
  - `timestamps=true/false` - Show/hide timestamps
  - `maxlines=50` - Maximum visible lines (prevents scroll bars in OBS)
  - `fontsize=16` - Font size in pixels
  - `fontfamily=Arial` - Font family
  - `fade=10` - Seconds before text fades (0 = never)
-See [server/nodejs/README.md](server/nodejs/README.md) for deployment instructions
+| Legacy (MainWindow) | Headless (AppController) |
 |---------------------|--------------------------|
 | `_initialize_components()` | `_initialize_engine()` |
 | `_start_transcription()` | `start_transcription()` |
 | `_stop_transcription()` | `stop_transcription()` |
 | `_on_settings_saved()` | `apply_settings()` |
 | `_reload_engine()` | `reload_engine()` |
 | `_start_web_server_if_enabled()` | `_start_web_server()` |
 | `_start_server_sync()` | `_start_server_sync()` |
 | Qt signals | Callbacks (`on_state_changed`, `on_transcription`, etc.) |
-### Configuration System
+### Threading Model (Headless)
- Config stored at `~/.local-transcription/config.yaml`
+- Main thread: Uvicorn (FastAPI) event loop
- Managed by [client/config.py](client/config.py)
+- Engine init thread: Downloads models, initializes VAD
- Settings apply immediately without restart (except model changes)
+- Web server thread: Separate asyncio loop for OBS display
- YAML format with nested keys (e.g., `transcription.model`)
+- Audio capture: Runs in engine callback threads
 - All results flow through `AppController` callbacks -> `APIServer` WebSocket broadcast
-### Device Management
+### Svelte Frontend
- [client/device_utils.py](client/device_utils.py) handles CPU/GPU detection
+Uses Svelte 5 runes throughout (`$state`, `$derived`, `$effect`, `$props`). No Svelte 4 patterns.
 - Auto-detects CUDA, MPS (Mac), or falls back to CPU
 - Compute types: float32 (best quality), float16 (GPU), int8 (fastest)
 - Thread-safe device selection
-## Key Implementation Details
+**Stores** (`src/lib/stores/`):
 - `backend.ts` — WebSocket connection + REST helpers (`apiGet`, `apiPost`, `apiPut`), auto-reconnect
 - `config.ts` — fetches/updates config from backend API
 - `transcriptions.ts` — manages transcript list, listens for `CustomEvent`s from backend store
-### PyInstaller Build Configuration
+**Key patterns:**
 - Backend store dispatches `CustomEvent`s on `window` for cross-store communication
 - Settings component collects all changed values into a `Record<string, any>` with dot-notation keys, sends via `PUT /api/config`
 - Controls use Tauri dialog plugin for native file save, falls back to blob download
- [local-transcription.spec](local-transcription.spec) controls build
+## CI/CD
 - UPX compression enabled for smaller executables
 - Hidden imports required for PySide6, faster-whisper, torch
 - Console mode enabled by default (set `console=False` to hide)
-### Threading Model
+Two Gitea Actions workflows in `.gitea/workflows/`:
- Main thread: Qt GUI event loop
+- **`release.yml`**: Triggers on push to `main`. Auto-bumps version, builds Tauri app on Linux/Windows/macOS, uploads `.deb`, `.rpm`, `.msi`, `.dmg` to Gitea release.
- Audio thread: Captures and processes audio chunks
+- **`build-sidecar.yml`**: Triggers on changes to `client/`, `server/`, `backend/`, `pyproject.toml`. Builds headless Python sidecar via PyInstaller. CUDA + CPU for Linux/Windows, CPU-only for macOS.
 - Web server thread: Runs FastAPI server
 - Transcription: Runs in callback thread from audio capture
 - All transcription results communicated via Qt signals
-### Server Sync (Optional Multi-User Feature)
+Both require a `BUILD_TOKEN` secret (Gitea API token with release write access).
 - [client/server_sync.py](client/server_sync.py) handles server communication
 - Toggle in Settings: "Enable Server Sync"
 - Sends transcriptions to Node.js server via HTTP POST
 - Real-time updates via WebSocket to display page
 - Per-speaker font support (Web-Safe, Google Fonts, Custom uploads)
 - Falls back gracefully if server unavailable
 ## Common Patterns
 ### Adding a New Setting
-1. Add to [config/default_config.yaml](config/default_config.yaml)
+1. Add default to [config/default_config.yaml](config/default_config.yaml)
-2. Update [client/config.py](client/config.py) if validation needed
+2. Add UI control in [src/lib/components/Settings.svelte](src/lib/components/Settings.svelte)
-3. Add UI control in [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py)
+3. Ensure the setting is included in the save handler's config update
-4. Apply setting in relevant component (no restart if possible)
+4. Apply in `AppController.apply_settings()` or the relevant component
-5. Emit signal to update display if needed
+5. For legacy GUI: also update [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py)
 ### Adding a New API Endpoint
 1. Add route in [backend/api_server.py](backend/api_server.py) `_setup_routes()`
 2. Add supporting logic in [backend/app_controller.py](backend/app_controller.py) if needed
 3. Call from Svelte via `backendStore.apiGet/apiPost/apiPut`
 ### Modifying Transcription Display
- Local GUI: [gui/transcription_display_qt.py](gui/transcription_display_qt.py)
+- Tauri UI: [src/lib/components/TranscriptionDisplay.svelte](src/lib/components/TranscriptionDisplay.svelte)
- Local web display (OBS): [server/web_display.py](server/web_display.py) (HTML in `_get_html()`)
+- OBS display: [server/web_display.py](server/web_display.py) (HTML in `_get_html()`)
 - Multi-user display: [server/nodejs/server.js](server/nodejs/server.js) (display page in `/display` route)
 ### Adding a New Model Size
 - Update [client/transcription_engine.py](client/transcription_engine.py)
 - Add to model selector in [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py)
 - Update CLI argument choices in [main_cli.py](main_cli.py)
 ## Dependencies
-**Core:**
+**Frontend:** Tauri v2, Svelte 5, Vite, TypeScript
- `faster-whisper`: Optimized Whisper inference
+**Backend:** Python 3.9+, FastAPI, Uvicorn, RealtimeSTT, faster-whisper, PyTorch (CUDA), sounddevice
- `torch`: ML framework (CUDA-enabled via special index)
+**Build:** PyInstaller (sidecar), Tauri CLI (app), uv (Python packages)
- `PySide6`: Qt6 bindings for GUI
+**CI:** Gitea Actions with platform-specific runners
 - `sounddevice`: Cross-platform audio I/O
 - `noisereduce`, `webrtcvad`: Audio preprocessing
 **Web Server:**
 - `fastapi`, `uvicorn`: Web server and ASGI
 - `websockets`: Real-time communication
 **Build:**
 - `pyinstaller`: Create standalone executables
 - `uv`: Fast package manager
 **PyTorch CUDA Index:**
 - Configured in [pyproject.toml](pyproject.toml) under `[[tool.uv.index]]`
 - Uses PyTorch's custom wheel repository for CUDA builds
 - Automatically installed with `uv sync` when using CUDA build scripts
 ## Platform-Specific Notes
 ### Linux
- Uses PulseAudio/ALSA for audio
+- Tauri needs: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`, `patchelf`
- Build scripts use bash (`.sh` files)
+- Audio: PulseAudio/ALSA via sounddevice
 - Executable: `dist/LocalTranscription/LocalTranscription`
 ### Windows
- Uses Windows Audio/WASAPI
+- Tauri needs: WebView2 (usually pre-installed on Windows 10+)
- Build scripts use batch (`.bat` files)
+- Audio: WASAPI via sounddevice
 - Executable: `dist\LocalTranscription\LocalTranscription.exe`
 - Requires Visual C++ Redistributable on target systems
-### Cross-Building
+### macOS
- **Cannot cross-compile** - must build on target platform
+- Tauri needs: Xcode Command Line Tools
- CI/CD should use platform-specific runners
+- Audio: CoreAudio via sounddevice
-
+- GPU: MPS (Apple Silicon) detected by `device_utils.py`
-## Troubleshooting
+- `Info.plist` must include `NSMicrophoneUsageDescription` for mic access
-
+- No CUDA builds — CPU/MPS only
 ### Model Loading Issues
 - Models download to `~/.cache/huggingface/`
 - First run requires internet connection
 - Check disk space (models: 75MB-3GB depending on size)
 ### Audio Device Issues
 - Run `uv run python main_cli.py --list-devices`
 - Check permissions (microphone access)
 - Try different device indices in settings
 ### GPU Not Detected
 - Run `uv run python check_cuda.py`
 - Install CUDA drivers (not CUDA toolkit - bundled in build)
 - Verify PyTorch sees GPU: `python -c "import torch; print(torch.cuda.is_available())"`
 ### Web Server Port Conflicts
 - Default port: 8080
 - Change in [gui/main_window_qt.py](gui/main_window_qt.py) or config
 - Use `lsof -i :8080` (Linux) or `netstat -ano | findstr :8080` (Windows)
 ## OBS Integration
 ### Local Display (Single User)
 1. Start Local Transcription app
 2. In OBS: Add "Browser" source
 3. URL: `http://localhost:8080`
 4. Set dimensions (e.g., 1920x300)
 ### Multi-User Display (Node.js Server)
 1. Deploy Node.js server (see [server/nodejs/README.md](server/nodejs/README.md))
 2. Each user configures Server URL: `http://your-server:3000/api/send`
 3. Enter same room name and passphrase
 4. In OBS: Add "Browser" source
 5. URL: `http://your-server:3000/display?room=ROOM&fade=10&timestamps=true&maxlines=50&fontsize=16`
 6. Customize URL parameters as needed:
   - `timestamps=false` - Hide timestamps
   - `maxlines=30` - Show max 30 lines (prevents scroll bars)
   - `fontsize=18` - Larger font
   - `fontfamily=Courier` - Different font
 ## Performance Optimization
 **For Real-Time Transcription:**
 - Use `tiny` or `base` model (faster)
 - Enable GPU if available (5-10x faster)
 - Increase chunk_duration for better accuracy (higher latency)
 - Decrease chunk_duration for lower latency (less context)
 - Enable VAD to skip silent audio
 **For Build Size Reduction:**
 - Don't bundle models (download on demand)
 - Use CPU-only build if no GPU users
 - Enable UPX compression (already in spec)
 ## Phase Status
 - ✅ **Phase 1**: Standalone desktop application (complete)
 - ✅ **Web Server**: Local OBS integration (complete)
 - ✅ **Builds**: PyInstaller executables (complete)
 - ✅ **Phase 2**: Multi-user Node.js server (complete, optional)
 - ⏸️ **Phase 3+**: Advanced features (see [NEXT_STEPS.md](NEXT_STEPS.md))
 ## Related Documentation
- [README.md](README.md) - User-facing documentation
+- [README.md](README.md) — User-facing documentation
- [BUILD.md](BUILD.md) - Detailed build instructions
+- [BUILD.md](BUILD.md) — Detailed build instructions
- [INSTALL.md](INSTALL.md) - Installation guide
+- [INSTALL.md](INSTALL.md) — Installation guide
- [NEXT_STEPS.md](NEXT_STEPS.md) - Future enhancements
+- [server/nodejs/README.md](server/nodejs/README.md) — Node.js server setup
 - [server/nodejs/README.md](server/nodejs/README.md) - Node.js server setup and deployment
--- a/README.md
+++ b/README.md
@@ -1,13 +1,14 @@
 # Local Transcription
-A real-time speech-to-text desktop application for streamers. Run locally on your machine with GPU or CPU, display transcriptions via OBS browser source, and optionally sync with other users through a multi-user server.
+A real-time speech-to-text desktop application for streamers. Runs locally on your machine with GPU or CPU, displays transcriptions via OBS browser source, and optionally syncs with other users through a multi-user server.
 **Version 1.4.0**
 ## Features
 - **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
- **Standalone Desktop App**: PySide6/Qt GUI that works without any server
+- **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/)
 - **Dual Transcription Modes**: Local (Whisper) or cloud (Deepgram) with managed billing or BYOK
 - **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback
 - **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
 - **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
@@ -16,36 +17,70 @@ A real-time speech-to-text desktop application for streamers. Run locally on you
 - **Customizable Colors**: User-configurable colors for name, text, and background
 - **Noise Suppression**: Built-in audio preprocessing to reduce background noise
 - **Auto-Updates**: Automatic update checking with release notes display
- **Cross-Platform**: Builds available for Windows and Linux
+
 ## Architecture
 The application uses a two-process architecture:
 1. **Tauri Shell** (Svelte 5 frontend) — lightweight native window (~50MB) rendering the UI
 2. **Python Backend** (sidecar) — headless process running transcription, audio capture, and the OBS web server
 The Tauri frontend communicates with the Python backend via REST API and WebSocket, following the same pattern as [voice-to-notes](https://repo.anhonesthost.net/MacroPad/voice-to-notes).
 ```
 Tauri App (user launches this)
  └─ Spawns Python backend as sidecar
       ├─ FastAPI REST API (control endpoints)
       ├─ WebSocket /ws/control (real-time state + transcriptions)
       ├─ OBS web display at http://localhost:8080
       └─ Transcription engine (Whisper or Deepgram)
 ```
 > **Legacy GUI**: The original PySide6/Qt desktop GUI (`main.py`) still works alongside the new Tauri frontend during the transition period.
 ## Quick Start
 ### Running from Source
 ```bash
-# Install dependencies
+# Install Python dependencies
 uv sync
-# Run the application
+# Run the Tauri app (frontend + backend)
 npm install
 npm run tauri dev
 # Or run just the headless backend (for development)
 uv run python -m backend.main_headless
 # Or run the legacy PySide6 GUI
 uv run python main.py
 ```
 ### Using Pre-Built Executables
-Download the latest release from the [releases page](https://repo.anhonesthost.net/streamer-tools/local-transcription/releases) and run the executable for your platform.
+Download the latest release from the [releases page](https://repo.anhonesthost.net/streamer-tools/local-transcription/releases):
 - **App installer** (Tauri shell): `.msi` (Windows), `.dmg` (macOS), `.deb`/`.rpm`/`.AppImage` (Linux)
 - **Sidecar** (Python backend): Download the matching `sidecar-*` zip for your platform (CUDA or CPU)
 ### Building from Source
 **Linux:**
 ```bash
-./build.sh
+# Build the Tauri app
-# Output: dist/LocalTranscription/LocalTranscription
+npm install
-```
+npm run tauri build
 # Output: src-tauri/target/release/bundle/
-**Windows:**
+# Build the Python sidecar (headless, no Qt)
-```cmd
+uv sync
 uv run pyinstaller local-transcription-headless.spec
 # Output: dist/local-transcription-backend/
 # Build the legacy PySide6 app (Linux)
 ./build.sh
 # Build the legacy PySide6 app (Windows)
 build.bat
 # Output: dist\LocalTranscription\LocalTranscription.exe
 ```
 For detailed build instructions, see [BUILD.md](BUILD.md).
@@ -57,14 +92,23 @@ For detailed build instructions, see [BUILD.md](BUILD.md).
 1. Launch the application
 2. Select your microphone from the audio device dropdown
 3. Choose a Whisper model (smaller = faster, larger = more accurate):
-   - `tiny.en` / `tiny` - Fastest, good for quick captions
+   - `tiny.en` / `tiny` — Fastest, good for quick captions
-   - `base.en` / `base` - Balanced speed and accuracy
+   - `base.en` / `base` — Balanced speed and accuracy
-   - `small.en` / `small` - Better accuracy
+   - `small.en` / `small` — Better accuracy
-   - `medium.en` / `medium` - High accuracy
+   - `medium.en` / `medium` — High accuracy
-   - `large-v3` - Best accuracy (requires more resources)
+   - `large-v3` — Best accuracy (requires more resources)
 4. Click **Start** to begin transcription
 5. Transcriptions appear in the main window and at `http://localhost:8080`
 ### Remote Transcription (Deepgram)
 Instead of local Whisper models, you can use cloud-based transcription:
 - **Managed mode**: Sign up via the transcription proxy for metered billing
 - **BYOK mode**: Bring your own Deepgram API key for direct access
 Configure in Settings > Remote Transcription.
 ### OBS Browser Source Setup
 1. Start the Local Transcription app
@@ -88,7 +132,7 @@ For syncing transcriptions across multiple users (e.g., multi-host streams or tr
 ## Configuration
-Settings are stored at `~/.local-transcription/config.yaml` and can be modified through the GUI settings panel.
+Settings are stored at `~/.local-transcription/config.yaml` and can be modified through the GUI settings panel or the REST API.
 ### Key Settings
@@ -100,6 +144,7 @@ Settings are stored at `~/.local-transcription/config.yaml` and can be modified
 | `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
 | `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
 | `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
 | `remote.mode` | Transcription mode (local/managed/byok) | `local` |
 | `display.show_timestamps` | Show timestamps with transcriptions | `true` |
 | `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
 | `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |
@@ -111,67 +156,114 @@ See [config/default_config.yaml](config/default_config.yaml) for all available o
 ```
 local-transcription/
-├── client/                      # Core transcription modules
+├── src/                             # Svelte 5 frontend (Tauri UI)
-│   ├── audio_capture.py         # Audio input handling
+│   ├── App.svelte                   # Main app shell
-│   ├── transcription_engine_realtime.py  # RealtimeSTT integration
+│   ├── lib/components/              # UI components
-│   ├── noise_suppression.py     # VAD and noise reduction
+│   │   ├── Header.svelte
-│   ├── device_utils.py          # CPU/GPU detection
+│   │   ├── StatusBar.svelte
-│   ├── config.py                # Configuration management
+│   │   ├── Controls.svelte
-│   ├── server_sync.py           # Multi-user server client
+│   │   ├── TranscriptionDisplay.svelte
-│   └── update_checker.py        # Auto-update functionality
+│   │   └── Settings.svelte
-├── gui/                         # Desktop application UI
+│   └── lib/stores/                  # Reactive state management
-│   ├── main_window_qt.py        # Main application window
+│       ├── backend.ts               # WebSocket + REST API client
-│   ├── settings_dialog_qt.py    # Settings dialog
+│       ├── config.ts                # App configuration
-│   └── transcription_display_qt.py  # Display widget
+│       └── transcriptions.ts        # Transcription data
-├── server/                      # Web servers
+├── src-tauri/                       # Tauri v2 Rust shell
-│   ├── web_display.py           # Local FastAPI server for OBS
+│   ├── src/main.rs
-│   └── nodejs/                  # Multi-user sync server
+│   └── tauri.conf.json
-│       ├── server.js            # Express + WebSocket server
+├── backend/                         # Headless Python backend (sidecar)
-│       └── README.md            # Deployment instructions
+│   ├── app_controller.py            # Orchestration logic (engine, sync, config)
 │   ├── api_server.py                # FastAPI REST + WebSocket control API
 │   └── main_headless.py             # Headless entry point
 ├── client/                          # Core transcription modules
 │   ├── audio_capture.py             # Audio input handling
 │   ├── transcription_engine_realtime.py  # RealtimeSTT / Whisper
 │   ├── deepgram_transcription.py    # Deepgram cloud transcription
 │   ├── noise_suppression.py         # VAD and noise reduction
 │   ├── device_utils.py              # CPU/GPU/MPS detection
 │   ├── config.py                    # Configuration management
 │   ├── server_sync.py               # Multi-user server client
 │   └── update_checker.py            # Auto-update functionality
 ├── gui/                             # Legacy PySide6/Qt GUI
 │   ├── main_window_qt.py
 │   ├── settings_dialog_qt.py
 │   └── transcription_display_qt.py
 ├── server/                          # Web servers
 │   ├── web_display.py               # Local FastAPI server for OBS
 │   └── nodejs/                      # Multi-user sync server
 ├── .gitea/workflows/                # CI/CD
 │   ├── release.yml                  # Tauri app builds (all platforms)
 │   └── build-sidecar.yml            # Python sidecar builds (CUDA + CPU)
 ├── config/
-│   └── default_config.yaml      # Default settings template
+│   └── default_config.yaml          # Default settings template
-├── main.py                      # GUI entry point
+├── main.py                          # Legacy GUI entry point
-├── main_cli.py                  # CLI version (for testing)
+├── main_cli.py                      # CLI version (for testing)
-├── build.sh                     # Linux build script
+├── local-transcription.spec         # PyInstaller config (legacy, with PySide6)
-├── build.bat                    # Windows build script
+├── local-transcription-headless.spec # PyInstaller config (headless sidecar)
-└── local-transcription.spec     # PyInstaller configuration
+├── pyproject.toml                   # Python dependencies
 └── package.json                     # Node.js / Tauri dependencies
 ```
 ## Technology Stack
-### Desktop Application
+### Frontend (Tauri)
 - **Tauri v2** — Native cross-platform shell (Rust)
 - **Svelte 5** — Reactive UI framework (TypeScript)
 - **Vite** — Frontend build tool
 ### Backend (Python Sidecar)
 - **Python 3.9+**
- **PySide6** - Qt6 GUI framework
+- **FastAPI + Uvicorn** — REST API and WebSocket server
- **RealtimeSTT** - Real-time speech-to-text with advanced VAD
+- **RealtimeSTT** — Real-time speech-to-text with advanced VAD
- **faster-whisper** - Optimized Whisper model inference
+- **faster-whisper** — Optimized Whisper model inference (CTranslate2)
- **PyTorch** - ML framework (CUDA-enabled)
+- **PyTorch** — ML framework (CUDA-enabled builds available)
- **sounddevice** - Cross-platform audio capture
+- **sounddevice** — Cross-platform audio capture
- **webrtcvad + silero_vad** - Voice activity detection
+- **webrtcvad + silero_vad** — Voice activity detection
 - **noisereduce** - Noise suppression
-### Web Servers
+### Multi-User Server (Optional)
- **FastAPI + Uvicorn** - Local web display server
+- **Node.js + Express + WebSocket** — Real-time sync server
 - **Node.js + Express + WebSocket** - Multi-user sync server
-### Build Tools
+### Build & CI/CD
- **PyInstaller** - Executable packaging
+- **PyInstaller** — Python sidecar packaging
- **uv** - Fast Python package manager
+- **Tauri CLI** — App bundling (.msi, .dmg, .deb, .rpm, .AppImage)
 - **Gitea Actions** — Automated cross-platform builds
 - **uv** — Fast Python package manager
 ## CI/CD
 Two Gitea Actions workflows in `.gitea/workflows/`:
 | Workflow | Trigger | Produces |
 |----------|---------|----------|
 | `release.yml` | Push to `main` | Tauri app installers for all platforms |
 | `build-sidecar.yml` | Changes to `client/`, `server/`, `backend/`, or `pyproject.toml` | Python sidecar zips (CUDA + CPU) |
 Both workflows require a `BUILD_TOKEN` secret in the repo settings (Gitea API token with release write access).
 ### Release Artifacts
 | Platform | App Installer | Sidecar (CUDA) | Sidecar (CPU) |
 |----------|--------------|----------------|---------------|
 | Linux x86_64 | `.deb`, `.rpm`, `.AppImage` | `sidecar-linux-x86_64-cuda.zip` | `sidecar-linux-x86_64-cpu.zip` |
 | Windows x86_64 | `.msi`, `-setup.exe` | `sidecar-windows-x86_64-cuda.zip` | `sidecar-windows-x86_64-cpu.zip` |
 | macOS ARM64 | `.dmg` | — | `sidecar-macos-aarch64-cpu.zip` |
 ## System Requirements
 ### Minimum
 - Python 3.9+
 - 4GB RAM
 - Any modern CPU
-### Recommended (for real-time performance)
+### Recommended (for local real-time transcription)
 - 8GB+ RAM
 - NVIDIA GPU with CUDA support (for GPU acceleration)
 - FFmpeg (installed automatically with dependencies)
 ### For Building
- **Linux**: gcc, Python dev headers
+- **Tauri app**: Node.js 20+, Rust stable, platform SDK (see [Tauri prerequisites](https://tauri.app/start/prerequisites/))
- **Windows**: Visual Studio Build Tools, Python dev headers
+- **Python sidecar**: Python 3.9+, uv, PyInstaller
 - **Linux**: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`, `patchelf`
 - **Windows**: Visual Studio Build Tools, WebView2
 - **macOS**: Xcode Command Line Tools
 ## Troubleshooting
@@ -185,7 +277,7 @@ local-transcription/
 # List available audio devices
 uv run python main_cli.py --list-devices
 ```
- Ensure microphone permissions are granted
+- Ensure microphone permissions are granted (especially on macOS)
 - Try different device indices in settings
 ### GPU Not Detected
@@ -193,13 +285,13 @@ uv run python main_cli.py --list-devices
 # Check CUDA availability
 uv run python -c "import torch; print(torch.cuda.is_available())"
 ```
- Install NVIDIA drivers (CUDA toolkit is bundled)
+- Install NVIDIA drivers (CUDA toolkit is bundled in CUDA sidecar builds)
 - The app automatically falls back to CPU if no GPU is available
 ### Web Server Port Conflicts
- Default port is 8080
+- Default port is 8080; the app tries ports 8080-8084 automatically
 - Change in settings or edit config file
- Check for conflicts: `lsof -i :8080` (Linux) or `netstat -ano | findstr :8080` (Windows)
+- Check for conflicts: `lsof -i :8080` (Linux/macOS) or `netstat -ano | findstr :8080` (Windows)
 ## Use Cases
@@ -222,3 +314,5 @@ MIT License
 - [OpenAI Whisper](https://github.com/openai/whisper) for the speech recognition model
 - [RealtimeSTT](https://github.com/KoljaB/RealtimeSTT) for real-time transcription capabilities
 - [faster-whisper](https://github.com/guillaumekln/faster-whisper) for optimized inference
 - [Tauri](https://tauri.app/) for the cross-platform desktop framework
 - [Deepgram](https://deepgram.com/) for cloud transcription API