local-transcription/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Local Transcription is a cross-platform desktop application for real-time speech-to-text transcription designed for streamers. It supports local Whisper models and cloud-based Deepgram transcription, with OBS browser source integration and optional multi-user sync.

**Architecture:** Two-process model — a Tauri v2 shell (Svelte 5 frontend) communicates with a headless Python backend (sidecar) via REST API and WebSocket.

**Key Features:**
- Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
- Headless Python backend with FastAPI control API
- Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK)
- Built-in web server for OBS browser source at `http://localhost:8080`
- Optional multi-user sync via Node.js server
- CUDA, MPS (Apple Silicon), and CPU support
- Auto-updates, custom fonts, configurable colors

> **Legacy GUI:** The original PySide6/Qt GUI (`main.py`, `gui/`) still works during the transition. New features should target the Tauri frontend and headless backend.

## Project Structure

```
local-transcription/
├── src/                             # Svelte 5 frontend (Tauri UI)
│   ├── App.svelte                   # Main app shell
│   ├── app.css                      # Global dark theme styles
│   ├── main.ts                      # Svelte mount point
│   ├── lib/components/              # UI components
│   │   ├── Header.svelte            # Title bar + settings button
│   │   ├── StatusBar.svelte         # State indicator, device, user info
│   │   ├── Controls.svelte          # Start/Stop, Clear, Save buttons
│   │   ├── TranscriptionDisplay.svelte  # Scrolling transcript view
│   │   └── Settings.svelte          # Full settings modal (all sections)
│   └── lib/stores/                  # Svelte 5 reactive stores ($state/$derived)
│       ├── backend.ts               # WebSocket + REST API client
│       ├── config.ts                # App configuration fetch/update
│       └── transcriptions.ts        # Transcript data management
├── src-tauri/                       # Tauri v2 Rust shell
│   ├── src/lib.rs                   # Plugin registration (shell, dialog, process)
│   ├── src/main.rs                  # Entry point
│   ├── tauri.conf.json              # Window, bundle, plugin config
│   └── Cargo.toml                   # Rust dependencies
├── backend/                         # Headless Python backend (the sidecar)
│   ├── app_controller.py            # Core orchestration (engine, sync, config)
│   ├── api_server.py                # FastAPI REST endpoints + /ws/control
│   └── main_headless.py             # Headless entry point (prints JSON to stdout)
├── client/                          # Core transcription modules (used by backend)
│   ├── audio_capture.py             # Audio input handling
│   ├── transcription_engine_realtime.py  # RealtimeSTT / Whisper engine
│   ├── deepgram_transcription.py    # Deepgram WebSocket cloud transcription
│   ├── noise_suppression.py         # VAD and noise reduction
│   ├── device_utils.py              # CPU/GPU/MPS detection
│   ├── config.py                    # YAML config management (~/.local-transcription/)
│   ├── server_sync.py               # Multi-user server sync client
│   ├── instance_lock.py             # Single-instance PID lock
│   └── update_checker.py            # Gitea release update checker
├── gui/                             # Legacy PySide6/Qt GUI (still functional)
│   ├── main_window_qt.py            # Main window (orchestration lives here in legacy)
│   ├── settings_dialog_qt.py        # Settings dialog
│   └── transcription_display_qt.py  # Display widget
├── server/
│   ├── web_display.py               # FastAPI OBS display server (WebSocket + HTML)
│   └── nodejs/                      # Optional multi-user sync server
├── .gitea/workflows/                # CI/CD
│   ├── release.yml                  # Tauri app builds (Linux/Windows/macOS)
│   └── build-sidecar.yml            # Python sidecar builds (CUDA + CPU)
├── config/default_config.yaml       # Default settings template
├── main.py                          # Legacy PySide6 GUI entry point
├── main_cli.py                      # CLI version for testing
├── version.py                       # Version string (__version__)
├── local-transcription.spec         # PyInstaller config (legacy, includes PySide6)
├── local-transcription-headless.spec # PyInstaller config (headless sidecar, no Qt)
├── pyproject.toml                   # Python deps (uv, CUDA PyTorch index)
├── package.json                     # Node/Tauri deps
└── vite.config.ts                   # Vite build config ($lib alias)
```

## Development Commands

### Frontend (Tauri + Svelte)
```bash
# Install npm dependencies
npm install

# Run Tauri in development mode (hot-reload)
npm run tauri dev

# Build frontend only (for testing)
npx vite build

# Type-check Svelte
npx svelte-check

# Check Rust compiles
cd src-tauri && cargo check
```

### Backend (Python)
```bash
# Install Python dependencies
uv sync

# Run the headless backend standalone (for development)
uv run python -m backend.main_headless --port 8080

# Run the legacy PySide6 GUI
uv run python main.py

# Run CLI version (headless, for testing)
uv run python main_cli.py

# List available audio devices
uv run python main_cli.py --list-devices
```

### Building
```bash
# Build Tauri app (produces platform installer)
npm run tauri build

# Build headless Python sidecar (no PySide6)
uv run pyinstaller local-transcription-headless.spec
# Output: dist/local-transcription-backend/

# Build legacy PySide6 app
uv run pyinstaller local-transcription.spec
# Or use: ./build.sh (Linux) / build.bat (Windows)
```

### Testing
```bash
uv run python test_components.py
uv run python check_cuda.py
```

## Architecture Details

### Communication: Tauri <-> Python Backend

The Svelte frontend connects to the Python backend via two channels:

**REST API** (on port 8081 by default):
- `GET /api/status` — app state, device info, version
- `POST /api/start` / `POST /api/stop` — transcription control
- `GET /api/config` / `PUT /api/config` — read/write settings (dot-notation keys)
- `GET /api/audio-devices` / `GET /api/compute-devices` — device enumeration
- `POST /api/reload-engine` — reload with new model/device
- `GET /api/transcriptions` / `POST /api/clear` — transcript management
- `POST /api/save-file` — write text to a file path
- `GET /api/check-update` / `POST /api/skip-version` — update management
- `POST /api/login` / `POST /api/register` / `GET /api/balance` — managed mode proxy

**WebSocket** `/ws/control`:
- Pushes real-time events: `state_changed`, `transcription`, `preview`, `error`, `credits_low`
- Client sends keepalive pings

The OBS display server runs separately on port 8080 (`GET /` for HTML, `WebSocket /ws` for transcriptions).

### Backend Process Lifecycle

1. `main_headless.py` starts, acquires instance lock, creates `AppController`
2. `AppController.initialize()` starts the OBS web server (port 8080) and engine init thread
3. `APIServer` wraps the controller with FastAPI routes, runs on port 8081
4. Backend prints `{"event": "ready", "port": 8080}` to stdout for Tauri to discover
5. On shutdown: engine stopped, web server stopped, lock released

### Headless Backend vs Legacy GUI

The `AppController` class (`backend/app_controller.py`) extracts all orchestration logic from `gui/main_window_qt.py` into a Qt-free class. The mapping:

| Legacy (MainWindow) | Headless (AppController) |
|---------------------|--------------------------|
| `_initialize_components()` | `_initialize_engine()` |
| `_start_transcription()` | `start_transcription()` |
| `_stop_transcription()` | `stop_transcription()` |
| `_on_settings_saved()` | `apply_settings()` |
| `_reload_engine()` | `reload_engine()` |
| `_start_web_server_if_enabled()` | `_start_web_server()` |
| `_start_server_sync()` | `_start_server_sync()` |
| Qt signals | Callbacks (`on_state_changed`, `on_transcription`, etc.) |

### Threading Model (Headless)

- Main thread: Uvicorn (FastAPI) event loop
- Engine init thread: Downloads models, initializes VAD
- Web server thread: Separate asyncio loop for OBS display
- Audio capture: Runs in engine callback threads
- All results flow through `AppController` callbacks -> `APIServer` WebSocket broadcast

### Svelte Frontend

Uses Svelte 5 runes throughout (`$state`, `$derived`, `$effect`, `$props`). No Svelte 4 patterns.

**Stores** (`src/lib/stores/`):
- `backend.ts` — WebSocket connection + REST helpers (`apiGet`, `apiPost`, `apiPut`), auto-reconnect
- `config.ts` — fetches/updates config from backend API
- `transcriptions.ts` — manages transcript list, listens for `CustomEvent`s from backend store

**Key patterns:**
- Backend store dispatches `CustomEvent`s on `window` for cross-store communication
- Settings component collects all changed values into a `Record<string, any>` with dot-notation keys, sends via `PUT /api/config`
- Controls use Tauri dialog plugin for native file save, falls back to blob download

## CI/CD

Two Gitea Actions workflows in `.gitea/workflows/`:

- **`release.yml`**: Triggers on push to `main`. Auto-bumps version, builds Tauri app on Linux/Windows/macOS, uploads `.deb`, `.rpm`, `.msi`, `.dmg` to Gitea release.
- **`build-sidecar.yml`**: Triggers on changes to `client/`, `server/`, `backend/`, `pyproject.toml`. Builds headless Python sidecar via PyInstaller. CUDA + CPU for Linux/Windows, CPU-only for macOS.

Both require a `BUILD_TOKEN` secret (Gitea API token with release write access).

## Common Patterns

### Adding a New Setting

1. Add default to [config/default_config.yaml](config/default_config.yaml)
2. Add UI control in [src/lib/components/Settings.svelte](src/lib/components/Settings.svelte)
3. Ensure the setting is included in the save handler's config update
4. Apply in `AppController.apply_settings()` or the relevant component
5. For legacy GUI: also update [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py)

### Adding a New API Endpoint

1. Add route in [backend/api_server.py](backend/api_server.py) `_setup_routes()`
2. Add supporting logic in [backend/app_controller.py](backend/app_controller.py) if needed
3. Call from Svelte via `backendStore.apiGet/apiPost/apiPut`

### Modifying Transcription Display

- Tauri UI: [src/lib/components/TranscriptionDisplay.svelte](src/lib/components/TranscriptionDisplay.svelte)
- OBS display: [server/web_display.py](server/web_display.py) (HTML in `_get_html()`)
- Multi-user display: [server/nodejs/server.js](server/nodejs/server.js) (display page in `/display` route)

## Dependencies

**Frontend:** Tauri v2, Svelte 5, Vite, TypeScript
**Backend:** Python 3.9+, FastAPI, Uvicorn, RealtimeSTT, faster-whisper, PyTorch (CUDA), sounddevice
**Build:** PyInstaller (sidecar), Tauri CLI (app), uv (Python packages)
**CI:** Gitea Actions with platform-specific runners

## Platform-Specific Notes

### Linux
- Tauri needs: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`, `patchelf`
- Audio: PulseAudio/ALSA via sounddevice

### Windows
- Tauri needs: WebView2 (usually pre-installed on Windows 10+)
- Audio: WASAPI via sounddevice

### macOS
- Tauri needs: Xcode Command Line Tools
- Audio: CoreAudio via sounddevice
- GPU: MPS (Apple Silicon) detected by `device_utils.py`
- `Info.plist` must include `NSMicrophoneUsageDescription` for mic access
- No CUDA builds — CPU/MPS only

## Related Documentation

- [README.md](README.md) — User-facing documentation
- [BUILD.md](BUILD.md) — Detailed build instructions
- [INSTALL.md](INSTALL.md) — Installation guide
- [server/nodejs/README.md](server/nodejs/README.md) — Node.js server setup