# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Local Transcription is a cross-platform desktop application for real-time speech-to-text transcription designed for streamers. It supports local Whisper models and cloud-based Deepgram transcription, with OBS browser source integration and optional multi-user sync. **Architecture:** Two-process model — a Tauri v2 shell (Svelte 5 frontend) communicates with a headless Python backend (sidecar) via REST API and WebSocket. **Key Features:** - Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5 - Headless Python backend with FastAPI control API - Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported - Settings UI hides local-only options (model, VAD, timing) when in cloud mode - Start button gated on API key / login — shows guidance if not configured - Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com) - Built-in web server for OBS browser source at `http://localhost:8080` - CUDA, MPS (Apple Silicon), and CPU support - Auto-updates, custom fonts, configurable colors > **Legacy GUI:** The original PySide6/Qt GUI (`main.py`, `gui/`) still works during the transition. New features should target the Tauri frontend and headless backend. ## Project Structure ``` local-transcription/ ├── src/ # Svelte 5 frontend (Tauri UI) │ ├── App.svelte # Main app shell │ ├── app.css # Global dark theme styles │ ├── main.ts # Svelte mount point │ ├── lib/components/ # UI components │ │ ├── Header.svelte # Title bar + settings button │ │ ├── StatusBar.svelte # State indicator, device, user info │ │ ├── Controls.svelte # Start/Stop, Clear, Save buttons │ │ ├── TranscriptionDisplay.svelte # Scrolling transcript view │ │ └── Settings.svelte # Full settings modal (all sections) │ └── lib/stores/ # Svelte 5 reactive stores ($state/$derived) │ ├── backend.ts # WebSocket + REST API client │ ├── config.ts # App configuration fetch/update │ └── transcriptions.ts # Transcript data management ├── src-tauri/ # Tauri v2 Rust shell │ ├── src/lib.rs # Plugin registration (shell, dialog, process) │ ├── src/main.rs # Entry point │ ├── tauri.conf.json # Window, bundle, plugin config │ └── Cargo.toml # Rust dependencies ├── backend/ # Headless Python backend (the sidecar) │ ├── app_controller.py # Core orchestration (engine, sync, config) │ ├── api_server.py # FastAPI REST endpoints + /ws/control │ └── main_headless.py # Headless entry point (prints JSON to stdout) ├── client/ # Core transcription modules (used by backend) │ ├── audio_capture.py # Audio input handling │ ├── transcription_engine_realtime.py # RealtimeSTT / Whisper engine │ ├── deepgram_transcription.py # Deepgram WebSocket cloud transcription │ ├── noise_suppression.py # VAD and noise reduction │ ├── device_utils.py # CPU/GPU/MPS detection │ ├── config.py # YAML config management (~/.local-transcription/) │ ├── server_sync.py # Multi-user server sync client │ ├── instance_lock.py # Single-instance PID lock │ └── update_checker.py # Gitea release update checker ├── gui/ # Legacy PySide6/Qt GUI (still functional) │ ├── main_window_qt.py # Main window (orchestration lives here in legacy) │ ├── settings_dialog_qt.py # Settings dialog │ └── transcription_display_qt.py # Display widget ├── server/ │ ├── web_display.py # FastAPI OBS display server (WebSocket + HTML) │ └── nodejs/ # Optional multi-user sync server ├── .gitea/workflows/ # CI/CD │ ├── release.yml # Coordinator: version bump, tag, release creation │ ├── build-app-linux.yml # Linux Tauri app build (triggered by v* tag) │ ├── build-app-windows.yml # Windows Tauri app build (triggered by v* tag) │ ├── build-app-macos.yml # macOS Tauri app build (triggered by v* tag) │ ├── sidecar-release.yml # Sidecar coordinator: version bump, tag, release │ ├── build-sidecar-linux.yml # Linux sidecar build (triggered by sidecar-v* tag) │ ├── build-sidecar-windows.yml # Windows sidecar build (triggered by sidecar-v* tag) │ └── build-sidecar-macos.yml # macOS sidecar build (triggered by sidecar-v* tag) ├── config/default_config.yaml # Default settings template ├── main.py # Legacy PySide6 GUI entry point ├── main_cli.py # CLI version for testing ├── version.py # Version string (__version__) ├── local-transcription.spec # PyInstaller config (legacy, includes PySide6) ├── local-transcription-headless.spec # PyInstaller config (headless sidecar, no Qt) ├── pyproject.toml # Python deps (uv, CUDA PyTorch index) ├── package.json # Node/Tauri deps └── vite.config.ts # Vite build config ($lib alias) ``` ## Development Commands ### Frontend (Tauri + Svelte) ```bash # Install npm dependencies npm install # Run Tauri in development mode (hot-reload) npm run tauri dev # Build frontend only (for testing) npx vite build # Type-check Svelte npx svelte-check # Check Rust compiles cd src-tauri && cargo check ``` ### Backend (Python) ```bash # Install Python dependencies uv sync # Run the headless backend standalone (for development) uv run python -m backend.main_headless --port 8080 # Run the legacy PySide6 GUI uv run python main.py # Run CLI version (headless, for testing) uv run python main_cli.py # List available audio devices uv run python main_cli.py --list-devices ``` ### Building ```bash # Build Tauri app (produces platform installer) npm run tauri build # Build headless Python sidecar (no PySide6) uv run pyinstaller local-transcription-headless.spec # Output: dist/local-transcription-backend/ # Build legacy PySide6 app uv run pyinstaller local-transcription.spec # Or use: ./build.sh (Linux) / build.bat (Windows) ``` ### Testing ```bash uv run python test_components.py uv run python check_cuda.py ``` ## Architecture Details ### Communication: Tauri <-> Python Backend The Svelte frontend connects to the Python backend via two channels: **REST API** (on port 8081 by default): - `GET /api/status` — app state, device info, version - `POST /api/start` / `POST /api/stop` — transcription control - `GET /api/config` / `PUT /api/config` — read/write settings (dot-notation keys) - `GET /api/audio-devices` / `GET /api/compute-devices` — device enumeration - `POST /api/reload-engine` — reload with new model/device - `GET /api/transcriptions` / `POST /api/clear` — transcript management - `POST /api/save-file` — write text to a file path - `GET /api/check-update` / `POST /api/skip-version` — update management - `POST /api/login` / `POST /api/register` / `GET /api/balance` — managed mode proxy **WebSocket** `/ws/control`: - Pushes real-time events: `state_changed`, `transcription`, `preview`, `error`, `credits_low` - Client sends keepalive pings The OBS display server runs separately on port 8080 (`GET /` for HTML, `WebSocket /ws` for transcriptions). ### Backend Process Lifecycle 1. `main_headless.py` starts, acquires instance lock, creates `AppController` 2. `AppController.initialize()` starts the OBS web server (port 8080) and engine init thread 3. `APIServer` wraps the controller with FastAPI routes, runs on port 8081 4. Backend prints `{"event": "ready", "port": 8080}` to stdout for Tauri to discover 5. On shutdown: engine stopped, web server stopped, lock released ### Headless Backend vs Legacy GUI The `AppController` class (`backend/app_controller.py`) extracts all orchestration logic from `gui/main_window_qt.py` into a Qt-free class. The mapping: | Legacy (MainWindow) | Headless (AppController) | |---------------------|--------------------------| | `_initialize_components()` | `_initialize_engine()` | | `_start_transcription()` | `start_transcription()` | | `_stop_transcription()` | `stop_transcription()` | | `_on_settings_saved()` | `apply_settings()` | | `_reload_engine()` | `reload_engine()` | | `_start_web_server_if_enabled()` | `_start_web_server()` | | `_start_server_sync()` | `_start_server_sync()` | | Qt signals | Callbacks (`on_state_changed`, `on_transcription`, etc.) | ### Threading Model (Headless) - Main thread: Uvicorn (FastAPI) event loop - Engine init thread: Downloads models, initializes VAD - Web server thread: Separate asyncio loop for OBS display - Audio capture: Runs in engine callback threads - All results flow through `AppController` callbacks -> `APIServer` WebSocket broadcast ### Svelte Frontend Uses Svelte 5 runes throughout (`$state`, `$derived`, `$effect`, `$props`). No Svelte 4 patterns. **Stores** (`src/lib/stores/`): - `backend.ts` — WebSocket connection + REST helpers (`apiGet`, `apiPost`, `apiPut`), auto-reconnect - `config.ts` — fetches/updates config from backend API - `transcriptions.ts` — manages transcript list, listens for `CustomEvent`s from backend store **Key patterns:** - Backend store dispatches `CustomEvent`s on `window` for cross-store communication - Settings component collects all changed values into a `Record` with dot-notation keys, sends via `PUT /api/config` - Controls use Tauri dialog plugin for native file save, falls back to blob download ## CI/CD Eight Gitea Actions workflows in `.gitea/workflows/`, split into coordinators and per-OS builders: **App release (Tauri):** - **`release.yml`**: Coordinator. Triggers on push to `main`. Auto-bumps version in package.json/tauri.conf.json/Cargo.toml/version.py, commits, tags `v{VERSION}`, creates Gitea release. - **`build-app-linux.yml`**: Triggers on `v*` tag push or `workflow_dispatch`. Builds Tauri app, uploads `.deb`/`.rpm`/`.AppImage`. - **`build-app-windows.yml`**: Triggers on `v*` tag push or `workflow_dispatch`. Builds Tauri app, uploads `.msi`/`*-setup.exe`. - **`build-app-macos.yml`**: Triggers on `v*` tag push or `workflow_dispatch`. Builds Tauri app, uploads `.dmg`. **Sidecar release (Python backend):** - **`sidecar-release.yml`**: Coordinator. Triggers on push to `main` with changes in `client/`, `server/`, `backend/`, `pyproject.toml`, or `local-transcription-headless.spec`. Bumps version in pyproject.toml/version.py, tags `sidecar-v{VERSION}`, creates Gitea release. - **`build-sidecar-linux.yml`**: Triggers on `sidecar-v*` tag push or `workflow_dispatch`. Builds CUDA + CPU sidecars via PyInstaller. - **`build-sidecar-windows.yml`**: Triggers on `sidecar-v*` tag push or `workflow_dispatch`. Builds CUDA + CPU sidecars via PyInstaller. - **`build-sidecar-macos.yml`**: Triggers on `sidecar-v*` tag push or `workflow_dispatch`. Builds CPU-only sidecar via PyInstaller. All per-OS build workflows can be re-run independently via `workflow_dispatch` with an optional `tag` input. All require a `BUILD_TOKEN` secret (Gitea API token with release write access). ## Common Patterns ### Adding a New Setting 1. Add default to [config/default_config.yaml](config/default_config.yaml) 2. Add UI control in [src/lib/components/Settings.svelte](src/lib/components/Settings.svelte) 3. Ensure the setting is included in the save handler's config update 4. Apply in `AppController.apply_settings()` or the relevant component 5. For legacy GUI: also update [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py) ### Adding a New API Endpoint 1. Add route in [backend/api_server.py](backend/api_server.py) `_setup_routes()` 2. Add supporting logic in [backend/app_controller.py](backend/app_controller.py) if needed 3. Call from Svelte via `backendStore.apiGet/apiPost/apiPut` ### Modifying Transcription Display - Tauri UI: [src/lib/components/TranscriptionDisplay.svelte](src/lib/components/TranscriptionDisplay.svelte) - OBS display: [server/web_display.py](server/web_display.py) (HTML in `_get_html()`) - Multi-user display: [server/nodejs/server.js](server/nodejs/server.js) (display page in `/display` route) ## Dependencies **Frontend:** Tauri v2, Svelte 5, Vite, TypeScript **Backend:** Python 3.9+, FastAPI, Uvicorn, RealtimeSTT, faster-whisper, PyTorch (CUDA), sounddevice **Build:** PyInstaller (sidecar), Tauri CLI (app), uv (Python packages) **CI:** Gitea Actions with platform-specific runners ## Platform-Specific Notes ### Linux - Tauri needs: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`, `patchelf` - Audio: PulseAudio/ALSA via sounddevice ### Windows - Tauri needs: WebView2 (usually pre-installed on Windows 10+) - Audio: WASAPI via sounddevice ### macOS - Tauri needs: Xcode Command Line Tools - Audio: CoreAudio via sounddevice - GPU: MPS (Apple Silicon) detected by `device_utils.py` - `Info.plist` must include `NSMicrophoneUsageDescription` for mic access - No CUDA builds — CPU/MPS only ## Code Signing Code signing is configured for Windows and macOS to eliminate install warnings (SmartScreen / Gatekeeper). See [SIGNING.md](SIGNING.md) for full setup details. **Status (as of 2026-04-10):** CI workflow changes are committed. Waiting on identity verification for both platforms before secrets can be configured. **How it works:** - macOS: Tauri auto-signs when `APPLE_CERTIFICATE` and related env vars are set in CI. Notarization uses App Store Connect API key. - Windows: Azure Artifact Signing via `signtool.exe` + dlib. CI workflow injects `signCommand` into `tauri.conf.json` at build time when `AZURE_CLIENT_ID` is set. - Both are no-ops when secrets aren't configured — unsigned builds work as before. **Key files:** - `src-tauri/Entitlements.plist` — macOS hardened runtime entitlements (mic, network) - `src-tauri/Info.plist` — macOS microphone usage description - `.gitea/workflows/build-app-macos.yml` — Apple signing + notarization - `.gitea/workflows/build-app-windows.yml` — Azure Artifact Signing **Secrets required (12 total):** See [SIGNING.md](SIGNING.md) for the full list — 6 Apple secrets, 6 Azure secrets. ## Related Documentation - [README.md](README.md) — User-facing documentation - [BUILD.md](BUILD.md) — Detailed build instructions - [INSTALL.md](INSTALL.md) — Installation guide - [SIGNING.md](SIGNING.md) — Code signing setup guide - [server/nodejs/README.md](server/nodejs/README.md) — Node.js server setup