Files
local-transcription/CLAUDE.md
Developer aa4033b412
All checks were successful
Release / Bump version and tag (push) Successful in 3s
Split CI workflows into per-OS files for independent re-runs
Refactored from 2 monolithic workflows into 8 targeted ones:

Coordinators (version bump + tag + release creation):
- release.yml: bumps app version, tags v*, creates Gitea release
- sidecar-release.yml: bumps sidecar version, tags sidecar-v*

Per-OS app builds (triggered by v* tags or workflow_dispatch):
- build-app-linux.yml: .deb, .rpm, .AppImage
- build-app-windows.yml: .msi, -setup.exe
- build-app-macos.yml: .dmg

Per-OS sidecar builds (triggered by sidecar-v* tags or workflow_dispatch):
- build-sidecar-linux.yml: CUDA + CPU variants
- build-sidecar-windows.yml: CUDA + CPU variants
- build-sidecar-macos.yml: CPU only

Each build workflow can be re-triggered independently without
re-running the version bump or rebuilding other platforms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 17:35:25 -07:00

14 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Local Transcription is a cross-platform desktop application for real-time speech-to-text transcription designed for streamers. It supports local Whisper models and cloud-based Deepgram transcription, with OBS browser source integration and optional multi-user sync.

Architecture: Two-process model — a Tauri v2 shell (Svelte 5 frontend) communicates with a headless Python backend (sidecar) via REST API and WebSocket.

Key Features:

  • Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
  • Headless Python backend with FastAPI control API
  • Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK)
  • Built-in web server for OBS browser source at http://localhost:8080
  • Optional multi-user sync via Node.js server
  • CUDA, MPS (Apple Silicon), and CPU support
  • Auto-updates, custom fonts, configurable colors

Legacy GUI: The original PySide6/Qt GUI (main.py, gui/) still works during the transition. New features should target the Tauri frontend and headless backend.

Project Structure

local-transcription/
├── src/                             # Svelte 5 frontend (Tauri UI)
│   ├── App.svelte                   # Main app shell
│   ├── app.css                      # Global dark theme styles
│   ├── main.ts                      # Svelte mount point
│   ├── lib/components/              # UI components
│   │   ├── Header.svelte            # Title bar + settings button
│   │   ├── StatusBar.svelte         # State indicator, device, user info
│   │   ├── Controls.svelte          # Start/Stop, Clear, Save buttons
│   │   ├── TranscriptionDisplay.svelte  # Scrolling transcript view
│   │   └── Settings.svelte          # Full settings modal (all sections)
│   └── lib/stores/                  # Svelte 5 reactive stores ($state/$derived)
│       ├── backend.ts               # WebSocket + REST API client
│       ├── config.ts                # App configuration fetch/update
│       └── transcriptions.ts        # Transcript data management
├── src-tauri/                       # Tauri v2 Rust shell
│   ├── src/lib.rs                   # Plugin registration (shell, dialog, process)
│   ├── src/main.rs                  # Entry point
│   ├── tauri.conf.json              # Window, bundle, plugin config
│   └── Cargo.toml                   # Rust dependencies
├── backend/                         # Headless Python backend (the sidecar)
│   ├── app_controller.py            # Core orchestration (engine, sync, config)
│   ├── api_server.py                # FastAPI REST endpoints + /ws/control
│   └── main_headless.py             # Headless entry point (prints JSON to stdout)
├── client/                          # Core transcription modules (used by backend)
│   ├── audio_capture.py             # Audio input handling
│   ├── transcription_engine_realtime.py  # RealtimeSTT / Whisper engine
│   ├── deepgram_transcription.py    # Deepgram WebSocket cloud transcription
│   ├── noise_suppression.py         # VAD and noise reduction
│   ├── device_utils.py              # CPU/GPU/MPS detection
│   ├── config.py                    # YAML config management (~/.local-transcription/)
│   ├── server_sync.py               # Multi-user server sync client
│   ├── instance_lock.py             # Single-instance PID lock
│   └── update_checker.py            # Gitea release update checker
├── gui/                             # Legacy PySide6/Qt GUI (still functional)
│   ├── main_window_qt.py            # Main window (orchestration lives here in legacy)
│   ├── settings_dialog_qt.py        # Settings dialog
│   └── transcription_display_qt.py  # Display widget
├── server/
│   ├── web_display.py               # FastAPI OBS display server (WebSocket + HTML)
│   └── nodejs/                      # Optional multi-user sync server
├── .gitea/workflows/                # CI/CD
│   ├── release.yml                  # Coordinator: version bump, tag, release creation
│   ├── build-app-linux.yml          # Linux Tauri app build (triggered by v* tag)
│   ├── build-app-windows.yml        # Windows Tauri app build (triggered by v* tag)
│   ├── build-app-macos.yml          # macOS Tauri app build (triggered by v* tag)
│   ├── sidecar-release.yml          # Sidecar coordinator: version bump, tag, release
│   ├── build-sidecar-linux.yml      # Linux sidecar build (triggered by sidecar-v* tag)
│   ├── build-sidecar-windows.yml    # Windows sidecar build (triggered by sidecar-v* tag)
│   └── build-sidecar-macos.yml      # macOS sidecar build (triggered by sidecar-v* tag)
├── config/default_config.yaml       # Default settings template
├── main.py                          # Legacy PySide6 GUI entry point
├── main_cli.py                      # CLI version for testing
├── version.py                       # Version string (__version__)
├── local-transcription.spec         # PyInstaller config (legacy, includes PySide6)
├── local-transcription-headless.spec # PyInstaller config (headless sidecar, no Qt)
├── pyproject.toml                   # Python deps (uv, CUDA PyTorch index)
├── package.json                     # Node/Tauri deps
└── vite.config.ts                   # Vite build config ($lib alias)

Development Commands

Frontend (Tauri + Svelte)

# Install npm dependencies
npm install

# Run Tauri in development mode (hot-reload)
npm run tauri dev

# Build frontend only (for testing)
npx vite build

# Type-check Svelte
npx svelte-check

# Check Rust compiles
cd src-tauri && cargo check

Backend (Python)

# Install Python dependencies
uv sync

# Run the headless backend standalone (for development)
uv run python -m backend.main_headless --port 8080

# Run the legacy PySide6 GUI
uv run python main.py

# Run CLI version (headless, for testing)
uv run python main_cli.py

# List available audio devices
uv run python main_cli.py --list-devices

Building

# Build Tauri app (produces platform installer)
npm run tauri build

# Build headless Python sidecar (no PySide6)
uv run pyinstaller local-transcription-headless.spec
# Output: dist/local-transcription-backend/

# Build legacy PySide6 app
uv run pyinstaller local-transcription.spec
# Or use: ./build.sh (Linux) / build.bat (Windows)

Testing

uv run python test_components.py
uv run python check_cuda.py

Architecture Details

Communication: Tauri <-> Python Backend

The Svelte frontend connects to the Python backend via two channels:

REST API (on port 8081 by default):

  • GET /api/status — app state, device info, version
  • POST /api/start / POST /api/stop — transcription control
  • GET /api/config / PUT /api/config — read/write settings (dot-notation keys)
  • GET /api/audio-devices / GET /api/compute-devices — device enumeration
  • POST /api/reload-engine — reload with new model/device
  • GET /api/transcriptions / POST /api/clear — transcript management
  • POST /api/save-file — write text to a file path
  • GET /api/check-update / POST /api/skip-version — update management
  • POST /api/login / POST /api/register / GET /api/balance — managed mode proxy

WebSocket /ws/control:

  • Pushes real-time events: state_changed, transcription, preview, error, credits_low
  • Client sends keepalive pings

The OBS display server runs separately on port 8080 (GET / for HTML, WebSocket /ws for transcriptions).

Backend Process Lifecycle

  1. main_headless.py starts, acquires instance lock, creates AppController
  2. AppController.initialize() starts the OBS web server (port 8080) and engine init thread
  3. APIServer wraps the controller with FastAPI routes, runs on port 8081
  4. Backend prints {"event": "ready", "port": 8080} to stdout for Tauri to discover
  5. On shutdown: engine stopped, web server stopped, lock released

Headless Backend vs Legacy GUI

The AppController class (backend/app_controller.py) extracts all orchestration logic from gui/main_window_qt.py into a Qt-free class. The mapping:

Legacy (MainWindow) Headless (AppController)
_initialize_components() _initialize_engine()
_start_transcription() start_transcription()
_stop_transcription() stop_transcription()
_on_settings_saved() apply_settings()
_reload_engine() reload_engine()
_start_web_server_if_enabled() _start_web_server()
_start_server_sync() _start_server_sync()
Qt signals Callbacks (on_state_changed, on_transcription, etc.)

Threading Model (Headless)

  • Main thread: Uvicorn (FastAPI) event loop
  • Engine init thread: Downloads models, initializes VAD
  • Web server thread: Separate asyncio loop for OBS display
  • Audio capture: Runs in engine callback threads
  • All results flow through AppController callbacks -> APIServer WebSocket broadcast

Svelte Frontend

Uses Svelte 5 runes throughout ($state, $derived, $effect, $props). No Svelte 4 patterns.

Stores (src/lib/stores/):

  • backend.ts — WebSocket connection + REST helpers (apiGet, apiPost, apiPut), auto-reconnect
  • config.ts — fetches/updates config from backend API
  • transcriptions.ts — manages transcript list, listens for CustomEvents from backend store

Key patterns:

  • Backend store dispatches CustomEvents on window for cross-store communication
  • Settings component collects all changed values into a Record<string, any> with dot-notation keys, sends via PUT /api/config
  • Controls use Tauri dialog plugin for native file save, falls back to blob download

CI/CD

Eight Gitea Actions workflows in .gitea/workflows/, split into coordinators and per-OS builders:

App release (Tauri):

  • release.yml: Coordinator. Triggers on push to main. Auto-bumps version in package.json/tauri.conf.json/Cargo.toml/version.py, commits, tags v{VERSION}, creates Gitea release.
  • build-app-linux.yml: Triggers on v* tag push or workflow_dispatch. Builds Tauri app, uploads .deb/.rpm/.AppImage.
  • build-app-windows.yml: Triggers on v* tag push or workflow_dispatch. Builds Tauri app, uploads .msi/*-setup.exe.
  • build-app-macos.yml: Triggers on v* tag push or workflow_dispatch. Builds Tauri app, uploads .dmg.

Sidecar release (Python backend):

  • sidecar-release.yml: Coordinator. Triggers on push to main with changes in client/, server/, backend/, pyproject.toml, or local-transcription-headless.spec. Bumps version in pyproject.toml/version.py, tags sidecar-v{VERSION}, creates Gitea release.
  • build-sidecar-linux.yml: Triggers on sidecar-v* tag push or workflow_dispatch. Builds CUDA + CPU sidecars via PyInstaller.
  • build-sidecar-windows.yml: Triggers on sidecar-v* tag push or workflow_dispatch. Builds CUDA + CPU sidecars via PyInstaller.
  • build-sidecar-macos.yml: Triggers on sidecar-v* tag push or workflow_dispatch. Builds CPU-only sidecar via PyInstaller.

All per-OS build workflows can be re-run independently via workflow_dispatch with an optional tag input. All require a BUILD_TOKEN secret (Gitea API token with release write access).

Common Patterns

Adding a New Setting

  1. Add default to config/default_config.yaml
  2. Add UI control in src/lib/components/Settings.svelte
  3. Ensure the setting is included in the save handler's config update
  4. Apply in AppController.apply_settings() or the relevant component
  5. For legacy GUI: also update gui/settings_dialog_qt.py

Adding a New API Endpoint

  1. Add route in backend/api_server.py _setup_routes()
  2. Add supporting logic in backend/app_controller.py if needed
  3. Call from Svelte via backendStore.apiGet/apiPost/apiPut

Modifying Transcription Display

Dependencies

Frontend: Tauri v2, Svelte 5, Vite, TypeScript Backend: Python 3.9+, FastAPI, Uvicorn, RealtimeSTT, faster-whisper, PyTorch (CUDA), sounddevice Build: PyInstaller (sidecar), Tauri CLI (app), uv (Python packages) CI: Gitea Actions with platform-specific runners

Platform-Specific Notes

Linux

  • Tauri needs: libgtk-3-dev, libwebkit2gtk-4.1-dev, libappindicator3-dev, librsvg2-dev, patchelf
  • Audio: PulseAudio/ALSA via sounddevice

Windows

  • Tauri needs: WebView2 (usually pre-installed on Windows 10+)
  • Audio: WASAPI via sounddevice

macOS

  • Tauri needs: Xcode Command Line Tools
  • Audio: CoreAudio via sounddevice
  • GPU: MPS (Apple Silicon) detected by device_utils.py
  • Info.plist must include NSMicrophoneUsageDescription for mic access
  • No CUDA builds — CPU/MPS only