Files
local-transcription/CLAUDE.md
Developer 47ca74e75d Update README and CLAUDE.md for Tauri rewrite
Update both docs to reflect the new architecture:
- Tauri v2 + Svelte 5 frontend replacing PySide6/Qt
- Headless Python backend with FastAPI control API
- Cross-platform support (Windows, macOS, Linux)
- Deepgram remote transcription (managed/BYOK)
- Gitea CI/CD workflows for automated builds
- New project structure with backend/, src/, src-tauri/
- Updated development commands and build instructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:34:10 -07:00

12 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Local Transcription is a cross-platform desktop application for real-time speech-to-text transcription designed for streamers. It supports local Whisper models and cloud-based Deepgram transcription, with OBS browser source integration and optional multi-user sync.

Architecture: Two-process model — a Tauri v2 shell (Svelte 5 frontend) communicates with a headless Python backend (sidecar) via REST API and WebSocket.

Key Features:

  • Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
  • Headless Python backend with FastAPI control API
  • Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK)
  • Built-in web server for OBS browser source at http://localhost:8080
  • Optional multi-user sync via Node.js server
  • CUDA, MPS (Apple Silicon), and CPU support
  • Auto-updates, custom fonts, configurable colors

Legacy GUI: The original PySide6/Qt GUI (main.py, gui/) still works during the transition. New features should target the Tauri frontend and headless backend.

Project Structure

local-transcription/
├── src/                             # Svelte 5 frontend (Tauri UI)
│   ├── App.svelte                   # Main app shell
│   ├── app.css                      # Global dark theme styles
│   ├── main.ts                      # Svelte mount point
│   ├── lib/components/              # UI components
│   │   ├── Header.svelte            # Title bar + settings button
│   │   ├── StatusBar.svelte         # State indicator, device, user info
│   │   ├── Controls.svelte          # Start/Stop, Clear, Save buttons
│   │   ├── TranscriptionDisplay.svelte  # Scrolling transcript view
│   │   └── Settings.svelte          # Full settings modal (all sections)
│   └── lib/stores/                  # Svelte 5 reactive stores ($state/$derived)
│       ├── backend.ts               # WebSocket + REST API client
│       ├── config.ts                # App configuration fetch/update
│       └── transcriptions.ts        # Transcript data management
├── src-tauri/                       # Tauri v2 Rust shell
│   ├── src/lib.rs                   # Plugin registration (shell, dialog, process)
│   ├── src/main.rs                  # Entry point
│   ├── tauri.conf.json              # Window, bundle, plugin config
│   └── Cargo.toml                   # Rust dependencies
├── backend/                         # Headless Python backend (the sidecar)
│   ├── app_controller.py            # Core orchestration (engine, sync, config)
│   ├── api_server.py                # FastAPI REST endpoints + /ws/control
│   └── main_headless.py             # Headless entry point (prints JSON to stdout)
├── client/                          # Core transcription modules (used by backend)
│   ├── audio_capture.py             # Audio input handling
│   ├── transcription_engine_realtime.py  # RealtimeSTT / Whisper engine
│   ├── deepgram_transcription.py    # Deepgram WebSocket cloud transcription
│   ├── noise_suppression.py         # VAD and noise reduction
│   ├── device_utils.py              # CPU/GPU/MPS detection
│   ├── config.py                    # YAML config management (~/.local-transcription/)
│   ├── server_sync.py               # Multi-user server sync client
│   ├── instance_lock.py             # Single-instance PID lock
│   └── update_checker.py            # Gitea release update checker
├── gui/                             # Legacy PySide6/Qt GUI (still functional)
│   ├── main_window_qt.py            # Main window (orchestration lives here in legacy)
│   ├── settings_dialog_qt.py        # Settings dialog
│   └── transcription_display_qt.py  # Display widget
├── server/
│   ├── web_display.py               # FastAPI OBS display server (WebSocket + HTML)
│   └── nodejs/                      # Optional multi-user sync server
├── .gitea/workflows/                # CI/CD
│   ├── release.yml                  # Tauri app builds (Linux/Windows/macOS)
│   └── build-sidecar.yml            # Python sidecar builds (CUDA + CPU)
├── config/default_config.yaml       # Default settings template
├── main.py                          # Legacy PySide6 GUI entry point
├── main_cli.py                      # CLI version for testing
├── version.py                       # Version string (__version__)
├── local-transcription.spec         # PyInstaller config (legacy, includes PySide6)
├── local-transcription-headless.spec # PyInstaller config (headless sidecar, no Qt)
├── pyproject.toml                   # Python deps (uv, CUDA PyTorch index)
├── package.json                     # Node/Tauri deps
└── vite.config.ts                   # Vite build config ($lib alias)

Development Commands

Frontend (Tauri + Svelte)

# Install npm dependencies
npm install

# Run Tauri in development mode (hot-reload)
npm run tauri dev

# Build frontend only (for testing)
npx vite build

# Type-check Svelte
npx svelte-check

# Check Rust compiles
cd src-tauri && cargo check

Backend (Python)

# Install Python dependencies
uv sync

# Run the headless backend standalone (for development)
uv run python -m backend.main_headless --port 8080

# Run the legacy PySide6 GUI
uv run python main.py

# Run CLI version (headless, for testing)
uv run python main_cli.py

# List available audio devices
uv run python main_cli.py --list-devices

Building

# Build Tauri app (produces platform installer)
npm run tauri build

# Build headless Python sidecar (no PySide6)
uv run pyinstaller local-transcription-headless.spec
# Output: dist/local-transcription-backend/

# Build legacy PySide6 app
uv run pyinstaller local-transcription.spec
# Or use: ./build.sh (Linux) / build.bat (Windows)

Testing

uv run python test_components.py
uv run python check_cuda.py

Architecture Details

Communication: Tauri <-> Python Backend

The Svelte frontend connects to the Python backend via two channels:

REST API (on port 8081 by default):

  • GET /api/status — app state, device info, version
  • POST /api/start / POST /api/stop — transcription control
  • GET /api/config / PUT /api/config — read/write settings (dot-notation keys)
  • GET /api/audio-devices / GET /api/compute-devices — device enumeration
  • POST /api/reload-engine — reload with new model/device
  • GET /api/transcriptions / POST /api/clear — transcript management
  • POST /api/save-file — write text to a file path
  • GET /api/check-update / POST /api/skip-version — update management
  • POST /api/login / POST /api/register / GET /api/balance — managed mode proxy

WebSocket /ws/control:

  • Pushes real-time events: state_changed, transcription, preview, error, credits_low
  • Client sends keepalive pings

The OBS display server runs separately on port 8080 (GET / for HTML, WebSocket /ws for transcriptions).

Backend Process Lifecycle

  1. main_headless.py starts, acquires instance lock, creates AppController
  2. AppController.initialize() starts the OBS web server (port 8080) and engine init thread
  3. APIServer wraps the controller with FastAPI routes, runs on port 8081
  4. Backend prints {"event": "ready", "port": 8080} to stdout for Tauri to discover
  5. On shutdown: engine stopped, web server stopped, lock released

Headless Backend vs Legacy GUI

The AppController class (backend/app_controller.py) extracts all orchestration logic from gui/main_window_qt.py into a Qt-free class. The mapping:

Legacy (MainWindow) Headless (AppController)
_initialize_components() _initialize_engine()
_start_transcription() start_transcription()
_stop_transcription() stop_transcription()
_on_settings_saved() apply_settings()
_reload_engine() reload_engine()
_start_web_server_if_enabled() _start_web_server()
_start_server_sync() _start_server_sync()
Qt signals Callbacks (on_state_changed, on_transcription, etc.)

Threading Model (Headless)

  • Main thread: Uvicorn (FastAPI) event loop
  • Engine init thread: Downloads models, initializes VAD
  • Web server thread: Separate asyncio loop for OBS display
  • Audio capture: Runs in engine callback threads
  • All results flow through AppController callbacks -> APIServer WebSocket broadcast

Svelte Frontend

Uses Svelte 5 runes throughout ($state, $derived, $effect, $props). No Svelte 4 patterns.

Stores (src/lib/stores/):

  • backend.ts — WebSocket connection + REST helpers (apiGet, apiPost, apiPut), auto-reconnect
  • config.ts — fetches/updates config from backend API
  • transcriptions.ts — manages transcript list, listens for CustomEvents from backend store

Key patterns:

  • Backend store dispatches CustomEvents on window for cross-store communication
  • Settings component collects all changed values into a Record<string, any> with dot-notation keys, sends via PUT /api/config
  • Controls use Tauri dialog plugin for native file save, falls back to blob download

CI/CD

Two Gitea Actions workflows in .gitea/workflows/:

  • release.yml: Triggers on push to main. Auto-bumps version, builds Tauri app on Linux/Windows/macOS, uploads .deb, .rpm, .msi, .dmg to Gitea release.
  • build-sidecar.yml: Triggers on changes to client/, server/, backend/, pyproject.toml. Builds headless Python sidecar via PyInstaller. CUDA + CPU for Linux/Windows, CPU-only for macOS.

Both require a BUILD_TOKEN secret (Gitea API token with release write access).

Common Patterns

Adding a New Setting

  1. Add default to config/default_config.yaml
  2. Add UI control in src/lib/components/Settings.svelte
  3. Ensure the setting is included in the save handler's config update
  4. Apply in AppController.apply_settings() or the relevant component
  5. For legacy GUI: also update gui/settings_dialog_qt.py

Adding a New API Endpoint

  1. Add route in backend/api_server.py _setup_routes()
  2. Add supporting logic in backend/app_controller.py if needed
  3. Call from Svelte via backendStore.apiGet/apiPost/apiPut

Modifying Transcription Display

Dependencies

Frontend: Tauri v2, Svelte 5, Vite, TypeScript Backend: Python 3.9+, FastAPI, Uvicorn, RealtimeSTT, faster-whisper, PyTorch (CUDA), sounddevice Build: PyInstaller (sidecar), Tauri CLI (app), uv (Python packages) CI: Gitea Actions with platform-specific runners

Platform-Specific Notes

Linux

  • Tauri needs: libgtk-3-dev, libwebkit2gtk-4.1-dev, libappindicator3-dev, librsvg2-dev, patchelf
  • Audio: PulseAudio/ALSA via sounddevice

Windows

  • Tauri needs: WebView2 (usually pre-installed on Windows 10+)
  • Audio: WASAPI via sounddevice

macOS

  • Tauri needs: Xcode Command Line Tools
  • Audio: CoreAudio via sounddevice
  • GPU: MPS (Apple Silicon) detected by device_utils.py
  • Info.plist must include NSMicrophoneUsageDescription for mic access
  • No CUDA builds — CPU/MPS only