- README: document cloud-first quick start, shared captions workflow (create room, join via share code, share existing room), and self-hosting option - README: update default remote.mode from local to byok in config table - CLAUDE.md: reflect cloud-first default, settings gating, and shared captions features Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
14 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Local Transcription is a cross-platform desktop application for real-time speech-to-text transcription designed for streamers. It supports local Whisper models and cloud-based Deepgram transcription, with OBS browser source integration and optional multi-user sync.
Architecture: Two-process model — a Tauri v2 shell (Svelte 5 frontend) communicates with a headless Python backend (sidecar) via REST API and WebSocket.
Key Features:
- Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
- Headless Python backend with FastAPI control API
- Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported
- Settings UI hides local-only options (model, VAD, timing) when in cloud mode
- Start button gated on API key / login — shows guidance if not configured
- Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com)
- Built-in web server for OBS browser source at
http://localhost:8080 - CUDA, MPS (Apple Silicon), and CPU support
- Auto-updates, custom fonts, configurable colors
Legacy GUI: The original PySide6/Qt GUI (
main.py,gui/) still works during the transition. New features should target the Tauri frontend and headless backend.
Project Structure
local-transcription/
├── src/ # Svelte 5 frontend (Tauri UI)
│ ├── App.svelte # Main app shell
│ ├── app.css # Global dark theme styles
│ ├── main.ts # Svelte mount point
│ ├── lib/components/ # UI components
│ │ ├── Header.svelte # Title bar + settings button
│ │ ├── StatusBar.svelte # State indicator, device, user info
│ │ ├── Controls.svelte # Start/Stop, Clear, Save buttons
│ │ ├── TranscriptionDisplay.svelte # Scrolling transcript view
│ │ └── Settings.svelte # Full settings modal (all sections)
│ └── lib/stores/ # Svelte 5 reactive stores ($state/$derived)
│ ├── backend.ts # WebSocket + REST API client
│ ├── config.ts # App configuration fetch/update
│ └── transcriptions.ts # Transcript data management
├── src-tauri/ # Tauri v2 Rust shell
│ ├── src/lib.rs # Plugin registration (shell, dialog, process)
│ ├── src/main.rs # Entry point
│ ├── tauri.conf.json # Window, bundle, plugin config
│ └── Cargo.toml # Rust dependencies
├── backend/ # Headless Python backend (the sidecar)
│ ├── app_controller.py # Core orchestration (engine, sync, config)
│ ├── api_server.py # FastAPI REST endpoints + /ws/control
│ └── main_headless.py # Headless entry point (prints JSON to stdout)
├── client/ # Core transcription modules (used by backend)
│ ├── audio_capture.py # Audio input handling
│ ├── transcription_engine_realtime.py # RealtimeSTT / Whisper engine
│ ├── deepgram_transcription.py # Deepgram WebSocket cloud transcription
│ ├── noise_suppression.py # VAD and noise reduction
│ ├── device_utils.py # CPU/GPU/MPS detection
│ ├── config.py # YAML config management (~/.local-transcription/)
│ ├── server_sync.py # Multi-user server sync client
│ ├── instance_lock.py # Single-instance PID lock
│ └── update_checker.py # Gitea release update checker
├── gui/ # Legacy PySide6/Qt GUI (still functional)
│ ├── main_window_qt.py # Main window (orchestration lives here in legacy)
│ ├── settings_dialog_qt.py # Settings dialog
│ └── transcription_display_qt.py # Display widget
├── server/
│ ├── web_display.py # FastAPI OBS display server (WebSocket + HTML)
│ └── nodejs/ # Optional multi-user sync server
├── .gitea/workflows/ # CI/CD
│ ├── release.yml # Coordinator: version bump, tag, release creation
│ ├── build-app-linux.yml # Linux Tauri app build (triggered by v* tag)
│ ├── build-app-windows.yml # Windows Tauri app build (triggered by v* tag)
│ ├── build-app-macos.yml # macOS Tauri app build (triggered by v* tag)
│ ├── sidecar-release.yml # Sidecar coordinator: version bump, tag, release
│ ├── build-sidecar-linux.yml # Linux sidecar build (triggered by sidecar-v* tag)
│ ├── build-sidecar-windows.yml # Windows sidecar build (triggered by sidecar-v* tag)
│ └── build-sidecar-macos.yml # macOS sidecar build (triggered by sidecar-v* tag)
├── config/default_config.yaml # Default settings template
├── main.py # Legacy PySide6 GUI entry point
├── main_cli.py # CLI version for testing
├── version.py # Version string (__version__)
├── local-transcription.spec # PyInstaller config (legacy, includes PySide6)
├── local-transcription-headless.spec # PyInstaller config (headless sidecar, no Qt)
├── pyproject.toml # Python deps (uv, CUDA PyTorch index)
├── package.json # Node/Tauri deps
└── vite.config.ts # Vite build config ($lib alias)
Development Commands
Frontend (Tauri + Svelte)
# Install npm dependencies
npm install
# Run Tauri in development mode (hot-reload)
npm run tauri dev
# Build frontend only (for testing)
npx vite build
# Type-check Svelte
npx svelte-check
# Check Rust compiles
cd src-tauri && cargo check
Backend (Python)
# Install Python dependencies
uv sync
# Run the headless backend standalone (for development)
uv run python -m backend.main_headless --port 8080
# Run the legacy PySide6 GUI
uv run python main.py
# Run CLI version (headless, for testing)
uv run python main_cli.py
# List available audio devices
uv run python main_cli.py --list-devices
Building
# Build Tauri app (produces platform installer)
npm run tauri build
# Build headless Python sidecar (no PySide6)
uv run pyinstaller local-transcription-headless.spec
# Output: dist/local-transcription-backend/
# Build legacy PySide6 app
uv run pyinstaller local-transcription.spec
# Or use: ./build.sh (Linux) / build.bat (Windows)
Testing
uv run python test_components.py
uv run python check_cuda.py
Architecture Details
Communication: Tauri <-> Python Backend
The Svelte frontend connects to the Python backend via two channels:
REST API (on port 8081 by default):
GET /api/status— app state, device info, versionPOST /api/start/POST /api/stop— transcription controlGET /api/config/PUT /api/config— read/write settings (dot-notation keys)GET /api/audio-devices/GET /api/compute-devices— device enumerationPOST /api/reload-engine— reload with new model/deviceGET /api/transcriptions/POST /api/clear— transcript managementPOST /api/save-file— write text to a file pathGET /api/check-update/POST /api/skip-version— update managementPOST /api/login/POST /api/register/GET /api/balance— managed mode proxy
WebSocket /ws/control:
- Pushes real-time events:
state_changed,transcription,preview,error,credits_low - Client sends keepalive pings
The OBS display server runs separately on port 8080 (GET / for HTML, WebSocket /ws for transcriptions).
Backend Process Lifecycle
main_headless.pystarts, acquires instance lock, createsAppControllerAppController.initialize()starts the OBS web server (port 8080) and engine init threadAPIServerwraps the controller with FastAPI routes, runs on port 8081- Backend prints
{"event": "ready", "port": 8080}to stdout for Tauri to discover - On shutdown: engine stopped, web server stopped, lock released
Headless Backend vs Legacy GUI
The AppController class (backend/app_controller.py) extracts all orchestration logic from gui/main_window_qt.py into a Qt-free class. The mapping:
| Legacy (MainWindow) | Headless (AppController) |
|---|---|
_initialize_components() |
_initialize_engine() |
_start_transcription() |
start_transcription() |
_stop_transcription() |
stop_transcription() |
_on_settings_saved() |
apply_settings() |
_reload_engine() |
reload_engine() |
_start_web_server_if_enabled() |
_start_web_server() |
_start_server_sync() |
_start_server_sync() |
| Qt signals | Callbacks (on_state_changed, on_transcription, etc.) |
Threading Model (Headless)
- Main thread: Uvicorn (FastAPI) event loop
- Engine init thread: Downloads models, initializes VAD
- Web server thread: Separate asyncio loop for OBS display
- Audio capture: Runs in engine callback threads
- All results flow through
AppControllercallbacks ->APIServerWebSocket broadcast
Svelte Frontend
Uses Svelte 5 runes throughout ($state, $derived, $effect, $props). No Svelte 4 patterns.
Stores (src/lib/stores/):
backend.ts— WebSocket connection + REST helpers (apiGet,apiPost,apiPut), auto-reconnectconfig.ts— fetches/updates config from backend APItranscriptions.ts— manages transcript list, listens forCustomEvents from backend store
Key patterns:
- Backend store dispatches
CustomEvents onwindowfor cross-store communication - Settings component collects all changed values into a
Record<string, any>with dot-notation keys, sends viaPUT /api/config - Controls use Tauri dialog plugin for native file save, falls back to blob download
CI/CD
Eight Gitea Actions workflows in .gitea/workflows/, split into coordinators and per-OS builders:
App release (Tauri):
release.yml: Coordinator. Triggers on push tomain. Auto-bumps version in package.json/tauri.conf.json/Cargo.toml/version.py, commits, tagsv{VERSION}, creates Gitea release.build-app-linux.yml: Triggers onv*tag push orworkflow_dispatch. Builds Tauri app, uploads.deb/.rpm/.AppImage.build-app-windows.yml: Triggers onv*tag push orworkflow_dispatch. Builds Tauri app, uploads.msi/*-setup.exe.build-app-macos.yml: Triggers onv*tag push orworkflow_dispatch. Builds Tauri app, uploads.dmg.
Sidecar release (Python backend):
sidecar-release.yml: Coordinator. Triggers on push tomainwith changes inclient/,server/,backend/,pyproject.toml, orlocal-transcription-headless.spec. Bumps version in pyproject.toml/version.py, tagssidecar-v{VERSION}, creates Gitea release.build-sidecar-linux.yml: Triggers onsidecar-v*tag push orworkflow_dispatch. Builds CUDA + CPU sidecars via PyInstaller.build-sidecar-windows.yml: Triggers onsidecar-v*tag push orworkflow_dispatch. Builds CUDA + CPU sidecars via PyInstaller.build-sidecar-macos.yml: Triggers onsidecar-v*tag push orworkflow_dispatch. Builds CPU-only sidecar via PyInstaller.
All per-OS build workflows can be re-run independently via workflow_dispatch with an optional tag input. All require a BUILD_TOKEN secret (Gitea API token with release write access).
Common Patterns
Adding a New Setting
- Add default to config/default_config.yaml
- Add UI control in src/lib/components/Settings.svelte
- Ensure the setting is included in the save handler's config update
- Apply in
AppController.apply_settings()or the relevant component - For legacy GUI: also update gui/settings_dialog_qt.py
Adding a New API Endpoint
- Add route in backend/api_server.py
_setup_routes() - Add supporting logic in backend/app_controller.py if needed
- Call from Svelte via
backendStore.apiGet/apiPost/apiPut
Modifying Transcription Display
- Tauri UI: src/lib/components/TranscriptionDisplay.svelte
- OBS display: server/web_display.py (HTML in
_get_html()) - Multi-user display: server/nodejs/server.js (display page in
/displayroute)
Dependencies
Frontend: Tauri v2, Svelte 5, Vite, TypeScript Backend: Python 3.9+, FastAPI, Uvicorn, RealtimeSTT, faster-whisper, PyTorch (CUDA), sounddevice Build: PyInstaller (sidecar), Tauri CLI (app), uv (Python packages) CI: Gitea Actions with platform-specific runners
Platform-Specific Notes
Linux
- Tauri needs:
libgtk-3-dev,libwebkit2gtk-4.1-dev,libappindicator3-dev,librsvg2-dev,patchelf - Audio: PulseAudio/ALSA via sounddevice
Windows
- Tauri needs: WebView2 (usually pre-installed on Windows 10+)
- Audio: WASAPI via sounddevice
macOS
- Tauri needs: Xcode Command Line Tools
- Audio: CoreAudio via sounddevice
- GPU: MPS (Apple Silicon) detected by
device_utils.py Info.plistmust includeNSMicrophoneUsageDescriptionfor mic access- No CUDA builds — CPU/MPS only
Related Documentation
- README.md — User-facing documentation
- BUILD.md — Detailed build instructions
- INSTALL.md — Installation guide
- server/nodejs/README.md — Node.js server setup