- README: document cloud-first quick start, shared captions workflow (create room, join via share code, share existing room), and self-hosting option - README: update default remote.mode from local to byok in config table - CLAUDE.md: reflect cloud-first default, settings gating, and shared captions features Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
284 lines
14 KiB
Markdown
284 lines
14 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
Local Transcription is a cross-platform desktop application for real-time speech-to-text transcription designed for streamers. It supports local Whisper models and cloud-based Deepgram transcription, with OBS browser source integration and optional multi-user sync.
|
|
|
|
**Architecture:** Two-process model — a Tauri v2 shell (Svelte 5 frontend) communicates with a headless Python backend (sidecar) via REST API and WebSocket.
|
|
|
|
**Key Features:**
|
|
- Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
|
|
- Headless Python backend with FastAPI control API
|
|
- Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported
|
|
- Settings UI hides local-only options (model, VAD, timing) when in cloud mode
|
|
- Start button gated on API key / login — shows guidance if not configured
|
|
- Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com)
|
|
- Built-in web server for OBS browser source at `http://localhost:8080`
|
|
- CUDA, MPS (Apple Silicon), and CPU support
|
|
- Auto-updates, custom fonts, configurable colors
|
|
|
|
> **Legacy GUI:** The original PySide6/Qt GUI (`main.py`, `gui/`) still works during the transition. New features should target the Tauri frontend and headless backend.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
local-transcription/
|
|
├── src/ # Svelte 5 frontend (Tauri UI)
|
|
│ ├── App.svelte # Main app shell
|
|
│ ├── app.css # Global dark theme styles
|
|
│ ├── main.ts # Svelte mount point
|
|
│ ├── lib/components/ # UI components
|
|
│ │ ├── Header.svelte # Title bar + settings button
|
|
│ │ ├── StatusBar.svelte # State indicator, device, user info
|
|
│ │ ├── Controls.svelte # Start/Stop, Clear, Save buttons
|
|
│ │ ├── TranscriptionDisplay.svelte # Scrolling transcript view
|
|
│ │ └── Settings.svelte # Full settings modal (all sections)
|
|
│ └── lib/stores/ # Svelte 5 reactive stores ($state/$derived)
|
|
│ ├── backend.ts # WebSocket + REST API client
|
|
│ ├── config.ts # App configuration fetch/update
|
|
│ └── transcriptions.ts # Transcript data management
|
|
├── src-tauri/ # Tauri v2 Rust shell
|
|
│ ├── src/lib.rs # Plugin registration (shell, dialog, process)
|
|
│ ├── src/main.rs # Entry point
|
|
│ ├── tauri.conf.json # Window, bundle, plugin config
|
|
│ └── Cargo.toml # Rust dependencies
|
|
├── backend/ # Headless Python backend (the sidecar)
|
|
│ ├── app_controller.py # Core orchestration (engine, sync, config)
|
|
│ ├── api_server.py # FastAPI REST endpoints + /ws/control
|
|
│ └── main_headless.py # Headless entry point (prints JSON to stdout)
|
|
├── client/ # Core transcription modules (used by backend)
|
|
│ ├── audio_capture.py # Audio input handling
|
|
│ ├── transcription_engine_realtime.py # RealtimeSTT / Whisper engine
|
|
│ ├── deepgram_transcription.py # Deepgram WebSocket cloud transcription
|
|
│ ├── noise_suppression.py # VAD and noise reduction
|
|
│ ├── device_utils.py # CPU/GPU/MPS detection
|
|
│ ├── config.py # YAML config management (~/.local-transcription/)
|
|
│ ├── server_sync.py # Multi-user server sync client
|
|
│ ├── instance_lock.py # Single-instance PID lock
|
|
│ └── update_checker.py # Gitea release update checker
|
|
├── gui/ # Legacy PySide6/Qt GUI (still functional)
|
|
│ ├── main_window_qt.py # Main window (orchestration lives here in legacy)
|
|
│ ├── settings_dialog_qt.py # Settings dialog
|
|
│ └── transcription_display_qt.py # Display widget
|
|
├── server/
|
|
│ ├── web_display.py # FastAPI OBS display server (WebSocket + HTML)
|
|
│ └── nodejs/ # Optional multi-user sync server
|
|
├── .gitea/workflows/ # CI/CD
|
|
│ ├── release.yml # Coordinator: version bump, tag, release creation
|
|
│ ├── build-app-linux.yml # Linux Tauri app build (triggered by v* tag)
|
|
│ ├── build-app-windows.yml # Windows Tauri app build (triggered by v* tag)
|
|
│ ├── build-app-macos.yml # macOS Tauri app build (triggered by v* tag)
|
|
│ ├── sidecar-release.yml # Sidecar coordinator: version bump, tag, release
|
|
│ ├── build-sidecar-linux.yml # Linux sidecar build (triggered by sidecar-v* tag)
|
|
│ ├── build-sidecar-windows.yml # Windows sidecar build (triggered by sidecar-v* tag)
|
|
│ └── build-sidecar-macos.yml # macOS sidecar build (triggered by sidecar-v* tag)
|
|
├── config/default_config.yaml # Default settings template
|
|
├── main.py # Legacy PySide6 GUI entry point
|
|
├── main_cli.py # CLI version for testing
|
|
├── version.py # Version string (__version__)
|
|
├── local-transcription.spec # PyInstaller config (legacy, includes PySide6)
|
|
├── local-transcription-headless.spec # PyInstaller config (headless sidecar, no Qt)
|
|
├── pyproject.toml # Python deps (uv, CUDA PyTorch index)
|
|
├── package.json # Node/Tauri deps
|
|
└── vite.config.ts # Vite build config ($lib alias)
|
|
```
|
|
|
|
## Development Commands
|
|
|
|
### Frontend (Tauri + Svelte)
|
|
```bash
|
|
# Install npm dependencies
|
|
npm install
|
|
|
|
# Run Tauri in development mode (hot-reload)
|
|
npm run tauri dev
|
|
|
|
# Build frontend only (for testing)
|
|
npx vite build
|
|
|
|
# Type-check Svelte
|
|
npx svelte-check
|
|
|
|
# Check Rust compiles
|
|
cd src-tauri && cargo check
|
|
```
|
|
|
|
### Backend (Python)
|
|
```bash
|
|
# Install Python dependencies
|
|
uv sync
|
|
|
|
# Run the headless backend standalone (for development)
|
|
uv run python -m backend.main_headless --port 8080
|
|
|
|
# Run the legacy PySide6 GUI
|
|
uv run python main.py
|
|
|
|
# Run CLI version (headless, for testing)
|
|
uv run python main_cli.py
|
|
|
|
# List available audio devices
|
|
uv run python main_cli.py --list-devices
|
|
```
|
|
|
|
### Building
|
|
```bash
|
|
# Build Tauri app (produces platform installer)
|
|
npm run tauri build
|
|
|
|
# Build headless Python sidecar (no PySide6)
|
|
uv run pyinstaller local-transcription-headless.spec
|
|
# Output: dist/local-transcription-backend/
|
|
|
|
# Build legacy PySide6 app
|
|
uv run pyinstaller local-transcription.spec
|
|
# Or use: ./build.sh (Linux) / build.bat (Windows)
|
|
```
|
|
|
|
### Testing
|
|
```bash
|
|
uv run python test_components.py
|
|
uv run python check_cuda.py
|
|
```
|
|
|
|
## Architecture Details
|
|
|
|
### Communication: Tauri <-> Python Backend
|
|
|
|
The Svelte frontend connects to the Python backend via two channels:
|
|
|
|
**REST API** (on port 8081 by default):
|
|
- `GET /api/status` — app state, device info, version
|
|
- `POST /api/start` / `POST /api/stop` — transcription control
|
|
- `GET /api/config` / `PUT /api/config` — read/write settings (dot-notation keys)
|
|
- `GET /api/audio-devices` / `GET /api/compute-devices` — device enumeration
|
|
- `POST /api/reload-engine` — reload with new model/device
|
|
- `GET /api/transcriptions` / `POST /api/clear` — transcript management
|
|
- `POST /api/save-file` — write text to a file path
|
|
- `GET /api/check-update` / `POST /api/skip-version` — update management
|
|
- `POST /api/login` / `POST /api/register` / `GET /api/balance` — managed mode proxy
|
|
|
|
**WebSocket** `/ws/control`:
|
|
- Pushes real-time events: `state_changed`, `transcription`, `preview`, `error`, `credits_low`
|
|
- Client sends keepalive pings
|
|
|
|
The OBS display server runs separately on port 8080 (`GET /` for HTML, `WebSocket /ws` for transcriptions).
|
|
|
|
### Backend Process Lifecycle
|
|
|
|
1. `main_headless.py` starts, acquires instance lock, creates `AppController`
|
|
2. `AppController.initialize()` starts the OBS web server (port 8080) and engine init thread
|
|
3. `APIServer` wraps the controller with FastAPI routes, runs on port 8081
|
|
4. Backend prints `{"event": "ready", "port": 8080}` to stdout for Tauri to discover
|
|
5. On shutdown: engine stopped, web server stopped, lock released
|
|
|
|
### Headless Backend vs Legacy GUI
|
|
|
|
The `AppController` class (`backend/app_controller.py`) extracts all orchestration logic from `gui/main_window_qt.py` into a Qt-free class. The mapping:
|
|
|
|
| Legacy (MainWindow) | Headless (AppController) |
|
|
|---------------------|--------------------------|
|
|
| `_initialize_components()` | `_initialize_engine()` |
|
|
| `_start_transcription()` | `start_transcription()` |
|
|
| `_stop_transcription()` | `stop_transcription()` |
|
|
| `_on_settings_saved()` | `apply_settings()` |
|
|
| `_reload_engine()` | `reload_engine()` |
|
|
| `_start_web_server_if_enabled()` | `_start_web_server()` |
|
|
| `_start_server_sync()` | `_start_server_sync()` |
|
|
| Qt signals | Callbacks (`on_state_changed`, `on_transcription`, etc.) |
|
|
|
|
### Threading Model (Headless)
|
|
|
|
- Main thread: Uvicorn (FastAPI) event loop
|
|
- Engine init thread: Downloads models, initializes VAD
|
|
- Web server thread: Separate asyncio loop for OBS display
|
|
- Audio capture: Runs in engine callback threads
|
|
- All results flow through `AppController` callbacks -> `APIServer` WebSocket broadcast
|
|
|
|
### Svelte Frontend
|
|
|
|
Uses Svelte 5 runes throughout (`$state`, `$derived`, `$effect`, `$props`). No Svelte 4 patterns.
|
|
|
|
**Stores** (`src/lib/stores/`):
|
|
- `backend.ts` — WebSocket connection + REST helpers (`apiGet`, `apiPost`, `apiPut`), auto-reconnect
|
|
- `config.ts` — fetches/updates config from backend API
|
|
- `transcriptions.ts` — manages transcript list, listens for `CustomEvent`s from backend store
|
|
|
|
**Key patterns:**
|
|
- Backend store dispatches `CustomEvent`s on `window` for cross-store communication
|
|
- Settings component collects all changed values into a `Record<string, any>` with dot-notation keys, sends via `PUT /api/config`
|
|
- Controls use Tauri dialog plugin for native file save, falls back to blob download
|
|
|
|
## CI/CD
|
|
|
|
Eight Gitea Actions workflows in `.gitea/workflows/`, split into coordinators and per-OS builders:
|
|
|
|
**App release (Tauri):**
|
|
- **`release.yml`**: Coordinator. Triggers on push to `main`. Auto-bumps version in package.json/tauri.conf.json/Cargo.toml/version.py, commits, tags `v{VERSION}`, creates Gitea release.
|
|
- **`build-app-linux.yml`**: Triggers on `v*` tag push or `workflow_dispatch`. Builds Tauri app, uploads `.deb`/`.rpm`/`.AppImage`.
|
|
- **`build-app-windows.yml`**: Triggers on `v*` tag push or `workflow_dispatch`. Builds Tauri app, uploads `.msi`/`*-setup.exe`.
|
|
- **`build-app-macos.yml`**: Triggers on `v*` tag push or `workflow_dispatch`. Builds Tauri app, uploads `.dmg`.
|
|
|
|
**Sidecar release (Python backend):**
|
|
- **`sidecar-release.yml`**: Coordinator. Triggers on push to `main` with changes in `client/`, `server/`, `backend/`, `pyproject.toml`, or `local-transcription-headless.spec`. Bumps version in pyproject.toml/version.py, tags `sidecar-v{VERSION}`, creates Gitea release.
|
|
- **`build-sidecar-linux.yml`**: Triggers on `sidecar-v*` tag push or `workflow_dispatch`. Builds CUDA + CPU sidecars via PyInstaller.
|
|
- **`build-sidecar-windows.yml`**: Triggers on `sidecar-v*` tag push or `workflow_dispatch`. Builds CUDA + CPU sidecars via PyInstaller.
|
|
- **`build-sidecar-macos.yml`**: Triggers on `sidecar-v*` tag push or `workflow_dispatch`. Builds CPU-only sidecar via PyInstaller.
|
|
|
|
All per-OS build workflows can be re-run independently via `workflow_dispatch` with an optional `tag` input. All require a `BUILD_TOKEN` secret (Gitea API token with release write access).
|
|
|
|
## Common Patterns
|
|
|
|
### Adding a New Setting
|
|
|
|
1. Add default to [config/default_config.yaml](config/default_config.yaml)
|
|
2. Add UI control in [src/lib/components/Settings.svelte](src/lib/components/Settings.svelte)
|
|
3. Ensure the setting is included in the save handler's config update
|
|
4. Apply in `AppController.apply_settings()` or the relevant component
|
|
5. For legacy GUI: also update [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py)
|
|
|
|
### Adding a New API Endpoint
|
|
|
|
1. Add route in [backend/api_server.py](backend/api_server.py) `_setup_routes()`
|
|
2. Add supporting logic in [backend/app_controller.py](backend/app_controller.py) if needed
|
|
3. Call from Svelte via `backendStore.apiGet/apiPost/apiPut`
|
|
|
|
### Modifying Transcription Display
|
|
|
|
- Tauri UI: [src/lib/components/TranscriptionDisplay.svelte](src/lib/components/TranscriptionDisplay.svelte)
|
|
- OBS display: [server/web_display.py](server/web_display.py) (HTML in `_get_html()`)
|
|
- Multi-user display: [server/nodejs/server.js](server/nodejs/server.js) (display page in `/display` route)
|
|
|
|
## Dependencies
|
|
|
|
**Frontend:** Tauri v2, Svelte 5, Vite, TypeScript
|
|
**Backend:** Python 3.9+, FastAPI, Uvicorn, RealtimeSTT, faster-whisper, PyTorch (CUDA), sounddevice
|
|
**Build:** PyInstaller (sidecar), Tauri CLI (app), uv (Python packages)
|
|
**CI:** Gitea Actions with platform-specific runners
|
|
|
|
## Platform-Specific Notes
|
|
|
|
### Linux
|
|
- Tauri needs: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`, `patchelf`
|
|
- Audio: PulseAudio/ALSA via sounddevice
|
|
|
|
### Windows
|
|
- Tauri needs: WebView2 (usually pre-installed on Windows 10+)
|
|
- Audio: WASAPI via sounddevice
|
|
|
|
### macOS
|
|
- Tauri needs: Xcode Command Line Tools
|
|
- Audio: CoreAudio via sounddevice
|
|
- GPU: MPS (Apple Silicon) detected by `device_utils.py`
|
|
- `Info.plist` must include `NSMicrophoneUsageDescription` for mic access
|
|
- No CUDA builds — CPU/MPS only
|
|
|
|
## Related Documentation
|
|
|
|
- [README.md](README.md) — User-facing documentation
|
|
- [BUILD.md](BUILD.md) — Detailed build instructions
|
|
- [INSTALL.md](INSTALL.md) — Installation guide
|
|
- [server/nodejs/README.md](server/nodejs/README.md) — Node.js server setup
|