streamer-tools/local-transcription

Fork 0

Files

Developer cd325102e2

Tests / Python Backend Tests (push) Successful in 5s

Details

Tests / Frontend Tests (push) Successful in 7s

Details

Tests / Rust Sidecar Tests (push) Successful in 2m13s

Details

Update docs for cloud-first UX and shared captions

- README: document cloud-first quick start, shared captions workflow
  (create room, join via share code, share existing room), and
  self-hosting option
- README: update default remote.mode from local to byok in config table
- CLAUDE.md: reflect cloud-first default, settings gating, and shared
  captions features

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-10 16:10:46 -07:00

14 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Local Transcription is a cross-platform desktop application for real-time speech-to-text transcription designed for streamers. It supports local Whisper models and cloud-based Deepgram transcription, with OBS browser source integration and optional multi-user sync.

Architecture: Two-process model — a Tauri v2 shell (Svelte 5 frontend) communicates with a headless Python backend (sidecar) via REST API and WebSocket.

Key Features:

Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
Headless Python backend with FastAPI control API
Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported
Settings UI hides local-only options (model, VAD, timing) when in cloud mode
Start button gated on API key / login — shows guidance if not configured
Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com)
Built-in web server for OBS browser source at http://localhost:8080
CUDA, MPS (Apple Silicon), and CPU support
Auto-updates, custom fonts, configurable colors

Legacy GUI: The original PySide6/Qt GUI (main.py, gui/) still works during the transition. New features should target the Tauri frontend and headless backend.

Project Structure

local-transcription/
├── src/                             # Svelte 5 frontend (Tauri UI)
│   ├── App.svelte                   # Main app shell
│   ├── app.css                      # Global dark theme styles
│   ├── main.ts                      # Svelte mount point
│   ├── lib/components/              # UI components
│   │   ├── Header.svelte            # Title bar + settings button
│   │   ├── StatusBar.svelte         # State indicator, device, user info
│   │   ├── Controls.svelte          # Start/Stop, Clear, Save buttons
│   │   ├── TranscriptionDisplay.svelte  # Scrolling transcript view
│   │   └── Settings.svelte          # Full settings modal (all sections)
│   └── lib/stores/                  # Svelte 5 reactive stores ($state/$derived)
│       ├── backend.ts               # WebSocket + REST API client
│       ├── config.ts                # App configuration fetch/update
│       └── transcriptions.ts        # Transcript data management
├── src-tauri/                       # Tauri v2 Rust shell
│   ├── src/lib.rs                   # Plugin registration (shell, dialog, process)
│   ├── src/main.rs                  # Entry point
│   ├── tauri.conf.json              # Window, bundle, plugin config
│   └── Cargo.toml                   # Rust dependencies
├── backend/                         # Headless Python backend (the sidecar)
│   ├── app_controller.py            # Core orchestration (engine, sync, config)
│   ├── api_server.py                # FastAPI REST endpoints + /ws/control
│   └── main_headless.py             # Headless entry point (prints JSON to stdout)
├── client/                          # Core transcription modules (used by backend)
│   ├── audio_capture.py             # Audio input handling
│   ├── transcription_engine_realtime.py  # RealtimeSTT / Whisper engine
│   ├── deepgram_transcription.py    # Deepgram WebSocket cloud transcription
│   ├── noise_suppression.py         # VAD and noise reduction
│   ├── device_utils.py              # CPU/GPU/MPS detection
│   ├── config.py                    # YAML config management (~/.local-transcription/)
│   ├── server_sync.py               # Multi-user server sync client
│   ├── instance_lock.py             # Single-instance PID lock
│   └── update_checker.py            # Gitea release update checker
├── gui/                             # Legacy PySide6/Qt GUI (still functional)
│   ├── main_window_qt.py            # Main window (orchestration lives here in legacy)
│   ├── settings_dialog_qt.py        # Settings dialog
│   └── transcription_display_qt.py  # Display widget
├── server/
│   ├── web_display.py               # FastAPI OBS display server (WebSocket + HTML)
│   └── nodejs/                      # Optional multi-user sync server
├── .gitea/workflows/                # CI/CD
│   ├── release.yml                  # Coordinator: version bump, tag, release creation
│   ├── build-app-linux.yml          # Linux Tauri app build (triggered by v* tag)
│   ├── build-app-windows.yml        # Windows Tauri app build (triggered by v* tag)
│   ├── build-app-macos.yml          # macOS Tauri app build (triggered by v* tag)
│   ├── sidecar-release.yml          # Sidecar coordinator: version bump, tag, release
│   ├── build-sidecar-linux.yml      # Linux sidecar build (triggered by sidecar-v* tag)
│   ├── build-sidecar-windows.yml    # Windows sidecar build (triggered by sidecar-v* tag)
│   └── build-sidecar-macos.yml      # macOS sidecar build (triggered by sidecar-v* tag)
├── config/default_config.yaml       # Default settings template
├── main.py                          # Legacy PySide6 GUI entry point
├── main_cli.py                      # CLI version for testing
├── version.py                       # Version string (__version__)
├── local-transcription.spec         # PyInstaller config (legacy, includes PySide6)
├── local-transcription-headless.spec # PyInstaller config (headless sidecar, no Qt)
├── pyproject.toml                   # Python deps (uv, CUDA PyTorch index)
├── package.json                     # Node/Tauri deps
└── vite.config.ts                   # Vite build config ($lib alias)

Development Commands

Frontend (Tauri + Svelte)

# Install npm dependencies
npm install

# Run Tauri in development mode (hot-reload)
npm run tauri dev

# Build frontend only (for testing)
npx vite build

# Type-check Svelte
npx svelte-check

# Check Rust compiles
cd src-tauri && cargo check

Backend (Python)

# Install Python dependencies
uv sync

# Run the headless backend standalone (for development)
uv run python -m backend.main_headless --port 8080

# Run the legacy PySide6 GUI
uv run python main.py

# Run CLI version (headless, for testing)
uv run python main_cli.py

# List available audio devices
uv run python main_cli.py --list-devices

Building

# Build Tauri app (produces platform installer)
npm run tauri build

# Build headless Python sidecar (no PySide6)
uv run pyinstaller local-transcription-headless.spec
# Output: dist/local-transcription-backend/

# Build legacy PySide6 app
uv run pyinstaller local-transcription.spec
# Or use: ./build.sh (Linux) / build.bat (Windows)

Testing

uv run python test_components.py
uv run python check_cuda.py

Architecture Details

Communication: Tauri <-> Python Backend

The Svelte frontend connects to the Python backend via two channels:

REST API (on port 8081 by default):

GET /api/status — app state, device info, version
POST /api/start / POST /api/stop — transcription control
GET /api/config / PUT /api/config — read/write settings (dot-notation keys)
GET /api/audio-devices / GET /api/compute-devices — device enumeration
POST /api/reload-engine — reload with new model/device
GET /api/transcriptions / POST /api/clear — transcript management
POST /api/save-file — write text to a file path
GET /api/check-update / POST /api/skip-version — update management
POST /api/login / POST /api/register / GET /api/balance — managed mode proxy

WebSocket /ws/control:

Pushes real-time events: state_changed, transcription, preview, error, credits_low
Client sends keepalive pings

The OBS display server runs separately on port 8080 (GET / for HTML, WebSocket /ws for transcriptions).

Backend Process Lifecycle

main_headless.py starts, acquires instance lock, creates AppController
AppController.initialize() starts the OBS web server (port 8080) and engine init thread
APIServer wraps the controller with FastAPI routes, runs on port 8081
Backend prints {"event": "ready", "port": 8080} to stdout for Tauri to discover
On shutdown: engine stopped, web server stopped, lock released

Headless Backend vs Legacy GUI

The AppController class (backend/app_controller.py) extracts all orchestration logic from gui/main_window_qt.py into a Qt-free class. The mapping:

Legacy (MainWindow)	Headless (AppController)
`_initialize_components()`	`_initialize_engine()`
`_start_transcription()`	`start_transcription()`
`_stop_transcription()`	`stop_transcription()`
`_on_settings_saved()`	`apply_settings()`
`_reload_engine()`	`reload_engine()`
`_start_web_server_if_enabled()`	`_start_web_server()`
`_start_server_sync()`	`_start_server_sync()`
Qt signals	Callbacks (`on_state_changed`, `on_transcription`, etc.)

Threading Model (Headless)

Main thread: Uvicorn (FastAPI) event loop
Engine init thread: Downloads models, initializes VAD
Web server thread: Separate asyncio loop for OBS display
Audio capture: Runs in engine callback threads
All results flow through AppController callbacks -> APIServer WebSocket broadcast

Svelte Frontend

Uses Svelte 5 runes throughout ($state, $derived, $effect, $props). No Svelte 4 patterns.

Stores (src/lib/stores/):

backend.ts — WebSocket connection + REST helpers (apiGet, apiPost, apiPut), auto-reconnect
config.ts — fetches/updates config from backend API
transcriptions.ts — manages transcript list, listens for CustomEvents from backend store

Key patterns:

Backend store dispatches CustomEvents on window for cross-store communication
Settings component collects all changed values into a Record<string, any> with dot-notation keys, sends via PUT /api/config
Controls use Tauri dialog plugin for native file save, falls back to blob download

CI/CD

Eight Gitea Actions workflows in .gitea/workflows/, split into coordinators and per-OS builders:

App release (Tauri):

release.yml: Coordinator. Triggers on push to main. Auto-bumps version in package.json/tauri.conf.json/Cargo.toml/version.py, commits, tags v{VERSION}, creates Gitea release.
build-app-linux.yml: Triggers on v* tag push or workflow_dispatch. Builds Tauri app, uploads .deb/.rpm/.AppImage.
build-app-windows.yml: Triggers on v* tag push or workflow_dispatch. Builds Tauri app, uploads .msi/*-setup.exe.
build-app-macos.yml: Triggers on v* tag push or workflow_dispatch. Builds Tauri app, uploads .dmg.

Sidecar release (Python backend):

sidecar-release.yml: Coordinator. Triggers on push to main with changes in client/, server/, backend/, pyproject.toml, or local-transcription-headless.spec. Bumps version in pyproject.toml/version.py, tags sidecar-v{VERSION}, creates Gitea release.
build-sidecar-linux.yml: Triggers on sidecar-v* tag push or workflow_dispatch. Builds CUDA + CPU sidecars via PyInstaller.
build-sidecar-windows.yml: Triggers on sidecar-v* tag push or workflow_dispatch. Builds CUDA + CPU sidecars via PyInstaller.
build-sidecar-macos.yml: Triggers on sidecar-v* tag push or workflow_dispatch. Builds CPU-only sidecar via PyInstaller.

All per-OS build workflows can be re-run independently via workflow_dispatch with an optional tag input. All require a BUILD_TOKEN secret (Gitea API token with release write access).

Common Patterns

Adding a New Setting

Add default to config/default_config.yaml
Add UI control in src/lib/components/Settings.svelte
Ensure the setting is included in the save handler's config update
Apply in AppController.apply_settings() or the relevant component
For legacy GUI: also update gui/settings_dialog_qt.py

Adding a New API Endpoint

Add route in backend/api_server.py _setup_routes()
Add supporting logic in backend/app_controller.py if needed
Call from Svelte via backendStore.apiGet/apiPost/apiPut

Modifying Transcription Display

Tauri UI: src/lib/components/TranscriptionDisplay.svelte
OBS display: server/web_display.py (HTML in _get_html())
Multi-user display: server/nodejs/server.js (display page in /display route)

Dependencies

Frontend: Tauri v2, Svelte 5, Vite, TypeScript Backend: Python 3.9+, FastAPI, Uvicorn, RealtimeSTT, faster-whisper, PyTorch (CUDA), sounddevice Build: PyInstaller (sidecar), Tauri CLI (app), uv (Python packages) CI: Gitea Actions with platform-specific runners

Platform-Specific Notes

Linux

Tauri needs: libgtk-3-dev, libwebkit2gtk-4.1-dev, libappindicator3-dev, librsvg2-dev, patchelf
Audio: PulseAudio/ALSA via sounddevice

Windows

Tauri needs: WebView2 (usually pre-installed on Windows 10+)
Audio: WASAPI via sounddevice

macOS

Tauri needs: Xcode Command Line Tools
Audio: CoreAudio via sounddevice
GPU: MPS (Apple Silicon) detected by device_utils.py
Info.plist must include NSMicrophoneUsageDescription for mic access
No CUDA builds — CPU/MPS only

README.md — User-facing documentation
BUILD.md — Detailed build instructions
INSTALL.md — Installation guide
server/nodejs/README.md — Node.js server setup

14 KiB Raw Blame History

CLAUDE.md

Project Overview

Project Structure

Development Commands

Frontend (Tauri + Svelte)

Backend (Python)

Building

Testing

Architecture Details

Communication: Tauri <-> Python Backend

Backend Process Lifecycle

Headless Backend vs Legacy GUI

Threading Model (Headless)

Svelte Frontend

CI/CD

Common Patterns

Adding a New Setting

Adding a New API Endpoint

Modifying Transcription Display

Dependencies

Platform-Specific Notes

Linux

Windows

macOS

Related Documentation

14 KiB

Raw Blame History