From cd325102e29cc3ae8d4a21302c52ae839acf667f Mon Sep 17 00:00:00 2001 From: Developer Date: Fri, 10 Apr 2026 16:10:38 -0700 Subject: [PATCH] Update docs for cloud-first UX and shared captions - README: document cloud-first quick start, shared captions workflow (create room, join via share code, share existing room), and self-hosting option - README: update default remote.mode from local to byok in config table - CLAUDE.md: reflect cloud-first default, settings gating, and shared captions features Co-Authored-By: Claude Opus 4.6 (1M context) --- CLAUDE.md | 6 ++-- README.md | 85 ++++++++++++++++++++++++++++++++++++------------------- 2 files changed, 60 insertions(+), 31 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index b4b7a59..8c6ba6d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -11,9 +11,11 @@ Local Transcription is a cross-platform desktop application for real-time speech **Key Features:** - Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5 - Headless Python backend with FastAPI control API -- Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK) +- Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported +- Settings UI hides local-only options (model, VAD, timing) when in cloud mode +- Start button gated on API key / login — shows guidance if not configured +- Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com) - Built-in web server for OBS browser source at `http://localhost:8080` -- Optional multi-user sync via Node.js server - CUDA, MPS (Apple Silicon), and CPU support - Auto-updates, custom fonts, configurable colors diff --git a/README.md b/README.md index 987d3cf..005f9bb 100644 --- a/README.md +++ b/README.md @@ -7,14 +7,14 @@ A real-time speech-to-text desktop application for streamers. Runs locally on yo ## Features - **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency +- **Cloud-First**: Defaults to Deepgram cloud transcription — get started with just an API key - **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/) -- **Dual Transcription Modes**: Local (Whisper) or cloud (Deepgram) with managed billing or BYOK -- **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback -- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection +- **Dual Transcription Modes**: Cloud (Deepgram) or local (Whisper) with automatic GPU/CPU detection +- **Shared Captions**: Create a room and share a code so others can join — no server setup needed - **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080` -- **Multi-User Sync**: Optional Node.js server to sync transcriptions across multiple users - **Custom Fonts**: Support for system fonts, web-safe fonts, Google Fonts, and custom font files - **Customizable Colors**: User-configurable colors for name, text, and background +- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection - **Noise Suppression**: Built-in audio preprocessing to reduce background noise - **Auto-Updates**: Automatic update checking with release notes display @@ -87,27 +87,30 @@ For detailed build instructions, see [BUILD.md](BUILD.md). ## Usage -### Standalone Mode +### Quick Setup (Cloud — Recommended) 1. Launch the application -2. Select your microphone from the audio device dropdown -3. Choose a Whisper model (smaller = faster, larger = more accurate): +2. Open **Settings** — the transcription mode defaults to **Cloud (Deepgram)** +3. Get a free API key at [console.deepgram.com](https://console.deepgram.com) and paste it in Settings +4. Select your microphone from the audio device dropdown +5. Click **Start Transcription** +6. Transcriptions appear in the main window and at `http://localhost:8080` + +> The Start button is disabled until an API key is entered. Local-only settings (model, VAD, timing) are hidden in cloud mode to keep things simple. + +### Local Mode (Whisper) + +For offline/on-device transcription, switch to **Local (Whisper)** in Settings: + +1. Choose a Whisper model (smaller = faster, larger = more accurate): - `tiny.en` / `tiny` — Fastest, good for quick captions - `base.en` / `base` — Balanced speed and accuracy - `small.en` / `small` — Better accuracy - `medium.en` / `medium` — High accuracy - `large-v3` — Best accuracy (requires more resources) -4. Click **Start** to begin transcription -5. Transcriptions appear in the main window and at `http://localhost:8080` - -### Remote Transcription (Deepgram) - -Instead of local Whisper models, you can use cloud-based transcription: - -- **Managed mode**: Sign up via the transcription proxy for metered billing -- **BYOK mode**: Bring your own Deepgram API key for direct access - -Configure in Settings > Remote Transcription. +2. Select compute device (Auto/CUDA/CPU) and compute type +3. Tune VAD sensitivity and timing settings as needed +4. Click **Start Transcription** ### OBS Browser Source Setup @@ -117,18 +120,42 @@ Configure in Settings > Remote Transcription. 4. Set dimensions (e.g., 1920x300) 5. Check "Shutdown source when not visible" for performance -### Multi-User Mode (Optional) +### Shared Captions (Multi-User) -For syncing transcriptions across multiple users (e.g., multi-host streams or translation teams): +Share live captions across multiple users using the hosted service at `https://caption.shadowdao.com/` — no server setup required. -1. Deploy the Node.js server (see [server/nodejs/README.md](server/nodejs/README.md)) -2. In the app settings, enable **Server Sync** -3. Enter the server URL (e.g., `http://your-server:3000/api/send`) -4. Set a room name and passphrase (shared with other users) -5. In OBS, use the server's display URL with your room name: - ``` - http://your-server:3000/display?room=YOURROOM×tamps=true&maxlines=50 - ``` +#### Creating a Room + +1. Open **Settings** and enable **Shared Captions** +2. Click **Create Room** — this generates a room name and passphrase automatically +3. A **share code** is generated and copied to your clipboard +4. Send the share code to anyone who should join + +#### Joining a Room + +1. Open **Settings** and enable **Shared Captions** +2. Paste the share code you received into the **"Paste share code to join"** field +3. Click **Join** — the server URL, room, and passphrase are auto-filled +4. Click **Save** + +#### Sharing an Existing Room + +If you already have a room configured and want to invite others: + +1. Open **Settings** and scroll to **Shared Captions** +2. Click **Share Current Room** — generates a share code from your current config and copies it to the clipboard +3. Send the code to others + +#### OBS Display for Shared Rooms + +In OBS, add a Browser source pointing to the server's display URL: +``` +https://caption.shadowdao.com/display?room=YOURROOM×tamps=true&maxlines=50 +``` + +#### Self-Hosting + +You can also self-host the sync server. See [server/nodejs/README.md](server/nodejs/README.md) for setup instructions, then enter your own server URL in the Shared Captions settings. ## Configuration @@ -144,7 +171,7 @@ Settings are stored at `~/.local-transcription/config.yaml` and can be modified | `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` | | `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` | | `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` | -| `remote.mode` | Transcription mode (local/managed/byok) | `local` | +| `remote.mode` | Transcription mode (local/managed/byok) | `byok` | | `display.show_timestamps` | Show timestamps with transcriptions | `true` | | `display.fade_after_seconds` | Fade out time (0 = never) | `10` | | `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |