Update docs for cloud-first UX and shared captions

- README: document cloud-first quick start, shared captions workflow (create room, join via share code, share existing room), and self-hosting option - README: update default remote.mode from local to byok in config table - CLAUDE.md: reflect cloud-first default, settings gating, and shared captions features Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:10:38 -07:00
parent d220158dd7
commit cd325102e2
2 changed files with 60 additions and 31 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -11,9 +11,11 @@ Local Transcription is a cross-platform desktop application for real-time speech
 **Key Features:**
 - Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
 - Headless Python backend with FastAPI control API
- Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK)
+- Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported
 - Settings UI hides local-only options (model, VAD, timing) when in cloud mode
 - Start button gated on API key / login — shows guidance if not configured
 - Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com)
 - Built-in web server for OBS browser source at `http://localhost:8080`
 - Optional multi-user sync via Node.js server
 - CUDA, MPS (Apple Silicon), and CPU support
 - Auto-updates, custom fonts, configurable colors
--- a/README.md
+++ b/README.md
@@ -7,14 +7,14 @@ A real-time speech-to-text desktop application for streamers. Runs locally on yo
 ## Features
 - **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
 - **Cloud-First**: Defaults to Deepgram cloud transcription — get started with just an API key
 - **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/)
- **Dual Transcription Modes**: Local (Whisper) or cloud (Deepgram) with managed billing or BYOK
+- **Dual Transcription Modes**: Cloud (Deepgram) or local (Whisper) with automatic GPU/CPU detection
- **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback
+- **Shared Captions**: Create a room and share a code so others can join — no server setup needed
 - **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
 - **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
 - **Multi-User Sync**: Optional Node.js server to sync transcriptions across multiple users
 - **Custom Fonts**: Support for system fonts, web-safe fonts, Google Fonts, and custom font files
 - **Customizable Colors**: User-configurable colors for name, text, and background
 - **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
 - **Noise Suppression**: Built-in audio preprocessing to reduce background noise
 - **Auto-Updates**: Automatic update checking with release notes display
@@ -87,27 +87,30 @@ For detailed build instructions, see [BUILD.md](BUILD.md).
 ## Usage
-### Standalone Mode
+### Quick Setup (Cloud — Recommended)
 1. Launch the application
-2. Select your microphone from the audio device dropdown
+2. Open **Settings** — the transcription mode defaults to **Cloud (Deepgram)**
-3. Choose a Whisper model (smaller = faster, larger = more accurate):
+3. Get a free API key at [console.deepgram.com](https://console.deepgram.com) and paste it in Settings
 4. Select your microphone from the audio device dropdown
 5. Click **Start Transcription**
 6. Transcriptions appear in the main window and at `http://localhost:8080`
 > The Start button is disabled until an API key is entered. Local-only settings (model, VAD, timing) are hidden in cloud mode to keep things simple.
 ### Local Mode (Whisper)
 For offline/on-device transcription, switch to **Local (Whisper)** in Settings:
 1. Choose a Whisper model (smaller = faster, larger = more accurate):
   - `tiny.en` / `tiny` — Fastest, good for quick captions
   - `base.en` / `base` — Balanced speed and accuracy
   - `small.en` / `small` — Better accuracy
   - `medium.en` / `medium` — High accuracy
   - `large-v3` — Best accuracy (requires more resources)
-4. Click **Start** to begin transcription
+2. Select compute device (Auto/CUDA/CPU) and compute type
-5. Transcriptions appear in the main window and at `http://localhost:8080`
+3. Tune VAD sensitivity and timing settings as needed
-
+4. Click **Start Transcription**
 ### Remote Transcription (Deepgram)
 Instead of local Whisper models, you can use cloud-based transcription:
 - **Managed mode**: Sign up via the transcription proxy for metered billing
 - **BYOK mode**: Bring your own Deepgram API key for direct access
 Configure in Settings > Remote Transcription.
 ### OBS Browser Source Setup
@@ -117,18 +120,42 @@ Configure in Settings > Remote Transcription.
 4. Set dimensions (e.g., 1920x300)
 5. Check "Shutdown source when not visible" for performance
-### Multi-User Mode (Optional)
+### Shared Captions (Multi-User)
-For syncing transcriptions across multiple users (e.g., multi-host streams or translation teams):
+Share live captions across multiple users using the hosted service at `https://caption.shadowdao.com/` — no server setup required.
-1. Deploy the Node.js server (see [server/nodejs/README.md](server/nodejs/README.md))
+#### Creating a Room
-2. In the app settings, enable **Server Sync**
+
-3. Enter the server URL (e.g., `http://your-server:3000/api/send`)
+1. Open **Settings** and enable **Shared Captions**
-4. Set a room name and passphrase (shared with other users)
+2. Click **Create Room** — this generates a room name and passphrase automatically
-5. In OBS, use the server's display URL with your room name:
+3. A **share code** is generated and copied to your clipboard
-   ```
+4. Send the share code to anyone who should join
-   http://your-server:3000/display?room=YOURROOM&timestamps=true&maxlines=50
+
-   ```
+#### Joining a Room
 1. Open **Settings** and enable **Shared Captions**
 2. Paste the share code you received into the **"Paste share code to join"** field
 3. Click **Join** — the server URL, room, and passphrase are auto-filled
 4. Click **Save**
 #### Sharing an Existing Room
 If you already have a room configured and want to invite others:
 1. Open **Settings** and scroll to **Shared Captions**
 2. Click **Share Current Room** — generates a share code from your current config and copies it to the clipboard
 3. Send the code to others
 #### OBS Display for Shared Rooms
 In OBS, add a Browser source pointing to the server's display URL:
 ```
 https://caption.shadowdao.com/display?room=YOURROOM&timestamps=true&maxlines=50
 ```
 #### Self-Hosting
 You can also self-host the sync server. See [server/nodejs/README.md](server/nodejs/README.md) for setup instructions, then enter your own server URL in the Shared Captions settings.
 ## Configuration
@@ -144,7 +171,7 @@ Settings are stored at `~/.local-transcription/config.yaml` and can be modified
 | `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
 | `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
 | `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
-| `remote.mode` | Transcription mode (local/managed/byok) | `local` |
+| `remote.mode` | Transcription mode (local/managed/byok) | `byok` |
 | `display.show_timestamps` | Show timestamps with transcriptions | `true` |
 | `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
 | `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |