Update docs for cloud-first UX and shared captions

- README: document cloud-first quick start, shared captions workflow (create room, join via share code, share existing room), and self-hosting option - README: update default remote.mode from local to byok in config table - CLAUDE.md: reflect cloud-first default, settings gating, and shared captions features Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:10:38 -07:00
parent d220158dd7
commit cd325102e2
2 changed files with 60 additions and 31 deletions
@@ -11,9 +11,11 @@ Local Transcription is a cross-platform desktop application for real-time speech
 **Key Features:**
 - Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
 - Headless Python backend with FastAPI control API
- Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK)
+- Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported
+- Settings UI hides local-only options (model, VAD, timing) when in cloud mode
+- Start button gated on API key / login — shows guidance if not configured
+- Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com)
 - Built-in web server for OBS browser source at `http://localhost:8080`
- Optional multi-user sync via Node.js server
 - CUDA, MPS (Apple Silicon), and CPU support
 - Auto-updates, custom fonts, configurable colors

@@ -7,14 +7,14 @@ A real-time speech-to-text desktop application for streamers. Runs locally on yo
 ## Features

 - **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
+- **Cloud-First**: Defaults to Deepgram cloud transcription — get started with just an API key
 - **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/)
- **Dual Transcription Modes**: Local (Whisper) or cloud (Deepgram) with managed billing or BYOK
- **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
+- **Dual Transcription Modes**: Cloud (Deepgram) or local (Whisper) with automatic GPU/CPU detection
+- **Shared Captions**: Create a room and share a code so others can join — no server setup needed
 - **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
- **Multi-User Sync**: Optional Node.js server to sync transcriptions across multiple users
 - **Custom Fonts**: Support for system fonts, web-safe fonts, Google Fonts, and custom font files
 - **Customizable Colors**: User-configurable colors for name, text, and background
+- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
 - **Noise Suppression**: Built-in audio preprocessing to reduce background noise
 - **Auto-Updates**: Automatic update checking with release notes display

@@ -87,27 +87,30 @@ For detailed build instructions, see [BUILD.md](BUILD.md).

 ## Usage

-### Standalone Mode
+### Quick Setup (Cloud — Recommended)

 1. Launch the application
-2. Select your microphone from the audio device dropdown
-3. Choose a Whisper model (smaller = faster, larger = more accurate):
+2. Open **Settings** — the transcription mode defaults to **Cloud (Deepgram)**
+3. Get a free API key at [console.deepgram.com](https://console.deepgram.com) and paste it in Settings
+4. Select your microphone from the audio device dropdown
+5. Click **Start Transcription**
+6. Transcriptions appear in the main window and at `http://localhost:8080`
+
+> The Start button is disabled until an API key is entered. Local-only settings (model, VAD, timing) are hidden in cloud mode to keep things simple.
+
+### Local Mode (Whisper)
+
+For offline/on-device transcription, switch to **Local (Whisper)** in Settings:
+
+1. Choose a Whisper model (smaller = faster, larger = more accurate):
   - `tiny.en` / `tiny` — Fastest, good for quick captions
   - `base.en` / `base` — Balanced speed and accuracy
   - `small.en` / `small` — Better accuracy
   - `medium.en` / `medium` — High accuracy
   - `large-v3` — Best accuracy (requires more resources)
-4. Click **Start** to begin transcription
-5. Transcriptions appear in the main window and at `http://localhost:8080`
-
-### Remote Transcription (Deepgram)
-
-Instead of local Whisper models, you can use cloud-based transcription:
-
- **Managed mode**: Sign up via the transcription proxy for metered billing
- **BYOK mode**: Bring your own Deepgram API key for direct access
-
-Configure in Settings > Remote Transcription.
+2. Select compute device (Auto/CUDA/CPU) and compute type
+3. Tune VAD sensitivity and timing settings as needed
+4. Click **Start Transcription**

 ### OBS Browser Source Setup

@@ -117,19 +120,43 @@ Configure in Settings > Remote Transcription.
 4. Set dimensions (e.g., 1920x300)
 5. Check "Shutdown source when not visible" for performance

-### Multi-User Mode (Optional)
+### Shared Captions (Multi-User)

-For syncing transcriptions across multiple users (e.g., multi-host streams or translation teams):
+Share live captions across multiple users using the hosted service at `https://caption.shadowdao.com/` — no server setup required.

-1. Deploy the Node.js server (see [server/nodejs/README.md](server/nodejs/README.md))
-2. In the app settings, enable **Server Sync**
-3. Enter the server URL (e.g., `http://your-server:3000/api/send`)
-4. Set a room name and passphrase (shared with other users)
-5. In OBS, use the server's display URL with your room name:
+#### Creating a Room
+
+1. Open **Settings** and enable **Shared Captions**
+2. Click **Create Room** — this generates a room name and passphrase automatically
+3. A **share code** is generated and copied to your clipboard
+4. Send the share code to anyone who should join
+
+#### Joining a Room
+
+1. Open **Settings** and enable **Shared Captions**
+2. Paste the share code you received into the **"Paste share code to join"** field
+3. Click **Join** — the server URL, room, and passphrase are auto-filled
+4. Click **Save**
+
+#### Sharing an Existing Room
+
+If you already have a room configured and want to invite others:
+
+1. Open **Settings** and scroll to **Shared Captions**
+2. Click **Share Current Room** — generates a share code from your current config and copies it to the clipboard
+3. Send the code to others
+
+#### OBS Display for Shared Rooms
+
+In OBS, add a Browser source pointing to the server's display URL:
 ```
-   http://your-server:3000/display?room=YOURROOM&timestamps=true&maxlines=50
+https://caption.shadowdao.com/display?room=YOURROOM&timestamps=true&maxlines=50
 ```

+#### Self-Hosting
+
+You can also self-host the sync server. See [server/nodejs/README.md](server/nodejs/README.md) for setup instructions, then enter your own server URL in the Shared Captions settings.
+
 ## Configuration

 Settings are stored at `~/.local-transcription/config.yaml` and can be modified through the GUI settings panel or the REST API.
@@ -144,7 +171,7 @@ Settings are stored at `~/.local-transcription/config.yaml` and can be modified
 | `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
 | `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
 | `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
-| `remote.mode` | Transcription mode (local/managed/byok) | `local` |
+| `remote.mode` | Transcription mode (local/managed/byok) | `byok` |
 | `display.show_timestamps` | Show timestamps with transcriptions | `true` |
 | `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
 | `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |