Update docs for cloud-first UX and shared captions
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 2m13s

- README: document cloud-first quick start, shared captions workflow
  (create room, join via share code, share existing room), and
  self-hosting option
- README: update default remote.mode from local to byok in config table
- CLAUDE.md: reflect cloud-first default, settings gating, and shared
  captions features

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Developer
2026-04-10 16:10:38 -07:00
parent d220158dd7
commit cd325102e2
2 changed files with 60 additions and 31 deletions

View File

@@ -11,9 +11,11 @@ Local Transcription is a cross-platform desktop application for real-time speech
**Key Features:** **Key Features:**
- Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5 - Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
- Headless Python backend with FastAPI control API - Headless Python backend with FastAPI control API
- Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK) - Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported
- Settings UI hides local-only options (model, VAD, timing) when in cloud mode
- Start button gated on API key / login — shows guidance if not configured
- Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com)
- Built-in web server for OBS browser source at `http://localhost:8080` - Built-in web server for OBS browser source at `http://localhost:8080`
- Optional multi-user sync via Node.js server
- CUDA, MPS (Apple Silicon), and CPU support - CUDA, MPS (Apple Silicon), and CPU support
- Auto-updates, custom fonts, configurable colors - Auto-updates, custom fonts, configurable colors

View File

@@ -7,14 +7,14 @@ A real-time speech-to-text desktop application for streamers. Runs locally on yo
## Features ## Features
- **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency - **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
- **Cloud-First**: Defaults to Deepgram cloud transcription — get started with just an API key
- **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/) - **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/)
- **Dual Transcription Modes**: Local (Whisper) or cloud (Deepgram) with managed billing or BYOK - **Dual Transcription Modes**: Cloud (Deepgram) or local (Whisper) with automatic GPU/CPU detection
- **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback - **Shared Captions**: Create a room and share a code so others can join — no server setup needed
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
- **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080` - **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
- **Multi-User Sync**: Optional Node.js server to sync transcriptions across multiple users
- **Custom Fonts**: Support for system fonts, web-safe fonts, Google Fonts, and custom font files - **Custom Fonts**: Support for system fonts, web-safe fonts, Google Fonts, and custom font files
- **Customizable Colors**: User-configurable colors for name, text, and background - **Customizable Colors**: User-configurable colors for name, text, and background
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
- **Noise Suppression**: Built-in audio preprocessing to reduce background noise - **Noise Suppression**: Built-in audio preprocessing to reduce background noise
- **Auto-Updates**: Automatic update checking with release notes display - **Auto-Updates**: Automatic update checking with release notes display
@@ -87,27 +87,30 @@ For detailed build instructions, see [BUILD.md](BUILD.md).
## Usage ## Usage
### Standalone Mode ### Quick Setup (Cloud — Recommended)
1. Launch the application 1. Launch the application
2. Select your microphone from the audio device dropdown 2. Open **Settings** — the transcription mode defaults to **Cloud (Deepgram)**
3. Choose a Whisper model (smaller = faster, larger = more accurate): 3. Get a free API key at [console.deepgram.com](https://console.deepgram.com) and paste it in Settings
4. Select your microphone from the audio device dropdown
5. Click **Start Transcription**
6. Transcriptions appear in the main window and at `http://localhost:8080`
> The Start button is disabled until an API key is entered. Local-only settings (model, VAD, timing) are hidden in cloud mode to keep things simple.
### Local Mode (Whisper)
For offline/on-device transcription, switch to **Local (Whisper)** in Settings:
1. Choose a Whisper model (smaller = faster, larger = more accurate):
- `tiny.en` / `tiny` — Fastest, good for quick captions - `tiny.en` / `tiny` — Fastest, good for quick captions
- `base.en` / `base` — Balanced speed and accuracy - `base.en` / `base` — Balanced speed and accuracy
- `small.en` / `small` — Better accuracy - `small.en` / `small` — Better accuracy
- `medium.en` / `medium` — High accuracy - `medium.en` / `medium` — High accuracy
- `large-v3` — Best accuracy (requires more resources) - `large-v3` — Best accuracy (requires more resources)
4. Click **Start** to begin transcription 2. Select compute device (Auto/CUDA/CPU) and compute type
5. Transcriptions appear in the main window and at `http://localhost:8080` 3. Tune VAD sensitivity and timing settings as needed
4. Click **Start Transcription**
### Remote Transcription (Deepgram)
Instead of local Whisper models, you can use cloud-based transcription:
- **Managed mode**: Sign up via the transcription proxy for metered billing
- **BYOK mode**: Bring your own Deepgram API key for direct access
Configure in Settings > Remote Transcription.
### OBS Browser Source Setup ### OBS Browser Source Setup
@@ -117,18 +120,42 @@ Configure in Settings > Remote Transcription.
4. Set dimensions (e.g., 1920x300) 4. Set dimensions (e.g., 1920x300)
5. Check "Shutdown source when not visible" for performance 5. Check "Shutdown source when not visible" for performance
### Multi-User Mode (Optional) ### Shared Captions (Multi-User)
For syncing transcriptions across multiple users (e.g., multi-host streams or translation teams): Share live captions across multiple users using the hosted service at `https://caption.shadowdao.com/` — no server setup required.
1. Deploy the Node.js server (see [server/nodejs/README.md](server/nodejs/README.md)) #### Creating a Room
2. In the app settings, enable **Server Sync**
3. Enter the server URL (e.g., `http://your-server:3000/api/send`) 1. Open **Settings** and enable **Shared Captions**
4. Set a room name and passphrase (shared with other users) 2. Click **Create Room** — this generates a room name and passphrase automatically
5. In OBS, use the server's display URL with your room name: 3. A **share code** is generated and copied to your clipboard
``` 4. Send the share code to anyone who should join
http://your-server:3000/display?room=YOURROOM&timestamps=true&maxlines=50
``` #### Joining a Room
1. Open **Settings** and enable **Shared Captions**
2. Paste the share code you received into the **"Paste share code to join"** field
3. Click **Join** — the server URL, room, and passphrase are auto-filled
4. Click **Save**
#### Sharing an Existing Room
If you already have a room configured and want to invite others:
1. Open **Settings** and scroll to **Shared Captions**
2. Click **Share Current Room** — generates a share code from your current config and copies it to the clipboard
3. Send the code to others
#### OBS Display for Shared Rooms
In OBS, add a Browser source pointing to the server's display URL:
```
https://caption.shadowdao.com/display?room=YOURROOM&timestamps=true&maxlines=50
```
#### Self-Hosting
You can also self-host the sync server. See [server/nodejs/README.md](server/nodejs/README.md) for setup instructions, then enter your own server URL in the Shared Captions settings.
## Configuration ## Configuration
@@ -144,7 +171,7 @@ Settings are stored at `~/.local-transcription/config.yaml` and can be modified
| `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` | | `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
| `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` | | `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
| `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` | | `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
| `remote.mode` | Transcription mode (local/managed/byok) | `local` | | `remote.mode` | Transcription mode (local/managed/byok) | `byok` |
| `display.show_timestamps` | Show timestamps with transcriptions | `true` | | `display.show_timestamps` | Show timestamps with transcriptions | `true` |
| `display.fade_after_seconds` | Fade out time (0 = never) | `10` | | `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
| `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` | | `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |