Update docs for cloud-first UX and shared captions
- README: document cloud-first quick start, shared captions workflow (create room, join via share code, share existing room), and self-hosting option - README: update default remote.mode from local to byok in config table - CLAUDE.md: reflect cloud-first default, settings gating, and shared captions features Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -11,9 +11,11 @@ Local Transcription is a cross-platform desktop application for real-time speech
|
||||
**Key Features:**
|
||||
- Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
|
||||
- Headless Python backend with FastAPI control API
|
||||
- Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK)
|
||||
- Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported
|
||||
- Settings UI hides local-only options (model, VAD, timing) when in cloud mode
|
||||
- Start button gated on API key / login — shows guidance if not configured
|
||||
- Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com)
|
||||
- Built-in web server for OBS browser source at `http://localhost:8080`
|
||||
- Optional multi-user sync via Node.js server
|
||||
- CUDA, MPS (Apple Silicon), and CPU support
|
||||
- Auto-updates, custom fonts, configurable colors
|
||||
|
||||
|
||||
81
README.md
81
README.md
@@ -7,14 +7,14 @@ A real-time speech-to-text desktop application for streamers. Runs locally on yo
|
||||
## Features
|
||||
|
||||
- **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
|
||||
- **Cloud-First**: Defaults to Deepgram cloud transcription — get started with just an API key
|
||||
- **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/)
|
||||
- **Dual Transcription Modes**: Local (Whisper) or cloud (Deepgram) with managed billing or BYOK
|
||||
- **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback
|
||||
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
|
||||
- **Dual Transcription Modes**: Cloud (Deepgram) or local (Whisper) with automatic GPU/CPU detection
|
||||
- **Shared Captions**: Create a room and share a code so others can join — no server setup needed
|
||||
- **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
|
||||
- **Multi-User Sync**: Optional Node.js server to sync transcriptions across multiple users
|
||||
- **Custom Fonts**: Support for system fonts, web-safe fonts, Google Fonts, and custom font files
|
||||
- **Customizable Colors**: User-configurable colors for name, text, and background
|
||||
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
|
||||
- **Noise Suppression**: Built-in audio preprocessing to reduce background noise
|
||||
- **Auto-Updates**: Automatic update checking with release notes display
|
||||
|
||||
@@ -87,27 +87,30 @@ For detailed build instructions, see [BUILD.md](BUILD.md).
|
||||
|
||||
## Usage
|
||||
|
||||
### Standalone Mode
|
||||
### Quick Setup (Cloud — Recommended)
|
||||
|
||||
1. Launch the application
|
||||
2. Select your microphone from the audio device dropdown
|
||||
3. Choose a Whisper model (smaller = faster, larger = more accurate):
|
||||
2. Open **Settings** — the transcription mode defaults to **Cloud (Deepgram)**
|
||||
3. Get a free API key at [console.deepgram.com](https://console.deepgram.com) and paste it in Settings
|
||||
4. Select your microphone from the audio device dropdown
|
||||
5. Click **Start Transcription**
|
||||
6. Transcriptions appear in the main window and at `http://localhost:8080`
|
||||
|
||||
> The Start button is disabled until an API key is entered. Local-only settings (model, VAD, timing) are hidden in cloud mode to keep things simple.
|
||||
|
||||
### Local Mode (Whisper)
|
||||
|
||||
For offline/on-device transcription, switch to **Local (Whisper)** in Settings:
|
||||
|
||||
1. Choose a Whisper model (smaller = faster, larger = more accurate):
|
||||
- `tiny.en` / `tiny` — Fastest, good for quick captions
|
||||
- `base.en` / `base` — Balanced speed and accuracy
|
||||
- `small.en` / `small` — Better accuracy
|
||||
- `medium.en` / `medium` — High accuracy
|
||||
- `large-v3` — Best accuracy (requires more resources)
|
||||
4. Click **Start** to begin transcription
|
||||
5. Transcriptions appear in the main window and at `http://localhost:8080`
|
||||
|
||||
### Remote Transcription (Deepgram)
|
||||
|
||||
Instead of local Whisper models, you can use cloud-based transcription:
|
||||
|
||||
- **Managed mode**: Sign up via the transcription proxy for metered billing
|
||||
- **BYOK mode**: Bring your own Deepgram API key for direct access
|
||||
|
||||
Configure in Settings > Remote Transcription.
|
||||
2. Select compute device (Auto/CUDA/CPU) and compute type
|
||||
3. Tune VAD sensitivity and timing settings as needed
|
||||
4. Click **Start Transcription**
|
||||
|
||||
### OBS Browser Source Setup
|
||||
|
||||
@@ -117,19 +120,43 @@ Configure in Settings > Remote Transcription.
|
||||
4. Set dimensions (e.g., 1920x300)
|
||||
5. Check "Shutdown source when not visible" for performance
|
||||
|
||||
### Multi-User Mode (Optional)
|
||||
### Shared Captions (Multi-User)
|
||||
|
||||
For syncing transcriptions across multiple users (e.g., multi-host streams or translation teams):
|
||||
Share live captions across multiple users using the hosted service at `https://caption.shadowdao.com/` — no server setup required.
|
||||
|
||||
1. Deploy the Node.js server (see [server/nodejs/README.md](server/nodejs/README.md))
|
||||
2. In the app settings, enable **Server Sync**
|
||||
3. Enter the server URL (e.g., `http://your-server:3000/api/send`)
|
||||
4. Set a room name and passphrase (shared with other users)
|
||||
5. In OBS, use the server's display URL with your room name:
|
||||
#### Creating a Room
|
||||
|
||||
1. Open **Settings** and enable **Shared Captions**
|
||||
2. Click **Create Room** — this generates a room name and passphrase automatically
|
||||
3. A **share code** is generated and copied to your clipboard
|
||||
4. Send the share code to anyone who should join
|
||||
|
||||
#### Joining a Room
|
||||
|
||||
1. Open **Settings** and enable **Shared Captions**
|
||||
2. Paste the share code you received into the **"Paste share code to join"** field
|
||||
3. Click **Join** — the server URL, room, and passphrase are auto-filled
|
||||
4. Click **Save**
|
||||
|
||||
#### Sharing an Existing Room
|
||||
|
||||
If you already have a room configured and want to invite others:
|
||||
|
||||
1. Open **Settings** and scroll to **Shared Captions**
|
||||
2. Click **Share Current Room** — generates a share code from your current config and copies it to the clipboard
|
||||
3. Send the code to others
|
||||
|
||||
#### OBS Display for Shared Rooms
|
||||
|
||||
In OBS, add a Browser source pointing to the server's display URL:
|
||||
```
|
||||
http://your-server:3000/display?room=YOURROOM×tamps=true&maxlines=50
|
||||
https://caption.shadowdao.com/display?room=YOURROOM×tamps=true&maxlines=50
|
||||
```
|
||||
|
||||
#### Self-Hosting
|
||||
|
||||
You can also self-host the sync server. See [server/nodejs/README.md](server/nodejs/README.md) for setup instructions, then enter your own server URL in the Shared Captions settings.
|
||||
|
||||
## Configuration
|
||||
|
||||
Settings are stored at `~/.local-transcription/config.yaml` and can be modified through the GUI settings panel or the REST API.
|
||||
@@ -144,7 +171,7 @@ Settings are stored at `~/.local-transcription/config.yaml` and can be modified
|
||||
| `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
|
||||
| `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
|
||||
| `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
|
||||
| `remote.mode` | Transcription mode (local/managed/byok) | `local` |
|
||||
| `remote.mode` | Transcription mode (local/managed/byok) | `byok` |
|
||||
| `display.show_timestamps` | Show timestamps with transcriptions | `true` |
|
||||
| `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
|
||||
| `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |
|
||||
|
||||
Reference in New Issue
Block a user