Update docs for cloud-first UX and shared captions
- README: document cloud-first quick start, shared captions workflow (create room, join via share code, share existing room), and self-hosting option - README: update default remote.mode from local to byok in config table - CLAUDE.md: reflect cloud-first default, settings gating, and shared captions features Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -11,9 +11,11 @@ Local Transcription is a cross-platform desktop application for real-time speech
|
|||||||
**Key Features:**
|
**Key Features:**
|
||||||
- Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
|
- Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
|
||||||
- Headless Python backend with FastAPI control API
|
- Headless Python backend with FastAPI control API
|
||||||
- Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK)
|
- Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported
|
||||||
|
- Settings UI hides local-only options (model, VAD, timing) when in cloud mode
|
||||||
|
- Start button gated on API key / login — shows guidance if not configured
|
||||||
|
- Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com)
|
||||||
- Built-in web server for OBS browser source at `http://localhost:8080`
|
- Built-in web server for OBS browser source at `http://localhost:8080`
|
||||||
- Optional multi-user sync via Node.js server
|
|
||||||
- CUDA, MPS (Apple Silicon), and CPU support
|
- CUDA, MPS (Apple Silicon), and CPU support
|
||||||
- Auto-updates, custom fonts, configurable colors
|
- Auto-updates, custom fonts, configurable colors
|
||||||
|
|
||||||
|
|||||||
85
README.md
85
README.md
@@ -7,14 +7,14 @@ A real-time speech-to-text desktop application for streamers. Runs locally on yo
|
|||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
|
- **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
|
||||||
|
- **Cloud-First**: Defaults to Deepgram cloud transcription — get started with just an API key
|
||||||
- **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/)
|
- **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/)
|
||||||
- **Dual Transcription Modes**: Local (Whisper) or cloud (Deepgram) with managed billing or BYOK
|
- **Dual Transcription Modes**: Cloud (Deepgram) or local (Whisper) with automatic GPU/CPU detection
|
||||||
- **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback
|
- **Shared Captions**: Create a room and share a code so others can join — no server setup needed
|
||||||
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
|
|
||||||
- **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
|
- **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
|
||||||
- **Multi-User Sync**: Optional Node.js server to sync transcriptions across multiple users
|
|
||||||
- **Custom Fonts**: Support for system fonts, web-safe fonts, Google Fonts, and custom font files
|
- **Custom Fonts**: Support for system fonts, web-safe fonts, Google Fonts, and custom font files
|
||||||
- **Customizable Colors**: User-configurable colors for name, text, and background
|
- **Customizable Colors**: User-configurable colors for name, text, and background
|
||||||
|
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
|
||||||
- **Noise Suppression**: Built-in audio preprocessing to reduce background noise
|
- **Noise Suppression**: Built-in audio preprocessing to reduce background noise
|
||||||
- **Auto-Updates**: Automatic update checking with release notes display
|
- **Auto-Updates**: Automatic update checking with release notes display
|
||||||
|
|
||||||
@@ -87,27 +87,30 @@ For detailed build instructions, see [BUILD.md](BUILD.md).
|
|||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
### Standalone Mode
|
### Quick Setup (Cloud — Recommended)
|
||||||
|
|
||||||
1. Launch the application
|
1. Launch the application
|
||||||
2. Select your microphone from the audio device dropdown
|
2. Open **Settings** — the transcription mode defaults to **Cloud (Deepgram)**
|
||||||
3. Choose a Whisper model (smaller = faster, larger = more accurate):
|
3. Get a free API key at [console.deepgram.com](https://console.deepgram.com) and paste it in Settings
|
||||||
|
4. Select your microphone from the audio device dropdown
|
||||||
|
5. Click **Start Transcription**
|
||||||
|
6. Transcriptions appear in the main window and at `http://localhost:8080`
|
||||||
|
|
||||||
|
> The Start button is disabled until an API key is entered. Local-only settings (model, VAD, timing) are hidden in cloud mode to keep things simple.
|
||||||
|
|
||||||
|
### Local Mode (Whisper)
|
||||||
|
|
||||||
|
For offline/on-device transcription, switch to **Local (Whisper)** in Settings:
|
||||||
|
|
||||||
|
1. Choose a Whisper model (smaller = faster, larger = more accurate):
|
||||||
- `tiny.en` / `tiny` — Fastest, good for quick captions
|
- `tiny.en` / `tiny` — Fastest, good for quick captions
|
||||||
- `base.en` / `base` — Balanced speed and accuracy
|
- `base.en` / `base` — Balanced speed and accuracy
|
||||||
- `small.en` / `small` — Better accuracy
|
- `small.en` / `small` — Better accuracy
|
||||||
- `medium.en` / `medium` — High accuracy
|
- `medium.en` / `medium` — High accuracy
|
||||||
- `large-v3` — Best accuracy (requires more resources)
|
- `large-v3` — Best accuracy (requires more resources)
|
||||||
4. Click **Start** to begin transcription
|
2. Select compute device (Auto/CUDA/CPU) and compute type
|
||||||
5. Transcriptions appear in the main window and at `http://localhost:8080`
|
3. Tune VAD sensitivity and timing settings as needed
|
||||||
|
4. Click **Start Transcription**
|
||||||
### Remote Transcription (Deepgram)
|
|
||||||
|
|
||||||
Instead of local Whisper models, you can use cloud-based transcription:
|
|
||||||
|
|
||||||
- **Managed mode**: Sign up via the transcription proxy for metered billing
|
|
||||||
- **BYOK mode**: Bring your own Deepgram API key for direct access
|
|
||||||
|
|
||||||
Configure in Settings > Remote Transcription.
|
|
||||||
|
|
||||||
### OBS Browser Source Setup
|
### OBS Browser Source Setup
|
||||||
|
|
||||||
@@ -117,18 +120,42 @@ Configure in Settings > Remote Transcription.
|
|||||||
4. Set dimensions (e.g., 1920x300)
|
4. Set dimensions (e.g., 1920x300)
|
||||||
5. Check "Shutdown source when not visible" for performance
|
5. Check "Shutdown source when not visible" for performance
|
||||||
|
|
||||||
### Multi-User Mode (Optional)
|
### Shared Captions (Multi-User)
|
||||||
|
|
||||||
For syncing transcriptions across multiple users (e.g., multi-host streams or translation teams):
|
Share live captions across multiple users using the hosted service at `https://caption.shadowdao.com/` — no server setup required.
|
||||||
|
|
||||||
1. Deploy the Node.js server (see [server/nodejs/README.md](server/nodejs/README.md))
|
#### Creating a Room
|
||||||
2. In the app settings, enable **Server Sync**
|
|
||||||
3. Enter the server URL (e.g., `http://your-server:3000/api/send`)
|
1. Open **Settings** and enable **Shared Captions**
|
||||||
4. Set a room name and passphrase (shared with other users)
|
2. Click **Create Room** — this generates a room name and passphrase automatically
|
||||||
5. In OBS, use the server's display URL with your room name:
|
3. A **share code** is generated and copied to your clipboard
|
||||||
```
|
4. Send the share code to anyone who should join
|
||||||
http://your-server:3000/display?room=YOURROOM×tamps=true&maxlines=50
|
|
||||||
```
|
#### Joining a Room
|
||||||
|
|
||||||
|
1. Open **Settings** and enable **Shared Captions**
|
||||||
|
2. Paste the share code you received into the **"Paste share code to join"** field
|
||||||
|
3. Click **Join** — the server URL, room, and passphrase are auto-filled
|
||||||
|
4. Click **Save**
|
||||||
|
|
||||||
|
#### Sharing an Existing Room
|
||||||
|
|
||||||
|
If you already have a room configured and want to invite others:
|
||||||
|
|
||||||
|
1. Open **Settings** and scroll to **Shared Captions**
|
||||||
|
2. Click **Share Current Room** — generates a share code from your current config and copies it to the clipboard
|
||||||
|
3. Send the code to others
|
||||||
|
|
||||||
|
#### OBS Display for Shared Rooms
|
||||||
|
|
||||||
|
In OBS, add a Browser source pointing to the server's display URL:
|
||||||
|
```
|
||||||
|
https://caption.shadowdao.com/display?room=YOURROOM×tamps=true&maxlines=50
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Self-Hosting
|
||||||
|
|
||||||
|
You can also self-host the sync server. See [server/nodejs/README.md](server/nodejs/README.md) for setup instructions, then enter your own server URL in the Shared Captions settings.
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
@@ -144,7 +171,7 @@ Settings are stored at `~/.local-transcription/config.yaml` and can be modified
|
|||||||
| `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
|
| `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
|
||||||
| `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
|
| `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
|
||||||
| `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
|
| `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
|
||||||
| `remote.mode` | Transcription mode (local/managed/byok) | `local` |
|
| `remote.mode` | Transcription mode (local/managed/byok) | `byok` |
|
||||||
| `display.show_timestamps` | Show timestamps with transcriptions | `true` |
|
| `display.show_timestamps` | Show timestamps with transcriptions | `true` |
|
||||||
| `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
|
| `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
|
||||||
| `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |
|
| `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |
|
||||||
|
|||||||
Reference in New Issue
Block a user