Prevents Tauri from auto-detecting local keychain certificates on the build machine, which causes SecKeychainItemImport failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Local Transcription
A real-time speech-to-text desktop application for streamers. Runs locally on your machine with GPU or CPU, displays transcriptions via OBS browser source, and optionally syncs with other users through a multi-user server.
Version 1.4.0
Features
- Real-Time Transcription: Live speech-to-text using Whisper models with minimal latency
- Cloud-First: Defaults to Deepgram cloud transcription — get started with just an API key
- Cross-Platform: Native desktop app for Windows, macOS, and Linux via Tauri
- Dual Transcription Modes: Cloud (Deepgram) or local (Whisper) with automatic GPU/CPU detection
- Shared Captions: Create a room and share a code so others can join — no server setup needed
- OBS Integration: Built-in web server for browser source capture at
http://localhost:8080 - Custom Fonts: Support for system fonts, web-safe fonts, Google Fonts, and custom font files
- Customizable Colors: User-configurable colors for name, text, and background
- Advanced Voice Detection: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
- Noise Suppression: Built-in audio preprocessing to reduce background noise
- Auto-Updates: Automatic update checking with release notes display
Architecture
The application uses a two-process architecture:
- Tauri Shell (Svelte 5 frontend) — lightweight native window (~50MB) rendering the UI
- Python Backend (sidecar) — headless process running transcription, audio capture, and the OBS web server
The Tauri frontend communicates with the Python backend via REST API and WebSocket, following the same pattern as voice-to-notes.
Tauri App (user launches this)
└─ Spawns Python backend as sidecar
├─ FastAPI REST API (control endpoints)
├─ WebSocket /ws/control (real-time state + transcriptions)
├─ OBS web display at http://localhost:8080
└─ Transcription engine (Whisper or Deepgram)
Legacy GUI: The original PySide6/Qt desktop GUI (
main.py) still works alongside the new Tauri frontend during the transition period.
Quick Start
Running from Source
# Install Python dependencies
uv sync
# Run the Tauri app (frontend + backend)
npm install
npm run tauri dev
# Or run just the headless backend (for development)
uv run python -m backend.main_headless
# Or run the legacy PySide6 GUI
uv run python main.py
Using Pre-Built Executables
Download the latest release from the releases page:
- App installer (Tauri shell):
.msi(Windows),.dmg(macOS),.deb/.rpm/.AppImage(Linux) - Sidecar (Python backend): Download the matching
sidecar-*zip for your platform (CUDA or CPU)
Building from Source
# Build the Tauri app
npm install
npm run tauri build
# Output: src-tauri/target/release/bundle/
# Build the Python sidecar (headless, no Qt)
uv sync
uv run pyinstaller local-transcription-headless.spec
# Output: dist/local-transcription-backend/
# Build the legacy PySide6 app (Linux)
./build.sh
# Build the legacy PySide6 app (Windows)
build.bat
For detailed build instructions, see BUILD.md.
Usage
Quick Setup (Cloud — Recommended)
- Launch the application
- Open Settings — the transcription mode defaults to Cloud (Deepgram)
- Get a free API key at console.deepgram.com and paste it in Settings
- Select your microphone from the audio device dropdown
- Click Start Transcription
- Transcriptions appear in the main window and at
http://localhost:8080
The Start button is disabled until an API key is entered. Local-only settings (model, VAD, timing) are hidden in cloud mode to keep things simple.
Local Mode (Whisper)
For offline/on-device transcription, switch to Local (Whisper) in Settings:
- Choose a Whisper model (smaller = faster, larger = more accurate):
tiny.en/tiny— Fastest, good for quick captionsbase.en/base— Balanced speed and accuracysmall.en/small— Better accuracymedium.en/medium— High accuracylarge-v3— Best accuracy (requires more resources)
- Select compute device (Auto/CUDA/CPU) and compute type
- Tune VAD sensitivity and timing settings as needed
- Click Start Transcription
OBS Browser Source Setup
- Start the Local Transcription app
- In OBS, add a Browser source
- Set URL to
http://localhost:8080 - Set dimensions (e.g., 1920x300)
- Check "Shutdown source when not visible" for performance
Shared Captions (Multi-User)
Share live captions across multiple users using the hosted service at https://caption.shadowdao.com/ — no server setup required.
Creating a Room
- Open Settings and enable Shared Captions
- Click Create Room — this generates a room name and passphrase automatically
- A share code is generated and copied to your clipboard
- Send the share code to anyone who should join
Joining a Room
- Open Settings and enable Shared Captions
- Paste the share code you received into the "Paste share code to join" field
- Click Join — the server URL, room, and passphrase are auto-filled
- Click Save
Sharing an Existing Room
If you already have a room configured and want to invite others:
- Open Settings and scroll to Shared Captions
- Click Share Current Room — generates a share code from your current config and copies it to the clipboard
- Send the code to others
OBS Display for Shared Rooms
In OBS, add a Browser source pointing to the server's display URL:
https://caption.shadowdao.com/display?room=YOURROOM×tamps=true&maxlines=50
Self-Hosting
You can also self-host the sync server. See server/nodejs/README.md for setup instructions, then enter your own server URL in the Shared Captions settings.
Configuration
Settings are stored at ~/.local-transcription/config.yaml and can be modified through the GUI settings panel or the REST API.
Key Settings
| Setting | Description | Default |
|---|---|---|
transcription.model |
Whisper model to use | base.en |
transcription.device |
Processing device (auto/cuda/cpu) | auto |
transcription.enable_realtime_transcription |
Show preview while speaking | false |
transcription.silero_sensitivity |
VAD sensitivity (0-1, lower = more sensitive) | 0.4 |
transcription.post_speech_silence_duration |
Silence before finalizing (seconds) | 0.3 |
transcription.continuous_mode |
Fast speaker mode for quick talkers | false |
remote.mode |
Transcription mode (local/managed/byok) | byok |
display.show_timestamps |
Show timestamps with transcriptions | true |
display.fade_after_seconds |
Fade out time (0 = never) | 10 |
display.font_source |
Font type (System Font/Web-Safe/Google Font/Custom File) | System Font |
web_server.port |
Local web server port | 8080 |
See config/default_config.yaml for all available options.
Project Structure
local-transcription/
├── src/ # Svelte 5 frontend (Tauri UI)
│ ├── App.svelte # Main app shell
│ ├── lib/components/ # UI components
│ │ ├── Header.svelte
│ │ ├── StatusBar.svelte
│ │ ├── Controls.svelte
│ │ ├── TranscriptionDisplay.svelte
│ │ └── Settings.svelte
│ └── lib/stores/ # Reactive state management
│ ├── backend.ts # WebSocket + REST API client
│ ├── config.ts # App configuration
│ └── transcriptions.ts # Transcription data
├── src-tauri/ # Tauri v2 Rust shell
│ ├── src/main.rs
│ └── tauri.conf.json
├── backend/ # Headless Python backend (sidecar)
│ ├── app_controller.py # Orchestration logic (engine, sync, config)
│ ├── api_server.py # FastAPI REST + WebSocket control API
│ └── main_headless.py # Headless entry point
├── client/ # Core transcription modules
│ ├── audio_capture.py # Audio input handling
│ ├── transcription_engine_realtime.py # RealtimeSTT / Whisper
│ ├── deepgram_transcription.py # Deepgram cloud transcription
│ ├── noise_suppression.py # VAD and noise reduction
│ ├── device_utils.py # CPU/GPU/MPS detection
│ ├── config.py # Configuration management
│ ├── server_sync.py # Multi-user server client
│ └── update_checker.py # Auto-update functionality
├── gui/ # Legacy PySide6/Qt GUI
│ ├── main_window_qt.py
│ ├── settings_dialog_qt.py
│ └── transcription_display_qt.py
├── server/ # Web servers
│ ├── web_display.py # Local FastAPI server for OBS
│ └── nodejs/ # Multi-user sync server
├── .gitea/workflows/ # CI/CD
│ ├── release.yml # Tauri app builds (all platforms)
│ └── build-sidecar.yml # Python sidecar builds (CUDA + CPU)
├── config/
│ └── default_config.yaml # Default settings template
├── main.py # Legacy GUI entry point
├── main_cli.py # CLI version (for testing)
├── local-transcription.spec # PyInstaller config (legacy, with PySide6)
├── local-transcription-headless.spec # PyInstaller config (headless sidecar)
├── pyproject.toml # Python dependencies
└── package.json # Node.js / Tauri dependencies
Technology Stack
Frontend (Tauri)
- Tauri v2 — Native cross-platform shell (Rust)
- Svelte 5 — Reactive UI framework (TypeScript)
- Vite — Frontend build tool
Backend (Python Sidecar)
- Python 3.9+
- FastAPI + Uvicorn — REST API and WebSocket server
- RealtimeSTT — Real-time speech-to-text with advanced VAD
- faster-whisper — Optimized Whisper model inference (CTranslate2)
- PyTorch — ML framework (CUDA-enabled builds available)
- sounddevice — Cross-platform audio capture
- webrtcvad + silero_vad — Voice activity detection
Multi-User Server (Optional)
- Node.js + Express + WebSocket — Real-time sync server
Build & CI/CD
- PyInstaller — Python sidecar packaging
- Tauri CLI — App bundling (.msi, .dmg, .deb, .rpm, .AppImage)
- Gitea Actions — Automated cross-platform builds
- uv — Fast Python package manager
CI/CD
Two Gitea Actions workflows in .gitea/workflows/:
| Workflow | Trigger | Produces |
|---|---|---|
release.yml |
Push to main |
Tauri app installers for all platforms |
build-sidecar.yml |
Changes to client/, server/, backend/, or pyproject.toml |
Python sidecar zips (CUDA + CPU) |
Both workflows require a BUILD_TOKEN secret in the repo settings (Gitea API token with release write access).
Release Artifacts
| Platform | App Installer | Sidecar (CUDA) | Sidecar (CPU) |
|---|---|---|---|
| Linux x86_64 | .deb, .rpm, .AppImage |
sidecar-linux-x86_64-cuda.zip |
sidecar-linux-x86_64-cpu.zip |
| Windows x86_64 | .msi, -setup.exe |
sidecar-windows-x86_64-cuda.zip |
sidecar-windows-x86_64-cpu.zip |
| macOS ARM64 | .dmg |
— | sidecar-macos-aarch64-cpu.zip |
System Requirements
Minimum
- 4GB RAM
- Any modern CPU
Recommended (for local real-time transcription)
- 8GB+ RAM
- NVIDIA GPU with CUDA support (for GPU acceleration)
For Building
- Tauri app: Node.js 20+, Rust stable, platform SDK (see Tauri prerequisites)
- Python sidecar: Python 3.9+, uv, PyInstaller
- Linux:
libgtk-3-dev,libwebkit2gtk-4.1-dev,libappindicator3-dev,librsvg2-dev,patchelf - Windows: Visual Studio Build Tools, WebView2
- macOS: Xcode Command Line Tools
Troubleshooting
macOS: "App is damaged and can't be opened"
macOS Gatekeeper blocks unsigned applications. Since the app is not yet signed with an Apple Developer certificate, you need to remove the quarantine flag before opening:
xattr -cr "/Applications/Local Transcription.app"
Then open the app normally. You only need to do this once after downloading.
Model Loading Issues
- Models download automatically on first use to
~/.cache/huggingface/ - First run requires internet connection
- Check disk space (models range from 75MB to 3GB)
Audio Device Issues
# List available audio devices
uv run python main_cli.py --list-devices
- Ensure microphone permissions are granted (especially on macOS)
- Try different device indices in settings
GPU Not Detected
# Check CUDA availability
uv run python -c "import torch; print(torch.cuda.is_available())"
- Install NVIDIA drivers (CUDA toolkit is bundled in CUDA sidecar builds)
- The app automatically falls back to CPU if no GPU is available
Web Server Port Conflicts
- Default port is 8080; the app tries ports 8080-8084 automatically
- Change in settings or edit config file
- Check for conflicts:
lsof -i :8080(Linux/macOS) ornetstat -ano | findstr :8080(Windows)
Use Cases
- Live Streaming Captions: Add real-time captions to your Twitch/YouTube streams
- Multi-Language Translation: Multiple translators transcribing in different languages
- Accessibility: Provide captions for hearing-impaired viewers
- Podcast Recording: Real-time transcription for multi-host shows
- Gaming Commentary: Track who said what in multiplayer sessions
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests at the repository.
License
MIT License
Acknowledgments
- OpenAI Whisper for the speech recognition model
- RealtimeSTT for real-time transcription capabilities
- faster-whisper for optimized inference
- Tauri for the cross-platform desktop framework
- Deepgram for cloud transcription API