Update README and CLAUDE.md for Tauri rewrite
Update both docs to reflect the new architecture: - Tauri v2 + Svelte 5 frontend replacing PySide6/Qt - Headless Python backend with FastAPI control API - Cross-platform support (Windows, macOS, Linux) - Deepgram remote transcription (managed/BYOK) - Gitea CI/CD workflows for automated builds - New project structure with backend/, src/, src-tauri/ - Updated development commands and build instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
413
CLAUDE.md
413
CLAUDE.md
@@ -4,52 +4,108 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
|||||||
|
|
||||||
## Project Overview
|
## Project Overview
|
||||||
|
|
||||||
Local Transcription is a desktop application for real-time speech-to-text transcription designed for streamers. It uses Whisper models (via faster-whisper) to transcribe audio locally with optional multi-user server synchronization.
|
Local Transcription is a cross-platform desktop application for real-time speech-to-text transcription designed for streamers. It supports local Whisper models and cloud-based Deepgram transcription, with OBS browser source integration and optional multi-user sync.
|
||||||
|
|
||||||
|
**Architecture:** Two-process model — a Tauri v2 shell (Svelte 5 frontend) communicates with a headless Python backend (sidecar) via REST API and WebSocket.
|
||||||
|
|
||||||
**Key Features:**
|
**Key Features:**
|
||||||
- Standalone desktop GUI (PySide6/Qt)
|
- Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
|
||||||
- Local transcription with CPU/GPU support
|
- Headless Python backend with FastAPI control API
|
||||||
- Built-in web server for OBS browser source integration
|
- Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK)
|
||||||
- Optional Node.js-based multi-user server for syncing transcriptions across users
|
- Built-in web server for OBS browser source at `http://localhost:8080`
|
||||||
- Noise suppression and Voice Activity Detection (VAD)
|
- Optional multi-user sync via Node.js server
|
||||||
- Cross-platform builds (Linux/Windows) with PyInstaller
|
- CUDA, MPS (Apple Silicon), and CPU support
|
||||||
|
- Auto-updates, custom fonts, configurable colors
|
||||||
|
|
||||||
|
> **Legacy GUI:** The original PySide6/Qt GUI (`main.py`, `gui/`) still works during the transition. New features should target the Tauri frontend and headless backend.
|
||||||
|
|
||||||
## Project Structure
|
## Project Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
local-transcription/
|
local-transcription/
|
||||||
├── client/ # Core transcription logic
|
├── src/ # Svelte 5 frontend (Tauri UI)
|
||||||
│ ├── audio_capture.py # Audio input and buffering
|
│ ├── App.svelte # Main app shell
|
||||||
│ ├── transcription_engine.py # Whisper model integration
|
│ ├── app.css # Global dark theme styles
|
||||||
│ ├── noise_suppression.py # VAD and noise reduction
|
│ ├── main.ts # Svelte mount point
|
||||||
│ ├── device_utils.py # CPU/GPU device management
|
│ ├── lib/components/ # UI components
|
||||||
│ ├── config.py # Configuration management
|
│ │ ├── Header.svelte # Title bar + settings button
|
||||||
│ └── server_sync.py # Multi-user server sync client
|
│ │ ├── StatusBar.svelte # State indicator, device, user info
|
||||||
├── gui/ # Desktop application UI
|
│ │ ├── Controls.svelte # Start/Stop, Clear, Save buttons
|
||||||
│ ├── main_window_qt.py # Main application window (PySide6)
|
│ │ ├── TranscriptionDisplay.svelte # Scrolling transcript view
|
||||||
│ ├── settings_dialog_qt.py # Settings dialog (PySide6)
|
│ │ └── Settings.svelte # Full settings modal (all sections)
|
||||||
│ └── transcription_display_qt.py # Display widget
|
│ └── lib/stores/ # Svelte 5 reactive stores ($state/$derived)
|
||||||
├── server/ # Web display servers
|
│ ├── backend.ts # WebSocket + REST API client
|
||||||
│ ├── web_display.py # FastAPI server for OBS browser source (local)
|
│ ├── config.ts # App configuration fetch/update
|
||||||
│ └── nodejs/ # Optional multi-user Node.js server
|
│ └── transcriptions.ts # Transcript data management
|
||||||
│ ├── server.js # Multi-user sync server with WebSocket
|
├── src-tauri/ # Tauri v2 Rust shell
|
||||||
│ ├── package.json # Node.js dependencies
|
│ ├── src/lib.rs # Plugin registration (shell, dialog, process)
|
||||||
│ └── README.md # Server deployment documentation
|
│ ├── src/main.rs # Entry point
|
||||||
├── config/ # Example configuration files
|
│ ├── tauri.conf.json # Window, bundle, plugin config
|
||||||
│ └── default_config.yaml # Default settings template
|
│ └── Cargo.toml # Rust dependencies
|
||||||
├── main.py # GUI application entry point
|
├── backend/ # Headless Python backend (the sidecar)
|
||||||
├── main_cli.py # CLI version for testing
|
│ ├── app_controller.py # Core orchestration (engine, sync, config)
|
||||||
└── pyproject.toml # Dependencies and build config
|
│ ├── api_server.py # FastAPI REST endpoints + /ws/control
|
||||||
|
│ └── main_headless.py # Headless entry point (prints JSON to stdout)
|
||||||
|
├── client/ # Core transcription modules (used by backend)
|
||||||
|
│ ├── audio_capture.py # Audio input handling
|
||||||
|
│ ├── transcription_engine_realtime.py # RealtimeSTT / Whisper engine
|
||||||
|
│ ├── deepgram_transcription.py # Deepgram WebSocket cloud transcription
|
||||||
|
│ ├── noise_suppression.py # VAD and noise reduction
|
||||||
|
│ ├── device_utils.py # CPU/GPU/MPS detection
|
||||||
|
│ ├── config.py # YAML config management (~/.local-transcription/)
|
||||||
|
│ ├── server_sync.py # Multi-user server sync client
|
||||||
|
│ ├── instance_lock.py # Single-instance PID lock
|
||||||
|
│ └── update_checker.py # Gitea release update checker
|
||||||
|
├── gui/ # Legacy PySide6/Qt GUI (still functional)
|
||||||
|
│ ├── main_window_qt.py # Main window (orchestration lives here in legacy)
|
||||||
|
│ ├── settings_dialog_qt.py # Settings dialog
|
||||||
|
│ └── transcription_display_qt.py # Display widget
|
||||||
|
├── server/
|
||||||
|
│ ├── web_display.py # FastAPI OBS display server (WebSocket + HTML)
|
||||||
|
│ └── nodejs/ # Optional multi-user sync server
|
||||||
|
├── .gitea/workflows/ # CI/CD
|
||||||
|
│ ├── release.yml # Tauri app builds (Linux/Windows/macOS)
|
||||||
|
│ └── build-sidecar.yml # Python sidecar builds (CUDA + CPU)
|
||||||
|
├── config/default_config.yaml # Default settings template
|
||||||
|
├── main.py # Legacy PySide6 GUI entry point
|
||||||
|
├── main_cli.py # CLI version for testing
|
||||||
|
├── version.py # Version string (__version__)
|
||||||
|
├── local-transcription.spec # PyInstaller config (legacy, includes PySide6)
|
||||||
|
├── local-transcription-headless.spec # PyInstaller config (headless sidecar, no Qt)
|
||||||
|
├── pyproject.toml # Python deps (uv, CUDA PyTorch index)
|
||||||
|
├── package.json # Node/Tauri deps
|
||||||
|
└── vite.config.ts # Vite build config ($lib alias)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Development Commands
|
## Development Commands
|
||||||
|
|
||||||
### Installation and Setup
|
### Frontend (Tauri + Svelte)
|
||||||
```bash
|
```bash
|
||||||
# Install dependencies (creates .venv automatically)
|
# Install npm dependencies
|
||||||
|
npm install
|
||||||
|
|
||||||
|
# Run Tauri in development mode (hot-reload)
|
||||||
|
npm run tauri dev
|
||||||
|
|
||||||
|
# Build frontend only (for testing)
|
||||||
|
npx vite build
|
||||||
|
|
||||||
|
# Type-check Svelte
|
||||||
|
npx svelte-check
|
||||||
|
|
||||||
|
# Check Rust compiles
|
||||||
|
cd src-tauri && cargo check
|
||||||
|
```
|
||||||
|
|
||||||
|
### Backend (Python)
|
||||||
|
```bash
|
||||||
|
# Install Python dependencies
|
||||||
uv sync
|
uv sync
|
||||||
|
|
||||||
# Run the GUI application
|
# Run the headless backend standalone (for development)
|
||||||
|
uv run python -m backend.main_headless --port 8080
|
||||||
|
|
||||||
|
# Run the legacy PySide6 GUI
|
||||||
uv run python main.py
|
uv run python main.py
|
||||||
|
|
||||||
# Run CLI version (headless, for testing)
|
# Run CLI version (headless, for testing)
|
||||||
@@ -57,257 +113,154 @@ uv run python main_cli.py
|
|||||||
|
|
||||||
# List available audio devices
|
# List available audio devices
|
||||||
uv run python main_cli.py --list-devices
|
uv run python main_cli.py --list-devices
|
||||||
|
|
||||||
# Install with CUDA support (if needed)
|
|
||||||
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Building Executables
|
### Building
|
||||||
```bash
|
```bash
|
||||||
# Linux (includes CUDA support - works on both GPU and CPU systems)
|
# Build Tauri app (produces platform installer)
|
||||||
./build.sh
|
npm run tauri build
|
||||||
|
|
||||||
# Windows (includes CUDA support - works on both GPU and CPU systems)
|
# Build headless Python sidecar (no PySide6)
|
||||||
build.bat
|
uv run pyinstaller local-transcription-headless.spec
|
||||||
|
# Output: dist/local-transcription-backend/
|
||||||
|
|
||||||
# Manual build with PyInstaller
|
# Build legacy PySide6 app
|
||||||
uv sync # Install dependencies (includes CUDA PyTorch)
|
|
||||||
uv pip uninstall -q enum34 # Remove incompatible enum34 package
|
|
||||||
uv run pyinstaller local-transcription.spec
|
uv run pyinstaller local-transcription.spec
|
||||||
|
# Or use: ./build.sh (Linux) / build.bat (Windows)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Important:** All builds include CUDA support via `pyproject.toml` configuration. CUDA builds can be created on systems without NVIDIA GPUs. The PyTorch CUDA runtime is bundled, and the app automatically falls back to CPU if no GPU is available.
|
|
||||||
|
|
||||||
### Testing
|
### Testing
|
||||||
```bash
|
```bash
|
||||||
# Run component tests
|
|
||||||
uv run python test_components.py
|
uv run python test_components.py
|
||||||
|
|
||||||
# Check CUDA availability
|
|
||||||
uv run python check_cuda.py
|
uv run python check_cuda.py
|
||||||
|
|
||||||
# Test web server manually
|
|
||||||
uv run python -m uvicorn server.web_display:app --reload
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Architecture
|
## Architecture Details
|
||||||
|
|
||||||
### Audio Processing Pipeline
|
### Communication: Tauri <-> Python Backend
|
||||||
|
|
||||||
1. **Audio Capture** ([client/audio_capture.py](client/audio_capture.py))
|
The Svelte frontend connects to the Python backend via two channels:
|
||||||
- Captures audio from microphone/system using sounddevice
|
|
||||||
- Handles automatic sample rate detection and resampling
|
|
||||||
- Uses chunking with overlap for better transcription quality
|
|
||||||
- Default: 3-second chunks with 0.5s overlap
|
|
||||||
|
|
||||||
2. **Noise Suppression** ([client/noise_suppression.py](client/noise_suppression.py))
|
**REST API** (on port 8081 by default):
|
||||||
- Applies noisereduce for background noise reduction
|
- `GET /api/status` — app state, device info, version
|
||||||
- Voice Activity Detection (VAD) using webrtcvad
|
- `POST /api/start` / `POST /api/stop` — transcription control
|
||||||
- Skips silent segments to improve performance
|
- `GET /api/config` / `PUT /api/config` — read/write settings (dot-notation keys)
|
||||||
|
- `GET /api/audio-devices` / `GET /api/compute-devices` — device enumeration
|
||||||
|
- `POST /api/reload-engine` — reload with new model/device
|
||||||
|
- `GET /api/transcriptions` / `POST /api/clear` — transcript management
|
||||||
|
- `POST /api/save-file` — write text to a file path
|
||||||
|
- `GET /api/check-update` / `POST /api/skip-version` — update management
|
||||||
|
- `POST /api/login` / `POST /api/register` / `GET /api/balance` — managed mode proxy
|
||||||
|
|
||||||
3. **Transcription** ([client/transcription_engine.py](client/transcription_engine.py))
|
**WebSocket** `/ws/control`:
|
||||||
- Uses faster-whisper for efficient inference
|
- Pushes real-time events: `state_changed`, `transcription`, `preview`, `error`, `credits_low`
|
||||||
- Supports CPU, CUDA, and Apple MPS (Mac)
|
- Client sends keepalive pings
|
||||||
- Models: tiny, base, small, medium, large
|
|
||||||
- Thread-safe model loading with locks
|
|
||||||
|
|
||||||
4. **Display** ([gui/main_window_qt.py](gui/main_window_qt.py))
|
The OBS display server runs separately on port 8080 (`GET /` for HTML, `WebSocket /ws` for transcriptions).
|
||||||
- PySide6/Qt-based desktop GUI
|
|
||||||
- Real-time transcription display with scrolling
|
|
||||||
- Settings panel with live updates (no restart needed)
|
|
||||||
|
|
||||||
### Web Server Architecture
|
### Backend Process Lifecycle
|
||||||
|
|
||||||
**Local Web Server** ([server/web_display.py](server/web_display.py))
|
1. `main_headless.py` starts, acquires instance lock, creates `AppController`
|
||||||
- Always runs when GUI starts (port 8080 by default)
|
2. `AppController.initialize()` starts the OBS web server (port 8080) and engine init thread
|
||||||
- FastAPI with WebSocket for real-time updates
|
3. `APIServer` wraps the controller with FastAPI routes, runs on port 8081
|
||||||
- Used for OBS browser source integration
|
4. Backend prints `{"event": "ready", "port": 8080}` to stdout for Tauri to discover
|
||||||
- Single-user (displays only local transcriptions)
|
5. On shutdown: engine stopped, web server stopped, lock released
|
||||||
|
|
||||||
**Multi-User Server** (Optional - for syncing across multiple users)
|
### Headless Backend vs Legacy GUI
|
||||||
|
|
||||||
**Node.js WebSocket Server** ([server/nodejs/](server/nodejs/)) - **RECOMMENDED**
|
The `AppController` class (`backend/app_controller.py`) extracts all orchestration logic from `gui/main_window_qt.py` into a Qt-free class. The mapping:
|
||||||
- Real-time WebSocket support (< 100ms latency)
|
|
||||||
- Handles 100+ concurrent users
|
|
||||||
- Easy deployment to VPS/cloud hosting (Railway, Heroku, DigitalOcean, or any VPS)
|
|
||||||
- Configurable display options via URL parameters:
|
|
||||||
- `timestamps=true/false` - Show/hide timestamps
|
|
||||||
- `maxlines=50` - Maximum visible lines (prevents scroll bars in OBS)
|
|
||||||
- `fontsize=16` - Font size in pixels
|
|
||||||
- `fontfamily=Arial` - Font family
|
|
||||||
- `fade=10` - Seconds before text fades (0 = never)
|
|
||||||
|
|
||||||
See [server/nodejs/README.md](server/nodejs/README.md) for deployment instructions
|
| Legacy (MainWindow) | Headless (AppController) |
|
||||||
|
|---------------------|--------------------------|
|
||||||
|
| `_initialize_components()` | `_initialize_engine()` |
|
||||||
|
| `_start_transcription()` | `start_transcription()` |
|
||||||
|
| `_stop_transcription()` | `stop_transcription()` |
|
||||||
|
| `_on_settings_saved()` | `apply_settings()` |
|
||||||
|
| `_reload_engine()` | `reload_engine()` |
|
||||||
|
| `_start_web_server_if_enabled()` | `_start_web_server()` |
|
||||||
|
| `_start_server_sync()` | `_start_server_sync()` |
|
||||||
|
| Qt signals | Callbacks (`on_state_changed`, `on_transcription`, etc.) |
|
||||||
|
|
||||||
### Configuration System
|
### Threading Model (Headless)
|
||||||
|
|
||||||
- Config stored at `~/.local-transcription/config.yaml`
|
- Main thread: Uvicorn (FastAPI) event loop
|
||||||
- Managed by [client/config.py](client/config.py)
|
- Engine init thread: Downloads models, initializes VAD
|
||||||
- Settings apply immediately without restart (except model changes)
|
- Web server thread: Separate asyncio loop for OBS display
|
||||||
- YAML format with nested keys (e.g., `transcription.model`)
|
- Audio capture: Runs in engine callback threads
|
||||||
|
- All results flow through `AppController` callbacks -> `APIServer` WebSocket broadcast
|
||||||
|
|
||||||
### Device Management
|
### Svelte Frontend
|
||||||
|
|
||||||
- [client/device_utils.py](client/device_utils.py) handles CPU/GPU detection
|
Uses Svelte 5 runes throughout (`$state`, `$derived`, `$effect`, `$props`). No Svelte 4 patterns.
|
||||||
- Auto-detects CUDA, MPS (Mac), or falls back to CPU
|
|
||||||
- Compute types: float32 (best quality), float16 (GPU), int8 (fastest)
|
|
||||||
- Thread-safe device selection
|
|
||||||
|
|
||||||
## Key Implementation Details
|
**Stores** (`src/lib/stores/`):
|
||||||
|
- `backend.ts` — WebSocket connection + REST helpers (`apiGet`, `apiPost`, `apiPut`), auto-reconnect
|
||||||
|
- `config.ts` — fetches/updates config from backend API
|
||||||
|
- `transcriptions.ts` — manages transcript list, listens for `CustomEvent`s from backend store
|
||||||
|
|
||||||
### PyInstaller Build Configuration
|
**Key patterns:**
|
||||||
|
- Backend store dispatches `CustomEvent`s on `window` for cross-store communication
|
||||||
|
- Settings component collects all changed values into a `Record<string, any>` with dot-notation keys, sends via `PUT /api/config`
|
||||||
|
- Controls use Tauri dialog plugin for native file save, falls back to blob download
|
||||||
|
|
||||||
- [local-transcription.spec](local-transcription.spec) controls build
|
## CI/CD
|
||||||
- UPX compression enabled for smaller executables
|
|
||||||
- Hidden imports required for PySide6, faster-whisper, torch
|
|
||||||
- Console mode enabled by default (set `console=False` to hide)
|
|
||||||
|
|
||||||
### Threading Model
|
Two Gitea Actions workflows in `.gitea/workflows/`:
|
||||||
|
|
||||||
- Main thread: Qt GUI event loop
|
- **`release.yml`**: Triggers on push to `main`. Auto-bumps version, builds Tauri app on Linux/Windows/macOS, uploads `.deb`, `.rpm`, `.msi`, `.dmg` to Gitea release.
|
||||||
- Audio thread: Captures and processes audio chunks
|
- **`build-sidecar.yml`**: Triggers on changes to `client/`, `server/`, `backend/`, `pyproject.toml`. Builds headless Python sidecar via PyInstaller. CUDA + CPU for Linux/Windows, CPU-only for macOS.
|
||||||
- Web server thread: Runs FastAPI server
|
|
||||||
- Transcription: Runs in callback thread from audio capture
|
|
||||||
- All transcription results communicated via Qt signals
|
|
||||||
|
|
||||||
### Server Sync (Optional Multi-User Feature)
|
Both require a `BUILD_TOKEN` secret (Gitea API token with release write access).
|
||||||
|
|
||||||
- [client/server_sync.py](client/server_sync.py) handles server communication
|
|
||||||
- Toggle in Settings: "Enable Server Sync"
|
|
||||||
- Sends transcriptions to Node.js server via HTTP POST
|
|
||||||
- Real-time updates via WebSocket to display page
|
|
||||||
- Per-speaker font support (Web-Safe, Google Fonts, Custom uploads)
|
|
||||||
- Falls back gracefully if server unavailable
|
|
||||||
|
|
||||||
## Common Patterns
|
## Common Patterns
|
||||||
|
|
||||||
### Adding a New Setting
|
### Adding a New Setting
|
||||||
|
|
||||||
1. Add to [config/default_config.yaml](config/default_config.yaml)
|
1. Add default to [config/default_config.yaml](config/default_config.yaml)
|
||||||
2. Update [client/config.py](client/config.py) if validation needed
|
2. Add UI control in [src/lib/components/Settings.svelte](src/lib/components/Settings.svelte)
|
||||||
3. Add UI control in [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py)
|
3. Ensure the setting is included in the save handler's config update
|
||||||
4. Apply setting in relevant component (no restart if possible)
|
4. Apply in `AppController.apply_settings()` or the relevant component
|
||||||
5. Emit signal to update display if needed
|
5. For legacy GUI: also update [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py)
|
||||||
|
|
||||||
|
### Adding a New API Endpoint
|
||||||
|
|
||||||
|
1. Add route in [backend/api_server.py](backend/api_server.py) `_setup_routes()`
|
||||||
|
2. Add supporting logic in [backend/app_controller.py](backend/app_controller.py) if needed
|
||||||
|
3. Call from Svelte via `backendStore.apiGet/apiPost/apiPut`
|
||||||
|
|
||||||
### Modifying Transcription Display
|
### Modifying Transcription Display
|
||||||
|
|
||||||
- Local GUI: [gui/transcription_display_qt.py](gui/transcription_display_qt.py)
|
- Tauri UI: [src/lib/components/TranscriptionDisplay.svelte](src/lib/components/TranscriptionDisplay.svelte)
|
||||||
- Local web display (OBS): [server/web_display.py](server/web_display.py) (HTML in `_get_html()`)
|
- OBS display: [server/web_display.py](server/web_display.py) (HTML in `_get_html()`)
|
||||||
- Multi-user display: [server/nodejs/server.js](server/nodejs/server.js) (display page in `/display` route)
|
- Multi-user display: [server/nodejs/server.js](server/nodejs/server.js) (display page in `/display` route)
|
||||||
|
|
||||||
### Adding a New Model Size
|
|
||||||
|
|
||||||
- Update [client/transcription_engine.py](client/transcription_engine.py)
|
|
||||||
- Add to model selector in [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py)
|
|
||||||
- Update CLI argument choices in [main_cli.py](main_cli.py)
|
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
**Core:**
|
**Frontend:** Tauri v2, Svelte 5, Vite, TypeScript
|
||||||
- `faster-whisper`: Optimized Whisper inference
|
**Backend:** Python 3.9+, FastAPI, Uvicorn, RealtimeSTT, faster-whisper, PyTorch (CUDA), sounddevice
|
||||||
- `torch`: ML framework (CUDA-enabled via special index)
|
**Build:** PyInstaller (sidecar), Tauri CLI (app), uv (Python packages)
|
||||||
- `PySide6`: Qt6 bindings for GUI
|
**CI:** Gitea Actions with platform-specific runners
|
||||||
- `sounddevice`: Cross-platform audio I/O
|
|
||||||
- `noisereduce`, `webrtcvad`: Audio preprocessing
|
|
||||||
|
|
||||||
**Web Server:**
|
|
||||||
- `fastapi`, `uvicorn`: Web server and ASGI
|
|
||||||
- `websockets`: Real-time communication
|
|
||||||
|
|
||||||
**Build:**
|
|
||||||
- `pyinstaller`: Create standalone executables
|
|
||||||
- `uv`: Fast package manager
|
|
||||||
|
|
||||||
**PyTorch CUDA Index:**
|
|
||||||
- Configured in [pyproject.toml](pyproject.toml) under `[[tool.uv.index]]`
|
|
||||||
- Uses PyTorch's custom wheel repository for CUDA builds
|
|
||||||
- Automatically installed with `uv sync` when using CUDA build scripts
|
|
||||||
|
|
||||||
## Platform-Specific Notes
|
## Platform-Specific Notes
|
||||||
|
|
||||||
### Linux
|
### Linux
|
||||||
- Uses PulseAudio/ALSA for audio
|
- Tauri needs: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`, `patchelf`
|
||||||
- Build scripts use bash (`.sh` files)
|
- Audio: PulseAudio/ALSA via sounddevice
|
||||||
- Executable: `dist/LocalTranscription/LocalTranscription`
|
|
||||||
|
|
||||||
### Windows
|
### Windows
|
||||||
- Uses Windows Audio/WASAPI
|
- Tauri needs: WebView2 (usually pre-installed on Windows 10+)
|
||||||
- Build scripts use batch (`.bat` files)
|
- Audio: WASAPI via sounddevice
|
||||||
- Executable: `dist\LocalTranscription\LocalTranscription.exe`
|
|
||||||
- Requires Visual C++ Redistributable on target systems
|
|
||||||
|
|
||||||
### Cross-Building
|
### macOS
|
||||||
- **Cannot cross-compile** - must build on target platform
|
- Tauri needs: Xcode Command Line Tools
|
||||||
- CI/CD should use platform-specific runners
|
- Audio: CoreAudio via sounddevice
|
||||||
|
- GPU: MPS (Apple Silicon) detected by `device_utils.py`
|
||||||
## Troubleshooting
|
- `Info.plist` must include `NSMicrophoneUsageDescription` for mic access
|
||||||
|
- No CUDA builds — CPU/MPS only
|
||||||
### Model Loading Issues
|
|
||||||
- Models download to `~/.cache/huggingface/`
|
|
||||||
- First run requires internet connection
|
|
||||||
- Check disk space (models: 75MB-3GB depending on size)
|
|
||||||
|
|
||||||
### Audio Device Issues
|
|
||||||
- Run `uv run python main_cli.py --list-devices`
|
|
||||||
- Check permissions (microphone access)
|
|
||||||
- Try different device indices in settings
|
|
||||||
|
|
||||||
### GPU Not Detected
|
|
||||||
- Run `uv run python check_cuda.py`
|
|
||||||
- Install CUDA drivers (not CUDA toolkit - bundled in build)
|
|
||||||
- Verify PyTorch sees GPU: `python -c "import torch; print(torch.cuda.is_available())"`
|
|
||||||
|
|
||||||
### Web Server Port Conflicts
|
|
||||||
- Default port: 8080
|
|
||||||
- Change in [gui/main_window_qt.py](gui/main_window_qt.py) or config
|
|
||||||
- Use `lsof -i :8080` (Linux) or `netstat -ano | findstr :8080` (Windows)
|
|
||||||
|
|
||||||
## OBS Integration
|
|
||||||
|
|
||||||
### Local Display (Single User)
|
|
||||||
1. Start Local Transcription app
|
|
||||||
2. In OBS: Add "Browser" source
|
|
||||||
3. URL: `http://localhost:8080`
|
|
||||||
4. Set dimensions (e.g., 1920x300)
|
|
||||||
|
|
||||||
### Multi-User Display (Node.js Server)
|
|
||||||
1. Deploy Node.js server (see [server/nodejs/README.md](server/nodejs/README.md))
|
|
||||||
2. Each user configures Server URL: `http://your-server:3000/api/send`
|
|
||||||
3. Enter same room name and passphrase
|
|
||||||
4. In OBS: Add "Browser" source
|
|
||||||
5. URL: `http://your-server:3000/display?room=ROOM&fade=10×tamps=true&maxlines=50&fontsize=16`
|
|
||||||
6. Customize URL parameters as needed:
|
|
||||||
- `timestamps=false` - Hide timestamps
|
|
||||||
- `maxlines=30` - Show max 30 lines (prevents scroll bars)
|
|
||||||
- `fontsize=18` - Larger font
|
|
||||||
- `fontfamily=Courier` - Different font
|
|
||||||
|
|
||||||
## Performance Optimization
|
|
||||||
|
|
||||||
**For Real-Time Transcription:**
|
|
||||||
- Use `tiny` or `base` model (faster)
|
|
||||||
- Enable GPU if available (5-10x faster)
|
|
||||||
- Increase chunk_duration for better accuracy (higher latency)
|
|
||||||
- Decrease chunk_duration for lower latency (less context)
|
|
||||||
- Enable VAD to skip silent audio
|
|
||||||
|
|
||||||
**For Build Size Reduction:**
|
|
||||||
- Don't bundle models (download on demand)
|
|
||||||
- Use CPU-only build if no GPU users
|
|
||||||
- Enable UPX compression (already in spec)
|
|
||||||
|
|
||||||
## Phase Status
|
|
||||||
|
|
||||||
- ✅ **Phase 1**: Standalone desktop application (complete)
|
|
||||||
- ✅ **Web Server**: Local OBS integration (complete)
|
|
||||||
- ✅ **Builds**: PyInstaller executables (complete)
|
|
||||||
- ✅ **Phase 2**: Multi-user Node.js server (complete, optional)
|
|
||||||
- ⏸️ **Phase 3+**: Advanced features (see [NEXT_STEPS.md](NEXT_STEPS.md))
|
|
||||||
|
|
||||||
## Related Documentation
|
## Related Documentation
|
||||||
|
|
||||||
- [README.md](README.md) - User-facing documentation
|
- [README.md](README.md) — User-facing documentation
|
||||||
- [BUILD.md](BUILD.md) - Detailed build instructions
|
- [BUILD.md](BUILD.md) — Detailed build instructions
|
||||||
- [INSTALL.md](INSTALL.md) - Installation guide
|
- [INSTALL.md](INSTALL.md) — Installation guide
|
||||||
- [NEXT_STEPS.md](NEXT_STEPS.md) - Future enhancements
|
- [server/nodejs/README.md](server/nodejs/README.md) — Node.js server setup
|
||||||
- [server/nodejs/README.md](server/nodejs/README.md) - Node.js server setup and deployment
|
|
||||||
|
|||||||
224
README.md
224
README.md
@@ -1,13 +1,14 @@
|
|||||||
# Local Transcription
|
# Local Transcription
|
||||||
|
|
||||||
A real-time speech-to-text desktop application for streamers. Run locally on your machine with GPU or CPU, display transcriptions via OBS browser source, and optionally sync with other users through a multi-user server.
|
A real-time speech-to-text desktop application for streamers. Runs locally on your machine with GPU or CPU, displays transcriptions via OBS browser source, and optionally syncs with other users through a multi-user server.
|
||||||
|
|
||||||
**Version 1.4.0**
|
**Version 1.4.0**
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
|
- **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
|
||||||
- **Standalone Desktop App**: PySide6/Qt GUI that works without any server
|
- **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/)
|
||||||
|
- **Dual Transcription Modes**: Local (Whisper) or cloud (Deepgram) with managed billing or BYOK
|
||||||
- **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback
|
- **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback
|
||||||
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
|
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
|
||||||
- **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
|
- **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
|
||||||
@@ -16,36 +17,70 @@ A real-time speech-to-text desktop application for streamers. Run locally on you
|
|||||||
- **Customizable Colors**: User-configurable colors for name, text, and background
|
- **Customizable Colors**: User-configurable colors for name, text, and background
|
||||||
- **Noise Suppression**: Built-in audio preprocessing to reduce background noise
|
- **Noise Suppression**: Built-in audio preprocessing to reduce background noise
|
||||||
- **Auto-Updates**: Automatic update checking with release notes display
|
- **Auto-Updates**: Automatic update checking with release notes display
|
||||||
- **Cross-Platform**: Builds available for Windows and Linux
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
The application uses a two-process architecture:
|
||||||
|
|
||||||
|
1. **Tauri Shell** (Svelte 5 frontend) — lightweight native window (~50MB) rendering the UI
|
||||||
|
2. **Python Backend** (sidecar) — headless process running transcription, audio capture, and the OBS web server
|
||||||
|
|
||||||
|
The Tauri frontend communicates with the Python backend via REST API and WebSocket, following the same pattern as [voice-to-notes](https://repo.anhonesthost.net/MacroPad/voice-to-notes).
|
||||||
|
|
||||||
|
```
|
||||||
|
Tauri App (user launches this)
|
||||||
|
└─ Spawns Python backend as sidecar
|
||||||
|
├─ FastAPI REST API (control endpoints)
|
||||||
|
├─ WebSocket /ws/control (real-time state + transcriptions)
|
||||||
|
├─ OBS web display at http://localhost:8080
|
||||||
|
└─ Transcription engine (Whisper or Deepgram)
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Legacy GUI**: The original PySide6/Qt desktop GUI (`main.py`) still works alongside the new Tauri frontend during the transition period.
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
### Running from Source
|
### Running from Source
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Install dependencies
|
# Install Python dependencies
|
||||||
uv sync
|
uv sync
|
||||||
|
|
||||||
# Run the application
|
# Run the Tauri app (frontend + backend)
|
||||||
|
npm install
|
||||||
|
npm run tauri dev
|
||||||
|
|
||||||
|
# Or run just the headless backend (for development)
|
||||||
|
uv run python -m backend.main_headless
|
||||||
|
|
||||||
|
# Or run the legacy PySide6 GUI
|
||||||
uv run python main.py
|
uv run python main.py
|
||||||
```
|
```
|
||||||
|
|
||||||
### Using Pre-Built Executables
|
### Using Pre-Built Executables
|
||||||
|
|
||||||
Download the latest release from the [releases page](https://repo.anhonesthost.net/streamer-tools/local-transcription/releases) and run the executable for your platform.
|
Download the latest release from the [releases page](https://repo.anhonesthost.net/streamer-tools/local-transcription/releases):
|
||||||
|
|
||||||
|
- **App installer** (Tauri shell): `.msi` (Windows), `.dmg` (macOS), `.deb`/`.rpm`/`.AppImage` (Linux)
|
||||||
|
- **Sidecar** (Python backend): Download the matching `sidecar-*` zip for your platform (CUDA or CPU)
|
||||||
|
|
||||||
### Building from Source
|
### Building from Source
|
||||||
|
|
||||||
**Linux:**
|
|
||||||
```bash
|
```bash
|
||||||
./build.sh
|
# Build the Tauri app
|
||||||
# Output: dist/LocalTranscription/LocalTranscription
|
npm install
|
||||||
```
|
npm run tauri build
|
||||||
|
# Output: src-tauri/target/release/bundle/
|
||||||
|
|
||||||
**Windows:**
|
# Build the Python sidecar (headless, no Qt)
|
||||||
```cmd
|
uv sync
|
||||||
|
uv run pyinstaller local-transcription-headless.spec
|
||||||
|
# Output: dist/local-transcription-backend/
|
||||||
|
|
||||||
|
# Build the legacy PySide6 app (Linux)
|
||||||
|
./build.sh
|
||||||
|
# Build the legacy PySide6 app (Windows)
|
||||||
build.bat
|
build.bat
|
||||||
# Output: dist\LocalTranscription\LocalTranscription.exe
|
|
||||||
```
|
```
|
||||||
|
|
||||||
For detailed build instructions, see [BUILD.md](BUILD.md).
|
For detailed build instructions, see [BUILD.md](BUILD.md).
|
||||||
@@ -57,14 +92,23 @@ For detailed build instructions, see [BUILD.md](BUILD.md).
|
|||||||
1. Launch the application
|
1. Launch the application
|
||||||
2. Select your microphone from the audio device dropdown
|
2. Select your microphone from the audio device dropdown
|
||||||
3. Choose a Whisper model (smaller = faster, larger = more accurate):
|
3. Choose a Whisper model (smaller = faster, larger = more accurate):
|
||||||
- `tiny.en` / `tiny` - Fastest, good for quick captions
|
- `tiny.en` / `tiny` — Fastest, good for quick captions
|
||||||
- `base.en` / `base` - Balanced speed and accuracy
|
- `base.en` / `base` — Balanced speed and accuracy
|
||||||
- `small.en` / `small` - Better accuracy
|
- `small.en` / `small` — Better accuracy
|
||||||
- `medium.en` / `medium` - High accuracy
|
- `medium.en` / `medium` — High accuracy
|
||||||
- `large-v3` - Best accuracy (requires more resources)
|
- `large-v3` — Best accuracy (requires more resources)
|
||||||
4. Click **Start** to begin transcription
|
4. Click **Start** to begin transcription
|
||||||
5. Transcriptions appear in the main window and at `http://localhost:8080`
|
5. Transcriptions appear in the main window and at `http://localhost:8080`
|
||||||
|
|
||||||
|
### Remote Transcription (Deepgram)
|
||||||
|
|
||||||
|
Instead of local Whisper models, you can use cloud-based transcription:
|
||||||
|
|
||||||
|
- **Managed mode**: Sign up via the transcription proxy for metered billing
|
||||||
|
- **BYOK mode**: Bring your own Deepgram API key for direct access
|
||||||
|
|
||||||
|
Configure in Settings > Remote Transcription.
|
||||||
|
|
||||||
### OBS Browser Source Setup
|
### OBS Browser Source Setup
|
||||||
|
|
||||||
1. Start the Local Transcription app
|
1. Start the Local Transcription app
|
||||||
@@ -88,7 +132,7 @@ For syncing transcriptions across multiple users (e.g., multi-host streams or tr
|
|||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
Settings are stored at `~/.local-transcription/config.yaml` and can be modified through the GUI settings panel.
|
Settings are stored at `~/.local-transcription/config.yaml` and can be modified through the GUI settings panel or the REST API.
|
||||||
|
|
||||||
### Key Settings
|
### Key Settings
|
||||||
|
|
||||||
@@ -100,6 +144,7 @@ Settings are stored at `~/.local-transcription/config.yaml` and can be modified
|
|||||||
| `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
|
| `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
|
||||||
| `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
|
| `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
|
||||||
| `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
|
| `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
|
||||||
|
| `remote.mode` | Transcription mode (local/managed/byok) | `local` |
|
||||||
| `display.show_timestamps` | Show timestamps with transcriptions | `true` |
|
| `display.show_timestamps` | Show timestamps with transcriptions | `true` |
|
||||||
| `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
|
| `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
|
||||||
| `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |
|
| `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |
|
||||||
@@ -111,67 +156,114 @@ See [config/default_config.yaml](config/default_config.yaml) for all available o
|
|||||||
|
|
||||||
```
|
```
|
||||||
local-transcription/
|
local-transcription/
|
||||||
├── client/ # Core transcription modules
|
├── src/ # Svelte 5 frontend (Tauri UI)
|
||||||
│ ├── audio_capture.py # Audio input handling
|
│ ├── App.svelte # Main app shell
|
||||||
│ ├── transcription_engine_realtime.py # RealtimeSTT integration
|
│ ├── lib/components/ # UI components
|
||||||
│ ├── noise_suppression.py # VAD and noise reduction
|
│ │ ├── Header.svelte
|
||||||
│ ├── device_utils.py # CPU/GPU detection
|
│ │ ├── StatusBar.svelte
|
||||||
│ ├── config.py # Configuration management
|
│ │ ├── Controls.svelte
|
||||||
│ ├── server_sync.py # Multi-user server client
|
│ │ ├── TranscriptionDisplay.svelte
|
||||||
│ └── update_checker.py # Auto-update functionality
|
│ │ └── Settings.svelte
|
||||||
├── gui/ # Desktop application UI
|
│ └── lib/stores/ # Reactive state management
|
||||||
│ ├── main_window_qt.py # Main application window
|
│ ├── backend.ts # WebSocket + REST API client
|
||||||
│ ├── settings_dialog_qt.py # Settings dialog
|
│ ├── config.ts # App configuration
|
||||||
│ └── transcription_display_qt.py # Display widget
|
│ └── transcriptions.ts # Transcription data
|
||||||
├── server/ # Web servers
|
├── src-tauri/ # Tauri v2 Rust shell
|
||||||
│ ├── web_display.py # Local FastAPI server for OBS
|
│ ├── src/main.rs
|
||||||
│ └── nodejs/ # Multi-user sync server
|
│ └── tauri.conf.json
|
||||||
│ ├── server.js # Express + WebSocket server
|
├── backend/ # Headless Python backend (sidecar)
|
||||||
│ └── README.md # Deployment instructions
|
│ ├── app_controller.py # Orchestration logic (engine, sync, config)
|
||||||
|
│ ├── api_server.py # FastAPI REST + WebSocket control API
|
||||||
|
│ └── main_headless.py # Headless entry point
|
||||||
|
├── client/ # Core transcription modules
|
||||||
|
│ ├── audio_capture.py # Audio input handling
|
||||||
|
│ ├── transcription_engine_realtime.py # RealtimeSTT / Whisper
|
||||||
|
│ ├── deepgram_transcription.py # Deepgram cloud transcription
|
||||||
|
│ ├── noise_suppression.py # VAD and noise reduction
|
||||||
|
│ ├── device_utils.py # CPU/GPU/MPS detection
|
||||||
|
│ ├── config.py # Configuration management
|
||||||
|
│ ├── server_sync.py # Multi-user server client
|
||||||
|
│ └── update_checker.py # Auto-update functionality
|
||||||
|
├── gui/ # Legacy PySide6/Qt GUI
|
||||||
|
│ ├── main_window_qt.py
|
||||||
|
│ ├── settings_dialog_qt.py
|
||||||
|
│ └── transcription_display_qt.py
|
||||||
|
├── server/ # Web servers
|
||||||
|
│ ├── web_display.py # Local FastAPI server for OBS
|
||||||
|
│ └── nodejs/ # Multi-user sync server
|
||||||
|
├── .gitea/workflows/ # CI/CD
|
||||||
|
│ ├── release.yml # Tauri app builds (all platforms)
|
||||||
|
│ └── build-sidecar.yml # Python sidecar builds (CUDA + CPU)
|
||||||
├── config/
|
├── config/
|
||||||
│ └── default_config.yaml # Default settings template
|
│ └── default_config.yaml # Default settings template
|
||||||
├── main.py # GUI entry point
|
├── main.py # Legacy GUI entry point
|
||||||
├── main_cli.py # CLI version (for testing)
|
├── main_cli.py # CLI version (for testing)
|
||||||
├── build.sh # Linux build script
|
├── local-transcription.spec # PyInstaller config (legacy, with PySide6)
|
||||||
├── build.bat # Windows build script
|
├── local-transcription-headless.spec # PyInstaller config (headless sidecar)
|
||||||
└── local-transcription.spec # PyInstaller configuration
|
├── pyproject.toml # Python dependencies
|
||||||
|
└── package.json # Node.js / Tauri dependencies
|
||||||
```
|
```
|
||||||
|
|
||||||
## Technology Stack
|
## Technology Stack
|
||||||
|
|
||||||
### Desktop Application
|
### Frontend (Tauri)
|
||||||
|
- **Tauri v2** — Native cross-platform shell (Rust)
|
||||||
|
- **Svelte 5** — Reactive UI framework (TypeScript)
|
||||||
|
- **Vite** — Frontend build tool
|
||||||
|
|
||||||
|
### Backend (Python Sidecar)
|
||||||
- **Python 3.9+**
|
- **Python 3.9+**
|
||||||
- **PySide6** - Qt6 GUI framework
|
- **FastAPI + Uvicorn** — REST API and WebSocket server
|
||||||
- **RealtimeSTT** - Real-time speech-to-text with advanced VAD
|
- **RealtimeSTT** — Real-time speech-to-text with advanced VAD
|
||||||
- **faster-whisper** - Optimized Whisper model inference
|
- **faster-whisper** — Optimized Whisper model inference (CTranslate2)
|
||||||
- **PyTorch** - ML framework (CUDA-enabled)
|
- **PyTorch** — ML framework (CUDA-enabled builds available)
|
||||||
- **sounddevice** - Cross-platform audio capture
|
- **sounddevice** — Cross-platform audio capture
|
||||||
- **webrtcvad + silero_vad** - Voice activity detection
|
- **webrtcvad + silero_vad** — Voice activity detection
|
||||||
- **noisereduce** - Noise suppression
|
|
||||||
|
|
||||||
### Web Servers
|
### Multi-User Server (Optional)
|
||||||
- **FastAPI + Uvicorn** - Local web display server
|
- **Node.js + Express + WebSocket** — Real-time sync server
|
||||||
- **Node.js + Express + WebSocket** - Multi-user sync server
|
|
||||||
|
|
||||||
### Build Tools
|
### Build & CI/CD
|
||||||
- **PyInstaller** - Executable packaging
|
- **PyInstaller** — Python sidecar packaging
|
||||||
- **uv** - Fast Python package manager
|
- **Tauri CLI** — App bundling (.msi, .dmg, .deb, .rpm, .AppImage)
|
||||||
|
- **Gitea Actions** — Automated cross-platform builds
|
||||||
|
- **uv** — Fast Python package manager
|
||||||
|
|
||||||
|
## CI/CD
|
||||||
|
|
||||||
|
Two Gitea Actions workflows in `.gitea/workflows/`:
|
||||||
|
|
||||||
|
| Workflow | Trigger | Produces |
|
||||||
|
|----------|---------|----------|
|
||||||
|
| `release.yml` | Push to `main` | Tauri app installers for all platforms |
|
||||||
|
| `build-sidecar.yml` | Changes to `client/`, `server/`, `backend/`, or `pyproject.toml` | Python sidecar zips (CUDA + CPU) |
|
||||||
|
|
||||||
|
Both workflows require a `BUILD_TOKEN` secret in the repo settings (Gitea API token with release write access).
|
||||||
|
|
||||||
|
### Release Artifacts
|
||||||
|
|
||||||
|
| Platform | App Installer | Sidecar (CUDA) | Sidecar (CPU) |
|
||||||
|
|----------|--------------|----------------|---------------|
|
||||||
|
| Linux x86_64 | `.deb`, `.rpm`, `.AppImage` | `sidecar-linux-x86_64-cuda.zip` | `sidecar-linux-x86_64-cpu.zip` |
|
||||||
|
| Windows x86_64 | `.msi`, `-setup.exe` | `sidecar-windows-x86_64-cuda.zip` | `sidecar-windows-x86_64-cpu.zip` |
|
||||||
|
| macOS ARM64 | `.dmg` | — | `sidecar-macos-aarch64-cpu.zip` |
|
||||||
|
|
||||||
## System Requirements
|
## System Requirements
|
||||||
|
|
||||||
### Minimum
|
### Minimum
|
||||||
- Python 3.9+
|
|
||||||
- 4GB RAM
|
- 4GB RAM
|
||||||
- Any modern CPU
|
- Any modern CPU
|
||||||
|
|
||||||
### Recommended (for real-time performance)
|
### Recommended (for local real-time transcription)
|
||||||
- 8GB+ RAM
|
- 8GB+ RAM
|
||||||
- NVIDIA GPU with CUDA support (for GPU acceleration)
|
- NVIDIA GPU with CUDA support (for GPU acceleration)
|
||||||
- FFmpeg (installed automatically with dependencies)
|
|
||||||
|
|
||||||
### For Building
|
### For Building
|
||||||
- **Linux**: gcc, Python dev headers
|
- **Tauri app**: Node.js 20+, Rust stable, platform SDK (see [Tauri prerequisites](https://tauri.app/start/prerequisites/))
|
||||||
- **Windows**: Visual Studio Build Tools, Python dev headers
|
- **Python sidecar**: Python 3.9+, uv, PyInstaller
|
||||||
|
- **Linux**: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`, `patchelf`
|
||||||
|
- **Windows**: Visual Studio Build Tools, WebView2
|
||||||
|
- **macOS**: Xcode Command Line Tools
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
@@ -185,7 +277,7 @@ local-transcription/
|
|||||||
# List available audio devices
|
# List available audio devices
|
||||||
uv run python main_cli.py --list-devices
|
uv run python main_cli.py --list-devices
|
||||||
```
|
```
|
||||||
- Ensure microphone permissions are granted
|
- Ensure microphone permissions are granted (especially on macOS)
|
||||||
- Try different device indices in settings
|
- Try different device indices in settings
|
||||||
|
|
||||||
### GPU Not Detected
|
### GPU Not Detected
|
||||||
@@ -193,13 +285,13 @@ uv run python main_cli.py --list-devices
|
|||||||
# Check CUDA availability
|
# Check CUDA availability
|
||||||
uv run python -c "import torch; print(torch.cuda.is_available())"
|
uv run python -c "import torch; print(torch.cuda.is_available())"
|
||||||
```
|
```
|
||||||
- Install NVIDIA drivers (CUDA toolkit is bundled)
|
- Install NVIDIA drivers (CUDA toolkit is bundled in CUDA sidecar builds)
|
||||||
- The app automatically falls back to CPU if no GPU is available
|
- The app automatically falls back to CPU if no GPU is available
|
||||||
|
|
||||||
### Web Server Port Conflicts
|
### Web Server Port Conflicts
|
||||||
- Default port is 8080
|
- Default port is 8080; the app tries ports 8080-8084 automatically
|
||||||
- Change in settings or edit config file
|
- Change in settings or edit config file
|
||||||
- Check for conflicts: `lsof -i :8080` (Linux) or `netstat -ano | findstr :8080` (Windows)
|
- Check for conflicts: `lsof -i :8080` (Linux/macOS) or `netstat -ano | findstr :8080` (Windows)
|
||||||
|
|
||||||
## Use Cases
|
## Use Cases
|
||||||
|
|
||||||
@@ -222,3 +314,5 @@ MIT License
|
|||||||
- [OpenAI Whisper](https://github.com/openai/whisper) for the speech recognition model
|
- [OpenAI Whisper](https://github.com/openai/whisper) for the speech recognition model
|
||||||
- [RealtimeSTT](https://github.com/KoljaB/RealtimeSTT) for real-time transcription capabilities
|
- [RealtimeSTT](https://github.com/KoljaB/RealtimeSTT) for real-time transcription capabilities
|
||||||
- [faster-whisper](https://github.com/guillaumekln/faster-whisper) for optimized inference
|
- [faster-whisper](https://github.com/guillaumekln/faster-whisper) for optimized inference
|
||||||
|
- [Tauri](https://tauri.app/) for the cross-platform desktop framework
|
||||||
|
- [Deepgram](https://deepgram.com/) for cloud transcription API
|
||||||
|
|||||||
Reference in New Issue
Block a user