Since pyproject.toml is configured to use PyTorch CUDA index by default, all builds automatically include CUDA support. Removed redundant separate CUDA build scripts and updated documentation. Changes: - Removed build-cuda.sh and build-cuda.bat (no longer needed) - Updated build.sh and build.bat to include CUDA by default - Added "uv sync" step to ensure CUDA PyTorch is installed - Updated messages to clarify CUDA support is included - Updated BUILD.md to reflect simplified build process - Removed separate CUDA build sections - Clarified all builds include CUDA support - Updated GPU support section - Updated CLAUDE.md with simplified build commands Benefits: - Simpler build process (one script per platform instead of two) - Less confusion about which script to use - All builds work on any system (GPU or CPU) - Automatic fallback to CPU if no GPU available - pyproject.toml is single source of truth for dependencies 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
12 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Local Transcription is a desktop application for real-time speech-to-text transcription designed for streamers. It uses Whisper models (via faster-whisper) to transcribe audio locally with optional multi-user server synchronization.
Key Features:
- Standalone desktop GUI (PySide6/Qt)
- Local transcription with CPU/GPU support
- Built-in web server for OBS browser source integration
- Optional Node.js-based multi-user server for syncing transcriptions across users
- Noise suppression and Voice Activity Detection (VAD)
- Cross-platform builds (Linux/Windows) with PyInstaller
Project Structure
local-transcription/
├── client/ # Core transcription logic
│ ├── audio_capture.py # Audio input and buffering
│ ├── transcription_engine.py # Whisper model integration
│ ├── noise_suppression.py # VAD and noise reduction
│ ├── device_utils.py # CPU/GPU device management
│ ├── config.py # Configuration management
│ └── server_sync.py # Multi-user server sync client
├── gui/ # Desktop application UI
│ ├── main_window_qt.py # Main application window (PySide6)
│ ├── settings_dialog_qt.py # Settings dialog (PySide6)
│ └── transcription_display_qt.py # Display widget
├── server/ # Web display servers
│ ├── web_display.py # FastAPI server for OBS browser source (local)
│ └── nodejs/ # Optional multi-user Node.js server
│ ├── server.js # Multi-user sync server with WebSocket
│ ├── package.json # Node.js dependencies
│ └── README.md # Server deployment documentation
├── config/ # Example configuration files
│ └── default_config.yaml # Default settings template
├── main.py # GUI application entry point
├── main_cli.py # CLI version for testing
└── pyproject.toml # Dependencies and build config
Development Commands
Installation and Setup
# Install dependencies (creates .venv automatically)
uv sync
# Run the GUI application
uv run python main.py
# Run CLI version (headless, for testing)
uv run python main_cli.py
# List available audio devices
uv run python main_cli.py --list-devices
# Install with CUDA support (if needed)
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
Building Executables
# Linux (includes CUDA support - works on both GPU and CPU systems)
./build.sh
# Windows (includes CUDA support - works on both GPU and CPU systems)
build.bat
# Manual build with PyInstaller
uv sync # Install dependencies (includes CUDA PyTorch)
uv pip uninstall -q enum34 # Remove incompatible enum34 package
uv run pyinstaller local-transcription.spec
Important: All builds include CUDA support via pyproject.toml configuration. CUDA builds can be created on systems without NVIDIA GPUs. The PyTorch CUDA runtime is bundled, and the app automatically falls back to CPU if no GPU is available.
Testing
# Run component tests
uv run python test_components.py
# Check CUDA availability
uv run python check_cuda.py
# Test web server manually
uv run python -m uvicorn server.web_display:app --reload
Architecture
Audio Processing Pipeline
-
Audio Capture (client/audio_capture.py)
- Captures audio from microphone/system using sounddevice
- Handles automatic sample rate detection and resampling
- Uses chunking with overlap for better transcription quality
- Default: 3-second chunks with 0.5s overlap
-
Noise Suppression (client/noise_suppression.py)
- Applies noisereduce for background noise reduction
- Voice Activity Detection (VAD) using webrtcvad
- Skips silent segments to improve performance
-
Transcription (client/transcription_engine.py)
- Uses faster-whisper for efficient inference
- Supports CPU, CUDA, and Apple MPS (Mac)
- Models: tiny, base, small, medium, large
- Thread-safe model loading with locks
-
Display (gui/main_window_qt.py)
- PySide6/Qt-based desktop GUI
- Real-time transcription display with scrolling
- Settings panel with live updates (no restart needed)
Web Server Architecture
Local Web Server (server/web_display.py)
- Always runs when GUI starts (port 8080 by default)
- FastAPI with WebSocket for real-time updates
- Used for OBS browser source integration
- Single-user (displays only local transcriptions)
Multi-User Server (Optional - for syncing across multiple users)
Node.js WebSocket Server (server/nodejs/) - RECOMMENDED
- Real-time WebSocket support (< 100ms latency)
- Handles 100+ concurrent users
- Easy deployment to VPS/cloud hosting (Railway, Heroku, DigitalOcean, or any VPS)
- Configurable display options via URL parameters:
timestamps=true/false- Show/hide timestampsmaxlines=50- Maximum visible lines (prevents scroll bars in OBS)fontsize=16- Font size in pixelsfontfamily=Arial- Font familyfade=10- Seconds before text fades (0 = never)
See server/nodejs/README.md for deployment instructions
Configuration System
- Config stored at
~/.local-transcription/config.yaml - Managed by client/config.py
- Settings apply immediately without restart (except model changes)
- YAML format with nested keys (e.g.,
transcription.model)
Device Management
- client/device_utils.py handles CPU/GPU detection
- Auto-detects CUDA, MPS (Mac), or falls back to CPU
- Compute types: float32 (best quality), float16 (GPU), int8 (fastest)
- Thread-safe device selection
Key Implementation Details
PyInstaller Build Configuration
- local-transcription.spec controls build
- UPX compression enabled for smaller executables
- Hidden imports required for PySide6, faster-whisper, torch
- Console mode enabled by default (set
console=Falseto hide)
Threading Model
- Main thread: Qt GUI event loop
- Audio thread: Captures and processes audio chunks
- Web server thread: Runs FastAPI server
- Transcription: Runs in callback thread from audio capture
- All transcription results communicated via Qt signals
Server Sync (Optional Multi-User Feature)
- client/server_sync.py handles server communication
- Toggle in Settings: "Enable Server Sync"
- Sends transcriptions to PHP server via POST
- Separate web display shows merged transcriptions from all users
- Falls back gracefully if server unavailable
Common Patterns
Adding a New Setting
- Add to config/default_config.yaml
- Update client/config.py if validation needed
- Add UI control in gui/settings_dialog_qt.py
- Apply setting in relevant component (no restart if possible)
- Emit signal to update display if needed
Modifying Transcription Display
- Local GUI: gui/transcription_display_qt.py
- Web display (OBS): server/web_display.py (HTML in
_get_html()) - Multi-user display: server/php/display.php
Adding a New Model Size
- Update client/transcription_engine.py
- Add to model selector in gui/settings_dialog_qt.py
- Update CLI argument choices in main_cli.py
Dependencies
Core:
faster-whisper: Optimized Whisper inferencetorch: ML framework (CUDA-enabled via special index)PySide6: Qt6 bindings for GUIsounddevice: Cross-platform audio I/Onoisereduce,webrtcvad: Audio preprocessing
Web Server:
fastapi,uvicorn: Web server and ASGIwebsockets: Real-time communication
Build:
pyinstaller: Create standalone executablesuv: Fast package manager
PyTorch CUDA Index:
- Configured in pyproject.toml under
[[tool.uv.index]] - Uses PyTorch's custom wheel repository for CUDA builds
- Automatically installed with
uv syncwhen using CUDA build scripts
Platform-Specific Notes
Linux
- Uses PulseAudio/ALSA for audio
- Build scripts use bash (
.shfiles) - Executable:
dist/LocalTranscription/LocalTranscription
Windows
- Uses Windows Audio/WASAPI
- Build scripts use batch (
.batfiles) - Executable:
dist\LocalTranscription\LocalTranscription.exe - Requires Visual C++ Redistributable on target systems
Cross-Building
- Cannot cross-compile - must build on target platform
- CI/CD should use platform-specific runners
Troubleshooting
Model Loading Issues
- Models download to
~/.cache/huggingface/ - First run requires internet connection
- Check disk space (models: 75MB-3GB depending on size)
Audio Device Issues
- Run
uv run python main_cli.py --list-devices - Check permissions (microphone access)
- Try different device indices in settings
GPU Not Detected
- Run
uv run python check_cuda.py - Install CUDA drivers (not CUDA toolkit - bundled in build)
- Verify PyTorch sees GPU:
python -c "import torch; print(torch.cuda.is_available())"
Web Server Port Conflicts
- Default port: 8080
- Change in gui/main_window_qt.py or config
- Use
lsof -i :8080(Linux) ornetstat -ano | findstr :8080(Windows)
OBS Integration
Local Display (Single User)
- Start Local Transcription app
- In OBS: Add "Browser" source
- URL:
http://localhost:8080 - Set dimensions (e.g., 1920x300)
Multi-User Display (Node.js Server)
- Deploy Node.js server (see server/nodejs/README.md)
- Each user configures Server URL:
http://your-server:3000/api/send - Enter same room name and passphrase
- In OBS: Add "Browser" source
- URL:
http://your-server:3000/display?room=ROOM&fade=10×tamps=true&maxlines=50&fontsize=16 - Customize URL parameters as needed:
timestamps=false- Hide timestampsmaxlines=30- Show max 30 lines (prevents scroll bars)fontsize=18- Larger fontfontfamily=Courier- Different font
Performance Optimization
For Real-Time Transcription:
- Use
tinyorbasemodel (faster) - Enable GPU if available (5-10x faster)
- Increase chunk_duration for better accuracy (higher latency)
- Decrease chunk_duration for lower latency (less context)
- Enable VAD to skip silent audio
For Build Size Reduction:
- Don't bundle models (download on demand)
- Use CPU-only build if no GPU users
- Enable UPX compression (already in spec)
Phase Status
- ✅ Phase 1: Standalone desktop application (complete)
- ✅ Web Server: Local OBS integration (complete)
- ✅ Builds: PyInstaller executables (complete)
- ✅ Phase 2: Multi-user Node.js server (complete, optional)
- ⏸️ Phase 3+: Advanced features (see NEXT_STEPS.md)
Related Documentation
- README.md - User-facing documentation
- BUILD.md - Detailed build instructions
- INSTALL.md - Installation guide
- NEXT_STEPS.md - Future enhancements
- server/nodejs/README.md - Node.js server setup and deployment