Major improvements to display configuration and server architecture: **Display Enhancements:** - Add URL parameters for display customization (timestamps, maxlines, fontsize, fontfamily) - Fix max lines enforcement to prevent scroll bars in OBS - Apply font family and size settings to both local and sync displays - Remove auto-scroll, enforce overflow:hidden for clean OBS integration **Node.js Server:** - Add timestamps toggle: timestamps=true/false - Add max lines limit: maxlines=50 - Add font configuration: fontsize=16, fontfamily=Arial - Update index page with URL parameters documentation - Improve display URLs in room generation **Local Web Server:** - Add max_lines, font_family, font_size configuration - Respect settings from GUI configuration - Apply changes immediately without restart **Architecture:** - Remove PHP server implementation (Node.js recommended) - Update all documentation to reference Node.js server - Update default config URLs to Node.js endpoints - Clean up 1700+ lines of PHP code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
11 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Local Transcription is a desktop application for real-time speech-to-text transcription designed for streamers. It uses Whisper models (via faster-whisper) to transcribe audio locally with optional multi-user server synchronization.
Key Features:
- Standalone desktop GUI (PySide6/Qt)
- Local transcription with CPU/GPU support
- Built-in web server for OBS browser source integration
- Optional Node.js-based multi-user server for syncing transcriptions across users
- Noise suppression and Voice Activity Detection (VAD)
- Cross-platform builds (Linux/Windows) with PyInstaller
Project Structure
local-transcription/
├── client/ # Core transcription logic
│ ├── audio_capture.py # Audio input and buffering
│ ├── transcription_engine.py # Whisper model integration
│ ├── noise_suppression.py # VAD and noise reduction
│ ├── device_utils.py # CPU/GPU device management
│ ├── config.py # Configuration management
│ └── server_sync.py # Multi-user server sync client
├── gui/ # Desktop application UI
│ ├── main_window_qt.py # Main application window (PySide6)
│ ├── settings_dialog_qt.py # Settings dialog (PySide6)
│ └── transcription_display_qt.py # Display widget
├── server/ # Web display servers
│ ├── web_display.py # FastAPI server for OBS browser source (local)
│ └── nodejs/ # Optional multi-user Node.js server
│ ├── server.js # Multi-user sync server with WebSocket
│ ├── package.json # Node.js dependencies
│ └── README.md # Server deployment documentation
├── config/ # Example configuration files
│ └── default_config.yaml # Default settings template
├── main.py # GUI application entry point
├── main_cli.py # CLI version for testing
└── pyproject.toml # Dependencies and build config
Development Commands
Installation and Setup
# Install dependencies (creates .venv automatically)
uv sync
# Run the GUI application
uv run python main.py
# Run CLI version (headless, for testing)
uv run python main_cli.py
# List available audio devices
uv run python main_cli.py --list-devices
# Install with CUDA support (if needed)
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
Building Executables
# Linux (CPU-only)
./build.sh
# Linux (with CUDA support - works on both GPU and CPU systems)
./build-cuda.sh
# Windows (CPU-only)
build.bat
# Windows (with CUDA support)
build-cuda.bat
# Manual build with PyInstaller
uv run pyinstaller local-transcription.spec
Important: CUDA builds can be created on systems without NVIDIA GPUs. The PyTorch CUDA runtime is bundled, and the app automatically falls back to CPU if no GPU is available.
Testing
# Run component tests
uv run python test_components.py
# Check CUDA availability
uv run python check_cuda.py
# Test web server manually
uv run python -m uvicorn server.web_display:app --reload
Architecture
Audio Processing Pipeline
-
Audio Capture (client/audio_capture.py)
- Captures audio from microphone/system using sounddevice
- Handles automatic sample rate detection and resampling
- Uses chunking with overlap for better transcription quality
- Default: 3-second chunks with 0.5s overlap
-
Noise Suppression (client/noise_suppression.py)
- Applies noisereduce for background noise reduction
- Voice Activity Detection (VAD) using webrtcvad
- Skips silent segments to improve performance
-
Transcription (client/transcription_engine.py)
- Uses faster-whisper for efficient inference
- Supports CPU, CUDA, and Apple MPS (Mac)
- Models: tiny, base, small, medium, large
- Thread-safe model loading with locks
-
Display (gui/main_window_qt.py)
- PySide6/Qt-based desktop GUI
- Real-time transcription display with scrolling
- Settings panel with live updates (no restart needed)
Web Server Architecture
Local Web Server (server/web_display.py)
- Always runs when GUI starts (port 8080 by default)
- FastAPI with WebSocket for real-time updates
- Used for OBS browser source integration
- Single-user (displays only local transcriptions)
Multi-User Server (Optional - for syncing across multiple users)
Node.js WebSocket Server (server/nodejs/) - RECOMMENDED
- Real-time WebSocket support (< 100ms latency)
- Handles 100+ concurrent users
- Easy deployment to VPS/cloud hosting (Railway, Heroku, DigitalOcean, or any VPS)
- Configurable display options via URL parameters:
timestamps=true/false- Show/hide timestampsmaxlines=50- Maximum visible lines (prevents scroll bars in OBS)fontsize=16- Font size in pixelsfontfamily=Arial- Font familyfade=10- Seconds before text fades (0 = never)
See server/nodejs/README.md for deployment instructions
Configuration System
- Config stored at
~/.local-transcription/config.yaml - Managed by client/config.py
- Settings apply immediately without restart (except model changes)
- YAML format with nested keys (e.g.,
transcription.model)
Device Management
- client/device_utils.py handles CPU/GPU detection
- Auto-detects CUDA, MPS (Mac), or falls back to CPU
- Compute types: float32 (best quality), float16 (GPU), int8 (fastest)
- Thread-safe device selection
Key Implementation Details
PyInstaller Build Configuration
- local-transcription.spec controls build
- UPX compression enabled for smaller executables
- Hidden imports required for PySide6, faster-whisper, torch
- Console mode enabled by default (set
console=Falseto hide)
Threading Model
- Main thread: Qt GUI event loop
- Audio thread: Captures and processes audio chunks
- Web server thread: Runs FastAPI server
- Transcription: Runs in callback thread from audio capture
- All transcription results communicated via Qt signals
Server Sync (Optional Multi-User Feature)
- client/server_sync.py handles server communication
- Toggle in Settings: "Enable Server Sync"
- Sends transcriptions to PHP server via POST
- Separate web display shows merged transcriptions from all users
- Falls back gracefully if server unavailable
Common Patterns
Adding a New Setting
- Add to config/default_config.yaml
- Update client/config.py if validation needed
- Add UI control in gui/settings_dialog_qt.py
- Apply setting in relevant component (no restart if possible)
- Emit signal to update display if needed
Modifying Transcription Display
- Local GUI: gui/transcription_display_qt.py
- Web display (OBS): server/web_display.py (HTML in
_get_html()) - Multi-user display: server/php/display.php
Adding a New Model Size
- Update client/transcription_engine.py
- Add to model selector in gui/settings_dialog_qt.py
- Update CLI argument choices in main_cli.py
Dependencies
Core:
faster-whisper: Optimized Whisper inferencetorch: ML framework (CUDA-enabled via special index)PySide6: Qt6 bindings for GUIsounddevice: Cross-platform audio I/Onoisereduce,webrtcvad: Audio preprocessing
Web Server:
fastapi,uvicorn: Web server and ASGIwebsockets: Real-time communication
Build:
pyinstaller: Create standalone executablesuv: Fast package manager
PyTorch CUDA Index:
- Configured in pyproject.toml under
[[tool.uv.index]] - Uses PyTorch's custom wheel repository for CUDA builds
- Automatically installed with
uv syncwhen using CUDA build scripts
Platform-Specific Notes
Linux
- Uses PulseAudio/ALSA for audio
- Build scripts use bash (
.shfiles) - Executable:
dist/LocalTranscription/LocalTranscription
Windows
- Uses Windows Audio/WASAPI
- Build scripts use batch (
.batfiles) - Executable:
dist\LocalTranscription\LocalTranscription.exe - Requires Visual C++ Redistributable on target systems
Cross-Building
- Cannot cross-compile - must build on target platform
- CI/CD should use platform-specific runners
Troubleshooting
Model Loading Issues
- Models download to
~/.cache/huggingface/ - First run requires internet connection
- Check disk space (models: 75MB-3GB depending on size)
Audio Device Issues
- Run
uv run python main_cli.py --list-devices - Check permissions (microphone access)
- Try different device indices in settings
GPU Not Detected
- Run
uv run python check_cuda.py - Install CUDA drivers (not CUDA toolkit - bundled in build)
- Verify PyTorch sees GPU:
python -c "import torch; print(torch.cuda.is_available())"
Web Server Port Conflicts
- Default port: 8080
- Change in gui/main_window_qt.py or config
- Use
lsof -i :8080(Linux) ornetstat -ano | findstr :8080(Windows)
OBS Integration
Local Display (Single User)
- Start Local Transcription app
- In OBS: Add "Browser" source
- URL:
http://localhost:8080 - Set dimensions (e.g., 1920x300)
Multi-User Display (Node.js Server)
- Deploy Node.js server (see server/nodejs/README.md)
- Each user configures Server URL:
http://your-server:3000/api/send - Enter same room name and passphrase
- In OBS: Add "Browser" source
- URL:
http://your-server:3000/display?room=ROOM&fade=10×tamps=true&maxlines=50&fontsize=16 - Customize URL parameters as needed:
timestamps=false- Hide timestampsmaxlines=30- Show max 30 lines (prevents scroll bars)fontsize=18- Larger fontfontfamily=Courier- Different font
Performance Optimization
For Real-Time Transcription:
- Use
tinyorbasemodel (faster) - Enable GPU if available (5-10x faster)
- Increase chunk_duration for better accuracy (higher latency)
- Decrease chunk_duration for lower latency (less context)
- Enable VAD to skip silent audio
For Build Size Reduction:
- Don't bundle models (download on demand)
- Use CPU-only build if no GPU users
- Enable UPX compression (already in spec)
Phase Status
- ✅ Phase 1: Standalone desktop application (complete)
- ✅ Web Server: Local OBS integration (complete)
- ✅ Builds: PyInstaller executables (complete)
- ✅ Phase 2: Multi-user Node.js server (complete, optional)
- ⏸️ Phase 3+: Advanced features (see NEXT_STEPS.md)
Related Documentation
- README.md - User-facing documentation
- BUILD.md - Detailed build instructions
- INSTALL.md - Installation guide
- NEXT_STEPS.md - Future enhancements
- server/nodejs/README.md - Node.js server setup and deployment