Files
local-transcription/CLAUDE.md
jknapp d34d272cf0 Simplify build process: CUDA support now included by default
Since pyproject.toml is configured to use PyTorch CUDA index by default,
all builds automatically include CUDA support. Removed redundant separate
CUDA build scripts and updated documentation.

Changes:
- Removed build-cuda.sh and build-cuda.bat (no longer needed)
- Updated build.sh and build.bat to include CUDA by default
  - Added "uv sync" step to ensure CUDA PyTorch is installed
  - Updated messages to clarify CUDA support is included
- Updated BUILD.md to reflect simplified build process
  - Removed separate CUDA build sections
  - Clarified all builds include CUDA support
  - Updated GPU support section
- Updated CLAUDE.md with simplified build commands

Benefits:
- Simpler build process (one script per platform instead of two)
- Less confusion about which script to use
- All builds work on any system (GPU or CPU)
- Automatic fallback to CPU if no GPU available
- pyproject.toml is single source of truth for dependencies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-28 19:09:36 -08:00

313 lines
12 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Local Transcription is a desktop application for real-time speech-to-text transcription designed for streamers. It uses Whisper models (via faster-whisper) to transcribe audio locally with optional multi-user server synchronization.
**Key Features:**
- Standalone desktop GUI (PySide6/Qt)
- Local transcription with CPU/GPU support
- Built-in web server for OBS browser source integration
- Optional Node.js-based multi-user server for syncing transcriptions across users
- Noise suppression and Voice Activity Detection (VAD)
- Cross-platform builds (Linux/Windows) with PyInstaller
## Project Structure
```
local-transcription/
├── client/ # Core transcription logic
│ ├── audio_capture.py # Audio input and buffering
│ ├── transcription_engine.py # Whisper model integration
│ ├── noise_suppression.py # VAD and noise reduction
│ ├── device_utils.py # CPU/GPU device management
│ ├── config.py # Configuration management
│ └── server_sync.py # Multi-user server sync client
├── gui/ # Desktop application UI
│ ├── main_window_qt.py # Main application window (PySide6)
│ ├── settings_dialog_qt.py # Settings dialog (PySide6)
│ └── transcription_display_qt.py # Display widget
├── server/ # Web display servers
│ ├── web_display.py # FastAPI server for OBS browser source (local)
│ └── nodejs/ # Optional multi-user Node.js server
│ ├── server.js # Multi-user sync server with WebSocket
│ ├── package.json # Node.js dependencies
│ └── README.md # Server deployment documentation
├── config/ # Example configuration files
│ └── default_config.yaml # Default settings template
├── main.py # GUI application entry point
├── main_cli.py # CLI version for testing
└── pyproject.toml # Dependencies and build config
```
## Development Commands
### Installation and Setup
```bash
# Install dependencies (creates .venv automatically)
uv sync
# Run the GUI application
uv run python main.py
# Run CLI version (headless, for testing)
uv run python main_cli.py
# List available audio devices
uv run python main_cli.py --list-devices
# Install with CUDA support (if needed)
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
```
### Building Executables
```bash
# Linux (includes CUDA support - works on both GPU and CPU systems)
./build.sh
# Windows (includes CUDA support - works on both GPU and CPU systems)
build.bat
# Manual build with PyInstaller
uv sync # Install dependencies (includes CUDA PyTorch)
uv pip uninstall -q enum34 # Remove incompatible enum34 package
uv run pyinstaller local-transcription.spec
```
**Important:** All builds include CUDA support via `pyproject.toml` configuration. CUDA builds can be created on systems without NVIDIA GPUs. The PyTorch CUDA runtime is bundled, and the app automatically falls back to CPU if no GPU is available.
### Testing
```bash
# Run component tests
uv run python test_components.py
# Check CUDA availability
uv run python check_cuda.py
# Test web server manually
uv run python -m uvicorn server.web_display:app --reload
```
## Architecture
### Audio Processing Pipeline
1. **Audio Capture** ([client/audio_capture.py](client/audio_capture.py))
- Captures audio from microphone/system using sounddevice
- Handles automatic sample rate detection and resampling
- Uses chunking with overlap for better transcription quality
- Default: 3-second chunks with 0.5s overlap
2. **Noise Suppression** ([client/noise_suppression.py](client/noise_suppression.py))
- Applies noisereduce for background noise reduction
- Voice Activity Detection (VAD) using webrtcvad
- Skips silent segments to improve performance
3. **Transcription** ([client/transcription_engine.py](client/transcription_engine.py))
- Uses faster-whisper for efficient inference
- Supports CPU, CUDA, and Apple MPS (Mac)
- Models: tiny, base, small, medium, large
- Thread-safe model loading with locks
4. **Display** ([gui/main_window_qt.py](gui/main_window_qt.py))
- PySide6/Qt-based desktop GUI
- Real-time transcription display with scrolling
- Settings panel with live updates (no restart needed)
### Web Server Architecture
**Local Web Server** ([server/web_display.py](server/web_display.py))
- Always runs when GUI starts (port 8080 by default)
- FastAPI with WebSocket for real-time updates
- Used for OBS browser source integration
- Single-user (displays only local transcriptions)
**Multi-User Server** (Optional - for syncing across multiple users)
**Node.js WebSocket Server** ([server/nodejs/](server/nodejs/)) - **RECOMMENDED**
- Real-time WebSocket support (< 100ms latency)
- Handles 100+ concurrent users
- Easy deployment to VPS/cloud hosting (Railway, Heroku, DigitalOcean, or any VPS)
- Configurable display options via URL parameters:
- `timestamps=true/false` - Show/hide timestamps
- `maxlines=50` - Maximum visible lines (prevents scroll bars in OBS)
- `fontsize=16` - Font size in pixels
- `fontfamily=Arial` - Font family
- `fade=10` - Seconds before text fades (0 = never)
See [server/nodejs/README.md](server/nodejs/README.md) for deployment instructions
### Configuration System
- Config stored at `~/.local-transcription/config.yaml`
- Managed by [client/config.py](client/config.py)
- Settings apply immediately without restart (except model changes)
- YAML format with nested keys (e.g., `transcription.model`)
### Device Management
- [client/device_utils.py](client/device_utils.py) handles CPU/GPU detection
- Auto-detects CUDA, MPS (Mac), or falls back to CPU
- Compute types: float32 (best quality), float16 (GPU), int8 (fastest)
- Thread-safe device selection
## Key Implementation Details
### PyInstaller Build Configuration
- [local-transcription.spec](local-transcription.spec) controls build
- UPX compression enabled for smaller executables
- Hidden imports required for PySide6, faster-whisper, torch
- Console mode enabled by default (set `console=False` to hide)
### Threading Model
- Main thread: Qt GUI event loop
- Audio thread: Captures and processes audio chunks
- Web server thread: Runs FastAPI server
- Transcription: Runs in callback thread from audio capture
- All transcription results communicated via Qt signals
### Server Sync (Optional Multi-User Feature)
- [client/server_sync.py](client/server_sync.py) handles server communication
- Toggle in Settings: "Enable Server Sync"
- Sends transcriptions to PHP server via POST
- Separate web display shows merged transcriptions from all users
- Falls back gracefully if server unavailable
## Common Patterns
### Adding a New Setting
1. Add to [config/default_config.yaml](config/default_config.yaml)
2. Update [client/config.py](client/config.py) if validation needed
3. Add UI control in [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py)
4. Apply setting in relevant component (no restart if possible)
5. Emit signal to update display if needed
### Modifying Transcription Display
- Local GUI: [gui/transcription_display_qt.py](gui/transcription_display_qt.py)
- Web display (OBS): [server/web_display.py](server/web_display.py) (HTML in `_get_html()`)
- Multi-user display: [server/php/display.php](server/php/display.php)
### Adding a New Model Size
- Update [client/transcription_engine.py](client/transcription_engine.py)
- Add to model selector in [gui/settings_dialog_qt.py](gui/settings_dialog_qt.py)
- Update CLI argument choices in [main_cli.py](main_cli.py)
## Dependencies
**Core:**
- `faster-whisper`: Optimized Whisper inference
- `torch`: ML framework (CUDA-enabled via special index)
- `PySide6`: Qt6 bindings for GUI
- `sounddevice`: Cross-platform audio I/O
- `noisereduce`, `webrtcvad`: Audio preprocessing
**Web Server:**
- `fastapi`, `uvicorn`: Web server and ASGI
- `websockets`: Real-time communication
**Build:**
- `pyinstaller`: Create standalone executables
- `uv`: Fast package manager
**PyTorch CUDA Index:**
- Configured in [pyproject.toml](pyproject.toml) under `[[tool.uv.index]]`
- Uses PyTorch's custom wheel repository for CUDA builds
- Automatically installed with `uv sync` when using CUDA build scripts
## Platform-Specific Notes
### Linux
- Uses PulseAudio/ALSA for audio
- Build scripts use bash (`.sh` files)
- Executable: `dist/LocalTranscription/LocalTranscription`
### Windows
- Uses Windows Audio/WASAPI
- Build scripts use batch (`.bat` files)
- Executable: `dist\LocalTranscription\LocalTranscription.exe`
- Requires Visual C++ Redistributable on target systems
### Cross-Building
- **Cannot cross-compile** - must build on target platform
- CI/CD should use platform-specific runners
## Troubleshooting
### Model Loading Issues
- Models download to `~/.cache/huggingface/`
- First run requires internet connection
- Check disk space (models: 75MB-3GB depending on size)
### Audio Device Issues
- Run `uv run python main_cli.py --list-devices`
- Check permissions (microphone access)
- Try different device indices in settings
### GPU Not Detected
- Run `uv run python check_cuda.py`
- Install CUDA drivers (not CUDA toolkit - bundled in build)
- Verify PyTorch sees GPU: `python -c "import torch; print(torch.cuda.is_available())"`
### Web Server Port Conflicts
- Default port: 8080
- Change in [gui/main_window_qt.py](gui/main_window_qt.py) or config
- Use `lsof -i :8080` (Linux) or `netstat -ano | findstr :8080` (Windows)
## OBS Integration
### Local Display (Single User)
1. Start Local Transcription app
2. In OBS: Add "Browser" source
3. URL: `http://localhost:8080`
4. Set dimensions (e.g., 1920x300)
### Multi-User Display (Node.js Server)
1. Deploy Node.js server (see [server/nodejs/README.md](server/nodejs/README.md))
2. Each user configures Server URL: `http://your-server:3000/api/send`
3. Enter same room name and passphrase
4. In OBS: Add "Browser" source
5. URL: `http://your-server:3000/display?room=ROOM&fade=10&timestamps=true&maxlines=50&fontsize=16`
6. Customize URL parameters as needed:
- `timestamps=false` - Hide timestamps
- `maxlines=30` - Show max 30 lines (prevents scroll bars)
- `fontsize=18` - Larger font
- `fontfamily=Courier` - Different font
## Performance Optimization
**For Real-Time Transcription:**
- Use `tiny` or `base` model (faster)
- Enable GPU if available (5-10x faster)
- Increase chunk_duration for better accuracy (higher latency)
- Decrease chunk_duration for lower latency (less context)
- Enable VAD to skip silent audio
**For Build Size Reduction:**
- Don't bundle models (download on demand)
- Use CPU-only build if no GPU users
- Enable UPX compression (already in spec)
## Phase Status
-**Phase 1**: Standalone desktop application (complete)
-**Web Server**: Local OBS integration (complete)
-**Builds**: PyInstaller executables (complete)
-**Phase 2**: Multi-user Node.js server (complete, optional)
- ⏸️ **Phase 3+**: Advanced features (see [NEXT_STEPS.md](NEXT_STEPS.md))
## Related Documentation
- [README.md](README.md) - User-facing documentation
- [BUILD.md](BUILD.md) - Detailed build instructions
- [INSTALL.md](INSTALL.md) - Installation guide
- [NEXT_STEPS.md](NEXT_STEPS.md) - Future enhancements
- [server/nodejs/README.md](server/nodejs/README.md) - Node.js server setup and deployment