Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
3.9 KiB
3.9 KiB
Installation Guide
Prerequisites
- Python 3.9 or higher
- uv (Python package installer)
- FFmpeg (required by faster-whisper)
- CUDA-capable GPU (optional, for GPU acceleration)
Installing uv
If you don't have uv installed:
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or with pip
pip install uv
Installing FFmpeg
On Ubuntu/Debian:
sudo apt update
sudo apt install ffmpeg
On macOS (with Homebrew):
brew install ffmpeg
On Windows:
Download from ffmpeg.org and add to PATH.
Installation Steps
1. Navigate to Project Directory
cd /home/jknapp/code/local-transcription
2. Install Dependencies with uv
# uv will automatically create a virtual environment and install dependencies
uv sync
This single command will:
- Create a virtual environment (
.venv/) - Install all dependencies from
pyproject.toml - Lock dependencies for reproducibility
Note for CUDA users: If you have an NVIDIA GPU, install PyTorch with CUDA support:
# For CUDA 11.8
uv pip install torch --index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.1
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
3. Run the Application
# Option 1: Using uv run (automatically uses the venv)
uv run python main.py
# Option 2: Activate venv manually
source .venv/bin/activate # On Windows: .venv\Scripts\activate
python main.py
On first run, the application will:
- Download the Whisper model (this may take a few minutes)
- Create a configuration file at
~/.local-transcription/config.yaml
Quick Start Commands
# Install everything
uv sync
# Run the application
uv run python main.py
# Install with server dependencies (for Phase 2+)
uv sync --extra server
# Update dependencies
uv sync --upgrade
Configuration
Settings can be changed through the GUI (Settings button) or by editing:
~/.local-transcription/config.yaml
Troubleshooting
Audio Device Issues
If no audio devices are detected:
uv run python -c "import sounddevice as sd; print(sd.query_devices())"
GPU Not Detected
Check if CUDA is available:
uv run python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
Model Download Fails
Models are downloaded to ~/.cache/huggingface/. If download fails:
- Check internet connection
- Ensure sufficient disk space (~1-3 GB depending on model size)
uv Command Not Found
Make sure uv is in your PATH:
# Add to ~/.bashrc or ~/.zshrc
export PATH="$HOME/.cargo/bin:$PATH"
Performance Tips
For best real-time performance:
- Use GPU if available - 5-10x faster than CPU
- Start with smaller models:
tiny: Fastest, ~39M parameters, 1-2s latencybase: Good balance, ~74M parameters, 2-3s latencysmall: Better accuracy, ~244M parameters, 3-5s latency
- Enable VAD (Voice Activity Detection) to skip silent audio
- Adjust chunk duration: Smaller = lower latency, larger = better accuracy
System Requirements
Minimum:
- CPU: Dual-core 2GHz+
- RAM: 4GB
- Model: tiny or base
Recommended:
- CPU: Quad-core 3GHz+ or GPU (NVIDIA GTX 1060+)
- RAM: 8GB
- Model: base or small
For Best Performance:
- GPU: NVIDIA RTX 2060 or better
- RAM: 16GB
- Model: small or medium
Development
Install development dependencies:
uv sync --extra dev
Run tests:
uv run pytest
Format code:
uv run black .
uv run ruff check .
Why uv?
uv is significantly faster than pip:
- 10-100x faster dependency resolution
- Automatic virtual environment management
- Reproducible builds with lockfile
- Drop-in replacement for pip commands
Learn more at astral.sh/uv