Files
local-transcription/INSTALL.md
Josh Knapp 472233aec4 Initial commit: Local Transcription App v1.0
Phase 1 Complete - Standalone Desktop Application

Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)

Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML

Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)

Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-25 18:48:23 -08:00

3.9 KiB

Installation Guide

Prerequisites

  • Python 3.9 or higher
  • uv (Python package installer)
  • FFmpeg (required by faster-whisper)
  • CUDA-capable GPU (optional, for GPU acceleration)

Installing uv

If you don't have uv installed:

# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or with pip
pip install uv

Installing FFmpeg

On Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

On macOS (with Homebrew):

brew install ffmpeg

On Windows:

Download from ffmpeg.org and add to PATH.

Installation Steps

1. Navigate to Project Directory

cd /home/jknapp/code/local-transcription

2. Install Dependencies with uv

# uv will automatically create a virtual environment and install dependencies
uv sync

This single command will:

  • Create a virtual environment (.venv/)
  • Install all dependencies from pyproject.toml
  • Lock dependencies for reproducibility

Note for CUDA users: If you have an NVIDIA GPU, install PyTorch with CUDA support:

# For CUDA 11.8
uv pip install torch --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
uv pip install torch --index-url https://download.pytorch.org/whl/cu121

3. Run the Application

# Option 1: Using uv run (automatically uses the venv)
uv run python main.py

# Option 2: Activate venv manually
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
python main.py

On first run, the application will:

  • Download the Whisper model (this may take a few minutes)
  • Create a configuration file at ~/.local-transcription/config.yaml

Quick Start Commands

# Install everything
uv sync

# Run the application
uv run python main.py

# Install with server dependencies (for Phase 2+)
uv sync --extra server

# Update dependencies
uv sync --upgrade

Configuration

Settings can be changed through the GUI (Settings button) or by editing:

~/.local-transcription/config.yaml

Troubleshooting

Audio Device Issues

If no audio devices are detected:

uv run python -c "import sounddevice as sd; print(sd.query_devices())"

GPU Not Detected

Check if CUDA is available:

uv run python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Model Download Fails

Models are downloaded to ~/.cache/huggingface/. If download fails:

  • Check internet connection
  • Ensure sufficient disk space (~1-3 GB depending on model size)

uv Command Not Found

Make sure uv is in your PATH:

# Add to ~/.bashrc or ~/.zshrc
export PATH="$HOME/.cargo/bin:$PATH"

Performance Tips

For best real-time performance:

  1. Use GPU if available - 5-10x faster than CPU
  2. Start with smaller models:
    • tiny: Fastest, ~39M parameters, 1-2s latency
    • base: Good balance, ~74M parameters, 2-3s latency
    • small: Better accuracy, ~244M parameters, 3-5s latency
  3. Enable VAD (Voice Activity Detection) to skip silent audio
  4. Adjust chunk duration: Smaller = lower latency, larger = better accuracy

System Requirements

Minimum:

  • CPU: Dual-core 2GHz+
  • RAM: 4GB
  • Model: tiny or base
  • CPU: Quad-core 3GHz+ or GPU (NVIDIA GTX 1060+)
  • RAM: 8GB
  • Model: base or small

For Best Performance:

  • GPU: NVIDIA RTX 2060 or better
  • RAM: 16GB
  • Model: small or medium

Development

Install development dependencies:

uv sync --extra dev

Run tests:

uv run pytest

Format code:

uv run black .
uv run ruff check .

Why uv?

uv is significantly faster than pip:

  • 10-100x faster dependency resolution
  • Automatic virtual environment management
  • Reproducible builds with lockfile
  • Drop-in replacement for pip commands

Learn more at astral.sh/uv