INSTALL.md

# Installation Guide

## Prerequisites

- **Python 3.9 or higher**
- **uv** (Python package installer)
- **FFmpeg** (required by faster-whisper)
- **CUDA-capable GPU** (optional, for GPU acceleration)

### Installing uv

If you don't have `uv` installed:

```bash
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or with pip
pip install uv
```

### Installing FFmpeg

#### On Ubuntu/Debian:
```bash
sudo apt update
sudo apt install ffmpeg
```

#### On macOS (with Homebrew):
```bash
brew install ffmpeg
```

#### On Windows:
Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH.

## Installation Steps

### 1. Navigate to Project Directory

```bash
cd /home/jknapp/code/local-transcription
```

### 2. Install Dependencies with uv

```bash
# uv will automatically create a virtual environment and install dependencies
uv sync
```

This single command will:
- Create a virtual environment (`.venv/`)
- Install all dependencies from `pyproject.toml`
- Lock dependencies for reproducibility

**Note for CUDA users:** If you have an NVIDIA GPU, install PyTorch with CUDA support:

```bash
# For CUDA 11.8
uv pip install torch --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
```

### 3. Run the Application

```bash
# Option 1: Using uv run (automatically uses the venv)
uv run python main.py

# Option 2: Activate venv manually
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
python main.py
```

On first run, the application will:
- Download the Whisper model (this may take a few minutes)
- Create a configuration file at `~/.local-transcription/config.yaml`

## Quick Start Commands

```bash
# Install everything
uv sync

# Run the application
uv run python main.py

# Install with server dependencies (for Phase 2+)
uv sync --extra server

# Update dependencies
uv sync --upgrade
```

## Configuration

Settings can be changed through the GUI (Settings button) or by editing:
```
~/.local-transcription/config.yaml
```

## Troubleshooting

### Audio Device Issues

If no audio devices are detected:
```bash
uv run python -c "import sounddevice as sd; print(sd.query_devices())"
```

### GPU Not Detected

Check if CUDA is available:
```bash
uv run python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
```

### Model Download Fails

Models are downloaded to `~/.cache/huggingface/`. If download fails:
- Check internet connection
- Ensure sufficient disk space (~1-3 GB depending on model size)

### uv Command Not Found

Make sure uv is in your PATH:
```bash
# Add to ~/.bashrc or ~/.zshrc
export PATH="$HOME/.cargo/bin:$PATH"
```

## Performance Tips

For best real-time performance:

1. **Use GPU if available** - 5-10x faster than CPU
2. **Start with smaller models**:
   - `tiny`: Fastest, ~39M parameters, 1-2s latency
   - `base`: Good balance, ~74M parameters, 2-3s latency
   - `small`: Better accuracy, ~244M parameters, 3-5s latency
3. **Enable VAD** (Voice Activity Detection) to skip silent audio
4. **Adjust chunk duration**: Smaller = lower latency, larger = better accuracy

## System Requirements

### Minimum:
- CPU: Dual-core 2GHz+
- RAM: 4GB
- Model: tiny or base

### Recommended:
- CPU: Quad-core 3GHz+ or GPU (NVIDIA GTX 1060+)
- RAM: 8GB
- Model: base or small

### For Best Performance:
- GPU: NVIDIA RTX 2060 or better
- RAM: 16GB
- Model: small or medium

## Development

### Install development dependencies:
```bash
uv sync --extra dev
```

### Run tests:
```bash
uv run pytest
```

### Format code:
```bash
uv run black .
uv run ruff check .
```

## Why uv?

`uv` is significantly faster than pip:
- **10-100x faster** dependency resolution
- **Automatic virtual environment** management
- **Reproducible builds** with lockfile
- **Drop-in replacement** for pip commands

Learn more at [astral.sh/uv](https://astral.sh/uv)
Initial commit: Local Transcription App v1.0 Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-12-25 18:48:23 -08:00			`# Installation Guide`

			`## Prerequisites`

			`- Python 3.9 or higher`
			`- uv (Python package installer)`
			`- FFmpeg (required by faster-whisper)`
			`- CUDA-capable GPU (optional, for GPU acceleration)`

			`### Installing uv`

			If you don't have `uv` installed:

			```bash
			`# On macOS and Linux`
			`curl -LsSf https://astral.sh/uv/install.sh \| sh`

			`# On Windows`
			`powershell -c "irm https://astral.sh/uv/install.ps1 \| iex"`

			`# Or with pip`
			`pip install uv`
			```

			`### Installing FFmpeg`

			`#### On Ubuntu/Debian:`
			```bash
			`sudo apt update`
			`sudo apt install ffmpeg`
			```

			`#### On macOS (with Homebrew):`
			```bash
			`brew install ffmpeg`
			```

			`#### On Windows:`
			`Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH.`

			`## Installation Steps`

			`### 1. Navigate to Project Directory`

			```bash
			`cd /home/jknapp/code/local-transcription`
			```

			`### 2. Install Dependencies with uv`

			```bash
			`# uv will automatically create a virtual environment and install dependencies`
			`uv sync`
			```

			`This single command will:`
			- Create a virtual environment (`.venv/`)
			- Install all dependencies from `pyproject.toml`
			`- Lock dependencies for reproducibility`

			`Note for CUDA users: If you have an NVIDIA GPU, install PyTorch with CUDA support:`

			```bash
			`# For CUDA 11.8`
			`uv pip install torch --index-url https://download.pytorch.org/whl/cu118`

			`# For CUDA 12.1`
			`uv pip install torch --index-url https://download.pytorch.org/whl/cu121`
			```

			`### 3. Run the Application`

			```bash
			`# Option 1: Using uv run (automatically uses the venv)`
			`uv run python main.py`

			`# Option 2: Activate venv manually`
			`source .venv/bin/activate # On Windows: .venv\Scripts\activate`
			`python main.py`
			```

			`On first run, the application will:`
			`- Download the Whisper model (this may take a few minutes)`
			- Create a configuration file at `~/.local-transcription/config.yaml`

			`## Quick Start Commands`

			```bash
			`# Install everything`
			`uv sync`

			`# Run the application`
			`uv run python main.py`

			`# Install with server dependencies (for Phase 2+)`
			`uv sync --extra server`

			`# Update dependencies`
			`uv sync --upgrade`
			```

			`## Configuration`

			`Settings can be changed through the GUI (Settings button) or by editing:`
			```
			`~/.local-transcription/config.yaml`
			```

			`## Troubleshooting`

			`### Audio Device Issues`

			`If no audio devices are detected:`
			```bash
			`uv run python -c "import sounddevice as sd; print(sd.query_devices())"`
			```

			`### GPU Not Detected`

			`Check if CUDA is available:`
			```bash
			`uv run python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"`
			```

			`### Model Download Fails`

			Models are downloaded to `~/.cache/huggingface/`. If download fails:
			`- Check internet connection`
			`- Ensure sufficient disk space (~1-3 GB depending on model size)`

			`### uv Command Not Found`

			`Make sure uv is in your PATH:`
			```bash
			`# Add to ~/.bashrc or ~/.zshrc`
			`export PATH="$HOME/.cargo/bin:$PATH"`
			```

			`## Performance Tips`

			`For best real-time performance:`

			`1. Use GPU if available - 5-10x faster than CPU`
			`2. Start with smaller models:`
			- `tiny`: Fastest, ~39M parameters, 1-2s latency
			- `base`: Good balance, ~74M parameters, 2-3s latency
			- `small`: Better accuracy, ~244M parameters, 3-5s latency
			`3. Enable VAD (Voice Activity Detection) to skip silent audio`
			`4. Adjust chunk duration: Smaller = lower latency, larger = better accuracy`

			`## System Requirements`

			`### Minimum:`
			`- CPU: Dual-core 2GHz+`
			`- RAM: 4GB`
			`- Model: tiny or base`

			`### Recommended:`
			`- CPU: Quad-core 3GHz+ or GPU (NVIDIA GTX 1060+)`
			`- RAM: 8GB`
			`- Model: base or small`

			`### For Best Performance:`
			`- GPU: NVIDIA RTX 2060 or better`
			`- RAM: 16GB`
			`- Model: small or medium`

			`## Development`

			`### Install development dependencies:`
			```bash
			`uv sync --extra dev`
			```

			`### Run tests:`
			```bash
			`uv run pytest`
			```

			`### Format code:`
			```bash
			`uv run black .`
			`uv run ruff check .`
			```

			`## Why uv?`

			`uv` is significantly faster than pip:
			`- 10-100x faster dependency resolution`
			`- Automatic virtual environment management`
			`- Reproducible builds with lockfile`
			`- Drop-in replacement for pip commands`

			`Learn more at [astral.sh/uv](https://astral.sh/uv)`