Initial commit: Local Transcription App v1.0

Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-25 18:48:23 -08:00
commit 472233aec4
31 changed files with 5116 additions and 0 deletions
--- a/INSTALL.md
+++ b/INSTALL.md
@@ -0,0 +1,194 @@
+# Installation Guide
+
+## Prerequisites
+
+- **Python 3.9 or higher**
+- **uv** (Python package installer)
+- **FFmpeg** (required by faster-whisper)
+- **CUDA-capable GPU** (optional, for GPU acceleration)
+
+### Installing uv
+
+If you don't have `uv` installed:
+
+```bash
+# On macOS and Linux
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# On Windows
+powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
+
+# Or with pip
+pip install uv
+```
+
+### Installing FFmpeg
+
+#### On Ubuntu/Debian:
+```bash
+sudo apt update
+sudo apt install ffmpeg
+```
+
+#### On macOS (with Homebrew):
+```bash
+brew install ffmpeg
+```
+
+#### On Windows:
+Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH.
+
+## Installation Steps
+
+### 1. Navigate to Project Directory
+
+```bash
+cd /home/jknapp/code/local-transcription
+```
+
+### 2. Install Dependencies with uv
+
+```bash
+# uv will automatically create a virtual environment and install dependencies
+uv sync
+```
+
+This single command will:
+- Create a virtual environment (`.venv/`)
+- Install all dependencies from `pyproject.toml`
+- Lock dependencies for reproducibility
+
+**Note for CUDA users:** If you have an NVIDIA GPU, install PyTorch with CUDA support:
+
+```bash
+# For CUDA 11.8
+uv pip install torch --index-url https://download.pytorch.org/whl/cu118
+
+# For CUDA 12.1
+uv pip install torch --index-url https://download.pytorch.org/whl/cu121
+```
+
+### 3. Run the Application
+
+```bash
+# Option 1: Using uv run (automatically uses the venv)
+uv run python main.py
+
+# Option 2: Activate venv manually
+source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+python main.py
+```
+
+On first run, the application will:
+- Download the Whisper model (this may take a few minutes)
+- Create a configuration file at `~/.local-transcription/config.yaml`
+
+## Quick Start Commands
+
+```bash
+# Install everything
+uv sync
+
+# Run the application
+uv run python main.py
+
+# Install with server dependencies (for Phase 2+)
+uv sync --extra server
+
+# Update dependencies
+uv sync --upgrade
+```
+
+## Configuration
+
+Settings can be changed through the GUI (Settings button) or by editing:
+```
+~/.local-transcription/config.yaml
+```
+
+## Troubleshooting
+
+### Audio Device Issues
+
+If no audio devices are detected:
+```bash
+uv run python -c "import sounddevice as sd; print(sd.query_devices())"
+```
+
+### GPU Not Detected
+
+Check if CUDA is available:
+```bash
+uv run python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
+```
+
+### Model Download Fails
+
+Models are downloaded to `~/.cache/huggingface/`. If download fails:
+- Check internet connection
+- Ensure sufficient disk space (~1-3 GB depending on model size)
+
+### uv Command Not Found
+
+Make sure uv is in your PATH:
+```bash
+# Add to ~/.bashrc or ~/.zshrc
+export PATH="$HOME/.cargo/bin:$PATH"
+```
+
+## Performance Tips
+
+For best real-time performance:
+
+1. **Use GPU if available** - 5-10x faster than CPU
+2. **Start with smaller models**:
+   - `tiny`: Fastest, ~39M parameters, 1-2s latency
+   - `base`: Good balance, ~74M parameters, 2-3s latency
+   - `small`: Better accuracy, ~244M parameters, 3-5s latency
+3. **Enable VAD** (Voice Activity Detection) to skip silent audio
+4. **Adjust chunk duration**: Smaller = lower latency, larger = better accuracy
+
+## System Requirements
+
+### Minimum:
+- CPU: Dual-core 2GHz+
+- RAM: 4GB
+- Model: tiny or base
+
+### Recommended:
+- CPU: Quad-core 3GHz+ or GPU (NVIDIA GTX 1060+)
+- RAM: 8GB
+- Model: base or small
+
+### For Best Performance:
+- GPU: NVIDIA RTX 2060 or better
+- RAM: 16GB
+- Model: small or medium
+
+## Development
+
+### Install development dependencies:
+```bash
+uv sync --extra dev
+```
+
+### Run tests:
+```bash
+uv run pytest
+```
+
+### Format code:
+```bash
+uv run black .
+uv run ruff check .
+```
+
+## Why uv?
+
+`uv` is significantly faster than pip:
+- **10-100x faster** dependency resolution
+- **Automatic virtual environment** management
+- **Reproducible builds** with lockfile
+- **Drop-in replacement** for pip commands
+
+Learn more at [astral.sh/uv](https://astral.sh/uv)