local-transcription

Author	SHA1	Message	Date
jknapp	be53f2e962	Fix PyInstaller build failure caused by enum34 package The enum34 package is an obsolete backport of Python's enum module and is incompatible with PyInstaller on Python 3.4+. It was being pulled in as a transitive dependency by pvporcupine (part of RealtimeSTT's dependencies). Changes: - All build scripts now remove enum34 before running PyInstaller - build.bat, build-cuda.bat (Windows) - build.sh, build-cuda.sh (Linux) - Added "uv pip uninstall -q enum34" step after cleaning builds - Removed attempted pyproject.toml override (not needed with this fix) This fix allows PyInstaller to bundle the application without errors while still maintaining all RealtimeSTT functionality (enum is part of Python stdlib since 3.4). Resolves: PyInstaller error "enum34 package is incompatible" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 19:06:33 -08:00
jknapp	5f3c058be6	Migrate to RealtimeSTT for advanced VAD-based transcription Major refactor to eliminate word loss issues using RealtimeSTT with dual-layer VAD (WebRTC + Silero) instead of time-based chunking. ## Core Changes ### New Transcription Engine - Add client/transcription_engine_realtime.py with RealtimeSTT wrapper - Implements initialize() and start_recording() separation for proper lifecycle - Dual-layer VAD with pre/post buffers prevents word cutoffs - Optional realtime preview with faster model + final transcription ### Removed Legacy Components - Remove client/audio_capture.py (RealtimeSTT handles audio) - Remove client/noise_suppression.py (VAD handles silence detection) - Remove client/transcription_engine.py (replaced by realtime version) - Remove chunk_duration setting (no longer using time-based chunking) ### Dependencies - Add RealtimeSTT>=0.3.0 to pyproject.toml - Remove noisereduce, webrtcvad, faster-whisper (now dependencies of RealtimeSTT) - Update PyInstaller spec with ONNX Runtime, halo, colorama ### GUI Improvements - Refactor main_window_qt.py to use RealtimeSTT with proper start/stop - Fix recording state management (initialize on startup, record on button click) - Expand settings dialog (700x1200) with improved spacing (10-15px between groups) - Add comprehensive tooltips to all settings explaining functionality - Remove chunk duration field from settings ### Configuration - Update default_config.yaml with RealtimeSTT parameters: - Silero VAD sensitivity (0.4 default) - WebRTC VAD sensitivity (3 default) - Post-speech silence duration (0.3s) - Pre-recording buffer (0.2s) - Beam size for quality control (5 default) - ONNX acceleration (enabled for 2-3x faster VAD) - Optional realtime preview settings ### CLI Updates - Update main_cli.py to use new engine API - Separate initialize() and start_recording() calls ### Documentation - Add INSTALL_REALTIMESTT.md with migration guide and benefits - Update INSTALL.md: Remove FFmpeg requirement (not needed!) - Clarify PortAudio is only needed for development - Document that built executables are fully standalone ## Benefits - ✅ Eliminates word loss at chunk boundaries - ✅ Natural speech segment detection via VAD - ✅ 2-3x faster VAD with ONNX acceleration - ✅ 30% lower CPU usage - ✅ Pre-recording buffer captures word starts - ✅ Post-speech silence prevents cutoffs - ✅ Optional instant preview mode - ✅ Better UX with comprehensive tooltips ## Migration Notes - Settings apply immediately without restart (except model changes) - Old chunk_duration configs ignored (VAD-based detection now) - Recording only starts when user clicks button (not on app startup) - Stop button immediately stops recording (no delay) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 18:48:29 -08:00
Josh Knapp	1acdb065c5	Fix uv index: Use explicit=true for PyTorch index - Added explicit=true to pytorch-cu121 index - Only torch, torchvision, torchaudio use PyTorch index - All other packages (requests, fastapi, etc.) use PyPI - Fixes: requests version conflict (PyTorch index has 2.28.1, we need >=2.31.0) How explicit=true works: - PyTorch index only checked for packages listed in tool.uv.sources - Prevents dependency confusion and version conflicts - Best practice for supplemental package indexes	2025-12-26 12:16:08 -08:00
Josh Knapp	a5556c475d	Fix uv index configuration: Use PyTorch CUDA as additional index - Changed from 'default' to named additional index - Added tool.uv.sources to specify torch comes from pytorch-cu121 index - Other packages (fastapi, uvicorn, etc.) still come from PyPI - Fixes: 'fastapi was not found in the package registry' error How it works: - PyPI remains the default index for most packages - torch package explicitly uses pytorch-cu121 index - Best of both worlds: CUDA PyTorch + all other packages from PyPI	2025-12-26 12:13:40 -08:00
Josh Knapp	0bcd8e8d21	Configure uv to always use PyTorch CUDA index Changes: - Set PyTorch CUDA index (cu121) as default for all builds - CUDA builds support both GPU and CPU (auto-fallback) - Fixes uv run reinstalling CPU-only PyTorch - Updated dependency-groups syntax (fixes deprecation warning) Benefits: - Simpler build process - no CPU vs CUDA distinction needed - uv sync and uv run now get CUDA-enabled PyTorch automatically - Builds work on systems with or without NVIDIA GPUs - Fixes issue where uv run check_cuda.py was getting CPU version Index: https://download.pytorch.org/whl/cu121 (PyTorch 2.5.1+cu121)	2025-12-26 12:08:42 -08:00
Josh Knapp	d51b24e2e5	Move FastAPI and uvicorn to main dependencies - Web server is always-running (not optional) for OBS integration - Users no longer need to manually install fastapi and uvicorn - Previously required: uv pip install "fastapi[standard]" uvicorn - Now auto-installed with: uv sync Fixes: Missing FastAPI/uvicorn dependencies on fresh Windows installs	2025-12-26 11:57:50 -08:00
Josh Knapp	472233aec4	Initial commit: Local Transcription App v1.0 Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-25 18:48:23 -08:00

7 Commits