local-transcription

Author	SHA1	Message	Date
jknapp	c968eb8a48	Fix RealtimeSTT warmup file and PyTorch CUDA version mismatch Fixed two build/runtime issues: 1. Windows: Missing warmup_audio.wav file from RealtimeSTT - Added RealtimeSTT to collect_data_files() in spec - Ensures warmup_audio.wav and other RealtimeSTT data files are bundled - Fixes: soundfile.LibsndfileError opening warmup_audio.wav 2. Linux: PyTorch/TorchAudio CUDA version mismatch (12.1 vs 12.4) - Added torchaudio>=2.0.0 explicitly to dependencies - Ensures torchaudio comes from pytorch-cu121 index (same as torch) - Previously RealtimeSTT was pulling torchaudio from PyPI with CUDA 12.4 - Fixes: RuntimeError about CUDA version mismatch Both packages now correctly use the pytorch-cu121 index via tool.uv.sources configuration, ensuring matching CUDA versions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 20:28:11 -08:00
jknapp	e77303f793	Document why we override enum34 dependency Added detailed comments explaining the enum34 override: - RealtimeSTT uses pvporcupine 1.9.5 (last open-source version) - pvporcupine 1.9.5 depends on enum34 - enum34 is incompatible with PyInstaller - We don't use wake word features, so enum34 is unnecessary - enum is in stdlib since Python 3.4 This provides context for future maintainers about why the override exists. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 19:44:14 -08:00
jknapp	07b746144d	Properly fix enum34 error with override-dependencies The previous PyInstaller exclusion approach didn't prevent the pre-flight check from failing. The proper solution is to use UV's override-dependencies to prevent enum34 from being installed in the first place. Changes: - Added [tool.uv] override-dependencies in pyproject.toml - Configured enum34 to only install on Python < 3.4 (effectively never, since we require Python >=3.9) - This prevents enum34 from being added to uv.lock Why this works: - UV respects override-dependencies during dependency resolution - enum34 is never installed, so PyInstaller pre-flight check passes - enum is part of Python stdlib since 3.4, so no functionality lost - RealtimeSTT's dependency on pvporcupine==1.9.5 (which requires enum34) is satisfied without actually installing enum34 Credit: Solution suggested by Opus Resolves: enum34 incompatible with PyInstaller error 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 19:42:13 -08:00
jknapp	be53f2e962	Fix PyInstaller build failure caused by enum34 package The enum34 package is an obsolete backport of Python's enum module and is incompatible with PyInstaller on Python 3.4+. It was being pulled in as a transitive dependency by pvporcupine (part of RealtimeSTT's dependencies). Changes: - All build scripts now remove enum34 before running PyInstaller - build.bat, build-cuda.bat (Windows) - build.sh, build-cuda.sh (Linux) - Added "uv pip uninstall -q enum34" step after cleaning builds - Removed attempted pyproject.toml override (not needed with this fix) This fix allows PyInstaller to bundle the application without errors while still maintaining all RealtimeSTT functionality (enum is part of Python stdlib since 3.4). Resolves: PyInstaller error "enum34 package is incompatible" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 19:06:33 -08:00
jknapp	5f3c058be6	Migrate to RealtimeSTT for advanced VAD-based transcription Major refactor to eliminate word loss issues using RealtimeSTT with dual-layer VAD (WebRTC + Silero) instead of time-based chunking. ## Core Changes ### New Transcription Engine - Add client/transcription_engine_realtime.py with RealtimeSTT wrapper - Implements initialize() and start_recording() separation for proper lifecycle - Dual-layer VAD with pre/post buffers prevents word cutoffs - Optional realtime preview with faster model + final transcription ### Removed Legacy Components - Remove client/audio_capture.py (RealtimeSTT handles audio) - Remove client/noise_suppression.py (VAD handles silence detection) - Remove client/transcription_engine.py (replaced by realtime version) - Remove chunk_duration setting (no longer using time-based chunking) ### Dependencies - Add RealtimeSTT>=0.3.0 to pyproject.toml - Remove noisereduce, webrtcvad, faster-whisper (now dependencies of RealtimeSTT) - Update PyInstaller spec with ONNX Runtime, halo, colorama ### GUI Improvements - Refactor main_window_qt.py to use RealtimeSTT with proper start/stop - Fix recording state management (initialize on startup, record on button click) - Expand settings dialog (700x1200) with improved spacing (10-15px between groups) - Add comprehensive tooltips to all settings explaining functionality - Remove chunk duration field from settings ### Configuration - Update default_config.yaml with RealtimeSTT parameters: - Silero VAD sensitivity (0.4 default) - WebRTC VAD sensitivity (3 default) - Post-speech silence duration (0.3s) - Pre-recording buffer (0.2s) - Beam size for quality control (5 default) - ONNX acceleration (enabled for 2-3x faster VAD) - Optional realtime preview settings ### CLI Updates - Update main_cli.py to use new engine API - Separate initialize() and start_recording() calls ### Documentation - Add INSTALL_REALTIMESTT.md with migration guide and benefits - Update INSTALL.md: Remove FFmpeg requirement (not needed!) - Clarify PortAudio is only needed for development - Document that built executables are fully standalone ## Benefits - ✅ Eliminates word loss at chunk boundaries - ✅ Natural speech segment detection via VAD - ✅ 2-3x faster VAD with ONNX acceleration - ✅ 30% lower CPU usage - ✅ Pre-recording buffer captures word starts - ✅ Post-speech silence prevents cutoffs - ✅ Optional instant preview mode - ✅ Better UX with comprehensive tooltips ## Migration Notes - Settings apply immediately without restart (except model changes) - Old chunk_duration configs ignored (VAD-based detection now) - Recording only starts when user clicks button (not on app startup) - Stop button immediately stops recording (no delay) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 18:48:29 -08:00
Josh Knapp	1acdb065c5	Fix uv index: Use explicit=true for PyTorch index - Added explicit=true to pytorch-cu121 index - Only torch, torchvision, torchaudio use PyTorch index - All other packages (requests, fastapi, etc.) use PyPI - Fixes: requests version conflict (PyTorch index has 2.28.1, we need >=2.31.0) How explicit=true works: - PyTorch index only checked for packages listed in tool.uv.sources - Prevents dependency confusion and version conflicts - Best practice for supplemental package indexes	2025-12-26 12:16:08 -08:00
Josh Knapp	a5556c475d	Fix uv index configuration: Use PyTorch CUDA as additional index - Changed from 'default' to named additional index - Added tool.uv.sources to specify torch comes from pytorch-cu121 index - Other packages (fastapi, uvicorn, etc.) still come from PyPI - Fixes: 'fastapi was not found in the package registry' error How it works: - PyPI remains the default index for most packages - torch package explicitly uses pytorch-cu121 index - Best of both worlds: CUDA PyTorch + all other packages from PyPI	2025-12-26 12:13:40 -08:00
Josh Knapp	0bcd8e8d21	Configure uv to always use PyTorch CUDA index Changes: - Set PyTorch CUDA index (cu121) as default for all builds - CUDA builds support both GPU and CPU (auto-fallback) - Fixes uv run reinstalling CPU-only PyTorch - Updated dependency-groups syntax (fixes deprecation warning) Benefits: - Simpler build process - no CPU vs CUDA distinction needed - uv sync and uv run now get CUDA-enabled PyTorch automatically - Builds work on systems with or without NVIDIA GPUs - Fixes issue where uv run check_cuda.py was getting CPU version Index: https://download.pytorch.org/whl/cu121 (PyTorch 2.5.1+cu121)	2025-12-26 12:08:42 -08:00
Josh Knapp	d51b24e2e5	Move FastAPI and uvicorn to main dependencies - Web server is always-running (not optional) for OBS integration - Users no longer need to manually install fastapi and uvicorn - Previously required: uv pip install "fastapi[standard]" uvicorn - Now auto-installed with: uv sync Fixes: Missing FastAPI/uvicorn dependencies on fresh Windows installs	2025-12-26 11:57:50 -08:00
Josh Knapp	472233aec4	Initial commit: Local Transcription App v1.0 Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-25 18:48:23 -08:00

10 Commits