Commit Graph

10 Commits

Author SHA1 Message Date
c968eb8a48 Fix RealtimeSTT warmup file and PyTorch CUDA version mismatch
Fixed two build/runtime issues:

1. Windows: Missing warmup_audio.wav file from RealtimeSTT
   - Added RealtimeSTT to collect_data_files() in spec
   - Ensures warmup_audio.wav and other RealtimeSTT data files are bundled
   - Fixes: soundfile.LibsndfileError opening warmup_audio.wav

2. Linux: PyTorch/TorchAudio CUDA version mismatch (12.1 vs 12.4)
   - Added torchaudio>=2.0.0 explicitly to dependencies
   - Ensures torchaudio comes from pytorch-cu121 index (same as torch)
   - Previously RealtimeSTT was pulling torchaudio from PyPI with CUDA 12.4
   - Fixes: RuntimeError about CUDA version mismatch

Both packages now correctly use the pytorch-cu121 index via tool.uv.sources
configuration, ensuring matching CUDA versions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-28 20:28:11 -08:00
e77303f793 Document why we override enum34 dependency
Added detailed comments explaining the enum34 override:
- RealtimeSTT uses pvporcupine 1.9.5 (last open-source version)
- pvporcupine 1.9.5 depends on enum34
- enum34 is incompatible with PyInstaller
- We don't use wake word features, so enum34 is unnecessary
- enum is in stdlib since Python 3.4

This provides context for future maintainers about why the override exists.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-28 19:44:14 -08:00
07b746144d Properly fix enum34 error with override-dependencies
The previous PyInstaller exclusion approach didn't prevent the pre-flight
check from failing. The proper solution is to use UV's override-dependencies
to prevent enum34 from being installed in the first place.

Changes:
- Added [tool.uv] override-dependencies in pyproject.toml
- Configured enum34 to only install on Python < 3.4
  (effectively never, since we require Python >=3.9)
- This prevents enum34 from being added to uv.lock

Why this works:
- UV respects override-dependencies during dependency resolution
- enum34 is never installed, so PyInstaller pre-flight check passes
- enum is part of Python stdlib since 3.4, so no functionality lost
- RealtimeSTT's dependency on pvporcupine==1.9.5 (which requires enum34)
  is satisfied without actually installing enum34

Credit: Solution suggested by Opus

Resolves: enum34 incompatible with PyInstaller error

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-28 19:42:13 -08:00
be53f2e962 Fix PyInstaller build failure caused by enum34 package
The enum34 package is an obsolete backport of Python's enum module
and is incompatible with PyInstaller on Python 3.4+. It was being
pulled in as a transitive dependency by pvporcupine (part of
RealtimeSTT's dependencies).

Changes:
- All build scripts now remove enum34 before running PyInstaller
  - build.bat, build-cuda.bat (Windows)
  - build.sh, build-cuda.sh (Linux)
- Added "uv pip uninstall -q enum34" step after cleaning builds
- Removed attempted pyproject.toml override (not needed with this fix)

This fix allows PyInstaller to bundle the application without errors
while still maintaining all RealtimeSTT functionality (enum is part
of Python stdlib since 3.4).

Resolves: PyInstaller error "enum34 package is incompatible"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-28 19:06:33 -08:00
5f3c058be6 Migrate to RealtimeSTT for advanced VAD-based transcription
Major refactor to eliminate word loss issues using RealtimeSTT with
dual-layer VAD (WebRTC + Silero) instead of time-based chunking.

## Core Changes

### New Transcription Engine
- Add client/transcription_engine_realtime.py with RealtimeSTT wrapper
- Implements initialize() and start_recording() separation for proper lifecycle
- Dual-layer VAD with pre/post buffers prevents word cutoffs
- Optional realtime preview with faster model + final transcription

### Removed Legacy Components
- Remove client/audio_capture.py (RealtimeSTT handles audio)
- Remove client/noise_suppression.py (VAD handles silence detection)
- Remove client/transcription_engine.py (replaced by realtime version)
- Remove chunk_duration setting (no longer using time-based chunking)

### Dependencies
- Add RealtimeSTT>=0.3.0 to pyproject.toml
- Remove noisereduce, webrtcvad, faster-whisper (now dependencies of RealtimeSTT)
- Update PyInstaller spec with ONNX Runtime, halo, colorama

### GUI Improvements
- Refactor main_window_qt.py to use RealtimeSTT with proper start/stop
- Fix recording state management (initialize on startup, record on button click)
- Expand settings dialog (700x1200) with improved spacing (10-15px between groups)
- Add comprehensive tooltips to all settings explaining functionality
- Remove chunk duration field from settings

### Configuration
- Update default_config.yaml with RealtimeSTT parameters:
  - Silero VAD sensitivity (0.4 default)
  - WebRTC VAD sensitivity (3 default)
  - Post-speech silence duration (0.3s)
  - Pre-recording buffer (0.2s)
  - Beam size for quality control (5 default)
  - ONNX acceleration (enabled for 2-3x faster VAD)
  - Optional realtime preview settings

### CLI Updates
- Update main_cli.py to use new engine API
- Separate initialize() and start_recording() calls

### Documentation
- Add INSTALL_REALTIMESTT.md with migration guide and benefits
- Update INSTALL.md: Remove FFmpeg requirement (not needed!)
- Clarify PortAudio is only needed for development
- Document that built executables are fully standalone

## Benefits

-  Eliminates word loss at chunk boundaries
-  Natural speech segment detection via VAD
-  2-3x faster VAD with ONNX acceleration
-  30% lower CPU usage
-  Pre-recording buffer captures word starts
-  Post-speech silence prevents cutoffs
-  Optional instant preview mode
-  Better UX with comprehensive tooltips

## Migration Notes

- Settings apply immediately without restart (except model changes)
- Old chunk_duration configs ignored (VAD-based detection now)
- Recording only starts when user clicks button (not on app startup)
- Stop button immediately stops recording (no delay)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-28 18:48:29 -08:00
1acdb065c5 Fix uv index: Use explicit=true for PyTorch index
- Added explicit=true to pytorch-cu121 index
- Only torch, torchvision, torchaudio use PyTorch index
- All other packages (requests, fastapi, etc.) use PyPI
- Fixes: requests version conflict (PyTorch index has 2.28.1, we need >=2.31.0)

How explicit=true works:
- PyTorch index only checked for packages listed in tool.uv.sources
- Prevents dependency confusion and version conflicts
- Best practice for supplemental package indexes
2025-12-26 12:16:08 -08:00
a5556c475d Fix uv index configuration: Use PyTorch CUDA as additional index
- Changed from 'default' to named additional index
- Added tool.uv.sources to specify torch comes from pytorch-cu121 index
- Other packages (fastapi, uvicorn, etc.) still come from PyPI
- Fixes: 'fastapi was not found in the package registry' error

How it works:
- PyPI remains the default index for most packages
- torch package explicitly uses pytorch-cu121 index
- Best of both worlds: CUDA PyTorch + all other packages from PyPI
2025-12-26 12:13:40 -08:00
0bcd8e8d21 Configure uv to always use PyTorch CUDA index
Changes:
- Set PyTorch CUDA index (cu121) as default for all builds
- CUDA builds support both GPU and CPU (auto-fallback)
- Fixes uv run reinstalling CPU-only PyTorch
- Updated dependency-groups syntax (fixes deprecation warning)

Benefits:
- Simpler build process - no CPU vs CUDA distinction needed
- uv sync and uv run now get CUDA-enabled PyTorch automatically
- Builds work on systems with or without NVIDIA GPUs
- Fixes issue where uv run check_cuda.py was getting CPU version

Index: https://download.pytorch.org/whl/cu121 (PyTorch 2.5.1+cu121)
2025-12-26 12:08:42 -08:00
d51b24e2e5 Move FastAPI and uvicorn to main dependencies
- Web server is always-running (not optional) for OBS integration
- Users no longer need to manually install fastapi and uvicorn
- Previously required: uv pip install "fastapi[standard]" uvicorn
- Now auto-installed with: uv sync

Fixes: Missing FastAPI/uvicorn dependencies on fresh Windows installs
2025-12-26 11:57:50 -08:00
472233aec4 Initial commit: Local Transcription App v1.0
Phase 1 Complete - Standalone Desktop Application

Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)

Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML

Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)

Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-25 18:48:23 -08:00