Commit Graph

7 Commits

Author SHA1 Message Date
be53f2e962 Fix PyInstaller build failure caused by enum34 package
The enum34 package is an obsolete backport of Python's enum module
and is incompatible with PyInstaller on Python 3.4+. It was being
pulled in as a transitive dependency by pvporcupine (part of
RealtimeSTT's dependencies).

Changes:
- All build scripts now remove enum34 before running PyInstaller
  - build.bat, build-cuda.bat (Windows)
  - build.sh, build-cuda.sh (Linux)
- Added "uv pip uninstall -q enum34" step after cleaning builds
- Removed attempted pyproject.toml override (not needed with this fix)

This fix allows PyInstaller to bundle the application without errors
while still maintaining all RealtimeSTT functionality (enum is part
of Python stdlib since 3.4).

Resolves: PyInstaller error "enum34 package is incompatible"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-28 19:06:33 -08:00
5f3c058be6 Migrate to RealtimeSTT for advanced VAD-based transcription
Major refactor to eliminate word loss issues using RealtimeSTT with
dual-layer VAD (WebRTC + Silero) instead of time-based chunking.

## Core Changes

### New Transcription Engine
- Add client/transcription_engine_realtime.py with RealtimeSTT wrapper
- Implements initialize() and start_recording() separation for proper lifecycle
- Dual-layer VAD with pre/post buffers prevents word cutoffs
- Optional realtime preview with faster model + final transcription

### Removed Legacy Components
- Remove client/audio_capture.py (RealtimeSTT handles audio)
- Remove client/noise_suppression.py (VAD handles silence detection)
- Remove client/transcription_engine.py (replaced by realtime version)
- Remove chunk_duration setting (no longer using time-based chunking)

### Dependencies
- Add RealtimeSTT>=0.3.0 to pyproject.toml
- Remove noisereduce, webrtcvad, faster-whisper (now dependencies of RealtimeSTT)
- Update PyInstaller spec with ONNX Runtime, halo, colorama

### GUI Improvements
- Refactor main_window_qt.py to use RealtimeSTT with proper start/stop
- Fix recording state management (initialize on startup, record on button click)
- Expand settings dialog (700x1200) with improved spacing (10-15px between groups)
- Add comprehensive tooltips to all settings explaining functionality
- Remove chunk duration field from settings

### Configuration
- Update default_config.yaml with RealtimeSTT parameters:
  - Silero VAD sensitivity (0.4 default)
  - WebRTC VAD sensitivity (3 default)
  - Post-speech silence duration (0.3s)
  - Pre-recording buffer (0.2s)
  - Beam size for quality control (5 default)
  - ONNX acceleration (enabled for 2-3x faster VAD)
  - Optional realtime preview settings

### CLI Updates
- Update main_cli.py to use new engine API
- Separate initialize() and start_recording() calls

### Documentation
- Add INSTALL_REALTIMESTT.md with migration guide and benefits
- Update INSTALL.md: Remove FFmpeg requirement (not needed!)
- Clarify PortAudio is only needed for development
- Document that built executables are fully standalone

## Benefits

-  Eliminates word loss at chunk boundaries
-  Natural speech segment detection via VAD
-  2-3x faster VAD with ONNX acceleration
-  30% lower CPU usage
-  Pre-recording buffer captures word starts
-  Post-speech silence prevents cutoffs
-  Optional instant preview mode
-  Better UX with comprehensive tooltips

## Migration Notes

- Settings apply immediately without restart (except model changes)
- Old chunk_duration configs ignored (VAD-based detection now)
- Recording only starts when user clicks button (not on app startup)
- Stop button immediately stops recording (no delay)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-28 18:48:29 -08:00
1acdb065c5 Fix uv index: Use explicit=true for PyTorch index
- Added explicit=true to pytorch-cu121 index
- Only torch, torchvision, torchaudio use PyTorch index
- All other packages (requests, fastapi, etc.) use PyPI
- Fixes: requests version conflict (PyTorch index has 2.28.1, we need >=2.31.0)

How explicit=true works:
- PyTorch index only checked for packages listed in tool.uv.sources
- Prevents dependency confusion and version conflicts
- Best practice for supplemental package indexes
2025-12-26 12:16:08 -08:00
a5556c475d Fix uv index configuration: Use PyTorch CUDA as additional index
- Changed from 'default' to named additional index
- Added tool.uv.sources to specify torch comes from pytorch-cu121 index
- Other packages (fastapi, uvicorn, etc.) still come from PyPI
- Fixes: 'fastapi was not found in the package registry' error

How it works:
- PyPI remains the default index for most packages
- torch package explicitly uses pytorch-cu121 index
- Best of both worlds: CUDA PyTorch + all other packages from PyPI
2025-12-26 12:13:40 -08:00
0bcd8e8d21 Configure uv to always use PyTorch CUDA index
Changes:
- Set PyTorch CUDA index (cu121) as default for all builds
- CUDA builds support both GPU and CPU (auto-fallback)
- Fixes uv run reinstalling CPU-only PyTorch
- Updated dependency-groups syntax (fixes deprecation warning)

Benefits:
- Simpler build process - no CPU vs CUDA distinction needed
- uv sync and uv run now get CUDA-enabled PyTorch automatically
- Builds work on systems with or without NVIDIA GPUs
- Fixes issue where uv run check_cuda.py was getting CPU version

Index: https://download.pytorch.org/whl/cu121 (PyTorch 2.5.1+cu121)
2025-12-26 12:08:42 -08:00
d51b24e2e5 Move FastAPI and uvicorn to main dependencies
- Web server is always-running (not optional) for OBS integration
- Users no longer need to manually install fastapi and uvicorn
- Previously required: uv pip install "fastapi[standard]" uvicorn
- Now auto-installed with: uv sync

Fixes: Missing FastAPI/uvicorn dependencies on fresh Windows installs
2025-12-26 11:57:50 -08:00
472233aec4 Initial commit: Local Transcription App v1.0
Phase 1 Complete - Standalone Desktop Application

Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)

Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML

Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)

Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-25 18:48:23 -08:00