Temporarily enable console output to diagnose "failed to start recording"
error in the PyInstaller build. This will show all print() statements and
error messages that are currently being hidden.
Change console=False to console=True in the spec file.
Once the issue is identified and fixed, set back to console=False for
a production build without the console window.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The icon wasn't working in frozen executables because:
1. LocalTranscription.png wasn't being bundled in the PyInstaller build
2. The code was using Path(__file__).parent which doesn't work in frozen exes
Changes:
- Added LocalTranscription.png to datas in local-transcription.spec
- Updated main.py to use sys._MEIPASS for frozen executables
- Updated gui/main_window_qt.py to use sys._MEIPASS for frozen executables
- Both files now detect if running frozen and adjust icon path accordingly
The icon will now appear correctly in:
- Window titlebar
- Taskbar (Windows) / Dock (macOS)
- Alt-Tab switcher
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed two build/runtime issues:
1. Windows: Missing warmup_audio.wav file from RealtimeSTT
- Added RealtimeSTT to collect_data_files() in spec
- Ensures warmup_audio.wav and other RealtimeSTT data files are bundled
- Fixes: soundfile.LibsndfileError opening warmup_audio.wav
2. Linux: PyTorch/TorchAudio CUDA version mismatch (12.1 vs 12.4)
- Added torchaudio>=2.0.0 explicitly to dependencies
- Ensures torchaudio comes from pytorch-cu121 index (same as torch)
- Previously RealtimeSTT was pulling torchaudio from PyPI with CUDA 12.4
- Fixes: RuntimeError about CUDA version mismatch
Both packages now correctly use the pytorch-cu121 index via tool.uv.sources
configuration, ensuring matching CUDA versions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
PyInstaller wasn't bundling pvporcupine's resource files (keyword_files
and lib directories), causing a FileNotFoundError at runtime when
pvporcupine tried to access its resources directory.
Changes:
- Added code to detect and include pvporcupine resources and lib folders
- Falls back gracefully if pvporcupine is not installed
- Resources are bundled even though we don't use wake word features
(pvporcupine initializes and checks for these on import)
This fixes the runtime error:
FileNotFoundError: [WinError 3] The system cannot find the path
specified: '...\pvporcupine\resources/keyword_files\windows'
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The previous approach of uninstalling enum34 before PyInstaller didn't
work because 'uv run' re-syncs dependencies. The proper solution is to
exclude enum34 directly in the PyInstaller spec file.
Changes:
- Added hooks/hook-enum34.py: Custom PyInstaller hook to exclude enum34
- Updated local-transcription.spec:
- Added 'hooks' to hookspath
- Added 'enum34' to excludes list
- Updated build.sh and build.bat:
- Removed enum34 uninstall step (no longer needed)
- Added comment explaining enum34 is excluded in spec
Why this works:
- PyInstaller's excludes list prevents enum34 from being bundled
- The custom hook provides documentation and explicit exclusion
- enum34 can remain installed in venv (won't break anything)
- Works regardless of 'uv run' re-syncing dependencies
enum34 is an obsolete Python 2.7/3.3 backport that's incompatible with
PyInstaller and unnecessary on Python 3.4+ (enum is in stdlib).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added platform-specific icon support for both the running application
and compiled executables:
New files:
- create_icons.py: Script to convert PNG to platform-specific formats
- Generates .ico for Windows (16, 32, 48, 256px sizes)
- Generates .iconset for macOS (ready for iconutil conversion)
- LocalTranscription.png: Source icon image
- LocalTranscription.ico: Windows icon file (multi-size)
- LocalTranscription.iconset/: macOS icon set (needs iconutil on macOS)
GUI changes:
- main.py: Set application-wide icon for taskbar/dock
- main_window_qt.py: Set window icon for GUI window
Build configuration:
- local-transcription.spec: Use platform-specific icons in PyInstaller
- Windows builds use LocalTranscription.ico
- macOS builds use LocalTranscription.icns (when generated)
To generate macOS .icns file on macOS:
iconutil -c icns LocalTranscription.iconset
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major refactor to eliminate word loss issues using RealtimeSTT with
dual-layer VAD (WebRTC + Silero) instead of time-based chunking.
## Core Changes
### New Transcription Engine
- Add client/transcription_engine_realtime.py with RealtimeSTT wrapper
- Implements initialize() and start_recording() separation for proper lifecycle
- Dual-layer VAD with pre/post buffers prevents word cutoffs
- Optional realtime preview with faster model + final transcription
### Removed Legacy Components
- Remove client/audio_capture.py (RealtimeSTT handles audio)
- Remove client/noise_suppression.py (VAD handles silence detection)
- Remove client/transcription_engine.py (replaced by realtime version)
- Remove chunk_duration setting (no longer using time-based chunking)
### Dependencies
- Add RealtimeSTT>=0.3.0 to pyproject.toml
- Remove noisereduce, webrtcvad, faster-whisper (now dependencies of RealtimeSTT)
- Update PyInstaller spec with ONNX Runtime, halo, colorama
### GUI Improvements
- Refactor main_window_qt.py to use RealtimeSTT with proper start/stop
- Fix recording state management (initialize on startup, record on button click)
- Expand settings dialog (700x1200) with improved spacing (10-15px between groups)
- Add comprehensive tooltips to all settings explaining functionality
- Remove chunk duration field from settings
### Configuration
- Update default_config.yaml with RealtimeSTT parameters:
- Silero VAD sensitivity (0.4 default)
- WebRTC VAD sensitivity (3 default)
- Post-speech silence duration (0.3s)
- Pre-recording buffer (0.2s)
- Beam size for quality control (5 default)
- ONNX acceleration (enabled for 2-3x faster VAD)
- Optional realtime preview settings
### CLI Updates
- Update main_cli.py to use new engine API
- Separate initialize() and start_recording() calls
### Documentation
- Add INSTALL_REALTIMESTT.md with migration guide and benefits
- Update INSTALL.md: Remove FFmpeg requirement (not needed!)
- Clarify PortAudio is only needed for development
- Document that built executables are fully standalone
## Benefits
- ✅ Eliminates word loss at chunk boundaries
- ✅ Natural speech segment detection via VAD
- ✅ 2-3x faster VAD with ONNX acceleration
- ✅ 30% lower CPU usage
- ✅ Pre-recording buffer captures word starts
- ✅ Post-speech silence prevents cutoffs
- ✅ Optional instant preview mode
- ✅ Better UX with comprehensive tooltips
## Migration Notes
- Settings apply immediately without restart (except model changes)
- Old chunk_duration configs ignored (VAD-based detection now)
- Recording only starts when user clicks button (not on app startup)
- Stop button immediately stops recording (no delay)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Hide console window on compiled desktop app (console=False in spec)
- Add 20-second auto-fade to "Connected" status in OBS display
- Keep "Disconnected" status visible until reconnection
- Add PM2 deployment configuration and documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Research findings:
- collect_all() has design flaws and poor performance with pydantic
- Pydantic uses compiled cpython extensions that prevent module discovery
- collect_submodules() is the recommended approach per PyInstaller docs
Changes:
- Replaced collect_all() with collect_submodules() for better reliability
- Now collects 105 pydantic submodules (vs unreliable collect_all)
- Added collect_data_files() for packages requiring data files
- Added explicit pydantic dependencies: colorsys, decimal, json, etc.
- Applies to both Windows AND Linux (no longer platform-specific)
Results:
✓ Collected 52 submodules from fastapi
✓ Collected 34 submodules from starlette
✓ Collected 105 submodules from pydantic
✓ Collected 3 submodules from pydantic_core
✓ Plus uvicorn, websockets, h11, anyio
Fixes: ModuleNotFoundError: No module named 'fastapi' on Windows
Based on: https://github.com/pyinstaller/pyinstaller/issues/5359
- On Windows, PyInstaller wasn't properly bundling FastAPI dependencies
- Added platform-specific collection using PyInstaller.utils.hooks.collect_all
- Only applies aggressive collection on Windows to keep Linux builds stable
- Collects all submodules and data files for: fastapi, starlette, pydantic,
pydantic_core, anyio, uvicorn, websockets, h11
- Linux builds remain unchanged and continue to work as before
Fixes: ModuleNotFoundError: No module named 'fastapi' on Windows executable
Fixed PyInstaller build error on Windows:
"ModuleNotFoundError: No module named 'fastapi'"
Added to hiddenimports:
- FastAPI and its core modules
- Starlette (FastAPI framework base)
- Pydantic (data validation)
- anyio, sniffio (async libraries)
- h11, websockets (protocol implementations)
- requests and dependencies (for server sync client)
This ensures all web server dependencies are bundled in the executable.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed PyInstaller build error where the Voice Activity Detection (VAD)
model was missing from the compiled executable.
Changes:
- Added faster_whisper/assets folder to PyInstaller datas
- Includes silero_vad_v6.onnx (1.2MB) in the build
- Resolves ONNXRuntimeError on transcription start
Error fixed:
[ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from
.../faster_whisper/assets/silero_vad_v6.onnx failed: File doesn't exist
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Phase 1 Complete - Standalone Desktop Application
Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)
Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML
Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)
Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>