local-transcription

Author	SHA1	Message	Date
jknapp	ee6dfe00d8	Enable console window for debugging PyInstaller build issues Temporarily enable console output to diagnose "failed to start recording" error in the PyInstaller build. This will show all print() statements and error messages that are currently being hidden. Change console=False to console=True in the spec file. Once the issue is identified and fixed, set back to console=False for a production build without the console window. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 20:40:20 -08:00
jknapp	95e9e8ebad	Fix application icon not showing in PyInstaller builds The icon wasn't working in frozen executables because: 1. LocalTranscription.png wasn't being bundled in the PyInstaller build 2. The code was using Path(__file__).parent which doesn't work in frozen exes Changes: - Added LocalTranscription.png to datas in local-transcription.spec - Updated main.py to use sys._MEIPASS for frozen executables - Updated gui/main_window_qt.py to use sys._MEIPASS for frozen executables - Both files now detect if running frozen and adjust icon path accordingly The icon will now appear correctly in: - Window titlebar - Taskbar (Windows) / Dock (macOS) - Alt-Tab switcher 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 20:35:53 -08:00
jknapp	c968eb8a48	Fix RealtimeSTT warmup file and PyTorch CUDA version mismatch Fixed two build/runtime issues: 1. Windows: Missing warmup_audio.wav file from RealtimeSTT - Added RealtimeSTT to collect_data_files() in spec - Ensures warmup_audio.wav and other RealtimeSTT data files are bundled - Fixes: soundfile.LibsndfileError opening warmup_audio.wav 2. Linux: PyTorch/TorchAudio CUDA version mismatch (12.1 vs 12.4) - Added torchaudio>=2.0.0 explicitly to dependencies - Ensures torchaudio comes from pytorch-cu121 index (same as torch) - Previously RealtimeSTT was pulling torchaudio from PyPI with CUDA 12.4 - Fixes: RuntimeError about CUDA version mismatch Both packages now correctly use the pytorch-cu121 index via tool.uv.sources configuration, ensuring matching CUDA versions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 20:28:11 -08:00
jknapp	4d6dd6d35d	Include pvporcupine resource files in PyInstaller build PyInstaller wasn't bundling pvporcupine's resource files (keyword_files and lib directories), causing a FileNotFoundError at runtime when pvporcupine tried to access its resources directory. Changes: - Added code to detect and include pvporcupine resources and lib folders - Falls back gracefully if pvporcupine is not installed - Resources are bundled even though we don't use wake word features (pvporcupine initializes and checks for these on import) This fixes the runtime error: FileNotFoundError: [WinError 3] The system cannot find the path specified: '...\pvporcupine\resources/keyword_files\windows' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 19:57:13 -08:00
jknapp	9b7f2e1d69	Fix enum34 error by excluding it in PyInstaller spec The previous approach of uninstalling enum34 before PyInstaller didn't work because 'uv run' re-syncs dependencies. The proper solution is to exclude enum34 directly in the PyInstaller spec file. Changes: - Added hooks/hook-enum34.py: Custom PyInstaller hook to exclude enum34 - Updated local-transcription.spec: - Added 'hooks' to hookspath - Added 'enum34' to excludes list - Updated build.sh and build.bat: - Removed enum34 uninstall step (no longer needed) - Added comment explaining enum34 is excluded in spec Why this works: - PyInstaller's excludes list prevents enum34 from being bundled - The custom hook provides documentation and explicit exclusion - enum34 can remain installed in venv (won't break anything) - Works regardless of 'uv run' re-syncing dependencies enum34 is an obsolete Python 2.7/3.3 backport that's incompatible with PyInstaller and unnecessary on Python 3.4+ (enum is in stdlib). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 19:32:23 -08:00
jknapp	20a7764bab	Add application icon support for GUI and compiled executables Added platform-specific icon support for both the running application and compiled executables: New files: - create_icons.py: Script to convert PNG to platform-specific formats - Generates .ico for Windows (16, 32, 48, 256px sizes) - Generates .iconset for macOS (ready for iconutil conversion) - LocalTranscription.png: Source icon image - LocalTranscription.ico: Windows icon file (multi-size) - LocalTranscription.iconset/: macOS icon set (needs iconutil on macOS) GUI changes: - main.py: Set application-wide icon for taskbar/dock - main_window_qt.py: Set window icon for GUI window Build configuration: - local-transcription.spec: Use platform-specific icons in PyInstaller - Windows builds use LocalTranscription.ico - macOS builds use LocalTranscription.icns (when generated) To generate macOS .icns file on macOS: iconutil -c icns LocalTranscription.iconset 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 18:59:24 -08:00
jknapp	5f3c058be6	Migrate to RealtimeSTT for advanced VAD-based transcription Major refactor to eliminate word loss issues using RealtimeSTT with dual-layer VAD (WebRTC + Silero) instead of time-based chunking. ## Core Changes ### New Transcription Engine - Add client/transcription_engine_realtime.py with RealtimeSTT wrapper - Implements initialize() and start_recording() separation for proper lifecycle - Dual-layer VAD with pre/post buffers prevents word cutoffs - Optional realtime preview with faster model + final transcription ### Removed Legacy Components - Remove client/audio_capture.py (RealtimeSTT handles audio) - Remove client/noise_suppression.py (VAD handles silence detection) - Remove client/transcription_engine.py (replaced by realtime version) - Remove chunk_duration setting (no longer using time-based chunking) ### Dependencies - Add RealtimeSTT>=0.3.0 to pyproject.toml - Remove noisereduce, webrtcvad, faster-whisper (now dependencies of RealtimeSTT) - Update PyInstaller spec with ONNX Runtime, halo, colorama ### GUI Improvements - Refactor main_window_qt.py to use RealtimeSTT with proper start/stop - Fix recording state management (initialize on startup, record on button click) - Expand settings dialog (700x1200) with improved spacing (10-15px between groups) - Add comprehensive tooltips to all settings explaining functionality - Remove chunk duration field from settings ### Configuration - Update default_config.yaml with RealtimeSTT parameters: - Silero VAD sensitivity (0.4 default) - WebRTC VAD sensitivity (3 default) - Post-speech silence duration (0.3s) - Pre-recording buffer (0.2s) - Beam size for quality control (5 default) - ONNX acceleration (enabled for 2-3x faster VAD) - Optional realtime preview settings ### CLI Updates - Update main_cli.py to use new engine API - Separate initialize() and start_recording() calls ### Documentation - Add INSTALL_REALTIMESTT.md with migration guide and benefits - Update INSTALL.md: Remove FFmpeg requirement (not needed!) - Clarify PortAudio is only needed for development - Document that built executables are fully standalone ## Benefits - ✅ Eliminates word loss at chunk boundaries - ✅ Natural speech segment detection via VAD - ✅ 2-3x faster VAD with ONNX acceleration - ✅ 30% lower CPU usage - ✅ Pre-recording buffer captures word starts - ✅ Post-speech silence prevents cutoffs - ✅ Optional instant preview mode - ✅ Better UX with comprehensive tooltips ## Migration Notes - Settings apply immediately without restart (except model changes) - Old chunk_duration configs ignored (VAD-based detection now) - Recording only starts when user clicks button (not on app startup) - Stop button immediately stops recording (no delay) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 18:48:29 -08:00
jknapp	478146c58d	Improve UX: hide console window and fade connection status - Hide console window on compiled desktop app (console=False in spec) - Add 20-second auto-fade to "Connected" status in OBS display - Keep "Disconnected" status visible until reconnection - Add PM2 deployment configuration and documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 17:04:28 -08:00
Josh Knapp	6ec350af69	Fix Windows FastAPI import: Replace collect_all with collect_submodules Research findings: - collect_all() has design flaws and poor performance with pydantic - Pydantic uses compiled cpython extensions that prevent module discovery - collect_submodules() is the recommended approach per PyInstaller docs Changes: - Replaced collect_all() with collect_submodules() for better reliability - Now collects 105 pydantic submodules (vs unreliable collect_all) - Added collect_data_files() for packages requiring data files - Added explicit pydantic dependencies: colorsys, decimal, json, etc. - Applies to both Windows AND Linux (no longer platform-specific) Results: ✓ Collected 52 submodules from fastapi ✓ Collected 34 submodules from starlette ✓ Collected 105 submodules from pydantic ✓ Collected 3 submodules from pydantic_core ✓ Plus uvicorn, websockets, h11, anyio Fixes: ModuleNotFoundError: No module named 'fastapi' on Windows Based on: https://github.com/pyinstaller/pyinstaller/issues/5359	2025-12-26 11:30:29 -08:00
Josh Knapp	926910177d	Fix Windows build: Use collect_all for FastAPI packages - On Windows, PyInstaller wasn't properly bundling FastAPI dependencies - Added platform-specific collection using PyInstaller.utils.hooks.collect_all - Only applies aggressive collection on Windows to keep Linux builds stable - Collects all submodules and data files for: fastapi, starlette, pydantic, pydantic_core, anyio, uvicorn, websockets, h11 - Linux builds remain unchanged and continue to work as before Fixes: ModuleNotFoundError: No module named 'fastapi' on Windows executable	2025-12-26 11:01:43 -08:00
Josh Knapp	0ee3f1003e	Fix Windows build: Add FastAPI and dependencies to hiddenimports Fixed PyInstaller build error on Windows: "ModuleNotFoundError: No module named 'fastapi'" Added to hiddenimports: - FastAPI and its core modules - Starlette (FastAPI framework base) - Pydantic (data validation) - anyio, sniffio (async libraries) - h11, websockets (protocol implementations) - requests and dependencies (for server sync client) This ensures all web server dependencies are bundled in the executable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 10:34:11 -08:00
Josh Knapp	003c27c8d5	Fix: Bundle Silero VAD model with PyInstaller Fixed PyInstaller build error where the Voice Activity Detection (VAD) model was missing from the compiled executable. Changes: - Added faster_whisper/assets folder to PyInstaller datas - Includes silero_vad_v6.onnx (1.2MB) in the build - Resolves ONNXRuntimeError on transcription start Error fixed: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from .../faster_whisper/assets/silero_vad_v6.onnx failed: File doesn't exist 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 08:26:58 -08:00
Josh Knapp	472233aec4	Initial commit: Local Transcription App v1.0 Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-25 18:48:23 -08:00

13 Commits