local-transcription

Author	SHA1	Message	Date
jknapp	20a7764bab	Add application icon support for GUI and compiled executables Added platform-specific icon support for both the running application and compiled executables: New files: - create_icons.py: Script to convert PNG to platform-specific formats - Generates .ico for Windows (16, 32, 48, 256px sizes) - Generates .iconset for macOS (ready for iconutil conversion) - LocalTranscription.png: Source icon image - LocalTranscription.ico: Windows icon file (multi-size) - LocalTranscription.iconset/: macOS icon set (needs iconutil on macOS) GUI changes: - main.py: Set application-wide icon for taskbar/dock - main_window_qt.py: Set window icon for GUI window Build configuration: - local-transcription.spec: Use platform-specific icons in PyInstaller - Windows builds use LocalTranscription.ico - macOS builds use LocalTranscription.icns (when generated) To generate macOS .icns file on macOS: iconutil -c icns LocalTranscription.iconset 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 18:59:24 -08:00
jknapp	5f3c058be6	Migrate to RealtimeSTT for advanced VAD-based transcription Major refactor to eliminate word loss issues using RealtimeSTT with dual-layer VAD (WebRTC + Silero) instead of time-based chunking. ## Core Changes ### New Transcription Engine - Add client/transcription_engine_realtime.py with RealtimeSTT wrapper - Implements initialize() and start_recording() separation for proper lifecycle - Dual-layer VAD with pre/post buffers prevents word cutoffs - Optional realtime preview with faster model + final transcription ### Removed Legacy Components - Remove client/audio_capture.py (RealtimeSTT handles audio) - Remove client/noise_suppression.py (VAD handles silence detection) - Remove client/transcription_engine.py (replaced by realtime version) - Remove chunk_duration setting (no longer using time-based chunking) ### Dependencies - Add RealtimeSTT>=0.3.0 to pyproject.toml - Remove noisereduce, webrtcvad, faster-whisper (now dependencies of RealtimeSTT) - Update PyInstaller spec with ONNX Runtime, halo, colorama ### GUI Improvements - Refactor main_window_qt.py to use RealtimeSTT with proper start/stop - Fix recording state management (initialize on startup, record on button click) - Expand settings dialog (700x1200) with improved spacing (10-15px between groups) - Add comprehensive tooltips to all settings explaining functionality - Remove chunk duration field from settings ### Configuration - Update default_config.yaml with RealtimeSTT parameters: - Silero VAD sensitivity (0.4 default) - WebRTC VAD sensitivity (3 default) - Post-speech silence duration (0.3s) - Pre-recording buffer (0.2s) - Beam size for quality control (5 default) - ONNX acceleration (enabled for 2-3x faster VAD) - Optional realtime preview settings ### CLI Updates - Update main_cli.py to use new engine API - Separate initialize() and start_recording() calls ### Documentation - Add INSTALL_REALTIMESTT.md with migration guide and benefits - Update INSTALL.md: Remove FFmpeg requirement (not needed!) - Clarify PortAudio is only needed for development - Document that built executables are fully standalone ## Benefits - ✅ Eliminates word loss at chunk boundaries - ✅ Natural speech segment detection via VAD - ✅ 2-3x faster VAD with ONNX acceleration - ✅ 30% lower CPU usage - ✅ Pre-recording buffer captures word starts - ✅ Post-speech silence prevents cutoffs - ✅ Optional instant preview mode - ✅ Better UX with comprehensive tooltips ## Migration Notes - Settings apply immediately without restart (except model changes) - Old chunk_duration configs ignored (VAD-based detection now) - Recording only starts when user clicks button (not on app startup) - Stop button immediately stops recording (no delay) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 18:48:29 -08:00
jknapp	eeeb488529	Add loading splash screen for app startup Splash Screen Features: - Shows "Local Transcription" branding during startup - Displays progress messages as app initializes - Prevents users from clicking multiple times while loading - Clean dark theme matching app design Implementation: - Created splash screen with custom pixmap drawing - Updates messages during initialization phases: - "Loading configuration..." - "Creating user interface..." - "Starting web server..." - "Loading Whisper model..." - Automatically closes when main window is ready - Always stays on top to remain visible Benefits: - Better user experience during model loading (2-5 seconds) - Prevents multiple app instances from confusion - Professional appearance - Clear feedback that app is starting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-27 06:33:44 -08:00
jknapp	bd0e84c5e7	Fix model switching crash and improve error handling Model Reload Fixes: - Properly disconnect signals before reconnecting to prevent duplicate connections - Wait for previous model loader thread to finish before starting new one - Add garbage collection after unloading model to free memory - Improve error handling in model reload callback Settings Dialog: - Remove duplicate success message (callback handles it) - Only show message if no callback is defined Transcription Engine: - Explicitly delete model reference before setting to None - Force garbage collection to ensure memory is freed This prevents crashes when switching models, especially when done multiple times in succession or while the app is under load. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-27 06:28:40 -08:00
jknapp	146a8c8beb	Enhance display customization and remove PHP server Major improvements to display configuration and server architecture: Display Enhancements: - Add URL parameters for display customization (timestamps, maxlines, fontsize, fontfamily) - Fix max lines enforcement to prevent scroll bars in OBS - Apply font family and size settings to both local and sync displays - Remove auto-scroll, enforce overflow:hidden for clean OBS integration Node.js Server: - Add timestamps toggle: timestamps=true/false - Add max lines limit: maxlines=50 - Add font configuration: fontsize=16, fontfamily=Arial - Update index page with URL parameters documentation - Improve display URLs in room generation Local Web Server: - Add max_lines, font_family, font_size configuration - Respect settings from GUI configuration - Apply changes immediately without restart Architecture: - Remove PHP server implementation (Node.js recommended) - Update all documentation to reference Node.js server - Update default config URLs to Node.js endpoints - Clean up 1700+ lines of PHP code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-27 06:15:55 -08:00
jknapp	e831dadd24	Fix app stability: graceful model switching and web server improvements - Add comprehensive error handling to prevent crashes during model reload - Implement automatic port fallback (8080-8084) for web server conflicts - Configure uvicorn to work properly with PyInstaller console=False builds - Add proper web server shutdown on app close to release ports - Improve error reporting with full tracebacks for debugging Fixes: - App crashing when switching models - Web server not starting after app crash (port conflict) - Web server failing silently in compiled builds without console 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 17:50:37 -08:00
jknapp	64c864b0f0	Fix multi-user server sync performance and integration Major fixes: - Integrated ServerSyncClient into GUI for actual multi-user sync - Fixed CUDA device display to show actual hardware used - Optimized server sync with parallel HTTP requests (5x faster) - Fixed 2-second DNS delay by using 127.0.0.1 instead of localhost - Added comprehensive debugging and performance logging Performance improvements: - HTTP requests: 2045ms → 52ms (97% faster) - Multi-user sync lag: ~4s → ~100ms (97% faster) - Parallel request processing with ThreadPoolExecutor (3 workers) New features: - Room generator with one-click copy on Node.js landing page - Auto-detection of PHP vs Node.js server types - Localhost warning banner for WSL2 users - Comprehensive debug logging throughout sync pipeline Files modified: - gui/main_window_qt.py - Server sync integration, device display fix - client/server_sync.py - Parallel HTTP, server type detection - server/nodejs/server.js - Room generator, warnings, debug logs Documentation added: - PERFORMANCE_FIX.md - Server sync optimization details - FIX_2_SECOND_HTTP_DELAY.md - DNS/localhost issue solution - LATENCY_GUIDE.md - Audio chunk duration tuning guide - DEBUG_4_SECOND_LAG.md - Comprehensive debugging guide - SESSION_SUMMARY.md - Complete session summary 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 16:44:55 -08:00
jknapp	c28679acb6	Update to support sync captions	2025-12-26 16:15:52 -08:00
Josh Knapp	aa8c294fdc	Add clickable web display link to main UI Added a clickable link in the status bar that opens the web display in the default browser. This makes it easy for users to access the OBS browser source without manually typing the URL. Features: - Shows "🌐 Open Web Display" link in green - Tooltip shows the full URL - Opens in default browser when clicked - Reads host/port from config automatically Location: Status bar, after user name URL format: http://127.0.0.1:8080 (or configured host:port) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 08:58:24 -08:00
Josh Knapp	0ba84e6ddd	Improve transcription accuracy with overlapping audio chunks Changes: 1. Changed UI text from "Recording" to "Transcribing" for clarity 2. Implemented overlapping audio chunks to prevent word cutoff Audio Overlap Feature: - Added overlap_duration parameter (default: 0.5 seconds) - Audio chunks now overlap by 0.5s to capture words at boundaries - Prevents missed words when chunks are processed separately - Configurable via audio.overlap_duration in config.yaml How it works: - Each 3-second chunk includes 0.5s from the previous chunk - Buffer advances by (chunk_size - overlap_size) instead of full chunk - Ensures words at chunk boundaries are captured in at least one chunk - No duplicate transcription due to Whisper's context handling Example with 3s chunks and 0.5s overlap: Chunk 1: [0.0s - 3.0s] Chunk 2: [2.5s - 5.5s] <- 0.5s overlap Chunk 3: [5.0s - 8.0s] <- 0.5s overlap Files modified: - client/audio_capture.py: Implemented overlapping buffer logic - config/default_config.yaml: Added overlap_duration setting - gui/main_window_qt.py: Updated UI text, passed overlap param - main_cli.py: Passed overlap param 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 08:47:19 -08:00
Josh Knapp	472233aec4	Initial commit: Local Transcription App v1.0 Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-25 18:48:23 -08:00

11 Commits