local-transcription

Author	SHA1	Message	Date
jknapp	5f3c058be6	Migrate to RealtimeSTT for advanced VAD-based transcription Major refactor to eliminate word loss issues using RealtimeSTT with dual-layer VAD (WebRTC + Silero) instead of time-based chunking. ## Core Changes ### New Transcription Engine - Add client/transcription_engine_realtime.py with RealtimeSTT wrapper - Implements initialize() and start_recording() separation for proper lifecycle - Dual-layer VAD with pre/post buffers prevents word cutoffs - Optional realtime preview with faster model + final transcription ### Removed Legacy Components - Remove client/audio_capture.py (RealtimeSTT handles audio) - Remove client/noise_suppression.py (VAD handles silence detection) - Remove client/transcription_engine.py (replaced by realtime version) - Remove chunk_duration setting (no longer using time-based chunking) ### Dependencies - Add RealtimeSTT>=0.3.0 to pyproject.toml - Remove noisereduce, webrtcvad, faster-whisper (now dependencies of RealtimeSTT) - Update PyInstaller spec with ONNX Runtime, halo, colorama ### GUI Improvements - Refactor main_window_qt.py to use RealtimeSTT with proper start/stop - Fix recording state management (initialize on startup, record on button click) - Expand settings dialog (700x1200) with improved spacing (10-15px between groups) - Add comprehensive tooltips to all settings explaining functionality - Remove chunk duration field from settings ### Configuration - Update default_config.yaml with RealtimeSTT parameters: - Silero VAD sensitivity (0.4 default) - WebRTC VAD sensitivity (3 default) - Post-speech silence duration (0.3s) - Pre-recording buffer (0.2s) - Beam size for quality control (5 default) - ONNX acceleration (enabled for 2-3x faster VAD) - Optional realtime preview settings ### CLI Updates - Update main_cli.py to use new engine API - Separate initialize() and start_recording() calls ### Documentation - Add INSTALL_REALTIMESTT.md with migration guide and benefits - Update INSTALL.md: Remove FFmpeg requirement (not needed!) - Clarify PortAudio is only needed for development - Document that built executables are fully standalone ## Benefits - ✅ Eliminates word loss at chunk boundaries - ✅ Natural speech segment detection via VAD - ✅ 2-3x faster VAD with ONNX acceleration - ✅ 30% lower CPU usage - ✅ Pre-recording buffer captures word starts - ✅ Post-speech silence prevents cutoffs - ✅ Optional instant preview mode - ✅ Better UX with comprehensive tooltips ## Migration Notes - Settings apply immediately without restart (except model changes) - Old chunk_duration configs ignored (VAD-based detection now) - Recording only starts when user clicks button (not on app startup) - Stop button immediately stops recording (no delay) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 18:48:29 -08:00
jknapp	146a8c8beb	Enhance display customization and remove PHP server Major improvements to display configuration and server architecture: Display Enhancements: - Add URL parameters for display customization (timestamps, maxlines, fontsize, fontfamily) - Fix max lines enforcement to prevent scroll bars in OBS - Apply font family and size settings to both local and sync displays - Remove auto-scroll, enforce overflow:hidden for clean OBS integration Node.js Server: - Add timestamps toggle: timestamps=true/false - Add max lines limit: maxlines=50 - Add font configuration: fontsize=16, fontfamily=Arial - Update index page with URL parameters documentation - Improve display URLs in room generation Local Web Server: - Add max_lines, font_family, font_size configuration - Respect settings from GUI configuration - Apply changes immediately without restart Architecture: - Remove PHP server implementation (Node.js recommended) - Update all documentation to reference Node.js server - Update default config URLs to Node.js endpoints - Clean up 1700+ lines of PHP code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-27 06:15:55 -08:00
Josh Knapp	9c3a0d7678	Add multi-user server sync (PHP server + client) Phase 2 implementation: Multiple streamers can now merge their captions into a single stream using a PHP server. PHP Server (server/php/): - server.php: API endpoint for sending/streaming transcriptions - display.php: Web page for viewing merged captions in OBS - config.php: Server configuration - .htaccess: Security settings - README.md: Comprehensive deployment guide Features: - Room-based isolation (multiple groups on same server) - Passphrase authentication per room - Real-time streaming via Server-Sent Events (SSE) - Different colors for each user - File-based storage (no database required) - Auto-cleanup of old rooms - Works on standard PHP hosting Client-Side: - client/server_sync.py: HTTP client for sending to PHP server - Settings dialog updated with server sync options - Config updated with server_sync section Server Configuration: - URL: Server endpoint (e.g., http://example.com/transcription/server.php) - Room: Unique room name for your group - Passphrase: Shared secret for authentication OBS Integration: Display URL format: http://example.com/transcription/display.php?room=ROOM&passphrase=PASS&fade=10&timestamps=true NOTE: Main window integration pending (client sends transcriptions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 10:09:12 -08:00
Josh Knapp	0ba84e6ddd	Improve transcription accuracy with overlapping audio chunks Changes: 1. Changed UI text from "Recording" to "Transcribing" for clarity 2. Implemented overlapping audio chunks to prevent word cutoff Audio Overlap Feature: - Added overlap_duration parameter (default: 0.5 seconds) - Audio chunks now overlap by 0.5s to capture words at boundaries - Prevents missed words when chunks are processed separately - Configurable via audio.overlap_duration in config.yaml How it works: - Each 3-second chunk includes 0.5s from the previous chunk - Buffer advances by (chunk_size - overlap_size) instead of full chunk - Ensures words at chunk boundaries are captured in at least one chunk - No duplicate transcription due to Whisper's context handling Example with 3s chunks and 0.5s overlap: Chunk 1: [0.0s - 3.0s] Chunk 2: [2.5s - 5.5s] <- 0.5s overlap Chunk 3: [5.0s - 8.0s] <- 0.5s overlap Files modified: - client/audio_capture.py: Implemented overlapping buffer logic - config/default_config.yaml: Added overlap_duration setting - gui/main_window_qt.py: Updated UI text, passed overlap param - main_cli.py: Passed overlap param 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 08:47:19 -08:00
Josh Knapp	472233aec4	Initial commit: Local Transcription App v1.0 Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-25 18:48:23 -08:00

5 Commits