New files:
- client/deepgram_transcription.py — DeepgramTranscriptionEngine with
managed mode (proxy) and BYOK mode (direct Deepgram). Sends raw binary
PCM audio over WebSocket, handles both proxy and Deepgram response formats.
Modified files:
- config/default_config.yaml — Replace remote_processing with new remote
section (mode, server_url, auth_token, byok_api_key, deepgram_model, language)
- client/config.py — Add migration from old remote_processing config
- gui/settings_dialog_qt.py — Replace Remote Processing group with
Transcription Mode section (Local/Managed/BYOK radio buttons, login/register
dialogs, balance display, model selector)
- gui/main_window_qt.py — Select engine based on remote.mode config,
add error and credits_low handlers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add UpdateChecker class to query Gitea API for latest releases
- Show update dialog with release notes when new version available
- Open browser to release page for download (handles large files)
- Allow users to skip specific versions or defer updates
- Add "Check for Updates Now" button in settings
- Check automatically on startup (respects 24-hour interval)
- Pre-configured for repo.anhonesthost.net/streamer-tools
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Font changes:
- Consolidate font settings into single Display Settings section
- Support Web-Safe, Google Fonts, and Custom File uploads for both displays
- Fix Google Fonts URL encoding (use + instead of %2B for spaces)
- Fix per-speaker font inline style quote escaping in Node.js display
- Add font debug logging to help diagnose font issues
- Update web server to sync all font settings on settings change
- Remove deprecated PHP server documentation files
New features:
- Add remote transcription service for GPU offloading
- Add instance lock to prevent multiple app instances
- Add version tracking
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 1 Complete - Standalone Desktop Application
Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)
Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML
Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)
Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>