4 Commits

Author SHA1 Message Date
Developer
9ff883e2e3 Phase 6: Add Deepgram remote transcription (managed + BYOK modes)
New files:
- client/deepgram_transcription.py — DeepgramTranscriptionEngine with
  managed mode (proxy) and BYOK mode (direct Deepgram). Sends raw binary
  PCM audio over WebSocket, handles both proxy and Deepgram response formats.

Modified files:
- config/default_config.yaml — Replace remote_processing with new remote
  section (mode, server_url, auth_token, byok_api_key, deepgram_model, language)
- client/config.py — Add migration from old remote_processing config
- gui/settings_dialog_qt.py — Replace Remote Processing group with
  Transcription Mode section (Local/Managed/BYOK radio buttons, login/register
  dialogs, balance display, model selector)
- gui/main_window_qt.py — Select engine based on remote.mode config,
  add error and credits_low handlers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:45:30 -07:00
b7ab57f21f Add auto-update feature with Gitea release checking
- Add UpdateChecker class to query Gitea API for latest releases
- Show update dialog with release notes when new version available
- Open browser to release page for download (handles large files)
- Allow users to skip specific versions or defer updates
- Add "Check for Updates Now" button in settings
- Check automatically on startup (respects 24-hour interval)
- Pre-configured for repo.anhonesthost.net/streamer-tools

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 17:40:13 -08:00
ff067b3368 Add unified per-speaker font support and remote transcription service
Font changes:
- Consolidate font settings into single Display Settings section
- Support Web-Safe, Google Fonts, and Custom File uploads for both displays
- Fix Google Fonts URL encoding (use + instead of %2B for spaces)
- Fix per-speaker font inline style quote escaping in Node.js display
- Add font debug logging to help diagnose font issues
- Update web server to sync all font settings on settings change
- Remove deprecated PHP server documentation files

New features:
- Add remote transcription service for GPU offloading
- Add instance lock to prevent multiple app instances
- Add version tracking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 19:09:57 -08:00
472233aec4 Initial commit: Local Transcription App v1.0
Phase 1 Complete - Standalone Desktop Application

Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)

Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML

Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)

Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-25 18:48:23 -08:00