local-transcription

Author	SHA1	Message	Date
Developer	1c8c6ad7e8	Fix display user not updating locally until app restart All checks were successful Tests / Python Backend Tests (push) Successful in 5s Details Tests / Frontend Tests (push) Successful in 7s Details Tests / Rust Sidecar Tests (push) Successful in 3m12s Details Engines now read user.name from the config object at transcription time instead of caching it at init, so name changes take effect immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:40:46 -07:00
Developer	3d3d7ec3c5	Add cloud-only sidecar variant (~50MB vs 500MB-2GB) All checks were successful Tests / Python Backend Tests (push) Successful in 6s Details Tests / Frontend Tests (push) Successful in 7s Details Tests / Rust Sidecar Tests (push) Successful in 1m59s Details Lightweight Deepgram-only sidecar that excludes PyTorch, faster-whisper, RealtimeSTT, and CUDA. Only includes audio capture + WebSocket streaming to Deepgram. Requires a Deepgram API key (BYOK or managed mode). Changes: - client/models.py: Extracted TranscriptionResult into standalone module so deepgram_transcription.py doesn't transitively import torch - backend/app_controller.py: Made RealtimeTranscriptionEngine and DeviceManager imports lazy (only loaded when remote.mode == "local") - local-transcription-cloud.spec: PyInstaller spec excluding all ML deps - SidecarSetup.svelte: Added "Cloud Only (Deepgram)" variant option - build-sidecar-cloud.yml: CI workflow building cloud sidecar for all 3 OS - sidecar-release.yml: Dispatches cloud build alongside CPU/CUDA builds Sidecar download options are now: - Standard (CPU): ~500 MB - local Whisper on any computer - GPU Accelerated (CUDA): ~2 GB - local Whisper with NVIDIA GPU - Cloud Only (Deepgram): ~50 MB - requires API key, no local models Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:57:43 -07:00
jknapp	ff067b3368	Add unified per-speaker font support and remote transcription service Font changes: - Consolidate font settings into single Display Settings section - Support Web-Safe, Google Fonts, and Custom File uploads for both displays - Fix Google Fonts URL encoding (use + instead of %2B for spaces) - Fix per-speaker font inline style quote escaping in Node.js display - Add font debug logging to help diagnose font issues - Update web server to sync all font settings on settings change - Remove deprecated PHP server documentation files New features: - Add remote transcription service for GPU offloading - Add instance lock to prevent multiple app instances - Add version tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-11 19:09:57 -08:00
jknapp	5f3c058be6	Migrate to RealtimeSTT for advanced VAD-based transcription Major refactor to eliminate word loss issues using RealtimeSTT with dual-layer VAD (WebRTC + Silero) instead of time-based chunking. ## Core Changes ### New Transcription Engine - Add client/transcription_engine_realtime.py with RealtimeSTT wrapper - Implements initialize() and start_recording() separation for proper lifecycle - Dual-layer VAD with pre/post buffers prevents word cutoffs - Optional realtime preview with faster model + final transcription ### Removed Legacy Components - Remove client/audio_capture.py (RealtimeSTT handles audio) - Remove client/noise_suppression.py (VAD handles silence detection) - Remove client/transcription_engine.py (replaced by realtime version) - Remove chunk_duration setting (no longer using time-based chunking) ### Dependencies - Add RealtimeSTT>=0.3.0 to pyproject.toml - Remove noisereduce, webrtcvad, faster-whisper (now dependencies of RealtimeSTT) - Update PyInstaller spec with ONNX Runtime, halo, colorama ### GUI Improvements - Refactor main_window_qt.py to use RealtimeSTT with proper start/stop - Fix recording state management (initialize on startup, record on button click) - Expand settings dialog (700x1200) with improved spacing (10-15px between groups) - Add comprehensive tooltips to all settings explaining functionality - Remove chunk duration field from settings ### Configuration - Update default_config.yaml with RealtimeSTT parameters: - Silero VAD sensitivity (0.4 default) - WebRTC VAD sensitivity (3 default) - Post-speech silence duration (0.3s) - Pre-recording buffer (0.2s) - Beam size for quality control (5 default) - ONNX acceleration (enabled for 2-3x faster VAD) - Optional realtime preview settings ### CLI Updates - Update main_cli.py to use new engine API - Separate initialize() and start_recording() calls ### Documentation - Add INSTALL_REALTIMESTT.md with migration guide and benefits - Update INSTALL.md: Remove FFmpeg requirement (not needed!) - Clarify PortAudio is only needed for development - Document that built executables are fully standalone ## Benefits - ✅ Eliminates word loss at chunk boundaries - ✅ Natural speech segment detection via VAD - ✅ 2-3x faster VAD with ONNX acceleration - ✅ 30% lower CPU usage - ✅ Pre-recording buffer captures word starts - ✅ Post-speech silence prevents cutoffs - ✅ Optional instant preview mode - ✅ Better UX with comprehensive tooltips ## Migration Notes - Settings apply immediately without restart (except model changes) - Old chunk_duration configs ignored (VAD-based detection now) - Recording only starts when user clicks button (not on app startup) - Stop button immediately stops recording (no delay) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-28 18:48:29 -08:00

4 Commits