local-transcription

Author	SHA1	Message	Date
Developer	ef188e1f67	Fix managed mode WebSocket URL when server_url uses https:// All checks were successful Tests / Python Backend Tests (push) Successful in 5s Details Tests / Frontend Tests (push) Successful in 7s Details Tests / Rust Sidecar Tests (push) Successful in 2m21s Details The URL builder was prepending wss:// to the full https:// URL, producing an invalid wss://https://... URL. Now properly converts https→wss and http→ws before appending the /ws/transcribe path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 20:03:58 -07:00
Developer	352615c15c	Fix Deepgram broken pipe: wait for WebSocket before starting audio All checks were successful Tests / Python Backend Tests (push) Successful in 5s Details Tests / Frontend Tests (push) Successful in 8s Details Tests / Rust Sidecar Tests (push) Successful in 2m0s Details Audio capture started immediately after spawning the WebSocket thread, but the WebSocket hadn't connected yet. Audio chunks sent to the unconnected WebSocket caused a broken pipe error. Fix: added a threading.Event that start_recording() waits on (up to 15s) before opening the audio stream. The event is set in _ws_lifecycle after the WebSocket connects and handshake completes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 12:18:47 -07:00
Developer	a3bcc5bee5	Show transcription start errors in UI, improve error logging All checks were successful Tests / Python Backend Tests (push) Successful in 5s Details Tests / Frontend Tests (push) Successful in 8s Details Tests / Rust Sidecar Tests (push) Successful in 2m5s Details Start Transcription button now shows the error message when it fails instead of silently reverting. Common causes: - Missing PortAudio library on Linux - Audio device not accessible - Deepgram connection failure Also added error details to backend console output and captured the last error from the Deepgram engine for better diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 12:15:43 -07:00
Developer	3d3d7ec3c5	Add cloud-only sidecar variant (~50MB vs 500MB-2GB) All checks were successful Tests / Python Backend Tests (push) Successful in 6s Details Tests / Frontend Tests (push) Successful in 7s Details Tests / Rust Sidecar Tests (push) Successful in 1m59s Details Lightweight Deepgram-only sidecar that excludes PyTorch, faster-whisper, RealtimeSTT, and CUDA. Only includes audio capture + WebSocket streaming to Deepgram. Requires a Deepgram API key (BYOK or managed mode). Changes: - client/models.py: Extracted TranscriptionResult into standalone module so deepgram_transcription.py doesn't transitively import torch - backend/app_controller.py: Made RealtimeTranscriptionEngine and DeviceManager imports lazy (only loaded when remote.mode == "local") - local-transcription-cloud.spec: PyInstaller spec excluding all ML deps - SidecarSetup.svelte: Added "Cloud Only (Deepgram)" variant option - build-sidecar-cloud.yml: CI workflow building cloud sidecar for all 3 OS - sidecar-release.yml: Dispatches cloud build alongside CPU/CUDA builds Sidecar download options are now: - Standard (CPU): ~500 MB - local Whisper on any computer - GPU Accelerated (CUDA): ~2 GB - local Whisper with NVIDIA GPU - Cloud Only (Deepgram): ~50 MB - requires API key, no local models Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:57:43 -07:00
Developer	9dcb14e92c	Fix Deepgram streaming latency All checks were successful Tests / Python Backend Tests (push) Successful in 5s Details Tests / Frontend Tests (push) Successful in 9s Details Tests / Rust Sidecar Tests (push) Successful in 2m5s Details Three changes to reduce transcription delay: 1. Send loop: queue.get() was blocking the asyncio event loop, stalling the receive loop and delaying transcription results. Now uses run_in_executor() to avoid blocking the event loop. 2. Block size: reduced from 4096 (~256ms) to 1024 (~64ms) for more frequent, smaller audio chunks. Deepgram handles streaming better with smaller packets. 3. Added punctuate=true and smart_format=true to Deepgram BYOK params for cleaner transcription output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:31:50 -07:00
Developer	9ff883e2e3	Phase 6: Add Deepgram remote transcription (managed + BYOK modes) New files: - client/deepgram_transcription.py — DeepgramTranscriptionEngine with managed mode (proxy) and BYOK mode (direct Deepgram). Sends raw binary PCM audio over WebSocket, handles both proxy and Deepgram response formats. Modified files: - config/default_config.yaml — Replace remote_processing with new remote section (mode, server_url, auth_token, byok_api_key, deepgram_model, language) - client/config.py — Add migration from old remote_processing config - gui/settings_dialog_qt.py — Replace Remote Processing group with Transcription Mode section (Local/Managed/BYOK radio buttons, login/register dialogs, balance display, model selector) - gui/main_window_qt.py — Select engine based on remote.mode config, add error and credits_low handlers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 11:45:30 -07:00

6 Commits