local-transcription

Author	SHA1	Message	Date
Developer	a3bcc5bee5	Show transcription start errors in UI, improve error logging All checks were successful Tests / Python Backend Tests (push) Successful in 5s Details Tests / Frontend Tests (push) Successful in 8s Details Tests / Rust Sidecar Tests (push) Successful in 2m5s Details Start Transcription button now shows the error message when it fails instead of silently reverting. Common causes: - Missing PortAudio library on Linux - Audio device not accessible - Deepgram connection failure Also added error details to backend console output and captured the last error from the Deepgram engine for better diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 12:15:43 -07:00
Developer	3d3d7ec3c5	Add cloud-only sidecar variant (~50MB vs 500MB-2GB) All checks were successful Tests / Python Backend Tests (push) Successful in 6s Details Tests / Frontend Tests (push) Successful in 7s Details Tests / Rust Sidecar Tests (push) Successful in 1m59s Details Lightweight Deepgram-only sidecar that excludes PyTorch, faster-whisper, RealtimeSTT, and CUDA. Only includes audio capture + WebSocket streaming to Deepgram. Requires a Deepgram API key (BYOK or managed mode). Changes: - client/models.py: Extracted TranscriptionResult into standalone module so deepgram_transcription.py doesn't transitively import torch - backend/app_controller.py: Made RealtimeTranscriptionEngine and DeviceManager imports lazy (only loaded when remote.mode == "local") - local-transcription-cloud.spec: PyInstaller spec excluding all ML deps - SidecarSetup.svelte: Added "Cloud Only (Deepgram)" variant option - build-sidecar-cloud.yml: CI workflow building cloud sidecar for all 3 OS - sidecar-release.yml: Dispatches cloud build alongside CPU/CUDA builds Sidecar download options are now: - Standard (CPU): ~500 MB - local Whisper on any computer - GPU Accelerated (CUDA): ~2 GB - local Whisper with NVIDIA GPU - Cloud Only (Deepgram): ~50 MB - requires API key, no local models Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:57:43 -07:00
Developer	9dcb14e92c	Fix Deepgram streaming latency All checks were successful Tests / Python Backend Tests (push) Successful in 5s Details Tests / Frontend Tests (push) Successful in 9s Details Tests / Rust Sidecar Tests (push) Successful in 2m5s Details Three changes to reduce transcription delay: 1. Send loop: queue.get() was blocking the asyncio event loop, stalling the receive loop and delaying transcription results. Now uses run_in_executor() to avoid blocking the event loop. 2. Block size: reduced from 4096 (~256ms) to 1024 (~64ms) for more frequent, smaller audio chunks. Deepgram handles streaming better with smaller packets. 3. Added punctuate=true and smart_format=true to Deepgram BYOK params for cleaner transcription output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:31:50 -07:00
Developer	9ff883e2e3	Phase 6: Add Deepgram remote transcription (managed + BYOK modes) New files: - client/deepgram_transcription.py — DeepgramTranscriptionEngine with managed mode (proxy) and BYOK mode (direct Deepgram). Sends raw binary PCM audio over WebSocket, handles both proxy and Deepgram response formats. Modified files: - config/default_config.yaml — Replace remote_processing with new remote section (mode, server_url, auth_token, byok_api_key, deepgram_model, language) - client/config.py — Add migration from old remote_processing config - gui/settings_dialog_qt.py — Replace Remote Processing group with Transcription Mode section (Local/Managed/BYOK radio buttons, login/register dialogs, balance display, model selector) - gui/main_window_qt.py — Select engine based on remote.mode config, add error and credits_low handlers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 11:45:30 -07:00

4 Commits