Commit Graph

6 Commits

Author SHA1 Message Date
Developer
ef188e1f67 Fix managed mode WebSocket URL when server_url uses https://
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 2m21s
The URL builder was prepending wss:// to the full https:// URL, producing
an invalid wss://https://... URL. Now properly converts https→wss and
http→ws before appending the /ws/transcribe path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:03:58 -07:00
Developer
352615c15c Fix Deepgram broken pipe: wait for WebSocket before starting audio
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m0s
Audio capture started immediately after spawning the WebSocket thread,
but the WebSocket hadn't connected yet. Audio chunks sent to the
unconnected WebSocket caused a broken pipe error.

Fix: added a threading.Event that start_recording() waits on (up to
15s) before opening the audio stream. The event is set in _ws_lifecycle
after the WebSocket connects and handshake completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:18:47 -07:00
Developer
a3bcc5bee5 Show transcription start errors in UI, improve error logging
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m5s
Start Transcription button now shows the error message when it fails
instead of silently reverting. Common causes:
- Missing PortAudio library on Linux
- Audio device not accessible
- Deepgram connection failure

Also added error details to backend console output and captured
the last error from the Deepgram engine for better diagnostics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:15:43 -07:00
Developer
3d3d7ec3c5 Add cloud-only sidecar variant (~50MB vs 500MB-2GB)
All checks were successful
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 1m59s
Lightweight Deepgram-only sidecar that excludes PyTorch, faster-whisper,
RealtimeSTT, and CUDA. Only includes audio capture + WebSocket streaming
to Deepgram. Requires a Deepgram API key (BYOK or managed mode).

Changes:
- client/models.py: Extracted TranscriptionResult into standalone module
  so deepgram_transcription.py doesn't transitively import torch
- backend/app_controller.py: Made RealtimeTranscriptionEngine and
  DeviceManager imports lazy (only loaded when remote.mode == "local")
- local-transcription-cloud.spec: PyInstaller spec excluding all ML deps
- SidecarSetup.svelte: Added "Cloud Only (Deepgram)" variant option
- build-sidecar-cloud.yml: CI workflow building cloud sidecar for all 3 OS
- sidecar-release.yml: Dispatches cloud build alongside CPU/CUDA builds

Sidecar download options are now:
- Standard (CPU): ~500 MB - local Whisper on any computer
- GPU Accelerated (CUDA): ~2 GB - local Whisper with NVIDIA GPU
- Cloud Only (Deepgram): ~50 MB - requires API key, no local models

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:57:43 -07:00
Developer
9dcb14e92c Fix Deepgram streaming latency
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 9s
Tests / Rust Sidecar Tests (push) Successful in 2m5s
Three changes to reduce transcription delay:

1. Send loop: queue.get() was blocking the asyncio event loop, stalling
   the receive loop and delaying transcription results. Now uses
   run_in_executor() to avoid blocking the event loop.

2. Block size: reduced from 4096 (~256ms) to 1024 (~64ms) for more
   frequent, smaller audio chunks. Deepgram handles streaming better
   with smaller packets.

3. Added punctuate=true and smart_format=true to Deepgram BYOK
   params for cleaner transcription output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:31:50 -07:00
Developer
9ff883e2e3 Phase 6: Add Deepgram remote transcription (managed + BYOK modes)
New files:
- client/deepgram_transcription.py — DeepgramTranscriptionEngine with
  managed mode (proxy) and BYOK mode (direct Deepgram). Sends raw binary
  PCM audio over WebSocket, handles both proxy and Deepgram response formats.

Modified files:
- config/default_config.yaml — Replace remote_processing with new remote
  section (mode, server_url, auth_token, byok_api_key, deepgram_model, language)
- client/config.py — Add migration from old remote_processing config
- gui/settings_dialog_qt.py — Replace Remote Processing group with
  Transcription Mode section (Local/Managed/BYOK radio buttons, login/register
  dialogs, balance display, model selector)
- gui/main_window_qt.py — Select engine based on remote.mode config,
  add error and credits_low handlers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:45:30 -07:00