Add speech-to-text via Faster Whisper container #1

Merged
jknapp merged 3 commits from feature/stt into main 2026-04-13 03:35:40 +00:00
Owner

Summary

  • Adds a mic button to the terminal UI for speech-to-text input via a Faster Whisper sidecar container
  • New stt-container/ with FastAPI transcription server (Dockerfile + server.py)
  • Rust backend: STT container lifecycle management + transcribe_audio IPC command (proxied via reqwest multipart)
  • Frontend: useSTT hook, SttButton (floating overlay), SttSettings panel, WAV encoder utility
  • Gitea Actions CI workflow for multi-arch (amd64/arm64) STT image builds, dual-push to Gitea + GHCR

How it works

  1. User enables STT in Settings and pulls/builds the STT container image
  2. Clicking the mic button in the terminal starts recording (16kHz mono PCM via AudioWorklet)
  3. Clicking again stops recording, encodes to WAV, sends through Tauri IPC to Rust backend
  4. Rust backend POSTs the WAV to the Whisper container's /transcribe endpoint
  5. Transcribed text is injected into the terminal as if typed

Risk assessment

  • Zero impact on existing users — STT is disabled by default and requires explicit opt-in (Settings toggle + image pull)
  • No changes to existing terminal I/O, voice mode, or container management
  • STT container binds to 127.0.0.1 only (not exposed to network)
  • Model cache persisted via named Docker volume (triple-c-stt-model-cache)

Test plan

  • Build STT container locally: docker build -t triple-c-stt ./stt-container
  • Run and test transcription endpoint: docker run -p 9876:9876 -e WHISPER_MODEL=tiny triple-c-stt then curl -F 'file=@test.wav' http://localhost:9876/transcribe
  • Enable STT in Settings → pull image → start container
  • Click mic button in terminal → speak → verify transcribed text appears
  • Verify existing voice mode (/voice) still works independently
  • Verify app startup/shutdown with STT container running (cleanup on close)
  • Test model switching (tiny → small) triggers container recreation

🤖 Generated with Claude Code

## Summary - Adds a **mic button** to the terminal UI for speech-to-text input via a Faster Whisper sidecar container - New `stt-container/` with FastAPI transcription server (Dockerfile + `server.py`) - Rust backend: STT container lifecycle management + `transcribe_audio` IPC command (proxied via reqwest multipart) - Frontend: `useSTT` hook, `SttButton` (floating overlay), `SttSettings` panel, WAV encoder utility - Gitea Actions CI workflow for multi-arch (amd64/arm64) STT image builds, dual-push to Gitea + GHCR ## How it works 1. User enables STT in Settings and pulls/builds the STT container image 2. Clicking the mic button in the terminal starts recording (16kHz mono PCM via AudioWorklet) 3. Clicking again stops recording, encodes to WAV, sends through Tauri IPC to Rust backend 4. Rust backend POSTs the WAV to the Whisper container's `/transcribe` endpoint 5. Transcribed text is injected into the terminal as if typed ## Risk assessment - **Zero impact on existing users** — STT is disabled by default and requires explicit opt-in (Settings toggle + image pull) - No changes to existing terminal I/O, voice mode, or container management - STT container binds to `127.0.0.1` only (not exposed to network) - Model cache persisted via named Docker volume (`triple-c-stt-model-cache`) ## Test plan - [ ] Build STT container locally: `docker build -t triple-c-stt ./stt-container` - [ ] Run and test transcription endpoint: `docker run -p 9876:9876 -e WHISPER_MODEL=tiny triple-c-stt` then `curl -F 'file=@test.wav' http://localhost:9876/transcribe` - [ ] Enable STT in Settings → pull image → start container - [ ] Click mic button in terminal → speak → verify transcribed text appears - [ ] Verify existing voice mode (`/voice`) still works independently - [ ] Verify app startup/shutdown with STT container running (cleanup on close) - [ ] Test model switching (tiny → small) triggers container recreation 🤖 Generated with [Claude Code](https://claude.com/claude-code)
jknapp added 1 commit 2026-04-13 03:04:16 +00:00
Add speech-to-text feature using Faster Whisper container
Some checks failed
Build App / compute-version (pull_request) Successful in 3s
Build App / build-macos (pull_request) Successful in 2m28s
Build STT Container / build-stt-container (pull_request) Successful in 3m18s
Build App / build-windows (pull_request) Successful in 4m40s
Build App / build-linux (pull_request) Failing after 1m46s
Build App / create-tag (pull_request) Has been skipped
Build App / sync-to-github (pull_request) Has been skipped
532de77927
Adds a mic button to the terminal UI that captures speech, transcribes
it via a Faster Whisper sidecar container, and injects the text into
the terminal input. Includes settings panel for model selection
(tiny/small/medium), port config, and container lifecycle management.

- stt-container/: Dockerfile + FastAPI server for Whisper transcription
- Rust backend: STT container management, transcribe_audio IPC command
- Frontend: useSTT hook, SttButton, SttSettings, WAV encoder
- CI: Gitea Actions workflow for multi-arch STT image builds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jknapp added 1 commit 2026-04-13 03:20:21 +00:00
Fix tauri-plugin-dialog version mismatch (2.6.0 → 2.7.0)
Some checks failed
Build App / compute-version (pull_request) Successful in 2s
Build App / build-macos (pull_request) Failing after 6s
Build STT Container / build-stt-container (pull_request) Successful in 12s
Build App / build-windows (pull_request) Failing after 24s
Build App / build-linux (pull_request) Successful in 4m50s
Build App / create-tag (pull_request) Has been skipped
Build App / sync-to-github (pull_request) Has been skipped
765ba91d7b
Cargo had resolved to 2.6.0 while npm had 2.7.0, causing the Tauri
build version check to fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jknapp added 1 commit 2026-04-13 03:28:10 +00:00
Update @tauri-apps/plugin-dialog npm package to 2.7.0
All checks were successful
Build App / compute-version (pull_request) Successful in 4s
Build STT Container / build-stt-container (pull_request) Successful in 14s
Build App / build-macos (pull_request) Successful in 2m23s
Build App / build-windows (pull_request) Successful in 4m5s
Build App / build-linux (pull_request) Successful in 4m38s
Build App / create-tag (pull_request) Has been skipped
Build App / sync-to-github (pull_request) Has been skipped
caf3e26816
Aligns the npm lockfile with the Cargo crate version to fix the Tauri
build version mismatch check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jknapp merged commit 702ebb7247 into main 2026-04-13 03:35:40 +00:00
jknapp deleted branch feature/stt 2026-04-13 03:35:47 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: CyberCoveLLC/Triple-C#1