Add speech-to-text via Faster Whisper container #1

2026-04-13T03:04:16Z

jknapp commented

2026-04-13 03:04:16 +00:00

Summary

Adds a mic button to the terminal UI for speech-to-text input via a Faster Whisper sidecar container
New stt-container/ with FastAPI transcription server (Dockerfile + server.py)
Rust backend: STT container lifecycle management + transcribe_audio IPC command (proxied via reqwest multipart)
Frontend: useSTT hook, SttButton (floating overlay), SttSettings panel, WAV encoder utility
Gitea Actions CI workflow for multi-arch (amd64/arm64) STT image builds, dual-push to Gitea + GHCR

How it works

User enables STT in Settings and pulls/builds the STT container image
Clicking the mic button in the terminal starts recording (16kHz mono PCM via AudioWorklet)
Clicking again stops recording, encodes to WAV, sends through Tauri IPC to Rust backend
Rust backend POSTs the WAV to the Whisper container's /transcribe endpoint
Transcribed text is injected into the terminal as if typed

Risk assessment

Zero impact on existing users — STT is disabled by default and requires explicit opt-in (Settings toggle + image pull)
No changes to existing terminal I/O, voice mode, or container management
STT container binds to 127.0.0.1 only (not exposed to network)
Model cache persisted via named Docker volume (triple-c-stt-model-cache)

Test plan

Build STT container locally: docker build -t triple-c-stt ./stt-container
Run and test transcription endpoint: docker run -p 9876:9876 -e WHISPER_MODEL=tiny triple-c-stt then curl -F 'file=@test.wav' http://localhost:9876/transcribe
Enable STT in Settings → pull image → start container
Click mic button in terminal → speak → verify transcribed text appears
Verify existing voice mode (/voice) still works independently
Verify app startup/shutdown with STT container running (cleanup on close)
Test model switching (tiny → small) triggers container recreation

🤖 Generated with Claude Code

## Summary - Adds a **mic button** to the terminal UI for speech-to-text input via a Faster Whisper sidecar container - New `stt-container/` with FastAPI transcription server (Dockerfile + `server.py`) - Rust backend: STT container lifecycle management + `transcribe_audio` IPC command (proxied via reqwest multipart) - Frontend: `useSTT` hook, `SttButton` (floating overlay), `SttSettings` panel, WAV encoder utility - Gitea Actions CI workflow for multi-arch (amd64/arm64) STT image builds, dual-push to Gitea + GHCR ## How it works 1. User enables STT in Settings and pulls/builds the STT container image 2. Clicking the mic button in the terminal starts recording (16kHz mono PCM via AudioWorklet) 3. Clicking again stops recording, encodes to WAV, sends through Tauri IPC to Rust backend 4. Rust backend POSTs the WAV to the Whisper container's `/transcribe` endpoint 5. Transcribed text is injected into the terminal as if typed ## Risk assessment - **Zero impact on existing users** — STT is disabled by default and requires explicit opt-in (Settings toggle + image pull) - No changes to existing terminal I/O, voice mode, or container management - STT container binds to `127.0.0.1` only (not exposed to network) - Model cache persisted via named Docker volume (`triple-c-stt-model-cache`) ## Test plan - [ ] Build STT container locally: `docker build -t triple-c-stt ./stt-container` - [ ] Run and test transcription endpoint: `docker run -p 9876:9876 -e WHISPER_MODEL=tiny triple-c-stt` then `curl -F 'file=@test.wav' http://localhost:9876/transcribe` - [ ] Enable STT in Settings → pull image → start container - [ ] Click mic button in terminal → speak → verify transcribed text appears - [ ] Verify existing voice mode (`/voice`) still works independently - [ ] Verify app startup/shutdown with STT container running (cleanup on close) - [ ] Test model switching (tiny → small) triggers container recreation 🤖 Generated with [Claude Code](https://claude.com/claude-code)

jknapp added 1 commit 2026-04-13 03:04:16 +00:00

Add speech-to-text feature using Faster Whisper container

Build App / compute-version (pull_request) Successful in 3s

Details

Build App / build-macos (pull_request) Successful in 2m28s

Details

Build STT Container / build-stt-container (pull_request) Successful in 3m18s

Details

Build App / build-windows (pull_request) Successful in 4m40s

Details

Build App / build-linux (pull_request) Failing after 1m46s

Details

Build App / create-tag (pull_request) Has been skipped

Details

Build App / sync-to-github (pull_request) Has been skipped

Details

532de77927

Adds a mic button to the terminal UI that captures speech, transcribes
it via a Faster Whisper sidecar container, and injects the text into
the terminal input. Includes settings panel for model selection
(tiny/small/medium), port config, and container lifecycle management.

- stt-container/: Dockerfile + FastAPI server for Whisper transcription
- Rust backend: STT container management, transcribe_audio IPC command
- Frontend: useSTT hook, SttButton, SttSettings, WAV encoder
- CI: Gitea Actions workflow for multi-arch STT image builds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jknapp added 1 commit 2026-04-13 03:20:21 +00:00

Fix tauri-plugin-dialog version mismatch (2.6.0 → 2.7.0)

Build App / compute-version (pull_request) Successful in 2s

Details

Build App / build-macos (pull_request) Failing after 6s

Details

Build STT Container / build-stt-container (pull_request) Successful in 12s

Details

Build App / build-windows (pull_request) Failing after 24s

Details

Build App / build-linux (pull_request) Successful in 4m50s

Details

Build App / create-tag (pull_request) Has been skipped

Details

Build App / sync-to-github (pull_request) Has been skipped

Details

765ba91d7b

Cargo had resolved to 2.6.0 while npm had 2.7.0, causing the Tauri
build version check to fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jknapp added 1 commit 2026-04-13 03:28:10 +00:00

Update @tauri-apps/plugin-dialog npm package to 2.7.0

Build App / compute-version (pull_request) Successful in 4s

Details

Build STT Container / build-stt-container (pull_request) Successful in 14s

Details

Build App / build-macos (pull_request) Successful in 2m23s

Details

Build App / build-windows (pull_request) Successful in 4m5s

Details

Build App / build-linux (pull_request) Successful in 4m38s

Details

Build App / create-tag (pull_request) Has been skipped

Details

Build App / sync-to-github (pull_request) Has been skipped

Details

caf3e26816

Aligns the npm lockfile with the Cargo crate version to fix the Tauri
build version mismatch check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jknapp merged commit 702ebb7247 into main

2026-04-13 03:35:40 +00:00

jknapp referenced this issue from a commit

2026-04-13 03:35:41 +00:00

Merge pull request 'Add speech-to-text via Faster Whisper container' (#1) from feature/stt into main

jknapp deleted branch feature/stt

2026-04-13 03:35:47 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: CyberCoveLLC/Triple-C#1