Add speech-to-text feature using Faster Whisper container
Some checks failed
Build App / compute-version (pull_request) Successful in 3s
Build App / build-macos (pull_request) Successful in 2m28s
Build STT Container / build-stt-container (pull_request) Successful in 3m18s
Build App / build-windows (pull_request) Successful in 4m40s
Build App / build-linux (pull_request) Failing after 1m46s
Build App / create-tag (pull_request) Has been skipped
Build App / sync-to-github (pull_request) Has been skipped

Adds a mic button to the terminal UI that captures speech, transcribes
it via a Faster Whisper sidecar container, and injects the text into
the terminal input. Includes settings panel for model selection
(tiny/small/medium), port config, and container lifecycle management.

- stt-container/: Dockerfile + FastAPI server for Whisper transcription
- Rust backend: STT container management, transcribe_audio IPC command
- Frontend: useSTT hook, SttButton, SttSettings, WAV encoder
- CI: Gitea Actions workflow for multi-arch STT image builds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-12 20:02:39 -07:00
parent 8301fd3690
commit 532de77927
19 changed files with 1121 additions and 2 deletions

View File

@@ -76,6 +76,48 @@ pub struct AppSettings {
pub dismissed_image_digest: Option<String>,
#[serde(default)]
pub web_terminal: WebTerminalSettings,
#[serde(default)]
pub stt: SttSettings,
}
fn default_stt_model() -> String {
"tiny".to_string()
}
fn default_stt_port() -> u16 {
9876
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SttSettings {
#[serde(default)]
pub enabled: bool,
#[serde(default = "default_stt_model")]
pub model: String,
#[serde(default = "default_stt_port")]
pub port: u16,
#[serde(default)]
pub language: Option<String>,
}
impl Default for SttSettings {
fn default() -> Self {
Self {
enabled: false,
model: default_stt_model(),
port: 9876,
language: None,
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SttStatus {
pub container_exists: bool,
pub running: bool,
pub port: u16,
pub model: String,
pub image_exists: bool,
}
fn default_web_terminal_port() -> u16 {
@@ -120,6 +162,7 @@ impl Default for AppSettings {
default_microphone: None,
dismissed_image_digest: None,
web_terminal: WebTerminalSettings::default(),
stt: SttSettings::default(),
}
}
}