Compare commits

..

57 Commits

Author SHA1 Message Date
Gitea Actions
34a165fc05 chore: bump version to 2.0.16 [skip ci] 2026-04-11 02:15:32 +00:00
Developer
8f4e5cc099 Default managed mode to transcribe.shadowdao.com and simplify login UI
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m4s
- Set default server_url to https://transcribe.shadowdao.com
- Remove Server URL field from managed mode settings (users don't need to configure it)
- Replace Register button with link to website signup page
- Add fallback to default URL in login handler for existing users with empty config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:12:50 -07:00
Developer
16f9ac2ab8 Add code signing config for Windows (Azure Artifact Signing) and macOS (Apple notarization)
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 1m58s
CI workflows now support code signing when secrets are configured:
- macOS: Apple Developer certificate + App Store Connect API key for notarization
- Windows: Azure Artifact Signing via signtool + dlib
- Both are no-ops when secrets aren't set (backwards-compatible)
- Add Entitlements.plist (mic, network) and Info.plist (NSMicrophoneUsageDescription)
- Add SIGNING.md with full setup guide for both platforms

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:02:46 -07:00
Developer
cd325102e2 Update docs for cloud-first UX and shared captions
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 2m13s
- README: document cloud-first quick start, shared captions workflow
  (create room, join via share code, share existing room), and
  self-hosting option
- README: update default remote.mode from local to byok in config table
- CLAUDE.md: reflect cloud-first default, settings gating, and shared
  captions features

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:10:46 -07:00
Gitea Actions
d220158dd7 chore: bump version to 2.0.15 [skip ci] 2026-04-10 19:38:00 +00:00
Developer
8670e19acc Add "Share Current Room" button to copy existing room config as share code
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 1m58s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 12:26:29 -07:00
Gitea Actions
812cc4ac5e chore: bump version to 2.0.14 [skip ci] 2026-04-10 19:15:02 +00:00
Developer
4aa19eee86 Fix test: align remote.mode in no-reload settings test
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 1m59s
The default remote.mode changed from 'local' to 'byok', causing
the apply_settings test to detect a mode mismatch and trigger an
unexpected engine reload. Pin remote.mode to 'local' in the test
to match the controller's assumed current mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 12:01:11 -07:00
Developer
b8dfe0f1ba Cloud-first UX: default to Deepgram, gate start button, add room sharing
Some checks failed
Tests / Python Backend Tests (push) Failing after 6s
Tests / Frontend Tests (push) Successful in 9s
Tests / Rust Sidecar Tests (push) Successful in 2m1s
- Change default transcription mode from local to byok (cloud/Deepgram)
- Move Transcription Mode selector to top of settings for visibility
- Hide local-only settings (model, VAD, timing) when cloud mode selected
- Disable Start button until API key (byok) or login (managed) is configured
- Add room creation and share code flow to Shared Captions section
- Add POST /api/create-room endpoint to Node.js sync server
- Update default sync URL placeholder to caption.shadowdao.com

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 11:58:49 -07:00
Gitea Actions
5837b97a20 chore: bump sidecar version to 1.0.11 [skip ci] 2026-04-08 21:15:05 +00:00
Gitea Actions
ab09a3e9da chore: bump version to 2.0.13 [skip ci] 2026-04-08 21:09:40 +00:00
Developer
5343a28a08 Bundle sounddevice PortAudio library in sidecar builds
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m4s
On macOS, sounddevice ships its own PortAudio dylib in the
_sounddevice_data directory. PyInstaller wasn't collecting it,
causing "Error querying device -1" when the sidecar tried to
open an audio stream.

Added data collection for _sounddevice_data in both cloud and
headless PyInstaller specs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:07:04 -07:00
Developer
f0bf026133 Handle ExitRequested to stop sidecar on macOS Cmd+Q
All checks were successful
Tests / Python Backend Tests (push) Successful in 4s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 1m56s
On macOS, Cmd+Q triggers ExitRequested before Exit. If the app is
force-quit or closed via Cmd+Q, the Exit event may not fire,
leaving the sidecar process orphaned with ports 8080/8081 in use.

Now handles both ExitRequested and Exit to ensure the sidecar is
always stopped when the app closes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:02:25 -07:00
Developer
37a029d1c6 Show app version from Tauri instead of sidecar
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 1m59s
The version label was reading from backendStore.version which comes
from the sidecar's version.py (hardcoded at build time). Now uses
Tauri's getVersion() API which reads from tauri.conf.json -- the
actual app version that gets bumped by the release workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:45:53 -07:00
Gitea Actions
5ec030387f chore: bump sidecar version to 1.0.10 [skip ci] 2026-04-08 20:27:00 +00:00
Gitea Actions
4d9bdba903 chore: bump version to 2.0.12 [skip ci] 2026-04-08 20:22:08 +00:00
Gitea Actions
a7a3bcd102 chore: bump version to 2.0.11 [skip ci] 2026-04-08 20:10:12 +00:00
Developer
115d93482a Always poll status after start/stop, even on API error
All checks were successful
Tests / Python Backend Tests (push) Successful in 4s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 1m55s
When apiPost throws (e.g. 400 "Already transcribing"), pollStatus
never ran because it was in the try block. The button stayed stuck
on "Start" even though transcription was running.

Moved pollStatus to the finally block so it always syncs the UI
with actual backend state. Also suppresses the error message for
400 responses since they just mean the state is already correct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:03:34 -07:00
Developer
fb672cbaef Update Cargo.lock
Some checks failed
Tests / Python Backend Tests (push) Successful in 4s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:02:03 -07:00
Gitea Actions
d8c79be094 chore: bump sidecar version to 1.0.9 [skip ci] 2026-04-08 19:52:47 +00:00
Developer
2811f5bb9c Fix release workflow false failure on successful dispatch
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 1m59s
The line [ "\$HTTP_CODE" != "204" ] && cat ... returns exit code 1
when the condition is false (all dispatches succeeded). Since it
was the last command in the loop, the step reported failure.
Changed to if/then/fi which doesn't leak the test exit code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:49:53 -07:00
Gitea Actions
30127d68e7 chore: bump version to 2.0.10 [skip ci] 2026-04-08 19:46:58 +00:00
Developer
ae61c8c75a Fix Start button not updating: unblock the event loop
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m4s
start_transcription() blocks up to 15s waiting for the Deepgram
WebSocket to connect. Running it synchronously in the async endpoint
blocked the entire uvicorn event loop, preventing:
- pollStatus from completing (frozen HTTP request)
- WebSocket broadcasts from being sent
- Any other API requests from being handled

Fix: run start/stop/reload in thread pool via run_in_executor so
the event loop stays responsive during long-running operations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:43:49 -07:00
Developer
2654200fe9 Switch sidecar-release to GITHUB_ENV to match release.yml
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 2m4s
Same fix as release.yml -- replaced step outputs with GITHUB_ENV
variables to avoid the act runner format bug. Also removed the
has_changes conditional since sidecar-release is now manual-only
(workflow_dispatch always means we want to build).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:33:28 -07:00
Developer
cae0c0b265 Fix false job failure: use GITHUB_ENV instead of step outputs
Some checks failed
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Has been cancelled
The act runner has a Go format bug that evaluates step outputs at
cleanup time and crashes with %!t(string=...), marking the job as
failed even though all steps succeeded.

Replaced steps.bump.outputs.* with GITHUB_ENV variables which
persist across steps without triggering the runner's output
evaluation bug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:32:20 -07:00
Gitea Actions
91b27ac22e chore: bump sidecar version to 1.0.8 [skip ci] 2026-04-08 19:27:02 +00:00
Gitea Actions
1210acd07f chore: bump version to 2.0.9 [skip ci] 2026-04-08 19:23:05 +00:00
Developer
352615c15c Fix Deepgram broken pipe: wait for WebSocket before starting audio
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m0s
Audio capture started immediately after spawning the WebSocket thread,
but the WebSocket hadn't connected yet. Audio chunks sent to the
unconnected WebSocket caused a broken pipe error.

Fix: added a threading.Event that start_recording() waits on (up to
15s) before opening the audio stream. The event is set in _ws_lifecycle
after the WebSocket connects and handshake completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:18:47 -07:00
Developer
a3bcc5bee5 Show transcription start errors in UI, improve error logging
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m5s
Start Transcription button now shows the error message when it fails
instead of silently reverting. Common causes:
- Missing PortAudio library on Linux
- Audio device not accessible
- Deepgram connection failure

Also added error details to backend console output and captured
the last error from the Deepgram engine for better diagnostics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:15:43 -07:00
Developer
b91fe876f9 Stop sidecar process when the app exits
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m6s
The sidecar process was orphaned when the Tauri app closed, leaving
ports 8080/8081 in use. On next launch the new sidecar couldn't bind
those ports and failed to start.

Added RunEvent::Exit handler that stops the sidecar before the app
process terminates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:54:53 -07:00
Developer
7e04d6b4af Fix Linux CPU sidecar bundling CUDA, add cleanup workflow
All checks were successful
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 2m9s
Linux CPU sidecar: PyPI's default torch on Linux includes CUDA
(~800MB). UV_NO_SOURCES only bypasses our custom CUDA index but
still gets CUDA-enabled torch from PyPI. Now explicitly installs
CPU-only torch from pytorch.org/whl/cpu after sync. Same fix
applied to Windows.

New cleanup-releases.yml workflow (manual trigger):
- Configurable: keep N app releases, keep N sidecar releases
- Dry run mode (default) shows what would be deleted without deleting
- Protects v1.4.0 (last pre-Tauri release)
- Shows release sizes in MB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:21:16 -07:00
Developer
15c4e262b9 Document macOS quarantine workaround in README
All checks were successful
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 2m1s
macOS Gatekeeper blocks unsigned apps with "damaged" error.
Added xattr -cr command to Troubleshooting section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:02:55 -07:00
Gitea Actions
2246723220 chore: bump sidecar version to 1.0.7 [skip ci] 2026-04-08 17:05:36 +00:00
Gitea Actions
1c586738f3 chore: bump version to 2.0.8 [skip ci] 2026-04-08 16:58:00 +00:00
Developer
fb02a24334 Remove CUDA sidecar builds, keep CPU + Cloud only
All checks were successful
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m3s
CUDA sidecars are ~2GB and too slow to upload from the Windows runner.
Cloud (Deepgram) provides faster transcription anyway. Removed:

- CUDA build steps from Windows and Linux sidecar workflows
- CUDA option from the SidecarSetup download screen

Remaining sidecar variants:
- Cloud (Deepgram): ~50 MB - recommended for most users
- Local CPU: ~500 MB - for offline/privacy use

CUDA can be revisited once the managed Deepgram service is ready.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:49:36 -07:00
Developer
ce64cacc5e Use max compression for sidecar zips to reduce upload size
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 2m1s
zip -9 on Linux, 7z -mx=9 on Windows. Compression takes longer but
produces smaller files which upload faster over the network.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:42:26 -07:00
Gitea Actions
14a7ca3b30 chore: bump sidecar version to 1.0.6 [skip ci] 2026-04-08 16:26:36 +00:00
Gitea Actions
5b7387f9c6 chore: bump version to 2.0.7 [skip ci] 2026-04-08 16:21:51 +00:00
Developer
293362baa1 Cloud sidecar auto-detects variant and guides user to configure
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m7s
On first launch, the cloud sidecar now:
1. Detects it's the cloud variant (DeviceManager import fails)
2. Auto-switches config from "local" to "byok" mode
3. Shows "Setup needed: Open Settings > Remote Transcription >
   enter your Deepgram API key" as a friendly status message
4. Stays in READY state so the UI is fully accessible

The user can then open Settings, enter their Deepgram API key,
save, and start transcribing without needing to know about modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:17:06 -07:00
Developer
41f50dedec Fix cloud sidecar crash on first launch
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 3m11s
The cloud sidecar excludes the local Whisper engine module, but on
first launch the config defaults to remote.mode="local" which tries
to import it. Now catches the ImportError gracefully and shows an
error message telling the user to switch to Cloud (Deepgram) mode
in Settings. The API server still starts so Settings is accessible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:12:17 -07:00
Developer
d8b7811153 Fix NaN% in sidecar download progress
All checks were successful
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m4s
The Rust backend emits {downloaded, total, phase, message} but the
Svelte component was reading event.payload.progress which doesn't
exist, resulting in NaN. Now calculates percentage from downloaded/total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:06:39 -07:00
Developer
ec8922672c Fix Stop Transcription button not updating after click
All checks were successful
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m9s
After calling POST /api/stop, the button stayed on "Stop Transcription"
because the state update depended on the WebSocket broadcast which can
be delayed or missed (event loop threading issue).

Fix: poll GET /api/status immediately after start/stop API calls to
update the UI state directly, rather than waiting for the WebSocket.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 07:26:06 -07:00
Gitea Actions
375669f657 chore: bump sidecar version to 1.0.5 [skip ci] 2026-04-08 00:43:01 +00:00
Gitea Actions
c8b11fb0ad chore: bump version to 2.0.6 [skip ci] 2026-04-08 00:37:28 +00:00
Developer
273a926f03 Fix YAML parse error: use block scalar for echo with colons
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 2m7s
Gitea's YAML parser treats `echo "text: value"` as a mapping when
on a single `run:` line. Using block scalar (`run: |`) avoids this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:21:42 -07:00
Gitea Actions
5bbbc38875 chore: bump version to 2.0.5 [skip ci] 2026-04-08 00:19:25 +00:00
Developer
d50be6654d Fix dispatch failures and disable automatic cleanup
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m8s
1. Quote RELEASE_TAG env vars in all workflow files. Unquoted
   ${{ inputs.tag }} caused YAML parse errors on some Gitea runners,
   making dispatch return HTTP 500 for Linux/macOS.

2. Disable automatic release cleanup in both coordinators. The cleanup
   races with async builds -- it deletes the release before builds
   finish uploading their assets. Clean up old releases manually
   from the Gitea UI instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:16:36 -07:00
Developer
68abf49018 Log dispatch error responses for debugging
Some checks failed
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Has been cancelled
Show the Gitea API response body when dispatch returns non-204,
to help diagnose why Linux/macOS dispatches return HTTP 500.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:14:23 -07:00
Gitea Actions
8cc2a3ec7a chore: bump version to 2.0.4 [skip ci] 2026-04-08 00:09:39 +00:00
Developer
8aa9dfc644 Update Cargo.lock and generated Tauri schemas
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 2m10s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:03:40 -07:00
Developer
3f16aa838d Add ability to change transcription engine from Settings
Some checks failed
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Has been cancelled
New features:
- Settings > Transcription Engine > "Change Transcription Engine"
  button stops the sidecar, deletes downloaded files, and reloads
  the app to show the engine selection screen
- Improved SidecarSetup descriptions with detailed explanations
  of each variant and "Recommended" tag on Cloud (Deepgram)
- Cloud option listed first as the recommended choice
- New reset_sidecar Tauri command that cleans up sidecar files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:02:31 -07:00
Developer
3d3d7ec3c5 Add cloud-only sidecar variant (~50MB vs 500MB-2GB)
All checks were successful
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 1m59s
Lightweight Deepgram-only sidecar that excludes PyTorch, faster-whisper,
RealtimeSTT, and CUDA. Only includes audio capture + WebSocket streaming
to Deepgram. Requires a Deepgram API key (BYOK or managed mode).

Changes:
- client/models.py: Extracted TranscriptionResult into standalone module
  so deepgram_transcription.py doesn't transitively import torch
- backend/app_controller.py: Made RealtimeTranscriptionEngine and
  DeviceManager imports lazy (only loaded when remote.mode == "local")
- local-transcription-cloud.spec: PyInstaller spec excluding all ML deps
- SidecarSetup.svelte: Added "Cloud Only (Deepgram)" variant option
- build-sidecar-cloud.yml: CI workflow building cloud sidecar for all 3 OS
- sidecar-release.yml: Dispatches cloud build alongside CPU/CUDA builds

Sidecar download options are now:
- Standard (CPU): ~500 MB - local Whisper on any computer
- GPU Accelerated (CUDA): ~2 GB - local Whisper with NVIDIA GPU
- Cloud Only (Deepgram): ~50 MB - requires API key, no local models

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:57:43 -07:00
Developer
bb039399fc Add font source/family settings matching v1.4.0 feature set
All checks were successful
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Successful in 2m11s
Restored the font configuration that was missing from the Tauri
rewrite. Settings now include:

- Font Source: System Font, Web-Safe, Google Font
- System Font: text input for any installed font family
- Web-Safe: dropdown with 13 universal fonts (Arial, Courier New, etc.)
- Google Font: dropdown with 35 fonts organized by category
  (Sans Serif, Serif, Monospace, Display, Handwriting)
- Font Size: range slider (8-32px)

All font settings are saved to config and applied to the OBS web
display and server sync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:40:52 -07:00
Developer
9dcb14e92c Fix Deepgram streaming latency
All checks were successful
Tests / Python Backend Tests (push) Successful in 5s
Tests / Frontend Tests (push) Successful in 9s
Tests / Rust Sidecar Tests (push) Successful in 2m5s
Three changes to reduce transcription delay:

1. Send loop: queue.get() was blocking the asyncio event loop, stalling
   the receive loop and delaying transcription results. Now uses
   run_in_executor() to avoid blocking the event loop.

2. Block size: reduced from 4096 (~256ms) to 1024 (~64ms) for more
   frequent, smaller audio chunks. Deepgram handles streaming better
   with smaller packets.

3. Added punctuate=true and smart_format=true to Deepgram BYOK
   params for cleaner transcription output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:31:50 -07:00
Developer
8db9b8298b Fix dev mode sidecar launch and engine reload on mode change
All checks were successful
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 1m57s
1. Dev mode: use `uv run python` instead of bare `python` to ensure
   the project venv is used. Also use CARGO_MANIFEST_DIR to find the
   project root reliably.

2. Engine reload: changing remote.mode (local/managed/byok) now
   triggers a full engine reload. Previously only model and device
   changes triggered reload, so switching to Deepgram had no effect
   until the app was restarted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:25:07 -07:00
Developer
411779f578 Make release and sidecar-release manual-only while testing
All checks were successful
Tests / Python Backend Tests (push) Successful in 4s
Tests / Frontend Tests (push) Successful in 7s
Tests / Rust Sidecar Tests (push) Successful in 1m59s
Removed push triggers from both coordinator workflows. They now
only run via workflow_dispatch (manual "Run workflow" button).
Re-enable push triggers once the build pipeline is stable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:04:06 -07:00
Developer
bc6055a707 Add workflow_dispatch trigger to release.yml
Some checks failed
Tests / Python Backend Tests (push) Successful in 6s
Tests / Frontend Tests (push) Successful in 8s
Tests / Rust Sidecar Tests (push) Has been cancelled
Allows manually triggering app releases from the Gitea Actions UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:03:18 -07:00
41 changed files with 1735 additions and 409 deletions

View File

@@ -13,10 +13,11 @@ jobs:
runs-on: ubuntu-latest
env:
NODE_VERSION: "20"
RELEASE_TAG: ${{ inputs.tag }}
RELEASE_TAG: "${{ inputs.tag }}"
steps:
- name: Show tag
run: echo "Building for tag: ${RELEASE_TAG}"
run: |
echo "Building for tag: ${RELEASE_TAG}"
- uses: actions/checkout@v4
with:

View File

@@ -13,10 +13,11 @@ jobs:
runs-on: macos-latest
env:
NODE_VERSION: "20"
RELEASE_TAG: ${{ inputs.tag }}
RELEASE_TAG: "${{ inputs.tag }}"
steps:
- name: Show tag
run: echo "Building for tag: ${RELEASE_TAG}"
run: |
echo "Building for tag: ${RELEASE_TAG}"
- uses: actions/checkout@v4
with:
@@ -38,7 +39,27 @@ jobs:
- name: Install npm dependencies
run: npm ci
- name: Setup code signing
env:
APPLE_API_KEY: ${{ secrets.APPLE_API_KEY }}
APPLE_API_KEY_CONTENT: ${{ secrets.APPLE_API_KEY_CONTENT }}
run: |
if [ -n "${APPLE_API_KEY_CONTENT}" ]; then
echo "Setting up notarization API key..."
mkdir -p ~/private_keys
echo "${APPLE_API_KEY_CONTENT}" > ~/private_keys/AuthKey_${APPLE_API_KEY}.p8
else
echo "No signing secrets configured, skipping code signing setup"
fi
- name: Build Tauri app
env:
APPLE_CERTIFICATE: ${{ secrets.APPLE_CERTIFICATE }}
APPLE_CERTIFICATE_PASSWORD: ${{ secrets.APPLE_CERTIFICATE_PASSWORD }}
APPLE_SIGNING_IDENTITY: ${{ secrets.APPLE_SIGNING_IDENTITY }}
APPLE_API_KEY: ${{ secrets.APPLE_API_KEY }}
APPLE_API_ISSUER: ${{ secrets.APPLE_API_ISSUER }}
APPLE_API_KEY_PATH: ~/private_keys/AuthKey_${{ secrets.APPLE_API_KEY }}.p8
run: npm run tauri build
- name: Upload to release
@@ -90,3 +111,6 @@ jobs:
"${REPO_API}/releases/${RELEASE_ID}/assets?name=${encoded_name}")
echo "Upload response: HTTP ${HTTP_CODE}"
done
- name: Cleanup signing artifacts
run: rm -rf ~/private_keys

View File

@@ -15,7 +15,7 @@ jobs:
name: Build App (Windows)
runs-on: windows-latest
env:
RELEASE_TAG: ${{ inputs.tag }}
RELEASE_TAG: "${{ inputs.tag }}"
steps:
- name: Show tag
shell: powershell
@@ -46,8 +46,45 @@ jobs:
shell: powershell
run: npm ci
- name: Setup Azure Artifact Signing
shell: powershell
env:
AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
AZURE_SIGNING_ENDPOINT: ${{ secrets.AZURE_SIGNING_ENDPOINT }}
AZURE_SIGNING_ACCOUNT: ${{ secrets.AZURE_SIGNING_ACCOUNT }}
AZURE_CERT_PROFILE: ${{ secrets.AZURE_CERT_PROFILE }}
run: |
if (-not $env:AZURE_CLIENT_ID) {
Write-Host "No Azure signing secrets configured, skipping code signing setup"
return
}
Write-Host "Setting up Azure Artifact Signing..."
# Install Artifact Signing client tools
nuget install Microsoft.ArtifactSigning.Client -x -OutputDirectory .\signing-tools
$dlibPath = (Resolve-Path ".\signing-tools\Microsoft.ArtifactSigning.Client*\bin\x64\Azure.CodeSigning.Dlib.dll").Path
# Write metadata.json
@{
Endpoint = $env:AZURE_SIGNING_ENDPOINT
CodeSigningAccountName = $env:AZURE_SIGNING_ACCOUNT
CertificateProfileName = $env:AZURE_CERT_PROFILE
} | ConvertTo-Json | Out-File -Encoding UTF8 metadata.json
$metadataPath = (Resolve-Path "metadata.json").Path
# Inject signCommand into tauri.conf.json for this build
$conf = Get-Content src-tauri\tauri.conf.json -Raw | ConvertFrom-Json
$signCmd = "signtool.exe sign /v /fd SHA256 /tr http://timestamp.acs.microsoft.com /td SHA256 /dlib `"$dlibPath`" /dmdf `"$metadataPath`" %1"
$conf.bundle.windows | Add-Member -NotePropertyName "signCommand" -NotePropertyValue $signCmd -Force
$conf | ConvertTo-Json -Depth 10 | Set-Content src-tauri\tauri.conf.json -Encoding UTF8
- name: Build Tauri app
shell: powershell
env:
AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
AZURE_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
AZURE_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
run: npm run tauri build
- name: Upload to release

View File

@@ -0,0 +1,229 @@
name: Build Sidecar (Cloud)
on:
workflow_dispatch:
inputs:
tag:
description: 'Sidecar release tag to build (e.g. sidecar-v1.0.5)'
required: true
jobs:
build-cloud-linux:
name: Build Cloud Sidecar (Linux)
runs-on: ubuntu-latest
env:
PYTHON_VERSION: "3.11"
RELEASE_TAG: "${{ inputs.tag }}"
steps:
- name: Show tag
run: |
echo "Building cloud sidecar for tag ${RELEASE_TAG}"
- uses: actions/checkout@v4
with:
ref: ${{ inputs.tag }}
- name: Install uv
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Set up Python
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y portaudio19-dev
- name: Build cloud sidecar
env:
UV_NO_SOURCES: "1"
run: |
uv venv
uv pip install pyinstaller numpy sounddevice fastapi uvicorn websockets pydantic requests pyyaml packaging
.venv/bin/pyinstaller local-transcription-cloud.spec
- name: Package
run: |
cd dist/local-transcription-backend && zip -r ../../sidecar-linux-x86_64-cloud.zip .
- name: Upload to release
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
sudo apt-get install -y jq
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${RELEASE_TAG}"
for i in $(seq 1 30); do
RELEASE_ID=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/tags/${TAG}" | jq -r '.id // empty')
if [ -n "${RELEASE_ID}" ] && [ "${RELEASE_ID}" != "null" ]; then
echo "Found release ${TAG} (ID: ${RELEASE_ID})"
break
fi
echo "Attempt ${i}/30: waiting for release..."
sleep 10
done
if [ -z "${RELEASE_ID}" ] || [ "${RELEASE_ID}" = "null" ]; then
echo "ERROR: Release not found"; exit 1
fi
for file in sidecar-*-cloud.zip; do
filename=$(basename "$file")
ASSET_ID=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/${RELEASE_ID}/assets" | jq -r ".[] | select(.name == \"${filename}\") | .id // empty")
[ -n "${ASSET_ID}" ] && curl -s -X DELETE -H "Authorization: token ${BUILD_TOKEN}" "${REPO_API}/releases/${RELEASE_ID}/assets/${ASSET_ID}"
curl -s -o /dev/null -w "Upload ${filename}: HTTP %{http_code}\n" -X POST \
-H "Authorization: token ${BUILD_TOKEN}" -H "Content-Type: application/octet-stream" \
-T "$file" "${REPO_API}/releases/${RELEASE_ID}/assets?name=${filename}"
done
build-cloud-windows:
name: Build Cloud Sidecar (Windows)
runs-on: windows-latest
env:
PYTHON_VERSION: "3.11"
RELEASE_TAG: "${{ inputs.tag }}"
steps:
- name: Show tag
shell: powershell
run: Write-Host "Building cloud sidecar for tag $env:RELEASE_TAG"
- uses: actions/checkout@v4
with:
ref: ${{ inputs.tag }}
- name: Install uv
shell: powershell
run: |
if (Get-Command uv -ErrorAction SilentlyContinue) {
Write-Host "uv already installed"
} else {
irm https://astral.sh/uv/install.ps1 | iex
$uvPaths = @("$env:USERPROFILE\.local\bin", "$env:USERPROFILE\.cargo\bin", "$env:LOCALAPPDATA\uv\bin")
foreach ($p in $uvPaths) { if (Test-Path $p) { echo $p | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append } }
}
- name: Set up Python
shell: powershell
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Build cloud sidecar
shell: powershell
env:
UV_NO_SOURCES: "1"
run: |
uv venv
uv pip install pyinstaller numpy sounddevice fastapi uvicorn websockets pydantic requests pyyaml packaging
.venv\Scripts\pyinstaller.exe local-transcription-cloud.spec
- name: Package
shell: powershell
run: |
if (-not (Get-Command 7z -ErrorAction SilentlyContinue)) { choco install 7zip -y }
7z a -tzip -mx=5 sidecar-windows-x86_64-cloud.zip .\dist\local-transcription-backend\*
- name: Upload to release
shell: powershell
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
$REPO_API = "${{ github.server_url }}/api/v1/repos/${{ github.repository }}"
$Headers = @{ "Authorization" = "token $env:BUILD_TOKEN" }
$TAG = $env:RELEASE_TAG
$RELEASE_ID = $null
for ($i = 1; $i -le 30; $i++) {
try {
$release = Invoke-RestMethod -Uri "$REPO_API/releases/tags/$TAG" -Headers $Headers -ErrorAction Stop
$RELEASE_ID = $release.id
if ($RELEASE_ID) { Write-Host "Found release $TAG (ID: $RELEASE_ID)"; break }
} catch {}
Write-Host "Attempt ${i}/30: waiting..."; Start-Sleep -Seconds 10
}
if (-not $RELEASE_ID) { Write-Host "ERROR: Release not found"; exit 1 }
Get-ChildItem -Path . -Filter "sidecar-*-cloud.zip" | ForEach-Object {
$fn = $_.Name; $enc = [System.Uri]::EscapeDataString($fn)
try {
$assets = Invoke-RestMethod -Uri "$REPO_API/releases/$RELEASE_ID/assets" -Headers $Headers
$existing = $assets | Where-Object { $_.name -eq $fn }
if ($existing) { Invoke-RestMethod -Uri "$REPO_API/releases/$RELEASE_ID/assets/$($existing.id)" -Method Delete -Headers $Headers }
} catch {}
curl.exe --fail -s -X POST -H "Authorization: token $env:BUILD_TOKEN" -H "Content-Type: application/octet-stream" -T "$($_.FullName)" "$REPO_API/releases/$RELEASE_ID/assets?name=$enc"
Write-Host "Uploaded $fn"
}
build-cloud-macos:
name: Build Cloud Sidecar (macOS)
runs-on: macos-latest
env:
PYTHON_VERSION: "3.11"
RELEASE_TAG: "${{ inputs.tag }}"
steps:
- name: Show tag
run: |
echo "Building cloud sidecar for tag ${RELEASE_TAG}"
- uses: actions/checkout@v4
with:
ref: ${{ inputs.tag }}
- name: Install uv
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Set up Python
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Install system dependencies
run: brew install portaudio
- name: Build cloud sidecar
env:
UV_NO_SOURCES: "1"
run: |
uv venv
uv pip install pyinstaller numpy sounddevice fastapi uvicorn websockets pydantic requests pyyaml packaging
.venv/bin/pyinstaller local-transcription-cloud.spec
- name: Package
run: |
cd dist/local-transcription-backend && zip -r ../../sidecar-macos-aarch64-cloud.zip .
- name: Upload to release
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
which jq || brew install jq
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${RELEASE_TAG}"
for i in $(seq 1 30); do
RELEASE_ID=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/tags/${TAG}" | jq -r '.id // empty')
if [ -n "${RELEASE_ID}" ] && [ "${RELEASE_ID}" != "null" ]; then
echo "Found release ${TAG} (ID: ${RELEASE_ID})"
break
fi
echo "Attempt ${i}/30: waiting for release..."
sleep 10
done
if [ -z "${RELEASE_ID}" ] || [ "${RELEASE_ID}" = "null" ]; then
echo "ERROR: Release not found"; exit 1
fi
for file in sidecar-*-cloud.zip; do
filename=$(basename "$file")
ASSET_ID=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/${RELEASE_ID}/assets" | jq -r ".[] | select(.name == \"${filename}\") | .id // empty")
[ -n "${ASSET_ID}" ] && curl -s -X DELETE -H "Authorization: token ${BUILD_TOKEN}" "${REPO_API}/releases/${RELEASE_ID}/assets/${ASSET_ID}"
curl -s -o /dev/null -w "Upload ${filename}: HTTP %{http_code}\n" -X POST \
-H "Authorization: token ${BUILD_TOKEN}" -H "Content-Type: application/octet-stream" \
-T "$file" "${REPO_API}/releases/${RELEASE_ID}/assets?name=${filename}"
done

View File

@@ -13,10 +13,11 @@ jobs:
runs-on: ubuntu-latest
env:
PYTHON_VERSION: "3.11"
RELEASE_TAG: ${{ inputs.tag }}
RELEASE_TAG: "${{ inputs.tag }}"
steps:
- name: Show tag
run: echo "Building for tag: ${RELEASE_TAG}"
run: |
echo "Building for tag: ${RELEASE_TAG}"
- uses: actions/checkout@v4
with:
@@ -39,26 +40,17 @@ jobs:
sudo apt-get update
sudo apt-get install -y portaudio19-dev
- name: Build sidecar (CUDA)
run: |
uv sync --frozen || uv sync
uv run pyinstaller local-transcription-headless.spec
- name: Package sidecar (CUDA)
run: |
cd dist/local-transcription-backend && zip -r ../../sidecar-linux-x86_64-cuda.zip .
- name: Build sidecar (CPU)
run: |
rm -rf dist/local-transcription-backend build/
uv sync --no-sources
# PyPI's default torch on Linux includes CUDA (~800MB).
# Replace with CPU-only torch from the dedicated index.
uv pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu --force-reinstall
# Run pyinstaller directly from venv to prevent uv run from
# re-resolving torch back to the CUDA version via pyproject.toml sources
.venv/bin/pyinstaller local-transcription-headless.spec
- name: Package sidecar (CPU)
run: |
cd dist/local-transcription-backend && zip -r ../../sidecar-linux-x86_64-cpu.zip .
cd dist/local-transcription-backend && zip -9 -r ../../sidecar-linux-x86_64-cpu.zip .
- name: Upload to sidecar release
env:

View File

@@ -13,10 +13,11 @@ jobs:
runs-on: macos-latest
env:
PYTHON_VERSION: "3.11"
RELEASE_TAG: ${{ inputs.tag }}
RELEASE_TAG: "${{ inputs.tag }}"
steps:
- name: Show tag
run: echo "Building for tag: ${RELEASE_TAG}"
run: |
echo "Building for tag: ${RELEASE_TAG}"
- uses: actions/checkout@v4
with:

View File

@@ -13,7 +13,7 @@ jobs:
runs-on: windows-latest
env:
PYTHON_VERSION: "3.11"
RELEASE_TAG: ${{ inputs.tag }}
RELEASE_TAG: "${{ inputs.tag }}"
steps:
- name: Show tag
shell: powershell
@@ -54,29 +54,19 @@ jobs:
choco install 7zip -y
}
- name: Build sidecar (CUDA)
shell: powershell
run: |
uv sync --frozen
if ($LASTEXITCODE -ne 0) { uv sync }
uv run pyinstaller local-transcription-headless.spec
- name: Package sidecar (CUDA)
shell: powershell
run: |
7z a -tzip -mx=5 sidecar-windows-x86_64-cuda.zip .\dist\local-transcription-backend\*
- name: Build sidecar (CPU)
shell: powershell
run: |
Remove-Item -Recurse -Force dist\local-transcription-backend, build -ErrorAction SilentlyContinue
$env:UV_NO_SOURCES = "1"
uv sync
# PyPI's default torch includes CUDA. Replace with CPU-only.
uv pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu --force-reinstall
.venv\Scripts\pyinstaller.exe local-transcription-headless.spec
- name: Package sidecar (CPU)
shell: powershell
run: |
7z a -tzip -mx=5 sidecar-windows-x86_64-cpu.zip .\dist\local-transcription-backend\*
7z a -tzip -mx=9 sidecar-windows-x86_64-cpu.zip .\dist\local-transcription-backend\*
- name: Upload to sidecar release
shell: powershell

View File

@@ -0,0 +1,102 @@
name: Cleanup Old Releases
on:
workflow_dispatch:
inputs:
keep_app_releases:
description: 'Number of app releases to keep'
required: false
default: '3'
keep_sidecar_releases:
description: 'Number of sidecar releases to keep'
required: false
default: '2'
dry_run:
description: 'Dry run (show what would be deleted without deleting)'
required: false
default: 'true'
jobs:
cleanup:
name: Cleanup Old Releases
runs-on: ubuntu-latest
steps:
- name: Cleanup releases
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
KEEP_APP="${{ inputs.keep_app_releases }}"
KEEP_SIDECAR="${{ inputs.keep_sidecar_releases }}"
DRY_RUN="${{ inputs.dry_run }}"
echo "=== Cleanup Configuration ==="
echo "Keep app releases: ${KEEP_APP}"
echo "Keep sidecar releases: ${KEEP_SIDECAR}"
echo "Dry run: ${DRY_RUN}"
echo ""
# Fetch all releases
ALL_RELEASES=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases?limit=50")
# ── App releases (v* tags, not sidecar-v*) ──
echo "=== App Releases ==="
APP_RELEASES=$(echo "$ALL_RELEASES" | jq -c '[.[] | select(.tag_name | startswith("v")) | select(.tag_name | startswith("sidecar") | not)]')
APP_TOTAL=$(echo "$APP_RELEASES" | jq 'length')
echo "Found ${APP_TOTAL} app releases, keeping ${KEEP_APP}"
if [ "$APP_TOTAL" -gt "$KEEP_APP" ]; then
echo "$APP_RELEASES" | jq -c ".[$KEEP_APP:][]" | while read -r release; do
ID=$(echo "$release" | jq -r '.id')
TAG=$(echo "$release" | jq -r '.tag_name')
SIZE=$(echo "$release" | jq '[.assets[]?.size // 0] | add // 0')
SIZE_MB=$(echo "scale=1; $SIZE / 1048576" | bc 2>/dev/null || echo "?")
# Protect v1.4.0 (last pre-Tauri release)
if [ "$TAG" = "v1.4.0" ]; then
echo " PROTECT ${TAG} (${SIZE_MB} MB)"
continue
fi
if [ "$DRY_RUN" = "true" ]; then
echo " WOULD DELETE ${TAG} (ID: ${ID}, ${SIZE_MB} MB)"
else
echo " DELETING ${TAG} (ID: ${ID}, ${SIZE_MB} MB)..."
curl -s -X DELETE -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/${ID}"
fi
done
else
echo " Nothing to clean up"
fi
echo ""
# ── Sidecar releases (sidecar-v* tags) ──
echo "=== Sidecar Releases ==="
SIDECAR_RELEASES=$(echo "$ALL_RELEASES" | jq -c '[.[] | select(.tag_name | startswith("sidecar-v"))]')
SIDECAR_TOTAL=$(echo "$SIDECAR_RELEASES" | jq 'length')
echo "Found ${SIDECAR_TOTAL} sidecar releases, keeping ${KEEP_SIDECAR}"
if [ "$SIDECAR_TOTAL" -gt "$KEEP_SIDECAR" ]; then
echo "$SIDECAR_RELEASES" | jq -c ".[$KEEP_SIDECAR:][]" | while read -r release; do
ID=$(echo "$release" | jq -r '.id')
TAG=$(echo "$release" | jq -r '.tag_name')
SIZE=$(echo "$release" | jq '[.assets[]?.size // 0] | add // 0')
SIZE_MB=$(echo "scale=1; $SIZE / 1048576" | bc 2>/dev/null || echo "?")
if [ "$DRY_RUN" = "true" ]; then
echo " WOULD DELETE ${TAG} (ID: ${ID}, ${SIZE_MB} MB)"
else
echo " DELETING ${TAG} (ID: ${ID}, ${SIZE_MB} MB)..."
curl -s -X DELETE -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/${ID}"
fi
done
else
echo " Nothing to clean up"
fi
echo ""
echo "=== Done ==="

View File

@@ -1,14 +1,7 @@
name: Release
on:
push:
branches: [main]
paths:
- 'src/**'
- 'src-tauri/**'
- 'package.json'
- 'vite.config.ts'
- 'index.html'
workflow_dispatch:
jobs:
test:
@@ -43,9 +36,6 @@ jobs:
name: Bump version and tag
needs: test
runs-on: ubuntu-latest
outputs:
new_version: ${{ steps.bump.outputs.new_version }}
tag: ${{ steps.bump.outputs.tag }}
steps:
- uses: actions/checkout@v4
with:
@@ -57,7 +47,6 @@ jobs:
git config user.email "actions@gitea.local"
- name: Bump patch version
id: bump
run: |
CURRENT=$(grep '"version"' package.json | head -1 | sed 's/.*"version": *"\([^"]*\)".*/\1/')
echo "Current version: ${CURRENT}"
@@ -75,35 +64,34 @@ jobs:
sed -i "s/__version__ = \"${CURRENT}\"/__version__ = \"${NEW_VERSION}\"/" version.py
sed -i "s/__version_info__ = .*/__version_info__ = (${MAJOR}, ${MINOR}, ${NEW_PATCH})/" version.py
echo "new_version=${NEW_VERSION}" >> $GITHUB_OUTPUT
echo "tag=v${NEW_VERSION}" >> $GITHUB_OUTPUT
# Write to env file instead of step outputs (avoids act runner bug)
echo "NEW_VERSION=${NEW_VERSION}" >> $GITHUB_ENV
echo "RELEASE_TAG=v${NEW_VERSION}" >> $GITHUB_ENV
- name: Commit and tag
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
NEW_VERSION="${{ steps.bump.outputs.new_version }}"
git add package.json src-tauri/tauri.conf.json src-tauri/Cargo.toml version.py
git commit -m "chore: bump version to ${NEW_VERSION} [skip ci]"
git tag "v${NEW_VERSION}"
git tag "${RELEASE_TAG}"
REMOTE_URL=$(git remote get-url origin | sed "s|://|://gitea-actions:${BUILD_TOKEN}@|")
git pull --rebase "${REMOTE_URL}" main || true
git push "${REMOTE_URL}" HEAD:main
git push "${REMOTE_URL}" "v${NEW_VERSION}"
git push "${REMOTE_URL}" "${RELEASE_TAG}"
- name: Create Gitea release
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${{ steps.bump.outputs.tag }}"
RELEASE_NAME="Local Transcription ${TAG}"
RELEASE_NAME="Local Transcription ${RELEASE_TAG}"
curl -s -X POST \
-H "Authorization: token ${BUILD_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"tag_name\": \"${TAG}\", \"name\": \"${RELEASE_NAME}\", \"body\": \"Automated build.\", \"draft\": false, \"prerelease\": false}" \
-d "{\"tag_name\": \"${RELEASE_TAG}\", \"name\": \"${RELEASE_NAME}\", \"body\": \"Automated build.\", \"draft\": false, \"prerelease\": false}" \
"${REPO_API}/releases"
echo "Created release: ${RELEASE_NAME}"
@@ -112,54 +100,14 @@ jobs:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${{ steps.bump.outputs.tag }}"
for workflow in build-app-linux.yml build-app-windows.yml build-app-macos.yml; do
echo "Dispatching ${workflow} for ${TAG}..."
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -X POST \
echo "Dispatching ${workflow} for ${RELEASE_TAG}..."
HTTP_CODE=$(curl -s -w "%{http_code}" -o /tmp/dispatch_resp.txt -X POST \
-H "Authorization: token ${BUILD_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"ref\": \"main\", \"inputs\": {\"tag\": \"${TAG}\"}}" \
-d "{\"ref\": \"main\", \"inputs\": {\"tag\": \"${RELEASE_TAG}\"}}" \
"${REPO_API}/actions/workflows/${workflow}/dispatches")
echo " -> HTTP ${HTTP_CODE}"
if [ "$HTTP_CODE" != "204" ]; then cat /tmp/dispatch_resp.txt; echo ""; fi
done
- name: Clean up old app releases
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
KEEP=3
PROTECT_TAG="v1.4.0"
echo "Cleaning up old app releases (keeping latest ${KEEP} + ${PROTECT_TAG})..."
# Get all app releases (v* tags, not sidecar-v*)
RELEASES=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases?limit=50" | jq -c '[.[] | select(.tag_name | startswith("v")) | select(.tag_name | startswith("sidecar") | not)]')
TOTAL=$(echo "$RELEASES" | jq 'length')
echo "Found ${TOTAL} app releases"
if [ "$TOTAL" -le "$KEEP" ]; then
echo "Nothing to clean up"
exit 0
fi
# Skip the newest KEEP releases, delete the rest (except protected)
echo "$RELEASES" | jq -c ".[$KEEP:][]" | while read -r release; do
ID=$(echo "$release" | jq -r '.id')
TAG=$(echo "$release" | jq -r '.tag_name')
if [ "$TAG" = "$PROTECT_TAG" ]; then
echo " Protecting ${TAG}"
continue
fi
echo " Deleting release ${TAG} (ID: ${ID})..."
curl -s -X DELETE -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/${ID}"
# Keep the git tag -- only delete the release (assets).
# Deleting tags breaks builds that haven't checked out yet.
done
echo "Cleanup complete"

View File

@@ -1,14 +1,6 @@
name: Sidecar Release
on:
push:
branches: [main]
paths:
- 'client/**'
- 'server/**'
- 'backend/**'
- 'pyproject.toml'
- 'local-transcription-headless.spec'
workflow_dispatch:
jobs:
@@ -35,40 +27,17 @@ jobs:
needs: test
if: "!contains(github.event.head_commit.message, '[skip ci]')"
runs-on: ubuntu-latest
outputs:
version: ${{ steps.bump.outputs.version }}
tag: ${{ steps.bump.outputs.tag }}
has_changes: ${{ steps.check_changes.outputs.has_changes }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 2
- name: Check for backend changes
id: check_changes
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
exit 0
fi
CHANGED=$(git diff --name-only HEAD~1 HEAD -- client/ server/ backend/ pyproject.toml local-transcription-headless.spec 2>/dev/null || echo "")
if [ -n "$CHANGED" ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
echo "Backend changes detected: $CHANGED"
else
echo "has_changes=false" >> $GITHUB_OUTPUT
echo "No backend changes detected, skipping sidecar build"
fi
- name: Configure git
if: steps.check_changes.outputs.has_changes == 'true'
run: |
git config user.name "Gitea Actions"
git config user.email "actions@gitea.local"
- name: Bump sidecar patch version
if: steps.check_changes.outputs.has_changes == 'true'
id: bump
run: |
CURRENT=$(grep '^version = ' pyproject.toml | head -1 | sed 's/version = "\(.*\)"/\1/')
echo "Current sidecar version: ${CURRENT}"
@@ -82,91 +51,52 @@ jobs:
sed -i "s/^version = \"${CURRENT}\"/version = \"${NEW_VERSION}\"/" pyproject.toml
echo "version=${NEW_VERSION}" >> $GITHUB_OUTPUT
echo "tag=sidecar-v${NEW_VERSION}" >> $GITHUB_OUTPUT
# Write to env file instead of step outputs (avoids act runner bug)
echo "NEW_VERSION=${NEW_VERSION}" >> $GITHUB_ENV
echo "RELEASE_TAG=sidecar-v${NEW_VERSION}" >> $GITHUB_ENV
- name: Commit and tag
if: steps.check_changes.outputs.has_changes == 'true'
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
NEW_VERSION="${{ steps.bump.outputs.version }}"
TAG="${{ steps.bump.outputs.tag }}"
git add pyproject.toml
git commit -m "chore: bump sidecar version to ${NEW_VERSION} [skip ci]"
git tag "${TAG}"
git tag "${RELEASE_TAG}"
REMOTE_URL=$(git remote get-url origin | sed "s|://|://gitea-actions:${BUILD_TOKEN}@|")
git pull --rebase "${REMOTE_URL}" main || true
git push "${REMOTE_URL}" HEAD:main
git push "${REMOTE_URL}" "${TAG}"
git push "${REMOTE_URL}" "${RELEASE_TAG}"
- name: Create Gitea release
if: steps.check_changes.outputs.has_changes == 'true'
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${{ steps.bump.outputs.tag }}"
VERSION="${{ steps.bump.outputs.version }}"
RELEASE_NAME="Sidecar v${VERSION}"
RELEASE_NAME="Sidecar v${NEW_VERSION}"
curl -s -X POST \
-H "Authorization: token ${BUILD_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"tag_name\": \"${TAG}\", \"name\": \"${RELEASE_NAME}\", \"body\": \"Automated sidecar build.\", \"draft\": false, \"prerelease\": false}" \
-d "{\"tag_name\": \"${RELEASE_TAG}\", \"name\": \"${RELEASE_NAME}\", \"body\": \"Automated sidecar build.\", \"draft\": false, \"prerelease\": false}" \
"${REPO_API}/releases"
echo "Created release: ${RELEASE_NAME}"
- name: Trigger per-OS sidecar builds
if: steps.check_changes.outputs.has_changes == 'true'
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
TAG="${{ steps.bump.outputs.tag }}"
for workflow in build-sidecar-linux.yml build-sidecar-windows.yml build-sidecar-macos.yml; do
echo "Dispatching ${workflow} for ${TAG}..."
for workflow in build-sidecar-linux.yml build-sidecar-windows.yml build-sidecar-macos.yml build-sidecar-cloud.yml; do
echo "Dispatching ${workflow} for ${RELEASE_TAG}..."
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -X POST \
-H "Authorization: token ${BUILD_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"ref\": \"main\", \"inputs\": {\"tag\": \"${TAG}\"}}" \
-d "{\"ref\": \"main\", \"inputs\": {\"tag\": \"${RELEASE_TAG}\"}}" \
"${REPO_API}/actions/workflows/${workflow}/dispatches")
echo " -> HTTP ${HTTP_CODE}"
done
- name: Clean up old sidecar releases
if: steps.check_changes.outputs.has_changes == 'true'
env:
BUILD_TOKEN: ${{ secrets.BUILD_TOKEN }}
run: |
REPO_API="${GITHUB_SERVER_URL}/api/v1/repos/${GITHUB_REPOSITORY}"
KEEP=2
echo "Cleaning up old sidecar releases (keeping latest ${KEEP})..."
# Get all sidecar releases (sidecar-v* tags)
RELEASES=$(curl -s -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases?limit=50" | jq -c '[.[] | select(.tag_name | startswith("sidecar-v"))]')
TOTAL=$(echo "$RELEASES" | jq 'length')
echo "Found ${TOTAL} sidecar releases"
if [ "$TOTAL" -le "$KEEP" ]; then
echo "Nothing to clean up"
exit 0
fi
# Skip the newest KEEP releases, delete the rest
echo "$RELEASES" | jq -c ".[$KEEP:][]" | while read -r release; do
ID=$(echo "$release" | jq -r '.id')
TAG=$(echo "$release" | jq -r '.tag_name')
echo " Deleting sidecar release ${TAG} (ID: ${ID})..."
curl -s -X DELETE -H "Authorization: token ${BUILD_TOKEN}" \
"${REPO_API}/releases/${ID}"
# Keep the git tag -- only delete the release (assets).
# Deleting tags breaks builds that haven't checked out yet.
done
echo "Cleanup complete"
# NOTE: Automatic cleanup disabled -- it races with async builds.
# Clean up old releases manually from the Gitea UI when needed.

View File

@@ -11,9 +11,11 @@ Local Transcription is a cross-platform desktop application for real-time speech
**Key Features:**
- Cross-platform desktop app (Windows, macOS, Linux) via Tauri v2 + Svelte 5
- Headless Python backend with FastAPI control API
- Dual transcription modes: local Whisper or cloud Deepgram (managed/BYOK)
- Cloud-first: defaults to Deepgram (BYOK) transcription; local Whisper also supported
- Settings UI hides local-only options (model, VAD, timing) when in cloud mode
- Start button gated on API key / login — shows guidance if not configured
- Shared Captions: create rooms, share via codes, join with one click (hosted at caption.shadowdao.com)
- Built-in web server for OBS browser source at `http://localhost:8080`
- Optional multi-user sync via Node.js server
- CUDA, MPS (Apple Silicon), and CPU support
- Auto-updates, custom fonts, configurable colors
@@ -273,9 +275,29 @@ All per-OS build workflows can be re-run independently via `workflow_dispatch` w
- `Info.plist` must include `NSMicrophoneUsageDescription` for mic access
- No CUDA builds — CPU/MPS only
## Code Signing
Code signing is configured for Windows and macOS to eliminate install warnings (SmartScreen / Gatekeeper). See [SIGNING.md](SIGNING.md) for full setup details.
**Status (as of 2026-04-10):** CI workflow changes are committed. Waiting on identity verification for both platforms before secrets can be configured.
**How it works:**
- macOS: Tauri auto-signs when `APPLE_CERTIFICATE` and related env vars are set in CI. Notarization uses App Store Connect API key.
- Windows: Azure Artifact Signing via `signtool.exe` + dlib. CI workflow injects `signCommand` into `tauri.conf.json` at build time when `AZURE_CLIENT_ID` is set.
- Both are no-ops when secrets aren't configured — unsigned builds work as before.
**Key files:**
- `src-tauri/Entitlements.plist` — macOS hardened runtime entitlements (mic, network)
- `src-tauri/Info.plist` — macOS microphone usage description
- `.gitea/workflows/build-app-macos.yml` — Apple signing + notarization
- `.gitea/workflows/build-app-windows.yml` — Azure Artifact Signing
**Secrets required (12 total):** See [SIGNING.md](SIGNING.md) for the full list — 6 Apple secrets, 6 Azure secrets.
## Related Documentation
- [README.md](README.md) — User-facing documentation
- [BUILD.md](BUILD.md) — Detailed build instructions
- [INSTALL.md](INSTALL.md) — Installation guide
- [SIGNING.md](SIGNING.md) — Code signing setup guide
- [server/nodejs/README.md](server/nodejs/README.md) — Node.js server setup

View File

@@ -7,14 +7,14 @@ A real-time speech-to-text desktop application for streamers. Runs locally on yo
## Features
- **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
- **Cloud-First**: Defaults to Deepgram cloud transcription — get started with just an API key
- **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/)
- **Dual Transcription Modes**: Local (Whisper) or cloud (Deepgram) with managed billing or BYOK
- **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
- **Dual Transcription Modes**: Cloud (Deepgram) or local (Whisper) with automatic GPU/CPU detection
- **Shared Captions**: Create a room and share a code so others can join — no server setup needed
- **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
- **Multi-User Sync**: Optional Node.js server to sync transcriptions across multiple users
- **Custom Fonts**: Support for system fonts, web-safe fonts, Google Fonts, and custom font files
- **Customizable Colors**: User-configurable colors for name, text, and background
- **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
- **Noise Suppression**: Built-in audio preprocessing to reduce background noise
- **Auto-Updates**: Automatic update checking with release notes display
@@ -87,27 +87,30 @@ For detailed build instructions, see [BUILD.md](BUILD.md).
## Usage
### Standalone Mode
### Quick Setup (Cloud — Recommended)
1. Launch the application
2. Select your microphone from the audio device dropdown
3. Choose a Whisper model (smaller = faster, larger = more accurate):
2. Open **Settings** — the transcription mode defaults to **Cloud (Deepgram)**
3. Get a free API key at [console.deepgram.com](https://console.deepgram.com) and paste it in Settings
4. Select your microphone from the audio device dropdown
5. Click **Start Transcription**
6. Transcriptions appear in the main window and at `http://localhost:8080`
> The Start button is disabled until an API key is entered. Local-only settings (model, VAD, timing) are hidden in cloud mode to keep things simple.
### Local Mode (Whisper)
For offline/on-device transcription, switch to **Local (Whisper)** in Settings:
1. Choose a Whisper model (smaller = faster, larger = more accurate):
- `tiny.en` / `tiny` — Fastest, good for quick captions
- `base.en` / `base` — Balanced speed and accuracy
- `small.en` / `small` — Better accuracy
- `medium.en` / `medium` — High accuracy
- `large-v3` — Best accuracy (requires more resources)
4. Click **Start** to begin transcription
5. Transcriptions appear in the main window and at `http://localhost:8080`
### Remote Transcription (Deepgram)
Instead of local Whisper models, you can use cloud-based transcription:
- **Managed mode**: Sign up via the transcription proxy for metered billing
- **BYOK mode**: Bring your own Deepgram API key for direct access
Configure in Settings > Remote Transcription.
2. Select compute device (Auto/CUDA/CPU) and compute type
3. Tune VAD sensitivity and timing settings as needed
4. Click **Start Transcription**
### OBS Browser Source Setup
@@ -117,18 +120,42 @@ Configure in Settings > Remote Transcription.
4. Set dimensions (e.g., 1920x300)
5. Check "Shutdown source when not visible" for performance
### Multi-User Mode (Optional)
### Shared Captions (Multi-User)
For syncing transcriptions across multiple users (e.g., multi-host streams or translation teams):
Share live captions across multiple users using the hosted service at `https://caption.shadowdao.com/` — no server setup required.
1. Deploy the Node.js server (see [server/nodejs/README.md](server/nodejs/README.md))
2. In the app settings, enable **Server Sync**
3. Enter the server URL (e.g., `http://your-server:3000/api/send`)
4. Set a room name and passphrase (shared with other users)
5. In OBS, use the server's display URL with your room name:
```
http://your-server:3000/display?room=YOURROOM&timestamps=true&maxlines=50
```
#### Creating a Room
1. Open **Settings** and enable **Shared Captions**
2. Click **Create Room** — this generates a room name and passphrase automatically
3. A **share code** is generated and copied to your clipboard
4. Send the share code to anyone who should join
#### Joining a Room
1. Open **Settings** and enable **Shared Captions**
2. Paste the share code you received into the **"Paste share code to join"** field
3. Click **Join** — the server URL, room, and passphrase are auto-filled
4. Click **Save**
#### Sharing an Existing Room
If you already have a room configured and want to invite others:
1. Open **Settings** and scroll to **Shared Captions**
2. Click **Share Current Room** — generates a share code from your current config and copies it to the clipboard
3. Send the code to others
#### OBS Display for Shared Rooms
In OBS, add a Browser source pointing to the server's display URL:
```
https://caption.shadowdao.com/display?room=YOURROOM&timestamps=true&maxlines=50
```
#### Self-Hosting
You can also self-host the sync server. See [server/nodejs/README.md](server/nodejs/README.md) for setup instructions, then enter your own server URL in the Shared Captions settings.
## Configuration
@@ -144,7 +171,7 @@ Settings are stored at `~/.local-transcription/config.yaml` and can be modified
| `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
| `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
| `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
| `remote.mode` | Transcription mode (local/managed/byok) | `local` |
| `remote.mode` | Transcription mode (local/managed/byok) | `byok` |
| `display.show_timestamps` | Show timestamps with transcriptions | `true` |
| `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
| `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |
@@ -267,6 +294,15 @@ Both workflows require a `BUILD_TOKEN` secret in the repo settings (Gitea API to
## Troubleshooting
### macOS: "App is damaged and can't be opened"
macOS Gatekeeper blocks unsigned applications. Since the app is not yet signed with an Apple Developer certificate, you need to remove the quarantine flag before opening:
```bash
xattr -cr "/Applications/Local Transcription.app"
```
Then open the app normally. You only need to do this once after downloading.
### Model Loading Issues
- Models download automatically on first use to `~/.cache/huggingface/`
- First run requires internet connection

136
SIGNING.md Normal file
View File

@@ -0,0 +1,136 @@
# Code Signing Setup
This document explains how to configure code signing for Local Transcription so that Windows and macOS installers are trusted by the operating system.
## Overview
Without code signing:
- **Windows**: SmartScreen shows "Windows protected your PC" warnings
- **macOS**: Gatekeeper blocks the app — "app can't be opened because it is from an unidentified developer"
The CI/CD workflows are configured to sign automatically when the required secrets are present. Without secrets, builds still work — they just produce unsigned installers.
---
## Windows — Azure Artifact Signing
**Cost**: ~$9.99/month (up to 5,000 signatures)
### 1. Create an Azure Account
Sign up at https://azure.microsoft.com if you don't already have one.
### 2. Set Up Artifact Signing
1. In the Azure Portal, search for **Artifact Signing**
2. Create a new **Artifact Signing Account**
- Choose a region (e.g., West US 2) — note this for the endpoint URL
- The endpoint will be like `https://wus2.codesigning.azure.net/`
3. Complete **Identity Verification** (required before you can create certificate profiles)
4. Create a **Certificate Profile** with type "Public Trust" for code signing
### 3. Create an App Registration (Service Principal)
This allows CI to authenticate to Azure:
1. Go to **Azure Active Directory** > **App registrations** > **New registration**
2. Name it (e.g., `local-transcription-signing`)
3. After creation, note the **Application (client) ID** and **Directory (tenant) ID**
4. Go to **Certificates & secrets** > **New client secret** — note the secret value
5. Grant the app registration the **Artifact Signing Certificate Profile Signer** role on your Artifact Signing Account
### 4. Add Gitea Secrets
In your Gitea repository, go to **Settings** > **Actions** > **Secrets** and add:
| Secret Name | Value |
|-------------|-------|
| `AZURE_CLIENT_ID` | App registration Application (client) ID |
| `AZURE_CLIENT_SECRET` | App registration client secret value |
| `AZURE_TENANT_ID` | Directory (tenant) ID |
| `AZURE_SIGNING_ENDPOINT` | Artifact Signing endpoint URL (e.g., `https://wus2.codesigning.azure.net/`) |
| `AZURE_SIGNING_ACCOUNT` | Artifact Signing account name |
| `AZURE_CERT_PROFILE` | Certificate profile name |
---
## macOS — Apple Developer Code Signing + Notarization
**Cost**: $99/year (Apple Developer Program)
### 1. Enroll in the Apple Developer Program
Sign up at https://developer.apple.com/programs/
### 2. Create a Developer ID Certificate
1. Open **Xcode** > **Settings** > **Accounts** > select your team > **Manage Certificates**
2. Click **+** > **Developer ID Application**
3. Or create via the Apple Developer portal: **Certificates, Identifiers & Profiles** > **Certificates** > **+** > **Developer ID Application**
### 3. Export the Certificate as .p12
1. Open **Keychain Access**
2. Find your **Developer ID Application** certificate
3. Right-click > **Export** > save as `.p12` with a password
4. Base64-encode it:
```bash
base64 -i certificate.p12 | tr -d '\n'
```
### 4. Create an App Store Connect API Key
This is used for notarization (submitting the app to Apple for verification):
1. Go to https://appstoreconnect.apple.com/access/integrations/api
2. Click **Generate API Key**
3. Give it a name and **Developer** role (minimum)
4. Download the `.p8` private key file (you can only download it once)
5. Note the **Key ID** and **Issuer ID** shown on the page
### 5. Find Your Signing Identity
Your signing identity looks like:
```
Developer ID Application: Your Name (TEAMID)
```
You can find it by running:
```bash
security find-identity -v -p codesigning
```
### 6. Add Gitea Secrets
| Secret Name | Value |
|-------------|-------|
| `APPLE_CERTIFICATE` | Base64-encoded .p12 certificate (from step 3) |
| `APPLE_CERTIFICATE_PASSWORD` | Password used when exporting the .p12 |
| `APPLE_SIGNING_IDENTITY` | Full identity string (e.g., `Developer ID Application: Your Name (TEAMID)`) |
| `APPLE_API_KEY` | App Store Connect API Key ID |
| `APPLE_API_ISSUER` | API issuer UUID |
| `APPLE_API_KEY_CONTENT` | Full contents of the `.p8` private key file |
---
## Verifying Signing Works
### Trigger a Build
Both build workflows use `workflow_dispatch`, so you can trigger them manually in Gitea:
1. Go to **Actions** > select the workflow > **Run workflow**
2. Enter the release tag (e.g., `v2.0.15`)
### Check macOS
After installing the `.dmg`, the app should open without any Gatekeeper warnings. You can also verify from the command line:
```bash
codesign -dv --verbose=4 /Applications/Local\ Transcription.app
spctl --assess --type execute /Applications/Local\ Transcription.app
```
### Check Windows
After running the `.msi` or `-setup.exe`, there should be no SmartScreen warning. The installer properties should show your organization name as the publisher.

View File

@@ -151,14 +151,24 @@ class APIServer:
@app.post("/api/start")
async def start_transcription():
success, message = ctrl.start_transcription()
import asyncio
# Run in thread pool to avoid blocking the event loop
# (start_recording can block up to 15s waiting for Deepgram WS)
loop = asyncio.get_event_loop()
success, message = await loop.run_in_executor(
None, ctrl.start_transcription
)
if not success:
raise HTTPException(status_code=400, detail=message)
return {"status": "ok", "message": message}
@app.post("/api/stop")
async def stop_transcription():
success, message = ctrl.stop_transcription()
import asyncio
loop = asyncio.get_event_loop()
success, message = await loop.run_in_executor(
None, ctrl.stop_transcription
)
if not success:
raise HTTPException(status_code=400, detail=message)
return {"status": "ok", "message": message}
@@ -223,7 +233,11 @@ class APIServer:
@app.post("/api/reload-engine")
async def reload_engine():
success, message = ctrl.reload_engine()
import asyncio
loop = asyncio.get_event_loop()
success, message = await loop.run_in_executor(
None, ctrl.reload_engine
)
if not success:
raise HTTPException(status_code=500, detail=message)
return {"status": "ok", "message": message}

View File

@@ -18,13 +18,18 @@ import sys
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from client.config import Config
from client.device_utils import DeviceManager
from client.transcription_engine_realtime import RealtimeTranscriptionEngine, TranscriptionResult
from client.models import TranscriptionResult
from client.deepgram_transcription import DeepgramTranscriptionEngine
from client.server_sync import ServerSyncClient
from server.web_display import TranscriptionWebServer
from version import __version__
# Heavy imports (torch, RealtimeSTT, faster-whisper) are deferred so
# the cloud-only sidecar build can exclude them entirely.
# Imported lazily in _initialize_engine() when remote.mode == "local".
RealtimeTranscriptionEngine = None
DeviceManager = None
class AppState:
"""Enum-like class for application states."""
@@ -89,7 +94,24 @@ class AppController:
def __init__(self, config: Optional[Config] = None):
self.config = config or Config()
self.device_manager = DeviceManager()
# DeviceManager is only needed for local Whisper mode.
# Lazy-import to keep the cloud-only sidecar lightweight.
global DeviceManager
if DeviceManager is None:
try:
from client.device_utils import DeviceManager as _DM
DeviceManager = _DM
except ImportError:
DeviceManager = None
self.device_manager = DeviceManager() if DeviceManager else None
self.is_cloud_only = DeviceManager is None
# If this is the cloud-only sidecar and mode is still "local",
# auto-switch to "byok" so the engine doesn't try to load Whisper.
if self.is_cloud_only and self.config.get('remote.mode', 'local') == 'local':
self.config.set('remote.mode', 'byok')
# State
self._state = AppState.INITIALIZING
@@ -243,15 +265,12 @@ class AppController:
def _initialize_engine(self):
"""Initialize the transcription engine in a background thread."""
device_config = self.config.get('transcription.device', 'auto')
self.device_manager.set_device(device_config)
audio_device_str = self.config.get('audio.input_device', 'default')
audio_device = None if audio_device_str == 'default' else int(audio_device_str)
model = self.config.get('transcription.model', 'base.en')
language = self.config.get('transcription.language', 'en')
device = self.device_manager.get_device_for_whisper()
device_config = self.config.get('transcription.device', 'auto')
compute_type = self.config.get('transcription.compute_type', 'default')
self.current_model_size = model
@@ -284,6 +303,27 @@ class AppController:
self.transcription_engine.set_error_callback(self._on_remote_error)
self.transcription_engine.set_credits_low_callback(self._on_credits_low)
else:
# Lazy-import heavy local transcription dependencies
global RealtimeTranscriptionEngine
if RealtimeTranscriptionEngine is None:
try:
from client.transcription_engine_realtime import RealtimeTranscriptionEngine as _RTE
RealtimeTranscriptionEngine = _RTE
except ImportError:
# Cloud-only sidecar -- local engine not available
self._set_state(
AppState.ERROR,
"Local transcription not available in this build. "
"Please switch to Cloud (Deepgram) mode in Settings."
)
return
if self.device_manager:
self.device_manager.set_device(device_config)
device = self.device_manager.get_device_for_whisper()
else:
device = "cpu"
self.transcription_engine = RealtimeTranscriptionEngine(
model=model,
device=device,
@@ -333,7 +373,15 @@ class AppController:
self._set_state(AppState.READY, f"Ready | Device: {device_display}")
else:
self._set_state(AppState.ERROR, message)
# Cloud sidecar with no API key -- show helpful setup message
# instead of a scary error. The user needs to enter their key.
if self.is_cloud_only:
self._set_state(
AppState.READY,
"Setup needed: Open Settings > Remote Transcription > enter your Deepgram API key"
)
else:
self._set_state(AppState.ERROR, message)
# ── Transcription Control ──────────────────────────────────────
@@ -348,7 +396,14 @@ class AppController:
try:
success = self.transcription_engine.start_recording()
if not success:
return False, "Failed to start recording"
import logging
# Check if there's a recent error in the logger
err_detail = getattr(self.transcription_engine, '_last_error', '')
msg = f"Failed to start recording"
if err_detail:
msg += f": {err_detail}"
print(f"ERROR: {msg}")
return False, msg
# Start server sync if enabled
if self.config.get('server_sync.enabled', False):
@@ -577,12 +632,18 @@ class AppController:
if self.config.get('server_sync.enabled', False):
self._start_server_sync()
# Check if model/device changed
# Check if model/device/remote mode changed -- any of these require
# a full engine reload since they change which engine class is used
new_model = self.config.get('transcription.model', 'base.en')
new_device = self.config.get('transcription.device', 'auto')
new_remote_mode = self.config.get('remote.mode', 'local')
current_remote_mode = 'local'
if self.transcription_engine:
current_remote_mode = getattr(self.transcription_engine, 'mode', 'local')
engine_reload_needed = (
self.current_model_size != new_model
or self.current_device_config != new_device
or current_remote_mode != new_remote_mode
)
if engine_reload_needed:
@@ -596,7 +657,7 @@ class AppController:
host = self.config.get('web_server.host', '127.0.0.1')
port = self.actual_web_port or self.config.get('web_server.port', 8080)
device_info = self.device_manager.get_device_info()
device_info = self.device_manager.get_device_info() if self.device_manager else []
remote_mode = self.config.get('remote.mode', 'local')
if remote_mode in ('managed', 'byok') and self.transcription_engine:
@@ -640,10 +701,13 @@ class AppController:
def get_compute_devices(self) -> list[dict]:
"""List available compute devices."""
device_info = self.device_manager.get_device_info()
devices = [{"id": "auto", "name": "Auto-detect"}]
for dev_id, dev_name in device_info:
devices.append({"id": dev_id, "name": dev_name})
if self.device_manager:
device_info = self.device_manager.get_device_info()
for dev_id, dev_name in device_info:
devices.append({"id": dev_id, "name": dev_name})
else:
devices.append({"id": "cloud", "name": "Cloud (Deepgram)"})
return devices
# ── Update Checking ────────────────────────────────────────────

View File

@@ -79,7 +79,7 @@ async def test_start_when_not_ready(api_client, controller):
@pytest.mark.asyncio
async def test_clear(api_client, controller):
from client.transcription_engine_realtime import TranscriptionResult
from client.models import TranscriptionResult
from datetime import datetime
controller.transcriptions = [

View File

@@ -72,7 +72,7 @@ def test_double_start_rejected(controller):
def test_clear_transcriptions(controller):
"""clear_transcriptions should empty the list and return the count."""
from client.transcription_engine_realtime import TranscriptionResult
from client.models import TranscriptionResult
controller.transcriptions = [
TranscriptionResult(text="Hello", is_final=True, timestamp=datetime.now(), user_name="Alice"),
@@ -85,7 +85,7 @@ def test_clear_transcriptions(controller):
def test_get_transcriptions_text_with_timestamps(controller):
"""get_transcriptions_text should include [HH:MM:SS] prefixes when requested."""
from client.transcription_engine_realtime import TranscriptionResult
from client.models import TranscriptionResult
ts = datetime(2025, 1, 15, 10, 30, 45)
controller.transcriptions = [
@@ -125,6 +125,8 @@ def test_apply_settings_no_reload_when_same(controller):
# Ensure config returns the same values
controller.config.set("transcription.model", "base.en")
controller.config.set("transcription.device", "auto")
# Remote mode must also match (no engine means current mode is 'local')
controller.config.set("remote.mode", "local")
controller.reload_engine = MagicMock(return_value=(True, "reloaded"))
@@ -141,7 +143,7 @@ def test_apply_settings_no_reload_when_same(controller):
def test_on_final_transcription_callback_fires(controller):
"""_on_final_transcription should append and invoke on_transcription callback."""
from client.transcription_engine_realtime import TranscriptionResult
from client.models import TranscriptionResult
received = []
controller.on_transcription = lambda data: received.append(data)
@@ -166,7 +168,7 @@ def test_on_final_transcription_callback_fires(controller):
def test_on_final_transcription_ignored_when_not_transcribing(controller):
"""If the controller is not in transcribing state the callback should be a no-op."""
from client.transcription_engine_realtime import TranscriptionResult
from client.models import TranscriptionResult
controller.is_transcribing = False

View File

@@ -17,7 +17,7 @@ from datetime import datetime
from queue import Queue, Empty
from typing import Optional, Callable
from client.transcription_engine_realtime import TranscriptionResult
from client.models import TranscriptionResult
logger = logging.getLogger(__name__)
@@ -67,7 +67,7 @@ class DeepgramTranscriptionEngine:
# Audio parameters
self.sample_rate: int = 16000
self.channels: int = 1
self.blocksize: int = 4096
self.blocksize: int = 1024 # ~64ms chunks for lower latency streaming
# Callbacks
self.realtime_callback: Optional[Callable[[TranscriptionResult], None]] = None
@@ -156,17 +156,30 @@ class DeepgramTranscriptionEngine:
return True
self._stop_event.clear()
self._ws_connected = threading.Event()
self._is_recording = True
# Start the asyncio event-loop thread (handles WS send/receive)
self._thread = threading.Thread(target=self._run_event_loop, daemon=True)
self._thread.start()
# Wait for the WebSocket to connect before starting audio capture.
# Without this, audio chunks arrive before the WS is open -> broken pipe.
if not self._ws_connected.wait(timeout=15):
logger.error("Timed out waiting for Deepgram WebSocket connection")
print("ERROR: Timed out waiting for Deepgram WebSocket connection")
self._last_error = "Timed out connecting to Deepgram"
self._is_recording = False
self._stop_event.set()
return False
# Start the audio capture stream
try:
self._start_audio_stream()
except Exception as exc:
logger.error("Failed to open audio stream: %s", exc)
print(f"ERROR: Failed to open audio stream: {exc}")
self._last_error = f"Audio stream error: {exc}"
self._is_recording = False
self._stop_event.set()
return False
@@ -283,6 +296,11 @@ class DeepgramTranscriptionEngine:
if not await self._managed_handshake():
return
# Signal that the WebSocket is connected and ready
logger.info("WebSocket connected to Deepgram")
if hasattr(self, '_ws_connected'):
self._ws_connected.set()
# Run send and receive concurrently
await asyncio.gather(
self._send_loop(),
@@ -314,6 +332,8 @@ class DeepgramTranscriptionEngine:
f"model={self.deepgram_model}"
f"&language={self.language}"
"&interim_results=true"
"&punctuate=true"
"&smart_format=true"
"&encoding=linear16"
f"&sample_rate={self.sample_rate}"
f"&channels={self.channels}"
@@ -370,10 +390,16 @@ class DeepgramTranscriptionEngine:
async def _send_loop(self):
"""Drain the audio queue and push raw PCM bytes over the WebSocket."""
loop = asyncio.get_event_loop()
while not self._stop_event.is_set():
try:
pcm_bytes = self._audio_queue.get(timeout=0.1)
except Empty:
# Use run_in_executor to avoid blocking the async event loop
# (which would stall the receive loop and delay transcriptions)
pcm_bytes = await asyncio.wait_for(
loop.run_in_executor(None, lambda: self._audio_queue.get(timeout=0.5)),
timeout=1.0,
)
except (Empty, asyncio.TimeoutError):
continue
try:

29
client/models.py Normal file
View File

@@ -0,0 +1,29 @@
"""Shared data models used across transcription engines."""
from datetime import datetime
class TranscriptionResult:
"""Represents a transcription result."""
def __init__(self, text: str, is_final: bool, timestamp: datetime, user_name: str = ""):
"""
Initialize transcription result.
Args:
text: Transcribed text
is_final: Whether this is a final transcription or realtime preview
timestamp: Timestamp of transcription
user_name: Name of the user/speaker
"""
self.text = text.strip()
self.is_final = is_final
self.timestamp = timestamp
self.user_name = user_name
def __repr__(self) -> str:
time_str = self.timestamp.strftime("%H:%M:%S")
prefix = "[FINAL]" if self.is_final else "[PREVIEW]"
if self.user_name and self.user_name.strip():
return f"{prefix} [{time_str}] {self.user_name}: {self.text}"
return f"{prefix} [{time_str}] {self.text}"

View File

@@ -8,30 +8,8 @@ from threading import Lock
import logging
class TranscriptionResult:
"""Represents a transcription result."""
def __init__(self, text: str, is_final: bool, timestamp: datetime, user_name: str = ""):
"""
Initialize transcription result.
Args:
text: Transcribed text
is_final: Whether this is a final transcription or realtime preview
timestamp: Timestamp of transcription
user_name: Name of the user/speaker
"""
self.text = text.strip()
self.is_final = is_final
self.timestamp = timestamp
self.user_name = user_name
def __repr__(self) -> str:
time_str = self.timestamp.strftime("%H:%M:%S")
prefix = "[FINAL]" if self.is_final else "[PREVIEW]"
if self.user_name and self.user_name.strip():
return f"{prefix} [{time_str}] {self.user_name}: {self.text}"
return f"{prefix} [{time_str}] {self.text}"
# Re-export TranscriptionResult from the shared models module for backward compatibility
from client.models import TranscriptionResult # noqa: F401
def to_dict(self) -> dict:
"""Convert to dictionary."""

View File

@@ -42,7 +42,7 @@ transcription:
server_sync:
enabled: false
url: "http://localhost:3000/api/send"
url: ""
room: "default"
passphrase: ""
# Font settings are now in the display section (shared for local and server sync)
@@ -69,8 +69,8 @@ web_server:
host: "127.0.0.1"
remote:
mode: local # local | managed | byok
server_url: "" # Proxy server URL for managed mode (e.g., wss://your-proxy.com)
mode: byok # local | managed | byok
server_url: "https://transcribe.shadowdao.com" # Proxy server URL for managed mode
auth_token: "" # JWT stored after login (managed mode)
byok_api_key: "" # Deepgram API key for BYOK mode
deepgram_model: nova-2 # Deepgram model to use

View File

@@ -0,0 +1,169 @@
# -*- mode: python ; coding: utf-8 -*-
"""PyInstaller spec file for cloud-only Local Transcription backend.
This builds a lightweight sidecar (~50MB) that only supports Deepgram
cloud transcription (managed + BYOK). No local Whisper models, no
PyTorch, no CUDA -- just audio capture and WebSocket streaming.
"""
import sys
import os
block_cipher = None
is_windows = sys.platform == 'win32'
from PyInstaller.utils.hooks import collect_submodules, collect_data_files
# Data files
datas = [
('config/default_config.yaml', 'config'),
]
# Collect sounddevice's bundled PortAudio library (_sounddevice_data)
try:
import sounddevice
sd_path = os.path.dirname(sounddevice.__file__)
sd_data = os.path.join(sd_path, '_sounddevice_data')
if os.path.exists(sd_data):
datas.append((sd_data, '_sounddevice_data'))
print(f" + Collected sounddevice PortAudio data from {sd_data}")
# Also collect the package itself
sd_datas = collect_data_files('sounddevice')
if sd_datas:
datas += sd_datas
print(f" + Collected {len(sd_datas)} sounddevice data files")
except ImportError:
print(" - Warning: sounddevice not found")
# Hidden imports -- only lightweight deps needed for Deepgram streaming
hiddenimports = [
'sounddevice',
'_sounddevice_data',
'numpy',
# FastAPI and dependencies
'fastapi',
'fastapi.routing',
'fastapi.responses',
'starlette',
'starlette.applications',
'starlette.routing',
'starlette.responses',
'starlette.websockets',
'starlette.middleware',
'starlette.middleware.cors',
'pydantic',
'pydantic.fields',
'pydantic.main',
'anyio',
'anyio._backends',
'anyio._backends._asyncio',
'sniffio',
# Uvicorn
'uvicorn',
'uvicorn.logging',
'uvicorn.loops',
'uvicorn.loops.auto',
'uvicorn.protocols',
'uvicorn.protocols.http',
'uvicorn.protocols.http.auto',
'uvicorn.protocols.http.h11_impl',
'uvicorn.protocols.websockets',
'uvicorn.protocols.websockets.auto',
'uvicorn.protocols.websockets.wsproto_impl',
'uvicorn.lifespan',
'uvicorn.lifespan.on',
'h11',
'websockets',
'websockets.legacy',
'websockets.legacy.server',
# HTTP client
'requests',
'urllib3',
'certifi',
'charset_normalizer',
]
# Collect submodules for key packages
print("Collecting submodules for cloud backend packages...")
for package in ['fastapi', 'starlette', 'pydantic', 'pydantic_core', 'anyio', 'uvicorn', 'websockets', 'h11']:
try:
submodules = collect_submodules(package)
hiddenimports += submodules
print(f" + Collected {len(submodules)} submodules from {package}")
except Exception as e:
print(f" - Warning: Could not collect {package}: {e}")
# Collect data files
for package in ['fastapi', 'starlette', 'pydantic', 'uvicorn']:
try:
data_files = collect_data_files(package)
if data_files:
datas += data_files
except Exception:
pass
# Pydantic critical deps
hiddenimports += [
'colorsys', 'decimal', 'json', 'ipaddress', 'pathlib', 'uuid',
'email.message', 'typing_extensions',
]
a = Analysis(
['backend/main_headless.py'],
pathex=[],
binaries=[],
datas=datas,
hiddenimports=hiddenimports,
hookspath=['hooks'],
hooksconfig={},
runtime_hooks=[],
excludes=[
# Exclude all heavy ML/local transcription deps
'torch', 'torchaudio', 'torchvision',
'faster_whisper', 'ctranslate2',
'RealtimeSTT', 'webrtcvad', 'webrtcvad_wheels',
'silero_vad', 'onnxruntime',
'openwakeword', 'pvporcupine', 'pyaudio',
'noisereduce', 'scipy',
# Exclude GUI frameworks
'PySide6', 'PyQt5', 'PyQt6', 'tkinter',
# Exclude other unnecessary heavy packages
'matplotlib', 'PIL', 'cv2',
],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False,
)
pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
exe = EXE(
pyz,
a.scripts,
[],
exclude_binaries=True,
name='local-transcription-backend',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=True,
disable_windowed_traceback=False,
argv_emulation=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None,
icon='LocalTranscription.ico' if is_windows else None,
)
coll = COLLECT(
exe,
a.binaries,
a.zipfiles,
a.datas,
strip=False,
upx=True,
upx_exclude=[],
name='local-transcription-backend',
)

View File

@@ -38,6 +38,21 @@ datas = [
(vad_assets_path, 'faster_whisper/assets'),
] + pvporcupine_data_files
# Collect sounddevice's bundled PortAudio library (_sounddevice_data)
try:
import sounddevice
sd_path = os.path.dirname(sounddevice.__file__)
sd_data = os.path.join(sd_path, '_sounddevice_data')
if os.path.exists(sd_data):
datas.append((sd_data, '_sounddevice_data'))
print(f" + Collected sounddevice PortAudio data from {sd_data}")
sd_datas = collect_data_files('sounddevice')
if sd_datas:
datas += sd_datas
print(f" + Collected {len(sd_datas)} sounddevice data files")
except ImportError:
print(" - Warning: sounddevice not found")
# Hidden imports -- NO PySide6/Qt needed for headless backend
hiddenimports = [
# Transcription engine
@@ -46,6 +61,7 @@ hiddenimports = [
'faster_whisper.vad',
'ctranslate2',
'sounddevice',
'_sounddevice_data',
'scipy',
'scipy.signal',
'numpy',

View File

@@ -1,7 +1,7 @@
{
"name": "local-transcription",
"private": true,
"version": "2.0.3",
"version": "2.0.16",
"type": "module",
"scripts": {
"dev": "vite dev",

View File

@@ -1,6 +1,6 @@
[project]
name = "local-transcription"
version = "1.0.4"
version = "1.0.11"
description = "A standalone desktop application for real-time speech-to-text transcription using Whisper models"
readme = "README.md"
requires-python = ">=3.9"

View File

@@ -703,6 +703,36 @@ app.post('/api/send', async (req, res) => {
}
});
// Create room explicitly (no transcription needed)
app.post('/api/create-room', async (req, res) => {
try {
const { room, passphrase } = req.body;
if (!room || !passphrase) {
return res.status(400).json({ error: 'Missing room or passphrase' });
}
// Check if room already exists
const existing = await loadRoom(room);
if (existing) {
const valid = await verifyPassphrase(room, passphrase);
if (!valid) {
return res.status(401).json({ error: 'Room exists with different passphrase' });
}
return res.json({ status: 'ok', room, created: false, message: 'Room already exists' });
}
// Create the room (verifyPassphrase creates it if it doesn't exist)
await verifyPassphrase(room, passphrase);
console.log(`[Room] Created room "${room}"`);
res.json({ status: 'ok', room, created: true });
} catch (err) {
console.error('Error in /api/create-room:', err);
res.status(500).json({ error: err.message });
}
});
// List transcriptions
app.get('/api/list', async (req, res) => {
try {

2
src-tauri/Cargo.lock generated
View File

@@ -1881,7 +1881,7 @@ checksum = "92daf443525c4cce67b150400bc2316076100ce0b3686209eb8cf3c31612e6f0"
[[package]]
name = "local-transcription"
version = "1.4.16"
version = "2.0.12"
dependencies = [
"bytes",
"chrono",

View File

@@ -1,6 +1,6 @@
[package]
name = "local-transcription"
version = "2.0.3"
version = "2.0.16"
description = "Real-time speech-to-text transcription for streamers"
authors = ["Local Transcription Contributors"]
edition = "2021"

View File

@@ -0,0 +1,14 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.security.device.audio-input</key>
<true/>
<key>com.apple.security.network.client</key>
<true/>
<key>com.apple.security.network.server</key>
<true/>
<key>com.apple.security.cs.allow-unsigned-executable-memory</key>
<true/>
</dict>
</plist>

8
src-tauri/Info.plist Normal file
View File

@@ -0,0 +1,8 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>NSMicrophoneUsageDescription</key>
<string>Local Transcription needs microphone access for real-time speech-to-text transcription.</string>
</dict>
</plist>

View File

@@ -1 +1 @@
{}
{"default":{"identifier":"default","description":"Default permissions for the main window","local":true,"windows":["main"],"permissions":["core:default","core:event:default","core:event:allow-listen","core:event:allow-emit","shell:default","dialog:default","process:default"]}}

View File

@@ -68,8 +68,31 @@ pub fn run() {
sidecar::get_sidecar_port,
sidecar::start_sidecar,
sidecar::stop_sidecar,
sidecar::reset_sidecar,
write_log,
])
.run(tauri::generate_context!())
.expect("error while running tauri application");
.build(tauri::generate_context!())
.expect("error while building tauri application")
.run(|app, event| {
match event {
tauri::RunEvent::Exit => {
if let Some(state) = app.try_state::<sidecar::ManagedSidecar>() {
if let Ok(mut mgr) = state.0.lock() {
eprintln!("[app] Stopping sidecar on exit...");
mgr.stop();
}
}
}
tauri::RunEvent::ExitRequested { .. } => {
// Also stop sidecar on exit request (Cmd+Q on macOS)
if let Some(state) = app.try_state::<sidecar::ManagedSidecar>() {
if let Ok(mut mgr) = state.0.lock() {
eprintln!("[app] Stopping sidecar on exit request...");
mgr.stop();
}
}
}
_ => {}
}
});
}

View File

@@ -554,18 +554,27 @@ impl SidecarManager {
// -- private helpers -------------------------------------------------------
fn build_dev_command(&self) -> Result<std::process::Command, String> {
let mut cmd = std::process::Command::new("python");
cmd.args(["-u", "-m", "backend.main_headless"]); // -u = unbuffered
// Use `uv run` to ensure we use the project's venv, not system Python
let mut cmd = std::process::Command::new("uv");
cmd.args(["run", "python", "-u", "-m", "backend.main_headless"]);
// Try to find the project root (parent of src-tauri)
if let Some(dirs) = DIRS.get() {
let project_root = dirs
.resource_dir
.parent() // src-tauri
.and_then(|p| p.parent()); // project root
if let Some(root) = project_root {
cmd.current_dir(root);
}
// Find the project root: try CARGO_MANIFEST_DIR first (set at compile time),
// then fall back to resource_dir parent chain
let manifest_dir = option_env!("CARGO_MANIFEST_DIR").map(std::path::PathBuf::from);
let project_root = manifest_dir
.as_ref()
.and_then(|d| d.parent()) // src-tauri -> project root
.or_else(|| {
DIRS.get()
.and_then(|d| d.resource_dir.parent())
.and_then(|p| p.parent())
});
if let Some(root) = project_root {
eprintln!("[sidecar] Dev mode: working dir = {}", root.display());
cmd.current_dir(root);
} else {
eprintln!("[sidecar] Dev mode: WARNING - could not determine project root");
}
cmd.env("PYTHONUNBUFFERED", "1");
@@ -676,6 +685,42 @@ pub fn stop_sidecar(state: tauri::State<'_, ManagedSidecar>) -> Result<(), Strin
Ok(())
}
/// Stop the running sidecar, delete its files and version marker.
/// The next app launch will show the sidecar download prompt.
#[tauri::command]
pub fn reset_sidecar(state: tauri::State<'_, ManagedSidecar>) -> Result<(), String> {
// Stop the running sidecar first
{
let mut mgr = state
.0
.lock()
.map_err(|e| format!("Lock error: {e}"))?;
mgr.stop();
}
let data = data_dir();
// Delete the version file so check_sidecar returns false
let vf = version_file();
if vf.exists() {
std::fs::remove_file(&vf)
.map_err(|e| format!("Failed to delete version file: {e}"))?;
}
// Delete all sidecar directories
if let Ok(entries) = std::fs::read_dir(&data) {
for entry in entries.flatten() {
let name = entry.file_name().to_string_lossy().to_string();
if name.starts_with("sidecar-") && entry.path().is_dir() {
eprintln!("[sidecar] Removing {}", entry.path().display());
let _ = std::fs::remove_dir_all(entry.path());
}
}
}
Ok(())
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------

View File

@@ -1,6 +1,6 @@
{
"productName": "Local Transcription",
"version": "2.0.3",
"version": "2.0.16",
"identifier": "net.anhonesthost.local-transcription",
"build": {
"frontendDist": "../dist",
@@ -33,7 +33,14 @@
"icons/icon.icns",
"icons/icon.ico",
"icons/icon.png"
]
],
"macOS": {
"entitlements": "Entitlements.plist",
"hardenedRuntime": true
},
"windows": {
"digestAlgorithm": "sha256"
}
},
"plugins": {
"shell": {

View File

@@ -15,6 +15,7 @@
let sidecarState = $state<SidecarState>("checking");
let debugLog = $state("");
let availableUpdate = $state("");
let appVersion = $state("");
let obsDisplayUrl = $derived(backendStore.obsUrl);
let syncDisplayUrl = $derived(backendStore.syncUrl);
@@ -108,6 +109,14 @@
}
onMount(() => {
// Get app version from Tauri
import("@tauri-apps/api/app").then(({ getVersion }) =>
getVersion().then((v) => { appVersion = v; })
).catch(() => {
// Browser dev mode -- read from package.json or use fallback
appVersion = "dev";
});
checkAndLaunchSidecar();
return () => {
@@ -201,7 +210,7 @@
<TranscriptionDisplay />
<Controls />
<div class="version-label">v{backendStore.version}</div>
<div class="version-label">v{appVersion || backendStore.version}</div>
</div>
{#if showSettings}

View File

@@ -1,5 +1,6 @@
<script lang="ts">
import { backendStore } from "$lib/stores/backend";
import { configStore } from "$lib/stores/config";
import { transcriptionStore } from "$lib/stores/transcriptions";
let isTranscribing = $derived(backendStore.appState === "transcribing");
@@ -8,18 +9,39 @@
);
let isLoading = $state(false);
let remoteMode = $derived(configStore.config.remote.mode);
let byokApiKey = $derived(configStore.config.remote.byok_api_key);
let authToken = $derived(configStore.config.remote.auth_token);
let cloudConfigured = $derived(
remoteMode === "local" ||
(remoteMode === "byok" && byokApiKey.trim() !== "") ||
(remoteMode === "managed" && authToken.trim() !== "")
);
let errorMessage = $state("");
async function toggleTranscription() {
if (isLoading) return;
isLoading = true;
errorMessage = "";
try {
if (isTranscribing) {
await backendStore.apiPost("/api/stop");
} else {
await backendStore.apiPost("/api/start");
}
} catch (err) {
console.error("Failed to toggle transcription:", err);
} catch (err: unknown) {
const msg = err instanceof Error ? err.message : String(err);
// Ignore "Already transcribing/not transcribing" -- just sync the state
if (!msg.includes("400")) {
console.error("Failed to toggle transcription:", msg);
errorMessage = msg;
}
} finally {
// Always poll status to sync UI with actual backend state,
// even if the API call failed (e.g. "Already transcribing")
await backendStore.pollStatus();
isLoading = false;
}
}
@@ -83,7 +105,7 @@
<button
class={isTranscribing ? "danger" : "primary"}
onclick={toggleTranscription}
disabled={!isReady || isLoading}
disabled={!isReady || isLoading || !cloudConfigured}
>
{#if isLoading}
...
@@ -101,9 +123,43 @@
<button onclick={saveTranscriptions} disabled={!backendStore.connected}>
Save
</button>
{#if errorMessage}
<span class="error-msg">{errorMessage}</span>
{/if}
{#if !cloudConfigured && isReady}
<div class="cloud-warning">
{#if remoteMode === "byok"}
<span>API key required. Get one at
<a href="https://console.deepgram.com" target="_blank" rel="noopener">console.deepgram.com</a>,
then enter it in Settings.</span>
{:else if remoteMode === "managed"}
<span>Login required. Open Settings to log in.</span>
{/if}
</div>
{/if}
</div>
<style>
.error-msg {
color: #f44336;
font-size: 12px;
margin-left: 8px;
}
.cloud-warning {
font-size: 12px;
color: #ff9800;
margin-left: 8px;
flex: 1;
}
.cloud-warning a {
color: #4fc3f7;
text-decoration: underline;
}
.controls {
display: flex;
align-items: center;

View File

@@ -27,6 +27,10 @@
let showTimestamps = $state(true);
let fadeSeconds = $state(10);
let maxLines = $state(100);
let fontSource = $state("System Font");
let fontFamily = $state("Courier");
let websafeFont = $state("Arial");
let googleFont = $state("Roboto");
let fontSize = $state(12);
let userColor = $state("#4CAF50");
let textColor = $state("#FFFFFF");
@@ -42,6 +46,14 @@
let managedPassword = $state("");
let autoCheckUpdates = $state(true);
let isCloudMode = $derived(remoteMode === "managed" || remoteMode === "byok");
// Room creation / join state
let shareCode = $state("");
let joinCode = $state("");
let roomCreating = $state(false);
let roomCreateMessage = $state("");
let saving = $state(false);
let saveMessage = $state("");
@@ -99,6 +111,10 @@
showTimestamps = cfg.display.show_timestamps;
fadeSeconds = cfg.display.fade_after_seconds;
maxLines = cfg.display.max_lines;
fontSource = cfg.display.font_source ?? "System Font";
fontFamily = cfg.display.font_family ?? "Courier";
websafeFont = cfg.display.websafe_font ?? "Arial";
googleFont = cfg.display.google_font ?? "Roboto";
fontSize = cfg.display.font_size;
userColor = cfg.display.user_color;
textColor = cfg.display.text_color;
@@ -174,6 +190,10 @@
show_timestamps: showTimestamps,
fade_after_seconds: fadeSeconds,
max_lines: maxLines,
font_source: fontSource,
font_family: fontFamily,
websafe_font: websafeFont,
google_font: googleFont,
font_size: fontSize,
user_color: userColor,
text_color: textColor,
@@ -187,7 +207,7 @@
},
remote: {
mode: remoteMode,
server_url: remoteServerUrl,
server_url: remoteServerUrl || MANAGED_SERVER_URL,
byok_api_key: byokApiKey,
},
updates: {
@@ -220,25 +240,133 @@
}
}
async function handleChangeSidecar() {
try {
const { invoke } = await import("@tauri-apps/api/core");
await invoke("reset_sidecar");
// Force a page reload which will re-trigger the setup flow
window.location.reload();
} catch (err) {
console.error("Failed to reset sidecar:", err);
saveMessage = `Error: ${err}`;
}
}
const MANAGED_SERVER_URL = "https://transcribe.shadowdao.com";
async function handleManagedLogin() {
try {
await backendStore.apiPost("/api/login", {
email: managedEmail,
password: managedPassword,
server_url: remoteServerUrl || MANAGED_SERVER_URL,
});
} catch (err) {
console.error("Login failed:", err);
}
}
async function handleManagedRegister() {
const CAPTION_SERVER = "https://caption.shadowdao.com";
function generateRandomName(): string {
const adjectives = ['swift', 'bright', 'cosmic', 'electric', 'turbo', 'mega', 'ultra', 'super', 'hyper', 'alpha'];
const nouns = ['phoenix', 'dragon', 'tiger', 'falcon', 'comet', 'storm', 'blaze', 'thunder', 'frost', 'nebula'];
const num = Math.floor(Math.random() * 10000);
return `${adjectives[Math.floor(Math.random() * adjectives.length)]}-${nouns[Math.floor(Math.random() * nouns.length)]}-${num}`;
}
function generateRandomPassphrase(): string {
const chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
let result = '';
for (let i = 0; i < 16; i++) {
result += chars.charAt(Math.floor(Math.random() * chars.length));
}
return result;
}
function encodeShareCode(url: string, room: string, passphrase: string): string {
return btoa(JSON.stringify({ url, room, passphrase }));
}
function decodeShareCode(code: string): { url: string; room: string; passphrase: string } | null {
try {
await backendStore.apiPost("/api/register", {
email: managedEmail,
password: managedPassword,
const json = JSON.parse(atob(code.trim()));
if (json.url && json.room && json.passphrase) {
return json;
}
return null;
} catch {
return null;
}
}
async function handleCreateRoom() {
roomCreating = true;
roomCreateMessage = "";
shareCode = "";
const room = generateRandomName();
const passphrase = generateRandomPassphrase();
const serverSendUrl = `${CAPTION_SERVER}/api/send`;
try {
const resp = await fetch(`${CAPTION_SERVER}/api/create-room`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ room, passphrase }),
});
if (!resp.ok) {
const err = await resp.json().catch(() => ({ error: "Request failed" }));
roomCreateMessage = `Error: ${err.error || resp.statusText}`;
return;
}
syncUrl = serverSendUrl;
syncRoom = room;
syncPassphrase = passphrase;
syncEnabled = true;
shareCode = encodeShareCode(serverSendUrl, room, passphrase);
roomCreateMessage = "Room created! Share the code below with others.";
} catch (err) {
console.error("Register failed:", err);
roomCreateMessage = `Error: ${err instanceof Error ? err.message : String(err)}`;
} finally {
roomCreating = false;
}
}
function handleJoinRoom() {
const decoded = decodeShareCode(joinCode);
if (!decoded) {
roomCreateMessage = "Invalid share code. Please check and try again.";
return;
}
syncUrl = decoded.url;
syncRoom = decoded.room;
syncPassphrase = decoded.passphrase;
syncEnabled = true;
joinCode = "";
roomCreateMessage = "Room joined! Fields have been auto-filled.";
}
async function handleShareCurrentRoom() {
const code = encodeShareCode(syncUrl, syncRoom, syncPassphrase);
shareCode = code;
try {
await navigator.clipboard.writeText(code);
roomCreateMessage = "Share code copied to clipboard!";
} catch {
roomCreateMessage = "Share code generated. Copy it from the field below.";
}
}
async function copyShareCode() {
try {
await navigator.clipboard.writeText(shareCode);
roomCreateMessage = "Share code copied to clipboard!";
} catch {
roomCreateMessage = "Failed to copy. Please select and copy manually.";
}
}
@@ -303,7 +431,83 @@
</div>
</section>
<!-- Transcription Settings -->
<!-- Remote Transcription (moved up for cloud-first UX) -->
<section class="settings-section">
<h3>Transcription Mode</h3>
<div class="radio-group">
<label>
<input
type="radio"
name="remote-mode"
value="byok"
bind:group={remoteMode}
/>
Cloud (Deepgram)
</label>
<label>
<input
type="radio"
name="remote-mode"
value="managed"
bind:group={remoteMode}
/>
Managed Service
</label>
<label>
<input
type="radio"
name="remote-mode"
value="local"
bind:group={remoteMode}
/>
Local (Whisper)
</label>
</div>
{#if remoteMode === "byok"}
<div class="field">
<label for="byok-key">Deepgram API Key</label>
<input
id="byok-key"
type="password"
bind:value={byokApiKey}
placeholder="Enter your Deepgram API key"
/>
<p style="font-size: 11px; color: var(--text-muted); margin-top: 4px;">
Get a key at <a href="https://console.deepgram.com" target="_blank" rel="noopener" style="color: var(--accent-blue);">console.deepgram.com</a>
</p>
</div>
{/if}
{#if remoteMode === "managed"}
<div class="managed-auth">
<div class="field">
<label for="managed-email">Email</label>
<input
id="managed-email"
type="email"
bind:value={managedEmail}
placeholder="email@example.com"
/>
</div>
<div class="field">
<label for="managed-password">Password</label>
<input
id="managed-password"
type="password"
bind:value={managedPassword}
/>
</div>
<div class="auth-buttons">
<button onclick={handleManagedLogin}>Login</button>
</div>
<p style="font-size: 11px; color: var(--text-muted); margin-top: 8px;">
Don't have an account? <a href="https://transcribe.shadowdao.com/register.html" target="_blank" rel="noopener" style="color: var(--accent-blue);">Sign up here</a>
</p>
</div>
{/if}
</section>
{#if !isCloudMode}
<!-- Transcription Settings (local Whisper only) -->
<section class="settings-section">
<h3>Transcription Settings</h3>
<div class="field">
@@ -449,6 +653,7 @@
/>
</div>
</section>
{/if}
<!-- Display Settings -->
<section class="settings-section">
@@ -485,6 +690,95 @@
bind:value={maxLines}
/>
</div>
<div class="field">
<label for="font-source">Font Source</label>
<select id="font-source" bind:value={fontSource}>
<option value="System Font">System Font</option>
<option value="Web-Safe">Web-Safe</option>
<option value="Google Font">Google Font</option>
</select>
</div>
{#if fontSource === "System Font"}
<div class="field">
<label for="font-family">System Font Family</label>
<input id="font-family" type="text" bind:value={fontFamily} placeholder="Courier" />
</div>
{/if}
{#if fontSource === "Web-Safe"}
<div class="field">
<label for="websafe-font">Web-Safe Font</label>
<select id="websafe-font" bind:value={websafeFont}>
<option value="Arial">Arial</option>
<option value="Arial Black">Arial Black</option>
<option value="Comic Sans MS">Comic Sans MS</option>
<option value="Courier New">Courier New</option>
<option value="Georgia">Georgia</option>
<option value="Impact">Impact</option>
<option value="Lucida Console">Lucida Console</option>
<option value="Lucida Sans Unicode">Lucida Sans Unicode</option>
<option value="Palatino Linotype">Palatino Linotype</option>
<option value="Tahoma">Tahoma</option>
<option value="Times New Roman">Times New Roman</option>
<option value="Trebuchet MS">Trebuchet MS</option>
<option value="Verdana">Verdana</option>
</select>
</div>
{/if}
{#if fontSource === "Google Font"}
<div class="field">
<label for="google-font">Google Font</label>
<select id="google-font" bind:value={googleFont}>
<optgroup label="Sans Serif">
<option value="Roboto">Roboto</option>
<option value="Open Sans">Open Sans</option>
<option value="Lato">Lato</option>
<option value="Montserrat">Montserrat</option>
<option value="Poppins">Poppins</option>
<option value="Nunito">Nunito</option>
<option value="Raleway">Raleway</option>
<option value="Ubuntu">Ubuntu</option>
<option value="Rubik">Rubik</option>
<option value="Work Sans">Work Sans</option>
<option value="Inter">Inter</option>
<option value="Outfit">Outfit</option>
<option value="Quicksand">Quicksand</option>
<option value="Comfortaa">Comfortaa</option>
<option value="Varela Round">Varela Round</option>
</optgroup>
<optgroup label="Serif">
<option value="Playfair Display">Playfair Display</option>
<option value="Merriweather">Merriweather</option>
<option value="Lora">Lora</option>
<option value="PT Serif">PT Serif</option>
<option value="Crimson Text">Crimson Text</option>
</optgroup>
<optgroup label="Monospace">
<option value="Roboto Mono">Roboto Mono</option>
<option value="Source Code Pro">Source Code Pro</option>
<option value="Fira Code">Fira Code</option>
<option value="JetBrains Mono">JetBrains Mono</option>
<option value="IBM Plex Mono">IBM Plex Mono</option>
</optgroup>
<optgroup label="Display">
<option value="Bebas Neue">Bebas Neue</option>
<option value="Oswald">Oswald</option>
<option value="Righteous">Righteous</option>
<option value="Bangers">Bangers</option>
<option value="Permanent Marker">Permanent Marker</option>
</optgroup>
<optgroup label="Handwriting">
<option value="Pacifico">Pacifico</option>
<option value="Lobster">Lobster</option>
<option value="Dancing Script">Dancing Script</option>
<option value="Caveat">Caveat</option>
<option value="Satisfy">Satisfy</option>
</optgroup>
</select>
<p style="font-size: 11px; color: var(--text-muted); margin-top: 4px;">
Browse more at <a href="https://fonts.google.com" target="_blank" rel="noopener" style="color: var(--accent-blue);">fonts.google.com</a>
</p>
</div>
{/if}
<div class="field">
<label for="font-size">Font Size: {fontSize}px</label>
<input
@@ -515,11 +809,11 @@
</div>
</section>
<!-- Server Sync -->
<!-- Server Sync (Shared Captions) -->
<section class="settings-section">
<h3>Server Sync</h3>
<h3>Shared Captions</h3>
<div class="field-row">
<label for="sync-enabled">Enable Server Sync</label>
<label for="sync-enabled">Enable Shared Captions</label>
<input
id="sync-enabled"
type="checkbox"
@@ -527,13 +821,57 @@
/>
</div>
{#if syncEnabled}
<div class="room-actions">
<div class="room-buttons-row">
<button
onclick={handleCreateRoom}
disabled={roomCreating}
class="secondary"
>
{roomCreating ? "Creating..." : "Create Room"}
</button>
<button
onclick={handleShareCurrentRoom}
disabled={!syncUrl.trim() || !syncRoom.trim() || !syncPassphrase.trim()}
class="secondary"
>
Share Current Room
</button>
</div>
<div class="join-row">
<input
type="text"
bind:value={joinCode}
placeholder="Paste share code to join"
class="join-input"
/>
<button onclick={handleJoinRoom} disabled={!joinCode.trim()} class="secondary">
Join
</button>
</div>
</div>
{#if roomCreateMessage}
<p class="room-message" class:error={roomCreateMessage.startsWith("Error")}>{roomCreateMessage}</p>
{/if}
{#if shareCode}
<div class="share-code-box">
<label>Share Code</label>
<div class="share-code-row">
<input type="text" value={shareCode} readonly class="share-code-input" />
<button onclick={copyShareCode} class="secondary">Copy</button>
</div>
</div>
{/if}
<div class="field">
<label for="sync-url">Server URL</label>
<input
id="sync-url"
type="url"
bind:value={syncUrl}
placeholder="http://localhost:3000/api/send"
placeholder="https://caption.shadowdao.com/api/send"
/>
</div>
<div class="field">
@@ -551,90 +889,6 @@
{/if}
</section>
<!-- Remote Transcription -->
<section class="settings-section">
<h3>Remote Transcription</h3>
<div class="radio-group">
<label>
<input
type="radio"
name="remote-mode"
value="local"
bind:group={remoteMode}
/>
Local
</label>
<label>
<input
type="radio"
name="remote-mode"
value="managed"
bind:group={remoteMode}
/>
Managed
</label>
<label>
<input
type="radio"
name="remote-mode"
value="byok"
bind:group={remoteMode}
/>
BYOK (Bring Your Own Key)
</label>
</div>
{#if remoteMode === "managed"}
<div class="field">
<label for="remote-url">Server URL</label>
<input
id="remote-url"
type="url"
bind:value={remoteServerUrl}
placeholder="wss://your-proxy.com"
/>
</div>
{/if}
{#if remoteMode === "byok"}
<div class="field">
<label for="byok-key">Deepgram API Key</label>
<input
id="byok-key"
type="password"
bind:value={byokApiKey}
placeholder="Enter your Deepgram API key"
/>
<p style="font-size: 11px; color: var(--text-muted); margin-top: 4px;">
Get a key at <a href="https://console.deepgram.com" target="_blank" rel="noopener" style="color: var(--accent-blue);">console.deepgram.com</a>
</p>
</div>
{/if}
{#if remoteMode === "managed"}
<div class="managed-auth">
<div class="field">
<label for="managed-email">Email</label>
<input
id="managed-email"
type="email"
bind:value={managedEmail}
placeholder="email@example.com"
/>
</div>
<div class="field">
<label for="managed-password">Password</label>
<input
id="managed-password"
type="password"
bind:value={managedPassword}
/>
</div>
<div class="auth-buttons">
<button onclick={handleManagedLogin}>Login</button>
<button onclick={handleManagedRegister}>Register</button>
</div>
</div>
{/if}
</section>
<!-- Updates -->
<section class="settings-section">
<h3>Updates</h3>
@@ -648,6 +902,17 @@
</div>
<button onclick={handleCheckUpdates}>Check Now</button>
</section>
<!-- Transcription Engine -->
<section class="settings-section">
<h3>Transcription Engine</h3>
<p style="font-size: 12px; color: var(--text-secondary); margin-bottom: 12px;">
Switch between local (Whisper) and cloud (Deepgram) transcription engines.
This will stop the current engine, remove the downloaded files, and restart
with the new engine selection.
</p>
<button class="danger-btn" onclick={handleChangeSidecar}>Change Transcription Engine</button>
</section>
</div>
<div class="settings-footer">
@@ -818,4 +1083,90 @@
.save-message.error {
color: #f44336;
}
.room-actions {
display: flex;
flex-direction: column;
gap: 8px;
margin-bottom: 12px;
}
.room-buttons-row {
display: flex;
gap: 8px;
}
.join-row {
display: flex;
gap: 8px;
}
.join-input {
flex: 1;
}
.room-message {
font-size: 12px;
color: #4CAF50;
margin-bottom: 8px;
}
.room-message.error {
color: #f44336;
}
.share-code-box {
margin-bottom: 12px;
}
.share-code-box label {
display: block;
margin-bottom: 4px;
font-size: 12px;
color: var(--text-secondary);
}
.share-code-row {
display: flex;
gap: 8px;
}
.share-code-input {
flex: 1;
font-size: 11px;
font-family: monospace;
}
.secondary {
background: transparent;
border: 1px solid var(--border-color);
color: var(--text-primary);
padding: 6px 12px;
border-radius: 6px;
cursor: pointer;
font-size: 13px;
}
.secondary:hover {
background: var(--bg-tertiary);
}
.secondary:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.danger-btn {
background: transparent;
border: 1px solid var(--accent-red, #f44336);
color: var(--accent-red, #f44336);
padding: 8px 16px;
border-radius: 6px;
cursor: pointer;
font-size: 13px;
}
.danger-btn:hover {
background: rgba(244, 67, 54, 0.1);
}
</style>

View File

@@ -36,11 +36,12 @@
try {
// Listen for progress events from the Tauri backend
unlisten = await listen<{ progress: number; message: string }>(
unlisten = await listen<{ downloaded: number; total: number; phase: string; message: string }>(
"sidecar-download-progress",
(event) => {
progress = event.payload.progress;
progressMessage = event.payload.message;
const { downloaded, total, message } = event.payload;
progress = total > 0 ? (downloaded / total) * 100 : 0;
progressMessage = message;
}
);
@@ -84,11 +85,29 @@
{#if setupState === "choose"}
<p class="setup-description">
The app needs to download its transcription engine before you can start.
Choose the version that best fits your hardware.
Choose a transcription engine. You can change this later in Settings.
</p>
<div class="variant-options">
<label class="variant-option" class:selected={variant === "cloud"}>
<input
type="radio"
name="variant"
value="cloud"
bind:group={variant}
/>
<div class="variant-info">
<span class="variant-name">Cloud (Deepgram)</span>
<span class="variant-desc">~50 MB download</span>
<span class="variant-detail">
Fast, accurate streaming transcription via Deepgram's servers.
Requires internet and a Deepgram API key.
Best for most users — low resource usage, works on any hardware.
</span>
<span class="variant-tag recommended">Recommended</span>
</div>
</label>
<label class="variant-option" class:selected={variant === "cpu"}>
<input
type="radio"
@@ -97,23 +116,16 @@
bind:group={variant}
/>
<div class="variant-info">
<span class="variant-name">Standard (CPU)</span>
<span class="variant-desc">Works on all computers (~500 MB download)</span>
<span class="variant-name">Local - CPU</span>
<span class="variant-desc">~500 MB download</span>
<span class="variant-detail">
Runs Whisper AI models locally on your CPU. No internet needed
after download. Good for privacy or offline use, but slower and
uses more system resources than cloud.
</span>
</div>
</label>
<label class="variant-option" class:selected={variant === "cuda"}>
<input
type="radio"
name="variant"
value="cuda"
bind:group={variant}
/>
<div class="variant-info">
<span class="variant-name">GPU Accelerated (CUDA)</span>
<span class="variant-desc">Faster transcription with NVIDIA GPU (~2 GB download)</span>
</div>
</label>
</div>
<button class="download-btn" onclick={startDownload}>
@@ -260,6 +272,30 @@
color: #888;
}
.variant-detail {
font-size: 11px;
color: #666;
line-height: 1.4;
margin-top: 2px;
}
.variant-tag {
display: inline-block;
font-size: 10px;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
padding: 2px 6px;
border-radius: 3px;
margin-top: 4px;
width: fit-content;
}
.variant-tag.recommended {
background: rgba(76, 175, 80, 0.15);
color: #4CAF50;
}
.download-btn {
display: block;
width: 100%;

View File

@@ -302,6 +302,7 @@ export const backendStore = {
setPort,
connect: connectWebSocket,
disconnect,
pollStatus,
apiUrl,
apiFetch,
apiGet,

View File

@@ -107,7 +107,7 @@ function getDefaultConfig(): AppConfig {
},
server_sync: {
enabled: false,
url: "http://localhost:3000/api/send",
url: "",
room: "default",
passphrase: "",
},
@@ -128,7 +128,7 @@ function getDefaultConfig(): AppConfig {
},
web_server: { port: 8080, host: "127.0.0.1" },
remote: {
mode: "local",
mode: "byok",
server_url: "",
auth_token: "",
byok_api_key: "",

View File

@@ -1,7 +1,7 @@
"""Version information for Local Transcription."""
__version__ = "2.0.3"
__version_info__ = (2, 0, 3)
__version__ = "2.0.16"
__version_info__ = (2, 0, 16)
# Version history:
# 1.4.0 - Auto-update feature: