Update README and CLAUDE.md for Tauri rewrite

Update both docs to reflect the new architecture: - Tauri v2 + Svelte 5 frontend replacing PySide6/Qt - Headless Python backend with FastAPI control API - Cross-platform support (Windows, macOS, Linux) - Deepgram remote transcription (managed/BYOK) - Gitea CI/CD workflows for automated builds - New project structure with backend/, src/, src-tauri/ - Updated development commands and build instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:34:10 -07:00
parent 25d2a55efb
commit 47ca74e75d
2 changed files with 342 additions and 295 deletions
--- a/README.md
+++ b/README.md
@@ -1,13 +1,14 @@
 # Local Transcription

-A real-time speech-to-text desktop application for streamers. Run locally on your machine with GPU or CPU, display transcriptions via OBS browser source, and optionally sync with other users through a multi-user server.
+A real-time speech-to-text desktop application for streamers. Runs locally on your machine with GPU or CPU, displays transcriptions via OBS browser source, and optionally syncs with other users through a multi-user server.

 **Version 1.4.0**

 ## Features

 - **Real-Time Transcription**: Live speech-to-text using Whisper models with minimal latency
- **Standalone Desktop App**: PySide6/Qt GUI that works without any server
+- **Cross-Platform**: Native desktop app for Windows, macOS, and Linux via [Tauri](https://tauri.app/)
+- **Dual Transcription Modes**: Local (Whisper) or cloud (Deepgram) with managed billing or BYOK
 - **CPU & GPU Support**: Automatic detection of CUDA (NVIDIA), MPS (Apple Silicon), or CPU fallback
 - **Advanced Voice Detection**: Dual-layer VAD (WebRTC + Silero) for accurate speech detection
 - **OBS Integration**: Built-in web server for browser source capture at `http://localhost:8080`
@@ -16,36 +17,70 @@ A real-time speech-to-text desktop application for streamers. Run locally on you
 - **Customizable Colors**: User-configurable colors for name, text, and background
 - **Noise Suppression**: Built-in audio preprocessing to reduce background noise
 - **Auto-Updates**: Automatic update checking with release notes display
- **Cross-Platform**: Builds available for Windows and Linux
+
+## Architecture
+
+The application uses a two-process architecture:
+
+1. **Tauri Shell** (Svelte 5 frontend) — lightweight native window (~50MB) rendering the UI
+2. **Python Backend** (sidecar) — headless process running transcription, audio capture, and the OBS web server
+
+The Tauri frontend communicates with the Python backend via REST API and WebSocket, following the same pattern as [voice-to-notes](https://repo.anhonesthost.net/MacroPad/voice-to-notes).
+
+```
+Tauri App (user launches this)
+  └─ Spawns Python backend as sidecar
+       ├─ FastAPI REST API (control endpoints)
+       ├─ WebSocket /ws/control (real-time state + transcriptions)
+       ├─ OBS web display at http://localhost:8080
+       └─ Transcription engine (Whisper or Deepgram)
+```
+
+> **Legacy GUI**: The original PySide6/Qt desktop GUI (`main.py`) still works alongside the new Tauri frontend during the transition period.

 ## Quick Start

 ### Running from Source

 ```bash
-# Install dependencies
+# Install Python dependencies
 uv sync

-# Run the application
+# Run the Tauri app (frontend + backend)
+npm install
+npm run tauri dev
+
+# Or run just the headless backend (for development)
+uv run python -m backend.main_headless
+
+# Or run the legacy PySide6 GUI
 uv run python main.py
 ```

 ### Using Pre-Built Executables

-Download the latest release from the [releases page](https://repo.anhonesthost.net/streamer-tools/local-transcription/releases) and run the executable for your platform.
+Download the latest release from the [releases page](https://repo.anhonesthost.net/streamer-tools/local-transcription/releases):
+
+- **App installer** (Tauri shell): `.msi` (Windows), `.dmg` (macOS), `.deb`/`.rpm`/`.AppImage` (Linux)
+- **Sidecar** (Python backend): Download the matching `sidecar-*` zip for your platform (CUDA or CPU)

 ### Building from Source

-**Linux:**
 ```bash
-./build.sh
-# Output: dist/LocalTranscription/LocalTranscription
-```
+# Build the Tauri app
+npm install
+npm run tauri build
+# Output: src-tauri/target/release/bundle/

-**Windows:**
-```cmd
+# Build the Python sidecar (headless, no Qt)
+uv sync
+uv run pyinstaller local-transcription-headless.spec
+# Output: dist/local-transcription-backend/
+
+# Build the legacy PySide6 app (Linux)
+./build.sh
+# Build the legacy PySide6 app (Windows)
 build.bat
-# Output: dist\LocalTranscription\LocalTranscription.exe
 ```

 For detailed build instructions, see [BUILD.md](BUILD.md).
@@ -57,14 +92,23 @@ For detailed build instructions, see [BUILD.md](BUILD.md).
 1. Launch the application
 2. Select your microphone from the audio device dropdown
 3. Choose a Whisper model (smaller = faster, larger = more accurate):
-   - `tiny.en` / `tiny` - Fastest, good for quick captions
-   - `base.en` / `base` - Balanced speed and accuracy
-   - `small.en` / `small` - Better accuracy
-   - `medium.en` / `medium` - High accuracy
-   - `large-v3` - Best accuracy (requires more resources)
+   - `tiny.en` / `tiny` — Fastest, good for quick captions
+   - `base.en` / `base` — Balanced speed and accuracy
+   - `small.en` / `small` — Better accuracy
+   - `medium.en` / `medium` — High accuracy
+   - `large-v3` — Best accuracy (requires more resources)
 4. Click **Start** to begin transcription
 5. Transcriptions appear in the main window and at `http://localhost:8080`

+### Remote Transcription (Deepgram)
+
+Instead of local Whisper models, you can use cloud-based transcription:
+
+- **Managed mode**: Sign up via the transcription proxy for metered billing
+- **BYOK mode**: Bring your own Deepgram API key for direct access
+
+Configure in Settings > Remote Transcription.
+
 ### OBS Browser Source Setup

 1. Start the Local Transcription app
@@ -88,7 +132,7 @@ For syncing transcriptions across multiple users (e.g., multi-host streams or tr

 ## Configuration

-Settings are stored at `~/.local-transcription/config.yaml` and can be modified through the GUI settings panel.
+Settings are stored at `~/.local-transcription/config.yaml` and can be modified through the GUI settings panel or the REST API.

 ### Key Settings

@@ -100,6 +144,7 @@ Settings are stored at `~/.local-transcription/config.yaml` and can be modified
 | `transcription.silero_sensitivity` | VAD sensitivity (0-1, lower = more sensitive) | `0.4` |
 | `transcription.post_speech_silence_duration` | Silence before finalizing (seconds) | `0.3` |
 | `transcription.continuous_mode` | Fast speaker mode for quick talkers | `false` |
+| `remote.mode` | Transcription mode (local/managed/byok) | `local` |
 | `display.show_timestamps` | Show timestamps with transcriptions | `true` |
 | `display.fade_after_seconds` | Fade out time (0 = never) | `10` |
 | `display.font_source` | Font type (System Font/Web-Safe/Google Font/Custom File) | `System Font` |
@@ -111,67 +156,114 @@ See [config/default_config.yaml](config/default_config.yaml) for all available o

 ```
 local-transcription/
-├── client/                      # Core transcription modules
-│   ├── audio_capture.py         # Audio input handling
-│   ├── transcription_engine_realtime.py  # RealtimeSTT integration
-│   ├── noise_suppression.py     # VAD and noise reduction
-│   ├── device_utils.py          # CPU/GPU detection
-│   ├── config.py                # Configuration management
-│   ├── server_sync.py           # Multi-user server client
-│   └── update_checker.py        # Auto-update functionality
-├── gui/                         # Desktop application UI
-│   ├── main_window_qt.py        # Main application window
-│   ├── settings_dialog_qt.py    # Settings dialog
-│   └── transcription_display_qt.py  # Display widget
-├── server/                      # Web servers
-│   ├── web_display.py           # Local FastAPI server for OBS
-│   └── nodejs/                  # Multi-user sync server
-│       ├── server.js            # Express + WebSocket server
-│       └── README.md            # Deployment instructions
+├── src/                             # Svelte 5 frontend (Tauri UI)
+│   ├── App.svelte                   # Main app shell
+│   ├── lib/components/              # UI components
+│   │   ├── Header.svelte
+│   │   ├── StatusBar.svelte
+│   │   ├── Controls.svelte
+│   │   ├── TranscriptionDisplay.svelte
+│   │   └── Settings.svelte
+│   └── lib/stores/                  # Reactive state management
+│       ├── backend.ts               # WebSocket + REST API client
+│       ├── config.ts                # App configuration
+│       └── transcriptions.ts        # Transcription data
+├── src-tauri/                       # Tauri v2 Rust shell
+│   ├── src/main.rs
+│   └── tauri.conf.json
+├── backend/                         # Headless Python backend (sidecar)
+│   ├── app_controller.py            # Orchestration logic (engine, sync, config)
+│   ├── api_server.py                # FastAPI REST + WebSocket control API
+│   └── main_headless.py             # Headless entry point
+├── client/                          # Core transcription modules
+│   ├── audio_capture.py             # Audio input handling
+│   ├── transcription_engine_realtime.py  # RealtimeSTT / Whisper
+│   ├── deepgram_transcription.py    # Deepgram cloud transcription
+│   ├── noise_suppression.py         # VAD and noise reduction
+│   ├── device_utils.py              # CPU/GPU/MPS detection
+│   ├── config.py                    # Configuration management
+│   ├── server_sync.py               # Multi-user server client
+│   └── update_checker.py            # Auto-update functionality
+├── gui/                             # Legacy PySide6/Qt GUI
+│   ├── main_window_qt.py
+│   ├── settings_dialog_qt.py
+│   └── transcription_display_qt.py
+├── server/                          # Web servers
+│   ├── web_display.py               # Local FastAPI server for OBS
+│   └── nodejs/                      # Multi-user sync server
+├── .gitea/workflows/                # CI/CD
+│   ├── release.yml                  # Tauri app builds (all platforms)
+│   └── build-sidecar.yml            # Python sidecar builds (CUDA + CPU)
 ├── config/
-│   └── default_config.yaml      # Default settings template
-├── main.py                      # GUI entry point
-├── main_cli.py                  # CLI version (for testing)
-├── build.sh                     # Linux build script
-├── build.bat                    # Windows build script
-└── local-transcription.spec     # PyInstaller configuration
+│   └── default_config.yaml          # Default settings template
+├── main.py                          # Legacy GUI entry point
+├── main_cli.py                      # CLI version (for testing)
+├── local-transcription.spec         # PyInstaller config (legacy, with PySide6)
+├── local-transcription-headless.spec # PyInstaller config (headless sidecar)
+├── pyproject.toml                   # Python dependencies
+└── package.json                     # Node.js / Tauri dependencies
 ```

 ## Technology Stack

-### Desktop Application
+### Frontend (Tauri)
+- **Tauri v2** — Native cross-platform shell (Rust)
+- **Svelte 5** — Reactive UI framework (TypeScript)
+- **Vite** — Frontend build tool
+
+### Backend (Python Sidecar)
 - **Python 3.9+**
- **PySide6** - Qt6 GUI framework
- **RealtimeSTT** - Real-time speech-to-text with advanced VAD
- **faster-whisper** - Optimized Whisper model inference
- **PyTorch** - ML framework (CUDA-enabled)
- **sounddevice** - Cross-platform audio capture
- **webrtcvad + silero_vad** - Voice activity detection
- **noisereduce** - Noise suppression
+- **FastAPI + Uvicorn** — REST API and WebSocket server
+- **RealtimeSTT** — Real-time speech-to-text with advanced VAD
+- **faster-whisper** — Optimized Whisper model inference (CTranslate2)
+- **PyTorch** — ML framework (CUDA-enabled builds available)
+- **sounddevice** — Cross-platform audio capture
+- **webrtcvad + silero_vad** — Voice activity detection

-### Web Servers
- **FastAPI + Uvicorn** - Local web display server
- **Node.js + Express + WebSocket** - Multi-user sync server
+### Multi-User Server (Optional)
+- **Node.js + Express + WebSocket** — Real-time sync server

-### Build Tools
- **PyInstaller** - Executable packaging
- **uv** - Fast Python package manager
+### Build & CI/CD
+- **PyInstaller** — Python sidecar packaging
+- **Tauri CLI** — App bundling (.msi, .dmg, .deb, .rpm, .AppImage)
+- **Gitea Actions** — Automated cross-platform builds
+- **uv** — Fast Python package manager
+
+## CI/CD
+
+Two Gitea Actions workflows in `.gitea/workflows/`:
+
+| Workflow | Trigger | Produces |
+|----------|---------|----------|
+| `release.yml` | Push to `main` | Tauri app installers for all platforms |
+| `build-sidecar.yml` | Changes to `client/`, `server/`, `backend/`, or `pyproject.toml` | Python sidecar zips (CUDA + CPU) |
+
+Both workflows require a `BUILD_TOKEN` secret in the repo settings (Gitea API token with release write access).
+
+### Release Artifacts
+
+| Platform | App Installer | Sidecar (CUDA) | Sidecar (CPU) |
+|----------|--------------|----------------|---------------|
+| Linux x86_64 | `.deb`, `.rpm`, `.AppImage` | `sidecar-linux-x86_64-cuda.zip` | `sidecar-linux-x86_64-cpu.zip` |
+| Windows x86_64 | `.msi`, `-setup.exe` | `sidecar-windows-x86_64-cuda.zip` | `sidecar-windows-x86_64-cpu.zip` |
+| macOS ARM64 | `.dmg` | — | `sidecar-macos-aarch64-cpu.zip` |

 ## System Requirements

 ### Minimum
- Python 3.9+
 - 4GB RAM
 - Any modern CPU

-### Recommended (for real-time performance)
+### Recommended (for local real-time transcription)
 - 8GB+ RAM
 - NVIDIA GPU with CUDA support (for GPU acceleration)
- FFmpeg (installed automatically with dependencies)

 ### For Building
- **Linux**: gcc, Python dev headers
- **Windows**: Visual Studio Build Tools, Python dev headers
+- **Tauri app**: Node.js 20+, Rust stable, platform SDK (see [Tauri prerequisites](https://tauri.app/start/prerequisites/))
+- **Python sidecar**: Python 3.9+, uv, PyInstaller
+- **Linux**: `libgtk-3-dev`, `libwebkit2gtk-4.1-dev`, `libappindicator3-dev`, `librsvg2-dev`, `patchelf`
+- **Windows**: Visual Studio Build Tools, WebView2
+- **macOS**: Xcode Command Line Tools

 ## Troubleshooting

@@ -185,7 +277,7 @@ local-transcription/
 # List available audio devices
 uv run python main_cli.py --list-devices
 ```
- Ensure microphone permissions are granted
+- Ensure microphone permissions are granted (especially on macOS)
 - Try different device indices in settings

 ### GPU Not Detected
@@ -193,13 +285,13 @@ uv run python main_cli.py --list-devices
 # Check CUDA availability
 uv run python -c "import torch; print(torch.cuda.is_available())"
 ```
- Install NVIDIA drivers (CUDA toolkit is bundled)
+- Install NVIDIA drivers (CUDA toolkit is bundled in CUDA sidecar builds)
 - The app automatically falls back to CPU if no GPU is available

 ### Web Server Port Conflicts
- Default port is 8080
+- Default port is 8080; the app tries ports 8080-8084 automatically
 - Change in settings or edit config file
- Check for conflicts: `lsof -i :8080` (Linux) or `netstat -ano | findstr :8080` (Windows)
+- Check for conflicts: `lsof -i :8080` (Linux/macOS) or `netstat -ano | findstr :8080` (Windows)

 ## Use Cases

@@ -222,3 +314,5 @@ MIT License
 - [OpenAI Whisper](https://github.com/openai/whisper) for the speech recognition model
 - [RealtimeSTT](https://github.com/KoljaB/RealtimeSTT) for real-time transcription capabilities
 - [faster-whisper](https://github.com/guillaumekln/faster-whisper) for optimized inference
+- [Tauri](https://tauri.app/) for the cross-platform desktop framework
+- [Deepgram](https://deepgram.com/) for cloud transcription API