Add unified per-speaker font support and remote transcription service

Font changes: - Consolidate font settings into single Display Settings section - Support Web-Safe, Google Fonts, and Custom File uploads for both displays - Fix Google Fonts URL encoding (use + instead of %2B for spaces) - Fix per-speaker font inline style quote escaping in Node.js display - Add font debug logging to help diagnose font issues - Update web server to sync all font settings on settings change - Remove deprecated PHP server documentation files New features: - Add remote transcription service for GPU offloading - Add instance lock to prevent multiple app instances - Add version tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 18:56:12 -08:00
parent f035bdb927
commit ff067b3368
23 changed files with 2486 additions and 1160 deletions
--- a/server/transcription-service/README.md
+++ b/server/transcription-service/README.md
@@ -0,0 +1,173 @@
+# Remote Transcription Service
+
+A standalone GPU-accelerated transcription service that accepts audio streams over WebSocket and returns transcriptions. Designed for offloading transcription processing from client machines to a GPU-equipped server.
+
+## Features
+
+- WebSocket-based audio streaming
+- API key authentication
+- GPU acceleration (CUDA)
+- Multiple simultaneous clients
+- Health check endpoints
+
+## Requirements
+
+- Python 3.10+
+- NVIDIA GPU with CUDA support (recommended)
+- 4GB+ VRAM for base model, 8GB+ for large models
+
+## Installation
+
+```bash
+cd server/transcription-service
+
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate  # Linux/Mac
+# or: venv\Scripts\activate  # Windows
+
+# Install dependencies
+pip install -r requirements.txt
+
+# For GPU support, install CUDA version of PyTorch
+pip install torch --index-url https://download.pytorch.org/whl/cu121
+```
+
+## Configuration
+
+Set environment variables before starting:
+
+```bash
+# Required: API key(s) for authentication
+export TRANSCRIPTION_API_KEY="your-secret-key"
+
+# Or multiple keys (comma-separated)
+export TRANSCRIPTION_API_KEYS="key1,key2,key3"
+
+# Optional: Model selection (default: base.en)
+export TRANSCRIPTION_MODEL="base.en"
+```
+
+## Running
+
+```bash
+# Start the service
+python server.py --host 0.0.0.0 --port 8765
+
+# Or with custom model
+python server.py --host 0.0.0.0 --port 8765 --model medium.en
+```
+
+## API Endpoints
+
+### Health Check
+```
+GET /
+GET /health
+```
+
+### WebSocket Transcription
+```
+WS /ws/transcribe
+```
+
+## WebSocket Protocol
+
+1. **Authentication**
+   ```json
+   // Client sends
+   {"type": "auth", "api_key": "your-key"}
+
+   // Server responds
+   {"type": "auth_result", "success": true, "message": "..."}
+   ```
+
+2. **Send Audio**
+   ```json
+   // Client sends (audio as base64-encoded float32 numpy array)
+   {"type": "audio", "data": "base64...", "sample_rate": 16000}
+
+   // Server responds
+   {"type": "transcription", "text": "Hello world", "is_preview": false, "timestamp": "..."}
+   ```
+
+3. **Keep-alive**
+   ```json
+   // Client sends
+   {"type": "ping"}
+
+   // Server responds
+   {"type": "pong"}
+   ```
+
+4. **Disconnect**
+   ```json
+   // Client sends
+   {"type": "end"}
+   ```
+
+## Client Integration
+
+The Local Transcription app includes a remote transcription client. Configure in Settings:
+
+1. Enable "Remote Processing"
+2. Set Server URL: `ws://your-server:8765/ws/transcribe`
+3. Enter your API key
+
+## Deployment
+
+### Docker
+
+```dockerfile
+FROM python:3.11-slim
+
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+
+COPY server.py .
+
+ENV TRANSCRIPTION_MODEL=base.en
+EXPOSE 8765
+
+CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8765"]
+```
+
+### Systemd Service
+
+```ini
+[Unit]
+Description=Remote Transcription Service
+After=network.target
+
+[Service]
+Type=simple
+User=transcription
+WorkingDirectory=/opt/transcription-service
+Environment=TRANSCRIPTION_API_KEY=your-key
+Environment=TRANSCRIPTION_MODEL=base.en
+ExecStart=/opt/transcription-service/venv/bin/python server.py
+Restart=always
+
+[Install]
+WantedBy=multi-user.target
+```
+
+## Models
+
+Available Whisper models (larger = better quality, slower):
+
+| Model | Parameters | VRAM | Speed |
+|-------|-----------|------|-------|
+| tiny.en | 39M | ~1GB | Fastest |
+| base.en | 74M | ~1GB | Fast |
+| small.en | 244M | ~2GB | Moderate |
+| medium.en | 769M | ~5GB | Slow |
+| large-v3 | 1550M | ~10GB | Slowest |
+
+## Security Notes
+
+- Always use API key authentication in production
+- Use HTTPS/WSS in production (via reverse proxy)
+- Rate limit connections if needed
+- Monitor GPU usage to prevent overload