Font changes: - Consolidate font settings into single Display Settings section - Support Web-Safe, Google Fonts, and Custom File uploads for both displays - Fix Google Fonts URL encoding (use + instead of %2B for spaces) - Fix per-speaker font inline style quote escaping in Node.js display - Add font debug logging to help diagnose font issues - Update web server to sync all font settings on settings change - Remove deprecated PHP server documentation files New features: - Add remote transcription service for GPU offloading - Add instance lock to prevent multiple app instances - Add version tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remote Transcription Service
A standalone GPU-accelerated transcription service that accepts audio streams over WebSocket and returns transcriptions. Designed for offloading transcription processing from client machines to a GPU-equipped server.
Features
- WebSocket-based audio streaming
- API key authentication
- GPU acceleration (CUDA)
- Multiple simultaneous clients
- Health check endpoints
Requirements
- Python 3.10+
- NVIDIA GPU with CUDA support (recommended)
- 4GB+ VRAM for base model, 8GB+ for large models
Installation
cd server/transcription-service
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or: venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# For GPU support, install CUDA version of PyTorch
pip install torch --index-url https://download.pytorch.org/whl/cu121
Configuration
Set environment variables before starting:
# Required: API key(s) for authentication
export TRANSCRIPTION_API_KEY="your-secret-key"
# Or multiple keys (comma-separated)
export TRANSCRIPTION_API_KEYS="key1,key2,key3"
# Optional: Model selection (default: base.en)
export TRANSCRIPTION_MODEL="base.en"
Running
# Start the service
python server.py --host 0.0.0.0 --port 8765
# Or with custom model
python server.py --host 0.0.0.0 --port 8765 --model medium.en
API Endpoints
Health Check
GET /
GET /health
WebSocket Transcription
WS /ws/transcribe
WebSocket Protocol
-
Authentication
// Client sends {"type": "auth", "api_key": "your-key"} // Server responds {"type": "auth_result", "success": true, "message": "..."} -
Send Audio
// Client sends (audio as base64-encoded float32 numpy array) {"type": "audio", "data": "base64...", "sample_rate": 16000} // Server responds {"type": "transcription", "text": "Hello world", "is_preview": false, "timestamp": "..."} -
Keep-alive
// Client sends {"type": "ping"} // Server responds {"type": "pong"} -
Disconnect
// Client sends {"type": "end"}
Client Integration
The Local Transcription app includes a remote transcription client. Configure in Settings:
- Enable "Remote Processing"
- Set Server URL:
ws://your-server:8765/ws/transcribe - Enter your API key
Deployment
Docker
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY server.py .
ENV TRANSCRIPTION_MODEL=base.en
EXPOSE 8765
CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8765"]
Systemd Service
[Unit]
Description=Remote Transcription Service
After=network.target
[Service]
Type=simple
User=transcription
WorkingDirectory=/opt/transcription-service
Environment=TRANSCRIPTION_API_KEY=your-key
Environment=TRANSCRIPTION_MODEL=base.en
ExecStart=/opt/transcription-service/venv/bin/python server.py
Restart=always
[Install]
WantedBy=multi-user.target
Models
Available Whisper models (larger = better quality, slower):
| Model | Parameters | VRAM | Speed |
|---|---|---|---|
| tiny.en | 39M | ~1GB | Fastest |
| base.en | 74M | ~1GB | Fast |
| small.en | 244M | ~2GB | Moderate |
| medium.en | 769M | ~5GB | Slow |
| large-v3 | 1550M | ~10GB | Slowest |
Security Notes
- Always use API key authentication in production
- Use HTTPS/WSS in production (via reverse proxy)
- Rate limit connections if needed
- Monitor GPU usage to prevent overload