Add unified per-speaker font support and remote transcription service
Font changes: - Consolidate font settings into single Display Settings section - Support Web-Safe, Google Fonts, and Custom File uploads for both displays - Fix Google Fonts URL encoding (use + instead of %2B for spaces) - Fix per-speaker font inline style quote escaping in Node.js display - Add font debug logging to help diagnose font issues - Update web server to sync all font settings on settings change - Remove deprecated PHP server documentation files New features: - Add remote transcription service for GPU offloading - Add instance lock to prevent multiple app instances - Add version tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
173
server/transcription-service/README.md
Normal file
173
server/transcription-service/README.md
Normal file
@@ -0,0 +1,173 @@
|
||||
# Remote Transcription Service
|
||||
|
||||
A standalone GPU-accelerated transcription service that accepts audio streams over WebSocket and returns transcriptions. Designed for offloading transcription processing from client machines to a GPU-equipped server.
|
||||
|
||||
## Features
|
||||
|
||||
- WebSocket-based audio streaming
|
||||
- API key authentication
|
||||
- GPU acceleration (CUDA)
|
||||
- Multiple simultaneous clients
|
||||
- Health check endpoints
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.10+
|
||||
- NVIDIA GPU with CUDA support (recommended)
|
||||
- 4GB+ VRAM for base model, 8GB+ for large models
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
cd server/transcription-service
|
||||
|
||||
# Create virtual environment
|
||||
python -m venv venv
|
||||
source venv/bin/activate # Linux/Mac
|
||||
# or: venv\Scripts\activate # Windows
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# For GPU support, install CUDA version of PyTorch
|
||||
pip install torch --index-url https://download.pytorch.org/whl/cu121
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Set environment variables before starting:
|
||||
|
||||
```bash
|
||||
# Required: API key(s) for authentication
|
||||
export TRANSCRIPTION_API_KEY="your-secret-key"
|
||||
|
||||
# Or multiple keys (comma-separated)
|
||||
export TRANSCRIPTION_API_KEYS="key1,key2,key3"
|
||||
|
||||
# Optional: Model selection (default: base.en)
|
||||
export TRANSCRIPTION_MODEL="base.en"
|
||||
```
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
# Start the service
|
||||
python server.py --host 0.0.0.0 --port 8765
|
||||
|
||||
# Or with custom model
|
||||
python server.py --host 0.0.0.0 --port 8765 --model medium.en
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Health Check
|
||||
```
|
||||
GET /
|
||||
GET /health
|
||||
```
|
||||
|
||||
### WebSocket Transcription
|
||||
```
|
||||
WS /ws/transcribe
|
||||
```
|
||||
|
||||
## WebSocket Protocol
|
||||
|
||||
1. **Authentication**
|
||||
```json
|
||||
// Client sends
|
||||
{"type": "auth", "api_key": "your-key"}
|
||||
|
||||
// Server responds
|
||||
{"type": "auth_result", "success": true, "message": "..."}
|
||||
```
|
||||
|
||||
2. **Send Audio**
|
||||
```json
|
||||
// Client sends (audio as base64-encoded float32 numpy array)
|
||||
{"type": "audio", "data": "base64...", "sample_rate": 16000}
|
||||
|
||||
// Server responds
|
||||
{"type": "transcription", "text": "Hello world", "is_preview": false, "timestamp": "..."}
|
||||
```
|
||||
|
||||
3. **Keep-alive**
|
||||
```json
|
||||
// Client sends
|
||||
{"type": "ping"}
|
||||
|
||||
// Server responds
|
||||
{"type": "pong"}
|
||||
```
|
||||
|
||||
4. **Disconnect**
|
||||
```json
|
||||
// Client sends
|
||||
{"type": "end"}
|
||||
```
|
||||
|
||||
## Client Integration
|
||||
|
||||
The Local Transcription app includes a remote transcription client. Configure in Settings:
|
||||
|
||||
1. Enable "Remote Processing"
|
||||
2. Set Server URL: `ws://your-server:8765/ws/transcribe`
|
||||
3. Enter your API key
|
||||
|
||||
## Deployment
|
||||
|
||||
### Docker
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
COPY requirements.txt .
|
||||
RUN pip install -r requirements.txt
|
||||
|
||||
COPY server.py .
|
||||
|
||||
ENV TRANSCRIPTION_MODEL=base.en
|
||||
EXPOSE 8765
|
||||
|
||||
CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8765"]
|
||||
```
|
||||
|
||||
### Systemd Service
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Remote Transcription Service
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=transcription
|
||||
WorkingDirectory=/opt/transcription-service
|
||||
Environment=TRANSCRIPTION_API_KEY=your-key
|
||||
Environment=TRANSCRIPTION_MODEL=base.en
|
||||
ExecStart=/opt/transcription-service/venv/bin/python server.py
|
||||
Restart=always
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
## Models
|
||||
|
||||
Available Whisper models (larger = better quality, slower):
|
||||
|
||||
| Model | Parameters | VRAM | Speed |
|
||||
|-------|-----------|------|-------|
|
||||
| tiny.en | 39M | ~1GB | Fastest |
|
||||
| base.en | 74M | ~1GB | Fast |
|
||||
| small.en | 244M | ~2GB | Moderate |
|
||||
| medium.en | 769M | ~5GB | Slow |
|
||||
| large-v3 | 1550M | ~10GB | Slowest |
|
||||
|
||||
## Security Notes
|
||||
|
||||
- Always use API key authentication in production
|
||||
- Use HTTPS/WSS in production (via reverse proxy)
|
||||
- Rate limit connections if needed
|
||||
- Monitor GPU usage to prevent overload
|
||||
Reference in New Issue
Block a user