174 lines
3.5 KiB
Markdown
174 lines
3.5 KiB
Markdown
|
|
# Remote Transcription Service
|
||
|
|
|
||
|
|
A standalone GPU-accelerated transcription service that accepts audio streams over WebSocket and returns transcriptions. Designed for offloading transcription processing from client machines to a GPU-equipped server.
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
- WebSocket-based audio streaming
|
||
|
|
- API key authentication
|
||
|
|
- GPU acceleration (CUDA)
|
||
|
|
- Multiple simultaneous clients
|
||
|
|
- Health check endpoints
|
||
|
|
|
||
|
|
## Requirements
|
||
|
|
|
||
|
|
- Python 3.10+
|
||
|
|
- NVIDIA GPU with CUDA support (recommended)
|
||
|
|
- 4GB+ VRAM for base model, 8GB+ for large models
|
||
|
|
|
||
|
|
## Installation
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd server/transcription-service
|
||
|
|
|
||
|
|
# Create virtual environment
|
||
|
|
python -m venv venv
|
||
|
|
source venv/bin/activate # Linux/Mac
|
||
|
|
# or: venv\Scripts\activate # Windows
|
||
|
|
|
||
|
|
# Install dependencies
|
||
|
|
pip install -r requirements.txt
|
||
|
|
|
||
|
|
# For GPU support, install CUDA version of PyTorch
|
||
|
|
pip install torch --index-url https://download.pytorch.org/whl/cu121
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
Set environment variables before starting:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Required: API key(s) for authentication
|
||
|
|
export TRANSCRIPTION_API_KEY="your-secret-key"
|
||
|
|
|
||
|
|
# Or multiple keys (comma-separated)
|
||
|
|
export TRANSCRIPTION_API_KEYS="key1,key2,key3"
|
||
|
|
|
||
|
|
# Optional: Model selection (default: base.en)
|
||
|
|
export TRANSCRIPTION_MODEL="base.en"
|
||
|
|
```
|
||
|
|
|
||
|
|
## Running
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Start the service
|
||
|
|
python server.py --host 0.0.0.0 --port 8765
|
||
|
|
|
||
|
|
# Or with custom model
|
||
|
|
python server.py --host 0.0.0.0 --port 8765 --model medium.en
|
||
|
|
```
|
||
|
|
|
||
|
|
## API Endpoints
|
||
|
|
|
||
|
|
### Health Check
|
||
|
|
```
|
||
|
|
GET /
|
||
|
|
GET /health
|
||
|
|
```
|
||
|
|
|
||
|
|
### WebSocket Transcription
|
||
|
|
```
|
||
|
|
WS /ws/transcribe
|
||
|
|
```
|
||
|
|
|
||
|
|
## WebSocket Protocol
|
||
|
|
|
||
|
|
1. **Authentication**
|
||
|
|
```json
|
||
|
|
// Client sends
|
||
|
|
{"type": "auth", "api_key": "your-key"}
|
||
|
|
|
||
|
|
// Server responds
|
||
|
|
{"type": "auth_result", "success": true, "message": "..."}
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Send Audio**
|
||
|
|
```json
|
||
|
|
// Client sends (audio as base64-encoded float32 numpy array)
|
||
|
|
{"type": "audio", "data": "base64...", "sample_rate": 16000}
|
||
|
|
|
||
|
|
// Server responds
|
||
|
|
{"type": "transcription", "text": "Hello world", "is_preview": false, "timestamp": "..."}
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Keep-alive**
|
||
|
|
```json
|
||
|
|
// Client sends
|
||
|
|
{"type": "ping"}
|
||
|
|
|
||
|
|
// Server responds
|
||
|
|
{"type": "pong"}
|
||
|
|
```
|
||
|
|
|
||
|
|
4. **Disconnect**
|
||
|
|
```json
|
||
|
|
// Client sends
|
||
|
|
{"type": "end"}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Client Integration
|
||
|
|
|
||
|
|
The Local Transcription app includes a remote transcription client. Configure in Settings:
|
||
|
|
|
||
|
|
1. Enable "Remote Processing"
|
||
|
|
2. Set Server URL: `ws://your-server:8765/ws/transcribe`
|
||
|
|
3. Enter your API key
|
||
|
|
|
||
|
|
## Deployment
|
||
|
|
|
||
|
|
### Docker
|
||
|
|
|
||
|
|
```dockerfile
|
||
|
|
FROM python:3.11-slim
|
||
|
|
|
||
|
|
WORKDIR /app
|
||
|
|
COPY requirements.txt .
|
||
|
|
RUN pip install -r requirements.txt
|
||
|
|
|
||
|
|
COPY server.py .
|
||
|
|
|
||
|
|
ENV TRANSCRIPTION_MODEL=base.en
|
||
|
|
EXPOSE 8765
|
||
|
|
|
||
|
|
CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8765"]
|
||
|
|
```
|
||
|
|
|
||
|
|
### Systemd Service
|
||
|
|
|
||
|
|
```ini
|
||
|
|
[Unit]
|
||
|
|
Description=Remote Transcription Service
|
||
|
|
After=network.target
|
||
|
|
|
||
|
|
[Service]
|
||
|
|
Type=simple
|
||
|
|
User=transcription
|
||
|
|
WorkingDirectory=/opt/transcription-service
|
||
|
|
Environment=TRANSCRIPTION_API_KEY=your-key
|
||
|
|
Environment=TRANSCRIPTION_MODEL=base.en
|
||
|
|
ExecStart=/opt/transcription-service/venv/bin/python server.py
|
||
|
|
Restart=always
|
||
|
|
|
||
|
|
[Install]
|
||
|
|
WantedBy=multi-user.target
|
||
|
|
```
|
||
|
|
|
||
|
|
## Models
|
||
|
|
|
||
|
|
Available Whisper models (larger = better quality, slower):
|
||
|
|
|
||
|
|
| Model | Parameters | VRAM | Speed |
|
||
|
|
|-------|-----------|------|-------|
|
||
|
|
| tiny.en | 39M | ~1GB | Fastest |
|
||
|
|
| base.en | 74M | ~1GB | Fast |
|
||
|
|
| small.en | 244M | ~2GB | Moderate |
|
||
|
|
| medium.en | 769M | ~5GB | Slow |
|
||
|
|
| large-v3 | 1550M | ~10GB | Slowest |
|
||
|
|
|
||
|
|
## Security Notes
|
||
|
|
|
||
|
|
- Always use API key authentication in production
|
||
|
|
- Use HTTPS/WSS in production (via reverse proxy)
|
||
|
|
- Rate limit connections if needed
|
||
|
|
- Monitor GPU usage to prevent overload
|