# Remote Transcription Service A standalone GPU-accelerated transcription service that accepts audio streams over WebSocket and returns transcriptions. Designed for offloading transcription processing from client machines to a GPU-equipped server. ## Features - WebSocket-based audio streaming - API key authentication - GPU acceleration (CUDA) - Multiple simultaneous clients - Health check endpoints ## Requirements - Python 3.10+ - NVIDIA GPU with CUDA support (recommended) - 4GB+ VRAM for base model, 8GB+ for large models ## Installation ```bash cd server/transcription-service # Create virtual environment python -m venv venv source venv/bin/activate # Linux/Mac # or: venv\Scripts\activate # Windows # Install dependencies pip install -r requirements.txt # For GPU support, install CUDA version of PyTorch pip install torch --index-url https://download.pytorch.org/whl/cu121 ``` ## Configuration Set environment variables before starting: ```bash # Required: API key(s) for authentication export TRANSCRIPTION_API_KEY="your-secret-key" # Or multiple keys (comma-separated) export TRANSCRIPTION_API_KEYS="key1,key2,key3" # Optional: Model selection (default: base.en) export TRANSCRIPTION_MODEL="base.en" ``` ## Running ```bash # Start the service python server.py --host 0.0.0.0 --port 8765 # Or with custom model python server.py --host 0.0.0.0 --port 8765 --model medium.en ``` ## API Endpoints ### Health Check ``` GET / GET /health ``` ### WebSocket Transcription ``` WS /ws/transcribe ``` ## WebSocket Protocol 1. **Authentication** ```json // Client sends {"type": "auth", "api_key": "your-key"} // Server responds {"type": "auth_result", "success": true, "message": "..."} ``` 2. **Send Audio** ```json // Client sends (audio as base64-encoded float32 numpy array) {"type": "audio", "data": "base64...", "sample_rate": 16000} // Server responds {"type": "transcription", "text": "Hello world", "is_preview": false, "timestamp": "..."} ``` 3. **Keep-alive** ```json // Client sends {"type": "ping"} // Server responds {"type": "pong"} ``` 4. **Disconnect** ```json // Client sends {"type": "end"} ``` ## Client Integration The Local Transcription app includes a remote transcription client. Configure in Settings: 1. Enable "Remote Processing" 2. Set Server URL: `ws://your-server:8765/ws/transcribe` 3. Enter your API key ## Deployment ### Docker ```dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY server.py . ENV TRANSCRIPTION_MODEL=base.en EXPOSE 8765 CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8765"] ``` ### Systemd Service ```ini [Unit] Description=Remote Transcription Service After=network.target [Service] Type=simple User=transcription WorkingDirectory=/opt/transcription-service Environment=TRANSCRIPTION_API_KEY=your-key Environment=TRANSCRIPTION_MODEL=base.en ExecStart=/opt/transcription-service/venv/bin/python server.py Restart=always [Install] WantedBy=multi-user.target ``` ## Models Available Whisper models (larger = better quality, slower): | Model | Parameters | VRAM | Speed | |-------|-----------|------|-------| | tiny.en | 39M | ~1GB | Fastest | | base.en | 74M | ~1GB | Fast | | small.en | 244M | ~2GB | Moderate | | medium.en | 769M | ~5GB | Slow | | large-v3 | 1550M | ~10GB | Slowest | ## Security Notes - Always use API key authentication in production - Use HTTPS/WSS in production (via reverse proxy) - Rate limit connections if needed - Monitor GPU usage to prevent overload