server/transcription-service/README.md

# Remote Transcription Service

A standalone GPU-accelerated transcription service that accepts audio streams over WebSocket and returns transcriptions. Designed for offloading transcription processing from client machines to a GPU-equipped server.

## Features

- WebSocket-based audio streaming
- API key authentication
- GPU acceleration (CUDA)
- Multiple simultaneous clients
- Health check endpoints

## Requirements

- Python 3.10+
- NVIDIA GPU with CUDA support (recommended)
- 4GB+ VRAM for base model, 8GB+ for large models

## Installation

```bash
cd server/transcription-service

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# For GPU support, install CUDA version of PyTorch
pip install torch --index-url https://download.pytorch.org/whl/cu121
```

## Configuration

Set environment variables before starting:

```bash
# Required: API key(s) for authentication
export TRANSCRIPTION_API_KEY="your-secret-key"

# Or multiple keys (comma-separated)
export TRANSCRIPTION_API_KEYS="key1,key2,key3"

# Optional: Model selection (default: base.en)
export TRANSCRIPTION_MODEL="base.en"
```

## Running

```bash
# Start the service
python server.py --host 0.0.0.0 --port 8765

# Or with custom model
python server.py --host 0.0.0.0 --port 8765 --model medium.en
```

## API Endpoints

### Health Check
```
GET /
GET /health
```

### WebSocket Transcription
```
WS /ws/transcribe
```

## WebSocket Protocol

1. **Authentication**
   ```json
   // Client sends
   {"type": "auth", "api_key": "your-key"}

   // Server responds
   {"type": "auth_result", "success": true, "message": "..."}
   ```

2. **Send Audio**
   ```json
   // Client sends (audio as base64-encoded float32 numpy array)
   {"type": "audio", "data": "base64...", "sample_rate": 16000}

   // Server responds
   {"type": "transcription", "text": "Hello world", "is_preview": false, "timestamp": "..."}
   ```

3. **Keep-alive**
   ```json
   // Client sends
   {"type": "ping"}

   // Server responds
   {"type": "pong"}
   ```

4. **Disconnect**
   ```json
   // Client sends
   {"type": "end"}
   ```

## Client Integration

The Local Transcription app includes a remote transcription client. Configure in Settings:

1. Enable "Remote Processing"
2. Set Server URL: `ws://your-server:8765/ws/transcribe`
3. Enter your API key

## Deployment

### Docker

```dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY server.py .

ENV TRANSCRIPTION_MODEL=base.en
EXPOSE 8765

CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8765"]
```

### Systemd Service

```ini
[Unit]
Description=Remote Transcription Service
After=network.target

[Service]
Type=simple
User=transcription
WorkingDirectory=/opt/transcription-service
Environment=TRANSCRIPTION_API_KEY=your-key
Environment=TRANSCRIPTION_MODEL=base.en
ExecStart=/opt/transcription-service/venv/bin/python server.py
Restart=always

[Install]
WantedBy=multi-user.target
```

## Models

Available Whisper models (larger = better quality, slower):

| Model | Parameters | VRAM | Speed |
|-------|-----------|------|-------|
| tiny.en | 39M | ~1GB | Fastest |
| base.en | 74M | ~1GB | Fast |
| small.en | 244M | ~2GB | Moderate |
| medium.en | 769M | ~5GB | Slow |
| large-v3 | 1550M | ~10GB | Slowest |

## Security Notes

- Always use API key authentication in production
- Use HTTPS/WSS in production (via reverse proxy)
- Rate limit connections if needed
- Monitor GPU usage to prevent overload
Add unified per-speaker font support and remote transcription service Font changes: - Consolidate font settings into single Display Settings section - Support Web-Safe, Google Fonts, and Custom File uploads for both displays - Fix Google Fonts URL encoding (use + instead of %2B for spaces) - Fix per-speaker font inline style quote escaping in Node.js display - Add font debug logging to help diagnose font issues - Update web server to sync all font settings on settings change - Remove deprecated PHP server documentation files New features: - Add remote transcription service for GPU offloading - Add instance lock to prevent multiple app instances - Add version tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> 2026-01-11 18:56:12 -08:00			`# Remote Transcription Service`

			`A standalone GPU-accelerated transcription service that accepts audio streams over WebSocket and returns transcriptions. Designed for offloading transcription processing from client machines to a GPU-equipped server.`

			`## Features`

			`- WebSocket-based audio streaming`
			`- API key authentication`
			`- GPU acceleration (CUDA)`
			`- Multiple simultaneous clients`
			`- Health check endpoints`

			`## Requirements`

			`- Python 3.10+`
			`- NVIDIA GPU with CUDA support (recommended)`
			`- 4GB+ VRAM for base model, 8GB+ for large models`

			`## Installation`

			```bash
			`cd server/transcription-service`

			`# Create virtual environment`
			`python -m venv venv`
			`source venv/bin/activate # Linux/Mac`
			`# or: venv\Scripts\activate # Windows`

			`# Install dependencies`
			`pip install -r requirements.txt`

			`# For GPU support, install CUDA version of PyTorch`
			`pip install torch --index-url https://download.pytorch.org/whl/cu121`
			```

			`## Configuration`

			`Set environment variables before starting:`

			```bash
			`# Required: API key(s) for authentication`
			`export TRANSCRIPTION_API_KEY="your-secret-key"`

			`# Or multiple keys (comma-separated)`
			`export TRANSCRIPTION_API_KEYS="key1,key2,key3"`

			`# Optional: Model selection (default: base.en)`
			`export TRANSCRIPTION_MODEL="base.en"`
			```

			`## Running`

			```bash
			`# Start the service`
			`python server.py --host 0.0.0.0 --port 8765`

			`# Or with custom model`
			`python server.py --host 0.0.0.0 --port 8765 --model medium.en`
			```

			`## API Endpoints`

			`### Health Check`
			```
			`GET /`
			`GET /health`
			```

			`### WebSocket Transcription`
			```
			`WS /ws/transcribe`
			```

			`## WebSocket Protocol`

			`1. Authentication`
			```json
			`// Client sends`
			`{"type": "auth", "api_key": "your-key"}`

			`// Server responds`
			`{"type": "auth_result", "success": true, "message": "..."}`
			```

			`2. Send Audio`
			```json
			`// Client sends (audio as base64-encoded float32 numpy array)`
			`{"type": "audio", "data": "base64...", "sample_rate": 16000}`

			`// Server responds`
			`{"type": "transcription", "text": "Hello world", "is_preview": false, "timestamp": "..."}`
			```

			`3. Keep-alive`
			```json
			`// Client sends`
			`{"type": "ping"}`

			`// Server responds`
			`{"type": "pong"}`
			```

			`4. Disconnect`
			```json
			`// Client sends`
			`{"type": "end"}`
			```

			`## Client Integration`

			`The Local Transcription app includes a remote transcription client. Configure in Settings:`

			`1. Enable "Remote Processing"`
			2. Set Server URL: `ws://your-server:8765/ws/transcribe`
			`3. Enter your API key`

			`## Deployment`

			`### Docker`

			```dockerfile
			`FROM python:3.11-slim`

			`WORKDIR /app`
			`COPY requirements.txt .`
			`RUN pip install -r requirements.txt`

			`COPY server.py .`

			`ENV TRANSCRIPTION_MODEL=base.en`
			`EXPOSE 8765`

			`CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8765"]`
			```

			`### Systemd Service`

			```ini
			`[Unit]`
			`Description=Remote Transcription Service`
			`After=network.target`

			`[Service]`
			`Type=simple`
			`User=transcription`
			`WorkingDirectory=/opt/transcription-service`
			`Environment=TRANSCRIPTION_API_KEY=your-key`
			`Environment=TRANSCRIPTION_MODEL=base.en`
			`ExecStart=/opt/transcription-service/venv/bin/python server.py`
			`Restart=always`

			`[Install]`
			`WantedBy=multi-user.target`
			```

			`## Models`

			`Available Whisper models (larger = better quality, slower):`

			`\| Model \| Parameters \| VRAM \| Speed \|`
			`\|-------\|-----------\|------\|-------\|`
			`\| tiny.en \| 39M \| ~1GB \| Fastest \|`
			`\| base.en \| 74M \| ~1GB \| Fast \|`
			`\| small.en \| 244M \| ~2GB \| Moderate \|`
			`\| medium.en \| 769M \| ~5GB \| Slow \|`
			`\| large-v3 \| 1550M \| ~10GB \| Slowest \|`

			`## Security Notes`

			`- Always use API key authentication in production`
			`- Use HTTPS/WSS in production (via reverse proxy)`
			`- Rate limit connections if needed`
			`- Monitor GPU usage to prevent overload`