Files

jknapp ff067b3368 Add unified per-speaker font support and remote transcription service

Font changes:
- Consolidate font settings into single Display Settings section
- Support Web-Safe, Google Fonts, and Custom File uploads for both displays
- Fix Google Fonts URL encoding (use + instead of %2B for spaces)
- Fix per-speaker font inline style quote escaping in Node.js display
- Add font debug logging to help diagnose font issues
- Update web server to sync all font settings on settings change
- Remove deprecated PHP server documentation files

New features:
- Add remote transcription service for GPU offloading
- Add instance lock to prevent multiple app instances
- Add version tracking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-11 19:09:57 -08:00

README.md

Add unified per-speaker font support and remote transcription service

2026-01-11 19:09:57 -08:00

requirements.txt

Add unified per-speaker font support and remote transcription service

2026-01-11 19:09:57 -08:00

server.py

Add unified per-speaker font support and remote transcription service

2026-01-11 19:09:57 -08:00

README.md

Remote Transcription Service

A standalone GPU-accelerated transcription service that accepts audio streams over WebSocket and returns transcriptions. Designed for offloading transcription processing from client machines to a GPU-equipped server.

Features

WebSocket-based audio streaming
API key authentication
GPU acceleration (CUDA)
Multiple simultaneous clients
Health check endpoints

Requirements

Python 3.10+
NVIDIA GPU with CUDA support (recommended)
4GB+ VRAM for base model, 8GB+ for large models

Installation

cd server/transcription-service

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# For GPU support, install CUDA version of PyTorch
pip install torch --index-url https://download.pytorch.org/whl/cu121

Configuration

Set environment variables before starting:

# Required: API key(s) for authentication
export TRANSCRIPTION_API_KEY="your-secret-key"

# Or multiple keys (comma-separated)
export TRANSCRIPTION_API_KEYS="key1,key2,key3"

# Optional: Model selection (default: base.en)
export TRANSCRIPTION_MODEL="base.en"

Running

# Start the service
python server.py --host 0.0.0.0 --port 8765

# Or with custom model
python server.py --host 0.0.0.0 --port 8765 --model medium.en

API Endpoints

Health Check

GET /
GET /health

WebSocket Transcription

WS /ws/transcribe

WebSocket Protocol

Authentication

// Client sends
{"type": "auth", "api_key": "your-key"}

// Server responds
{"type": "auth_result", "success": true, "message": "..."}

Send Audio

// Client sends (audio as base64-encoded float32 numpy array)
{"type": "audio", "data": "base64...", "sample_rate": 16000}

// Server responds
{"type": "transcription", "text": "Hello world", "is_preview": false, "timestamp": "..."}

Keep-alive

// Client sends
{"type": "ping"}

// Server responds
{"type": "pong"}

Disconnect
```
// Client sends
{"type": "end"}
```

Client Integration

The Local Transcription app includes a remote transcription client. Configure in Settings:

Enable "Remote Processing"
Set Server URL: ws://your-server:8765/ws/transcribe
Enter your API key

Deployment

Docker

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY server.py .

ENV TRANSCRIPTION_MODEL=base.en
EXPOSE 8765

CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8765"]

Systemd Service

[Unit]
Description=Remote Transcription Service
After=network.target

[Service]
Type=simple
User=transcription
WorkingDirectory=/opt/transcription-service
Environment=TRANSCRIPTION_API_KEY=your-key
Environment=TRANSCRIPTION_MODEL=base.en
ExecStart=/opt/transcription-service/venv/bin/python server.py
Restart=always

[Install]
WantedBy=multi-user.target

Models

Available Whisper models (larger = better quality, slower):

Model	Parameters	VRAM	Speed
tiny.en	39M	~1GB	Fastest
base.en	74M	~1GB	Fast
small.en	244M	~2GB	Moderate
medium.en	769M	~5GB	Slow
large-v3	1550M	~10GB	Slowest

Security Notes

Always use API key authentication in production
Use HTTPS/WSS in production (via reverse proxy)
Rate limit connections if needed
Monitor GPU usage to prevent overload