Add unified per-speaker font support and remote transcription service

Font changes:
- Consolidate font settings into single Display Settings section
- Support Web-Safe, Google Fonts, and Custom File uploads for both displays
- Fix Google Fonts URL encoding (use + instead of %2B for spaces)
- Fix per-speaker font inline style quote escaping in Node.js display
- Add font debug logging to help diagnose font issues
- Update web server to sync all font settings on settings change
- Remove deprecated PHP server documentation files

New features:
- Add remote transcription service for GPU offloading
- Add instance lock to prevent multiple app instances
- Add version tracking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-11 18:56:12 -08:00
parent f035bdb927
commit ff067b3368
23 changed files with 2486 additions and 1160 deletions

View File

@@ -1,308 +0,0 @@
# Multi-User Server Comparison
## TL;DR: Which Should You Use?
| Situation | Recommended Solution |
|-----------|---------------------|
| **Shared hosting (cPanel, etc.)** | **PHP Polling** (display-polling.php) |
| **VPS or cloud server** | **Node.js** (best performance) |
| **Quick test/demo** | **PHP Polling** (easiest) |
| **Production with many users** | **Node.js** (most reliable) |
| **No server access** | Use local-only mode |
## Detailed Comparison
### 1. PHP with SSE (Original - server.php + display.php)
**Status:** ⚠️ **PROBLEMATIC** - Not recommended
**Problems:**
- PHP-FPM buffers output (SSE doesn't work)
- Apache/Nginx proxy timeouts
- Shared hosting often blocks long connections
- High resource usage (one PHP process per viewer)
**When it might work:**
- Only with specific Apache configurations
- Not on shared hosting with PHP-FPM
- Requires `ProxyTimeout` settings
**Verdict:** ❌ Avoid unless you have full server control and can configure Apache properly
---
### 2. PHP with Polling (NEW - display-polling.php)
**Status:****RECOMMENDED for PHP**
**Pros:**
- ✅ Works on ANY shared hosting
- ✅ No buffering issues
- ✅ No special configuration needed
- ✅ Simple to deploy (just upload files)
- ✅ Uses standard HTTP requests
**Cons:**
- ❌ Higher latency (1-2 seconds)
- ❌ More server requests (polls every second)
- ❌ Slightly higher bandwidth
**Performance:**
- Latency: 1-2 seconds
- Max users: 20-30 concurrent viewers
- Resource usage: Moderate
**Best for:**
- Shared hosting (cPanel, Bluehost, etc.)
- Quick deployment
- Small to medium groups
**Setup:**
```bash
# Just upload these files:
server.php
display-polling.php # ← Use this instead of display.php
config.php
```
**OBS URL:**
```
https://your-site.com/transcription/display-polling.php?room=ROOM&fade=10
```
---
### 3. Node.js Server (NEW - server/nodejs/)
**Status:****BEST PERFORMANCE**
**Pros:**
- ✅ Native WebSocket support
- ✅ Real-time updates (< 100ms latency)
- ✅ Handles 100+ concurrent connections easily
- ✅ Lower resource usage
- ✅ No buffering issues
- ✅ Event-driven architecture
**Cons:**
- ❌ Requires VPS or cloud server
- ❌ Need to install Node.js
- ❌ More setup than PHP
**Performance:**
- Latency: < 100ms
- Max users: 500+ concurrent
- Resource usage: Very low (~50MB RAM)
**Best for:**
- Production deployments
- Large groups (10+ streamers)
- Professional use
- Anyone with a VPS
**Setup:**
```bash
cd server/nodejs
npm install
npm start
```
**Free hosting options:**
- Railway.app (free tier)
- Heroku (free tier)
- Fly.io (free tier)
- Any $5/month VPS (DigitalOcean, Linode)
**OBS URL:**
```
http://your-server.com:3000/display?room=ROOM&fade=10
```
---
## Feature Comparison Matrix
| Feature | PHP SSE | PHP Polling | Node.js |
|---------|---------|-------------|---------|
| **Real-time** | ⚠️ Should be, but breaks | ⚠️ 1-2s delay | ✅ < 100ms |
| **Reliability** | ❌ Buffering issues | ✅ Very reliable | ✅ Very reliable |
| **Shared Hosting** | ❌ Usually fails | ✅ Works everywhere | ❌ Needs VPS |
| **Setup Difficulty** | 🟡 Medium | 🟢 Easy | 🟡 Medium |
| **Max Users** | 10 | 30 | 500+ |
| **Resource Usage** | High | Medium | Low |
| **Latency** | Should be instant, but... | 1-2 seconds | < 100ms |
| **Cost** | $5-10/month hosting | $5-10/month hosting | Free - $5/month |
---
## Migration Guide
### From PHP SSE to PHP Polling
**Super easy - just change the URL:**
Old:
```
https://your-site.com/transcription/display.php?room=ROOM
```
New:
```
https://your-site.com/transcription/display-polling.php?room=ROOM
```
Everything else stays the same! The desktop app doesn't need changes.
---
### From PHP to Node.js
**1. Deploy Node.js server** (see server/nodejs/README.md)
**2. Update desktop app settings:**
Old (PHP):
```
Server URL: https://your-site.com/transcription/server.php
```
New (Node.js):
```
Server URL: http://your-server.com:3000/api/send
```
**3. Update OBS browser source:**
Old (PHP):
```
https://your-site.com/transcription/display.php?room=ROOM
```
New (Node.js):
```
http://your-server.com:3000/display?room=ROOM&fade=10
```
---
## Testing Your Setup
### Test PHP Polling
1. Upload files to server
2. Visit: `https://your-site.com/transcription/server.php`
- Should see JSON response
3. Visit: `https://your-site.com/transcription/display-polling.php?room=test`
- Should see "🟡 Waiting for data..."
4. Send a test message:
```bash
curl -X POST "https://your-site.com/transcription/server.php?action=send" \
-H "Content-Type: application/json" \
-d '{
"room": "test",
"passphrase": "testpass",
"user_name": "TestUser",
"text": "Hello World",
"timestamp": "12:34:56"
}'
```
5. Display should show "Hello World" within 1-2 seconds
### Test Node.js
1. Start server: `npm start`
2. Visit: `http://localhost:3000`
- Should see JSON response
3. Visit: `http://localhost:3000/display?room=test`
- Should see "⚫ Connecting..." then "🟢 Connected"
4. Send test message (same curl as above, but to `http://localhost:3000/api/send`)
5. Display should show message instantly
---
## Troubleshooting
### PHP Polling Issues
**"Status stays yellow"**
- Room doesn't exist yet
- Send a message from desktop app first
**"Gets 500 error"**
- Check PHP error logs
- Verify `data/` directory is writable
**"Slow updates (5+ seconds)"**
- Increase poll interval: `?poll=500` (500ms)
- Check server load
### Node.js Issues
**"Cannot connect"**
- Check firewall allows port 3000
- Verify server is running: `curl http://localhost:3000`
**"WebSocket failed"**
- Check browser console for errors
- Try different port
- Check reverse proxy settings if using Nginx
---
## Recommendations by Use Case
### Solo Streamer (Local Only)
**Use:** Built-in web server (no multi-user server needed)
- Just run the desktop app
- OBS: `http://localhost:8080`
### 2-3 Friends on Shared Hosting
**Use:** PHP Polling
- Upload to your existing web hosting
- Cost: $0 (use existing hosting)
- Setup time: 5 minutes
### 5+ Streamers, Want Best Quality
**Use:** Node.js on VPS
- Deploy to Railway.app (free) or DigitalOcean ($5/month)
- Real-time updates
- Professional quality
### Large Event/Convention
**Use:** Node.js on cloud
- Deploy to AWS/Azure/GCP
- Use load balancer for redundancy
- Can handle hundreds of users
---
## Cost Breakdown
### PHP Polling
- **Shared hosting:** $5-10/month (or free if you already have hosting)
- **Total:** $5-10/month
### Node.js
- **Free options:**
- Railway.app (500 hours/month free)
- Heroku (free dyno)
- Fly.io (free tier)
- **Paid options:**
- DigitalOcean Droplet: $5/month
- Linode: $5/month
- AWS EC2 t2.micro: $8/month (or free tier)
- **Total:** $0-8/month
### Just Use Local Mode
- **Cost:** $0
- **Limitation:** Only shows your own transcriptions (no multi-user sync)
---
## Final Recommendation
**For most users:** Start with **PHP Polling** on shared hosting. It works reliably and is dead simple.
**If you want the best:** Use **Node.js** - it's worth the extra setup for the performance.
**For testing:** Use **local mode** (no server) - built into the desktop app.

View File

@@ -1,218 +0,0 @@
# Quick Fix for Multi-User Display Issues
## The Problem
Your PHP SSE (Server-Sent Events) setup isn't working because:
1. **PHP-FPM buffers output** - Shared hosting uses PHP-FPM which buffers everything
2. **Apache/Nginx timeouts** - Proxy kills long connections
3. **SSE isn't designed for PHP** - PHP processes are meant to be short-lived
## The Solutions (in order of recommendation)
---
### ✅ Solution 1: Use PHP Polling (Easiest Fix)
**What changed:** Instead of SSE (streaming), use regular HTTP polling every 1 second
**Files affected:**
- **Keep:** `server.php`, `config.php` (no changes needed)
- **Replace:** Use `display-polling.php` instead of `display.php`
**Setup:**
1. Upload `display-polling.php` to your server
2. Change your OBS Browser Source URL from:
```
OLD: https://your-site.com/transcription/display.php?room=ROOM
NEW: https://your-site.com/transcription/display-polling.php?room=ROOM
```
3. Done! No other changes needed.
**Pros:**
- ✅ Works on ANY shared hosting
- ✅ No server configuration needed
- ✅ Uses your existing setup
- ✅ 5-minute fix
**Cons:**
- ⚠️ 1-2 second latency (vs instant with WebSocket)
- ⚠️ More server requests (but minimal impact)
**Performance:** Good for 2-20 concurrent users
---
### ⭐ Solution 2: Use Node.js Server (Best Performance)
**What changed:** Switch from PHP to Node.js - designed for real-time
**Setup:**
1. Get a VPS (or use free hosting like Railway.app)
2. Install Node.js:
```bash
cd server/nodejs
npm install
npm start
```
3. Update desktop app Server URL to:
```
http://your-server.com:3000/api/send
```
4. Update OBS URL to:
```
http://your-server.com:3000/display?room=ROOM
```
**Pros:**
- ✅ Real-time (< 100ms latency)
- ✅ Handles 100+ users easily
- ✅ Native WebSocket support
- ✅ Lower resource usage
- ✅ Can use free hosting (Railway, Heroku, Fly.io)
**Cons:**
- ❌ Requires VPS or cloud hosting (can't use shared hosting)
- ❌ More setup than PHP
**Performance:** Excellent for any number of users
**Free Hosting Options:**
- Railway.app (easiest - just connect GitHub)
- Heroku (free tier)
- Fly.io (free tier)
---
### 🔧 Solution 3: Fix PHP SSE (Advanced - Not Recommended)
**Only if you have full server control and really want SSE**
This requires:
1. Apache configuration changes
2. Disabling output buffering
3. Increasing timeouts
See `apache-sse-config.conf` for details.
**Not recommended because:** It's complex, fragile, and PHP polling is easier and more reliable.
---
## Quick Comparison
| Solution | Setup Time | Reliability | Latency | Works on Shared Hosting? |
|----------|-----------|-------------|---------|-------------------------|
| **PHP Polling** | 5 min | ⭐⭐⭐⭐⭐ | 1-2s | ✅ Yes |
| **Node.js** | 30 min | ⭐⭐⭐⭐⭐ | < 100ms | ❌ No (needs VPS) |
| **PHP SSE** | 2 hours | ⭐⭐ | Should be instant | ❌ Rarely |
---
## Testing Your Fix
### Test PHP Polling
1. Run the test script:
```bash
cd server
./test-server.sh
```
2. Or manually:
```bash
# Send a test message
curl -X POST "https://your-site.com/transcription/server.php?action=send" \
-H "Content-Type: application/json" \
-d '{
"room": "test",
"passphrase": "testpass",
"user_name": "TestUser",
"text": "Hello World",
"timestamp": "12:34:56"
}'
# Open in browser:
https://your-site.com/transcription/display-polling.php?room=test
# Should see "Hello World" appear within 1-2 seconds
```
### Test Node.js
1. Start server:
```bash
cd server/nodejs
npm install
npm start
```
2. Open browser:
```
http://localhost:3000/display?room=test
```
3. Send test message:
```bash
curl -X POST "http://localhost:3000/api/send" \
-H "Content-Type: application/json" \
-d '{
"room": "test",
"passphrase": "testpass",
"user_name": "TestUser",
"text": "Hello World",
"timestamp": "12:34:56"
}'
```
4. Should see message appear **instantly**
---
## My Recommendation
**Start with PHP Polling** (Solution 1):
- Upload `display-polling.php`
- Change OBS URL
- Test it out
**If you like it and want better performance**, migrate to Node.js (Solution 2):
- Takes 30 minutes
- Much better performance
- Can use free hosting
**Forget about PHP SSE** (Solution 3):
- Too much work
- Unreliable
- Not worth it
---
## Files You Need
### For PHP Polling
- ✅ `server.php` (already have)
- ✅ `config.php` (already have)
- ✅ `display-polling.php` (NEW - just created)
- ❌ `display.php` (don't use anymore)
### For Node.js
- ✅ `server/nodejs/server.js` (NEW)
- ✅ `server/nodejs/package.json` (NEW)
- ✅ `server/nodejs/README.md` (NEW)
---
## Need Help?
1. Read [COMPARISON.md](COMPARISON.md) for detailed comparison
2. Read [server/nodejs/README.md](nodejs/README.md) for Node.js setup
3. Run `./test-server.sh` to diagnose issues
4. Check browser console for errors
---
## Bottom Line
**Your SSE display doesn't work because PHP + shared hosting + SSE = bad combo.**
**Use PHP Polling (1-2s delay) or Node.js (instant).** Both work reliably.

View File

@@ -1,248 +0,0 @@
# Server Sync Performance - Before vs After
## The Problem You Experienced
**Symptom:** Shared sync display was several seconds behind local transcription
**Why:** The test script worked fast because it sent ONE message. But the Python app sends messages continuously during speech, and they were getting queued up!
---
## Before Fix: Serial Processing ❌
```
You speak: "Hello" "How" "are" "you" "today"
↓ ↓ ↓ ↓ ↓
Local GUI: Hello How are you today ← Instant!
↓ ↓ ↓ ↓ ↓
Send Queue: [Hello]→[How]→[are]→[you]→[today]
|
↓ (Wait for HTTP response before sending next)
HTTP: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Send Send Send Send Send
Hello How are you today
(200ms) (200ms)(200ms)(200ms)(200ms)
↓ ↓ ↓ ↓ ↓
Server: Hello How are you today
↓ ↓ ↓ ↓ ↓
Display: Hello How are you today ← 1 second behind!
(0ms) (200ms)(400ms)(600ms)(800ms)
```
**Total delay:** 1 second for 5 messages!
---
## After Fix: Parallel Processing ✅
```
You speak: "Hello" "How" "are" "you" "today"
↓ ↓ ↓ ↓ ↓
Local GUI: Hello How are you today ← Instant!
↓ ↓ ↓ ↓ ↓
Send Queue: [Hello] [How] [are] [you] [today]
↓ ↓ ↓
↓ ↓ ↓ ← Up to 3 parallel workers!
HTTP: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Send Hello ┐
Send How ├─ All sent simultaneously!
Send are ┘
Wait for free worker...
Send you ┐
Send today ┘
(200ms total!)
↓ ↓ ↓ ↓ ↓
Server: Hello How are you today
↓ ↓ ↓ ↓ ↓
Display: Hello How are you today ← 200ms behind!
(0ms) (0ms) (0ms) (0ms) (200ms)
```
**Total delay:** 200ms for 5 messages!
---
## Real-World Example
### Scenario: You speak a paragraph
**"Hello everyone. How are you doing today? I'm testing the transcription system."**
### Before Fix (Serial)
```
Time Local GUI Server Display
0.0s "Hello everyone."
0.2s "How are you doing today?"
0.4s "I'm testing..." "Hello everyone." ← 0.4s behind!
0.6s "How are you doing..." ← 0.4s behind!
0.8s "I'm testing..." ← 0.4s behind!
```
### After Fix (Parallel)
```
Time Local GUI Server Display
0.0s "Hello everyone."
0.2s "How are you doing today?" "Hello everyone." ← 0.2s behind!
0.4s "I'm testing..." "How are you doing..." ← 0.2s behind!
0.6s "I'm testing..." ← 0.2s behind!
```
**Improvement:** Consistent 200ms delay vs growing 400-800ms delay!
---
## Technical Details
### Problem 1: Wrong URL Format ❌
```python
# What the client was sending to Node.js:
POST http://localhost:3000/api/send?action=send
# What Node.js was expecting:
POST http://localhost:3000/api/send
```
**Fix:** Auto-detect server type
```python
if 'server.php' in url:
# PHP server needs ?action=send
POST http://server.com/server.php?action=send
else:
# Node.js doesn't need it
POST http://server.com/api/send
```
### Problem 2: Blocking HTTP Requests ❌
```python
# Old code (BLOCKING):
while True:
message = queue.get()
send_http(message) # ← Wait here! Can't send next until this returns
```
**Fix:** Use thread pool
```python
# New code (NON-BLOCKING):
executor = ThreadPoolExecutor(max_workers=3)
while True:
message = queue.get()
executor.submit(send_http, message) # ← Returns immediately! Send next!
```
### Problem 3: Long Timeouts ❌
```python
# Old:
queue.get(timeout=1.0) # Wait up to 1 second for new message
send_http(..., timeout=5.0) # Wait up to 5 seconds for response
# New:
queue.get(timeout=0.1) # Check queue every 100ms (responsive!)
send_http(..., timeout=2.0) # Fail fast if server slow
```
---
## Performance Metrics
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Single message | 150ms | 150ms | Same |
| 5 messages (serial) | 750ms | 200ms | **3.7x faster** |
| 10 messages (serial) | 1500ms | 300ms | **5x faster** |
| 20 messages (rapid) | 3000ms | 600ms | **5x faster** |
| Queue polling | 1000ms | 100ms | **10x faster** |
| Failure timeout | 5000ms | 2000ms | **2.5x faster** |
---
## Visual Comparison
### Before: Messages in Queue Building Up
```
[Message 1] ━━━━━━━━━━━━━━━━━━━━━ Sending... (200ms)
[Message 2] Waiting...
[Message 3] Waiting...
[Message 4] Waiting...
[Message 5] Waiting...
[Message 1] Done ✓
[Message 2] ━━━━━━━━━━━━━━━━━━━━━ Sending... (200ms)
[Message 3] Waiting...
[Message 4] Waiting...
[Message 5] Waiting...
... and so on (total: 1 second for 5 messages)
```
### After: Messages Sent in Parallel
```
[Message 1] ━━━━━━━━━━━━━━━━━━━━━ Sending... ┐
[Message 2] ━━━━━━━━━━━━━━━━━━━━━ Sending... ├─ Parallel! (200ms)
[Message 3] ━━━━━━━━━━━━━━━━━━━━━ Sending... ┘
[Message 4] Waiting for free worker...
[Message 5] Waiting for free worker...
↓ (workers become available)
[Message 1] Done ✓
[Message 2] Done ✓
[Message 3] Done ✓
[Message 4] ━━━━━━━━━━━━━━━━━━━━━ Sending... ┐
[Message 5] ━━━━━━━━━━━━━━━━━━━━━ Sending... ┘
Total time: 400ms for 5 messages (2.5x faster!)
```
---
## How to Test the Improvement
1. **Start Node.js server:**
```bash
cd server/nodejs
npm start
```
2. **Configure desktop app:**
- Settings → Server Sync → Enable
- Server URL: `http://localhost:3000/api/send`
- Room: `test`
- Passphrase: `test`
3. **Open display page:**
```
http://localhost:3000/display?room=test&fade=20
```
4. **Test rapid speech:**
- Start transcription
- Speak 5-10 sentences quickly in succession
- Watch both local GUI and web display
**Expected:** Web display should be only ~200ms behind local GUI (instead of 1-2 seconds)
---
## Why 3 Workers?
**Why not 1?** → Serial processing, slow
**Why not 10?** → Too many connections, overwhelms server
**Why 3?** → Good balance:
- Fast enough for rapid speech
- Doesn't overwhelm server
- Low resource usage
You can change this in the code:
```python
self.executor = ThreadPoolExecutor(max_workers=3) # Change to 5 for faster
```
---
## Summary
**Fixed URL format** for Node.js server
**Added parallel HTTP requests** (up to 3 simultaneous)
**Reduced timeouts** for faster polling and failure detection
**Result:** 5-10x faster sync for rapid speech
**Before:** Laggy, messages queue up, 1-2 second delay
**After:** Near real-time, 100-300ms delay, smooth!

View File

@@ -1,15 +1,15 @@
# Node.js Multi-User Transcription Server
**Much better than PHP for real-time applications!**
A real-time multi-user transcription sync server for streamers and teams.
## Why Node.js is Better Than PHP for This
## Features
1. **Native WebSocket Support** - No SSE buffering issues
2. **Event-Driven** - Designed for real-time connections
3. **No Buffering Problems** - PHP-FPM/FastCGI buffering is a nightmare
4. **Lower Latency** - Instant message delivery
5. **Better Resource Usage** - One process handles all connections
6. **Easy to Deploy** - Works on any VPS, cloud platform, or even Heroku free tier
- **Real-time WebSocket** - Instant message delivery (< 100ms latency)
- **Per-speaker fonts** - Each user can have their own font style
- **Google Fonts support** - 1000+ free fonts loaded from CDN
- **Web-safe fonts** - Universal fonts that work everywhere
- **Custom font uploads** - Upload your own .ttf/.woff2 files
- **Easy deployment** - Works on any VPS, cloud platform, or locally
## Quick Start
@@ -54,13 +54,35 @@ PORT=8080 npm start
Add a Browser source with this URL:
```
http://your-server.com:3000/display?room=YOUR_ROOM&fade=10&timestamps=true
http://your-server.com:3000/display?room=YOUR_ROOM&fade=10&timestamps=true&fontsource=websafe&websafefont=Arial
```
**Parameters:**
- `room` - Your room name (required)
- `fade` - Seconds before text fades (0 = never fade)
- `timestamps` - Show timestamps (true/false)
| Parameter | Default | Description |
|-----------|---------|-------------|
| `room` | default | Your room name (required) |
| `fade` | 10 | Seconds before text fades (0 = never fade) |
| `timestamps` | true | Show timestamps (true/false) |
| `maxlines` | 50 | Max lines visible (prevents scroll bars) |
| `fontsize` | 16 | Font size in pixels |
| `fontsource` | websafe | Font source: `websafe`, `google`, or `custom` |
| `websafefont` | Arial | Web-safe font name |
| `googlefont` | Roboto | Google Font name |
**Font Examples:**
```
# Web-safe font (works everywhere)
?room=myroom&fontsource=websafe&websafefont=Courier+New
# Google Font (loaded from CDN)
?room=myroom&fontsource=google&googlefont=Open+Sans
# Custom font (uploaded by users)
?room=myroom&fontsource=custom
```
**Per-Speaker Fonts:**
Each user can set their own font in the desktop app (Settings → Multi-User Server Sync → Font Source). Per-speaker fonts override the URL defaults, so different speakers can have different fonts on the same display.
## API Endpoints
@@ -74,7 +96,9 @@ Content-Type: application/json
"passphrase": "my-secret",
"user_name": "Alice",
"text": "Hello everyone!",
"timestamp": "12:34:56"
"timestamp": "12:34:56",
"font_family": "Open Sans", // Optional: per-speaker font
"font_type": "google" // Optional: websafe, google, or custom
}
```
@@ -282,17 +306,6 @@ Ports below 1024 require root. Either:
- Average latency: < 100ms
- Memory usage: ~50MB
## Comparison: Node.js vs PHP
| Feature | Node.js | PHP (SSE) |
|---------|---------|-----------|
| Real-time | ✅ WebSocket | ⚠️ SSE (buffering issues) |
| Latency | < 100ms | 1-5 seconds (buffering) |
| Connections | 1000+ | Limited by PHP-FPM |
| Setup | Easy | Complex (Apache/Nginx config) |
| Hosting | VPS, Cloud | Shared hosting (problematic) |
| Resource Usage | Low | High (one PHP process per connection) |
## License
Part of the Local Transcription project.

View File

@@ -27,11 +27,15 @@ const wss = new WebSocket.Server({ server });
// Configuration
const PORT = process.env.PORT || 3000;
const DATA_DIR = path.join(__dirname, 'data');
const FONTS_DIR = path.join(__dirname, 'fonts');
const MAX_TRANSCRIPTIONS = 100;
const CLEANUP_INTERVAL = 2 * 60 * 60 * 1000; // 2 hours
// In-memory font storage by room (font_name -> {data: Buffer, mime: string})
const roomFonts = new Map();
// Middleware
app.use(bodyParser.json());
app.use(bodyParser.json({ limit: '10mb' })); // Increase limit for font uploads
app.use((req, res, next) => {
res.header('Access-Control-Allow-Origin', '*');
res.header('Access-Control-Allow-Methods', 'GET, POST, OPTIONS');
@@ -146,7 +150,8 @@ function broadcastToRoom(room, data) {
});
const broadcastTime = Date.now() - broadcastStart;
console.log(`[Broadcast] Sent to ${sent} client(s) in room "${room}" (${broadcastTime}ms)`);
const fontInfo = data.font_family ? ` [font: ${data.font_family} (${data.font_type})]` : '';
console.log(`[Broadcast] Sent to ${sent} client(s) in room "${room}" (${broadcastTime}ms)${fontInfo}`);
}
// Cleanup old rooms
@@ -418,10 +423,15 @@ app.get('/', (req, res) => {
<li><code>timestamps=true</code> - Show/hide timestamps (true/false)</li>
<li><code>maxlines=50</code> - Max lines visible at once (prevents scroll bars)</li>
<li><code>fontsize=16</code> - Font size in pixels</li>
<li><code>fontfamily=Arial</code> - Font family (Arial, Courier, etc.)</li>
<li><code>fontsource=websafe</code> - Font source: <code>websafe</code>, <code>google</code>, or <code>custom</code></li>
<li><code>websafefont=Arial</code> - Web-safe font (Arial, Times New Roman, Courier New, etc.)</li>
<li><code>googlefont=Roboto</code> - Google Font name (Roboto, Open Sans, Lato, etc.)</li>
</ul>
<p style="font-size: 0.85em; color: #888; margin-top: 10px;">
Example: <code>?room=myroom&fade=15&timestamps=false&maxlines=30&fontsize=18</code>
Example: <code>?room=myroom&fade=15&fontsource=google&googlefont=Open+Sans&fontsize=18</code>
</p>
<p style="font-size: 0.85em; color: #888;">
Note: Per-speaker fonts override the default. Each user can set their own font in the app settings.
</p>
</details>
</div>
@@ -541,7 +551,7 @@ app.get('/', (req, res) => {
// Build URLs
const serverUrl = \`http://\${window.location.host}/api/send\`;
const displayUrl = \`http://\${window.location.host}/display?room=\${encodeURIComponent(room)}&fade=10&timestamps=true&maxlines=50&fontsize=16&fontfamily=Arial\`;
const displayUrl = \`http://\${window.location.host}/display?room=\${encodeURIComponent(room)}&fade=10&timestamps=true&maxlines=50&fontsize=16&fontsource=websafe&websafefont=Arial\`;
// Update UI
document.getElementById('serverUrl').textContent = serverUrl;
@@ -592,7 +602,7 @@ app.get('/', (req, res) => {
app.post('/api/send', async (req, res) => {
const requestStart = Date.now();
try {
const { room, passphrase, user_name, text, timestamp } = req.body;
const { room, passphrase, user_name, text, timestamp, is_preview, font_family, font_type } = req.body;
if (!room || !passphrase || !user_name || !text) {
return res.status(400).json({ error: 'Missing required fields' });
@@ -611,17 +621,27 @@ app.post('/api/send', async (req, res) => {
user_name: user_name.trim(),
text: text.trim(),
timestamp: timestamp || new Date().toLocaleTimeString('en-US', { hour12: false }),
created_at: Date.now()
created_at: Date.now(),
is_preview: is_preview || false,
font_family: font_family || null, // Per-speaker font name
font_type: font_type || null // Font type: "websafe", "google", or "custom"
};
const addStart = Date.now();
await addTranscription(room, transcription);
if (is_preview) {
// Previews are only broadcast, not stored
broadcastToRoom(room, transcription);
} else {
// Final transcriptions are stored and broadcast
await addTranscription(room, transcription);
}
const addTime = Date.now() - addStart;
const totalTime = Date.now() - requestStart;
console.log(`[${new Date().toISOString()}] Transcription received: "${text.substring(0, 50)}..." (verify: ${verifyTime}ms, add: ${addTime}ms, total: ${totalTime}ms)`);
const previewLabel = is_preview ? ' [PREVIEW]' : '';
console.log(`[${new Date().toISOString()}]${previewLabel} Transcription received: "${text.substring(0, 50)}..." (verify: ${verifyTime}ms, add: ${addTime}ms, total: ${totalTime}ms)`);
res.json({ status: 'ok', message: 'Transcription added' });
res.json({ status: 'ok', message: is_preview ? 'Preview broadcast' : 'Transcription added' });
} catch (err) {
console.error('Error in /api/send:', err);
res.status(500).json({ error: err.message });
@@ -647,9 +667,115 @@ app.get('/api/list', async (req, res) => {
}
});
// Upload fonts for a room
app.post('/api/fonts', async (req, res) => {
try {
const { room, passphrase, fonts } = req.body;
if (!room || !passphrase) {
return res.status(400).json({ error: 'Missing room or passphrase' });
}
// Verify passphrase
const valid = await verifyPassphrase(room, passphrase);
if (!valid) {
return res.status(401).json({ error: 'Invalid passphrase' });
}
if (!fonts || !Array.isArray(fonts)) {
return res.status(400).json({ error: 'No fonts provided' });
}
// Initialize room fonts storage if needed
if (!roomFonts.has(room)) {
roomFonts.set(room, new Map());
}
const fontsMap = roomFonts.get(room);
// Process each font
let addedCount = 0;
for (const font of fonts) {
if (!font.name || !font.data || !font.mime) continue;
// Decode base64 font data
const fontData = Buffer.from(font.data, 'base64');
fontsMap.set(font.name, {
data: fontData,
mime: font.mime,
uploaded_at: Date.now()
});
addedCount++;
console.log(`[Fonts] Uploaded font "${font.name}" for room "${room}" (${fontData.length} bytes)`);
}
res.json({ status: 'ok', message: `${addedCount} font(s) uploaded`, fonts: Array.from(fontsMap.keys()) });
} catch (err) {
console.error('Error in /api/fonts:', err);
res.status(500).json({ error: err.message });
}
});
// Serve uploaded fonts
app.get('/fonts/:room/:fontname', (req, res) => {
const { room, fontname } = req.params;
const fontsMap = roomFonts.get(room);
if (!fontsMap) {
return res.status(404).json({ error: 'Room not found' });
}
const font = fontsMap.get(fontname);
if (!font) {
return res.status(404).json({ error: 'Font not found' });
}
res.set('Content-Type', font.mime);
res.set('Cache-Control', 'public, max-age=3600');
res.send(font.data);
});
// List fonts for a room
app.get('/api/fonts', (req, res) => {
const { room } = req.query;
if (!room) {
return res.status(400).json({ error: 'Missing room parameter' });
}
const fontsMap = roomFonts.get(room);
const fonts = fontsMap ? Array.from(fontsMap.keys()) : [];
res.json({ fonts });
});
// Serve display page
app.get('/display', (req, res) => {
const { room = 'default', fade = '10', timestamps = 'true', maxlines = '50', fontsize = '16', fontfamily = 'Arial' } = req.query;
const {
room = 'default',
fade = '10',
timestamps = 'true',
maxlines = '50',
fontsize = '16',
fontfamily = 'Arial',
// New font source parameters
fontsource = 'websafe', // websafe, google, or custom
websafefont = 'Arial',
googlefont = 'Roboto'
} = req.query;
// Determine the effective default font based on fontsource
let effectiveFont = fontfamily; // Legacy fallback
if (fontsource === 'google' && googlefont) {
effectiveFont = googlefont;
} else if (fontsource === 'websafe' && websafefont) {
effectiveFont = websafefont;
}
// Generate Google Font link if needed
// Note: Google Fonts expects spaces as '+' in the URL, not %2B
const googleFontLink = fontsource === 'google' && googlefont
? `<link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=${googlefont.replace(/ /g, '+')}&display=swap">`
: '';
res.send(`
<!DOCTYPE html>
@@ -657,12 +783,16 @@ app.get('/display', (req, res) => {
<head>
<title>Multi-User Transcription Display</title>
<meta charset="UTF-8">
${googleFontLink}
<style id="custom-fonts">
/* Custom fonts will be injected here */
</style>
<style>
body {
margin: 0;
padding: 20px;
background: transparent;
font-family: ${fontfamily}, sans-serif;
font-family: "${effectiveFont}", sans-serif;
font-size: ${fontsize}px;
color: white;
overflow: hidden;
@@ -681,6 +811,14 @@ app.get('/display', (req, res) => {
.transcription.fading {
opacity: 0;
}
.transcription.preview {
font-style: italic;
}
.preview-indicator {
color: #888;
font-size: 0.85em;
margin-right: 5px;
}
.timestamp {
color: #888;
font-size: 0.9em;
@@ -721,11 +859,68 @@ app.get('/display', (req, res) => {
const fadeAfter = ${fade};
const showTimestamps = ${timestamps === 'true' || timestamps === '1'};
const maxLines = ${maxlines};
const requestedFont = "${fontfamily}";
const container = document.getElementById('transcriptions');
const statusEl = document.getElementById('status');
const userColors = new Map();
let colorIndex = 0;
// Track preview elements by user for replacement
const userPreviews = new Map();
// Track loaded Google Fonts to avoid duplicate loading
const loadedGoogleFonts = new Set();
// Load a Google Font dynamically
function loadGoogleFont(fontName) {
if (loadedGoogleFonts.has(fontName)) return;
loadedGoogleFonts.add(fontName);
const link = document.createElement('link');
link.rel = 'stylesheet';
// Google Fonts expects spaces as '+' in the URL, not %2B
link.href = \`https://fonts.googleapis.com/css2?family=\${fontName.replace(/ /g, '+')}&display=swap\`;
document.head.appendChild(link);
console.log('Loading Google Font:', fontName);
}
// Load custom fonts for this room
async function loadCustomFonts() {
try {
const response = await fetch(\`/api/fonts?room=\${encodeURIComponent(room)}\`);
const data = await response.json();
if (data.fonts && data.fonts.length > 0) {
let fontFaceCSS = '';
for (const fontName of data.fonts) {
// Determine format based on extension
let format = 'truetype';
if (fontName.endsWith('.woff2')) format = 'woff2';
else if (fontName.endsWith('.woff')) format = 'woff';
else if (fontName.endsWith('.otf')) format = 'opentype';
// Font family name is filename without extension
const familyName = fontName.replace(/\\.(ttf|otf|woff2?)\$/i, '');
fontFaceCSS += \`
@font-face {
font-family: "\${familyName}";
src: url("/fonts/\${encodeURIComponent(room)}/\${encodeURIComponent(fontName)}") format("\${format}");
font-weight: normal;
font-style: normal;
}
\`;
}
// Inject the font-face rules
document.getElementById('custom-fonts').textContent = fontFaceCSS;
console.log('Loaded custom fonts:', data.fonts);
}
} catch (err) {
console.error('Error loading custom fonts:', err);
}
}
function getUserColor(userName) {
if (!userColors.has(userName)) {
const hue = (colorIndex * 137.5) % 360;
@@ -737,32 +932,96 @@ app.get('/display', (req, res) => {
}
function addTranscription(data) {
const div = document.createElement('div');
div.className = 'transcription';
const isPreview = data.is_preview || false;
const userName = data.user_name || '';
const fontFamily = data.font_family || null; // Per-speaker font name
const fontType = data.font_type || null; // "websafe", "google", or "custom"
const userColor = getUserColor(data.user_name);
// Debug: Log received font info
if (fontFamily) {
console.log('Received transcription with font:', fontFamily, '(' + fontType + ')');
}
// Load Google Font if needed
if (fontType === 'google' && fontFamily) {
loadGoogleFont(fontFamily);
}
// Build font style string if font is set
// Use single quotes for font name to avoid conflict with style="" double quotes
const fontStyle = fontFamily ? \`font-family: '\${fontFamily}', sans-serif;\` : '';
// If this is a final transcription, remove any existing preview from this user
if (!isPreview && userPreviews.has(userName)) {
const previewEl = userPreviews.get(userName);
if (previewEl && previewEl.parentNode) {
previewEl.remove();
}
userPreviews.delete(userName);
}
// If this is a preview, update existing preview or create new one
if (isPreview && userPreviews.has(userName)) {
const previewEl = userPreviews.get(userName);
if (previewEl && previewEl.parentNode) {
// Update existing preview
const userColor = getUserColor(userName);
let html = '';
if (showTimestamps && data.timestamp) {
html += \`<span class="timestamp">[\${data.timestamp}]</span>\`;
}
if (userName) {
html += \`<span class="user" style="color: \${userColor}">\${userName}:</span>\`;
}
html += \`<span class="preview-indicator">[...]</span>\`;
html += \`<span class="text" style="\${fontStyle}">\${data.text}</span>\`;
previewEl.innerHTML = html;
return;
}
}
const div = document.createElement('div');
div.className = isPreview ? 'transcription preview' : 'transcription';
const userColor = getUserColor(userName);
let html = '';
if (showTimestamps && data.timestamp) {
html += \`<span class="timestamp">[\${data.timestamp}]</span>\`;
}
if (data.user_name) {
html += \`<span class="user" style="color: \${userColor}">\${data.user_name}:</span>\`;
if (userName) {
html += \`<span class="user" style="color: \${userColor}">\${userName}:</span>\`;
}
html += \`<span class="text">\${data.text}</span>\`;
if (isPreview) {
html += \`<span class="preview-indicator">[...]</span>\`;
}
html += \`<span class="text" style="\${fontStyle}">\${data.text}</span>\`;
div.innerHTML = html;
container.appendChild(div);
if (fadeAfter > 0) {
setTimeout(() => {
div.classList.add('fading');
setTimeout(() => div.remove(), 1000);
}, fadeAfter * 1000);
// Track preview element for this user
if (isPreview) {
userPreviews.set(userName, div);
} else {
// Only set fade timer for final transcriptions
if (fadeAfter > 0) {
setTimeout(() => {
div.classList.add('fading');
setTimeout(() => div.remove(), 1000);
}, fadeAfter * 1000);
}
}
// Enforce max lines limit
// Enforce max lines limit (don't remove current previews)
while (container.children.length > maxLines) {
container.removeChild(container.firstChild);
const first = container.firstChild;
// Don't remove if it's an active preview
let isActivePreview = false;
userPreviews.forEach((el) => {
if (el === first) isActivePreview = true;
});
if (isActivePreview) break;
container.removeChild(first);
}
}
@@ -821,7 +1080,8 @@ app.get('/display', (req, res) => {
};
}
loadRecent().then(connect);
// Load custom fonts, then recent transcriptions, then connect WebSocket
loadCustomFonts().then(() => loadRecent()).then(connect);
</script>
</body>
</html>

View File

@@ -1,160 +0,0 @@
#!/bin/bash
# Test script for multi-user transcription servers
set -e
echo "================================="
echo "Multi-User Server Test Script"
echo "================================="
echo ""
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Get server URL from user
echo "What server are you testing?"
echo "1) PHP Server"
echo "2) Node.js Server"
echo "3) Custom URL"
read -p "Choice (1-3): " choice
case $choice in
1)
read -p "Enter PHP server URL (e.g., https://example.com/transcription/server.php): " SERVER_URL
API_ENDPOINT="${SERVER_URL}?action=send"
;;
2)
read -p "Enter Node.js server URL (e.g., http://localhost:3000): " SERVER_URL
API_ENDPOINT="${SERVER_URL}/api/send"
;;
3)
read -p "Enter API endpoint URL: " API_ENDPOINT
;;
*)
echo "Invalid choice"
exit 1
;;
esac
# Get room details
read -p "Room name [test]: " ROOM
ROOM=${ROOM:-test}
read -p "Passphrase [testpass]: " PASSPHRASE
PASSPHRASE=${PASSPHRASE:-testpass}
read -p "User name [TestUser]: " USER_NAME
USER_NAME=${USER_NAME:-TestUser}
echo ""
echo "================================="
echo "Testing connection to server..."
echo "================================="
echo "API Endpoint: $API_ENDPOINT"
echo "Room: $ROOM"
echo "User: $USER_NAME"
echo ""
# Test 1: Send a transcription
echo "Test 1: Sending test transcription..."
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "$API_ENDPOINT" \
-H "Content-Type: application/json" \
-d "{
\"room\": \"$ROOM\",
\"passphrase\": \"$PASSPHRASE\",
\"user_name\": \"$USER_NAME\",
\"text\": \"Test message from test script\",
\"timestamp\": \"$(date +%H:%M:%S)\"
}")
HTTP_CODE=$(echo "$RESPONSE" | tail -n1)
BODY=$(echo "$RESPONSE" | sed '$d')
if [ "$HTTP_CODE" = "200" ]; then
echo -e "${GREEN}✓ Success!${NC} Server responded with 200 OK"
echo "Response: $BODY"
else
echo -e "${RED}✗ Failed!${NC} Server responded with HTTP $HTTP_CODE"
echo "Response: $BODY"
exit 1
fi
echo ""
# Test 2: Send multiple messages
echo "Test 2: Sending 5 test messages..."
for i in {1..5}; do
curl -s -X POST "$API_ENDPOINT" \
-H "Content-Type: application/json" \
-d "{
\"room\": \"$ROOM\",
\"passphrase\": \"$PASSPHRASE\",
\"user_name\": \"$USER_NAME\",
\"text\": \"Test message #$i\",
\"timestamp\": \"$(date +%H:%M:%S)\"
}" > /dev/null
echo -e "${GREEN}${NC} Sent message #$i"
sleep 0.5
done
echo ""
# Test 3: List transcriptions (if available)
echo "Test 3: Retrieving transcriptions..."
if [ "$choice" = "1" ]; then
LIST_URL="${SERVER_URL}?action=list&room=$ROOM"
elif [ "$choice" = "2" ]; then
LIST_URL="${SERVER_URL}/api/list?room=$ROOM"
else
echo "Skipping list test for custom URL"
LIST_URL=""
fi
if [ -n "$LIST_URL" ]; then
LIST_RESPONSE=$(curl -s "$LIST_URL")
COUNT=$(echo "$LIST_RESPONSE" | grep -o "\"text\"" | wc -l)
if [ "$COUNT" -gt 0 ]; then
echo -e "${GREEN}✓ Success!${NC} Retrieved $COUNT transcriptions"
echo "$LIST_RESPONSE" | python3 -m json.tool 2>/dev/null || echo "$LIST_RESPONSE"
else
echo -e "${YELLOW}⚠ Warning:${NC} No transcriptions retrieved"
echo "$LIST_RESPONSE"
fi
fi
echo ""
echo "================================="
echo "Test Complete!"
echo "================================="
echo ""
echo "Next steps:"
echo ""
if [ "$choice" = "1" ]; then
echo "1. Open this URL in OBS Browser Source:"
echo " ${SERVER_URL%server.php}display-polling.php?room=$ROOM&fade=10"
echo ""
echo "2. Or test in your browser first:"
echo " ${SERVER_URL%server.php}display-polling.php?room=$ROOM"
elif [ "$choice" = "2" ]; then
echo "1. Open this URL in OBS Browser Source:"
echo " ${SERVER_URL}/display?room=$ROOM&fade=10"
echo ""
echo "2. Or test in your browser first:"
echo " ${SERVER_URL}/display?room=$ROOM"
fi
echo ""
echo "3. Configure desktop app with these settings:"
echo " - Server URL: $API_ENDPOINT"
echo " - Room: $ROOM"
echo " - Passphrase: $PASSPHRASE"
echo ""
echo "4. Start transcribing!"
echo ""

View File

@@ -0,0 +1,173 @@
# Remote Transcription Service
A standalone GPU-accelerated transcription service that accepts audio streams over WebSocket and returns transcriptions. Designed for offloading transcription processing from client machines to a GPU-equipped server.
## Features
- WebSocket-based audio streaming
- API key authentication
- GPU acceleration (CUDA)
- Multiple simultaneous clients
- Health check endpoints
## Requirements
- Python 3.10+
- NVIDIA GPU with CUDA support (recommended)
- 4GB+ VRAM for base model, 8GB+ for large models
## Installation
```bash
cd server/transcription-service
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or: venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# For GPU support, install CUDA version of PyTorch
pip install torch --index-url https://download.pytorch.org/whl/cu121
```
## Configuration
Set environment variables before starting:
```bash
# Required: API key(s) for authentication
export TRANSCRIPTION_API_KEY="your-secret-key"
# Or multiple keys (comma-separated)
export TRANSCRIPTION_API_KEYS="key1,key2,key3"
# Optional: Model selection (default: base.en)
export TRANSCRIPTION_MODEL="base.en"
```
## Running
```bash
# Start the service
python server.py --host 0.0.0.0 --port 8765
# Or with custom model
python server.py --host 0.0.0.0 --port 8765 --model medium.en
```
## API Endpoints
### Health Check
```
GET /
GET /health
```
### WebSocket Transcription
```
WS /ws/transcribe
```
## WebSocket Protocol
1. **Authentication**
```json
// Client sends
{"type": "auth", "api_key": "your-key"}
// Server responds
{"type": "auth_result", "success": true, "message": "..."}
```
2. **Send Audio**
```json
// Client sends (audio as base64-encoded float32 numpy array)
{"type": "audio", "data": "base64...", "sample_rate": 16000}
// Server responds
{"type": "transcription", "text": "Hello world", "is_preview": false, "timestamp": "..."}
```
3. **Keep-alive**
```json
// Client sends
{"type": "ping"}
// Server responds
{"type": "pong"}
```
4. **Disconnect**
```json
// Client sends
{"type": "end"}
```
## Client Integration
The Local Transcription app includes a remote transcription client. Configure in Settings:
1. Enable "Remote Processing"
2. Set Server URL: `ws://your-server:8765/ws/transcribe`
3. Enter your API key
## Deployment
### Docker
```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY server.py .
ENV TRANSCRIPTION_MODEL=base.en
EXPOSE 8765
CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8765"]
```
### Systemd Service
```ini
[Unit]
Description=Remote Transcription Service
After=network.target
[Service]
Type=simple
User=transcription
WorkingDirectory=/opt/transcription-service
Environment=TRANSCRIPTION_API_KEY=your-key
Environment=TRANSCRIPTION_MODEL=base.en
ExecStart=/opt/transcription-service/venv/bin/python server.py
Restart=always
[Install]
WantedBy=multi-user.target
```
## Models
Available Whisper models (larger = better quality, slower):
| Model | Parameters | VRAM | Speed |
|-------|-----------|------|-------|
| tiny.en | 39M | ~1GB | Fastest |
| base.en | 74M | ~1GB | Fast |
| small.en | 244M | ~2GB | Moderate |
| medium.en | 769M | ~5GB | Slow |
| large-v3 | 1550M | ~10GB | Slowest |
## Security Notes
- Always use API key authentication in production
- Use HTTPS/WSS in production (via reverse proxy)
- Rate limit connections if needed
- Monitor GPU usage to prevent overload

View File

@@ -0,0 +1,8 @@
fastapi>=0.100.0
uvicorn>=0.22.0
websockets>=11.0
numpy>=1.24.0
pydantic>=2.0.0
faster-whisper>=0.10.0
RealtimeSTT>=0.1.0
torch>=2.0.0

View File

@@ -0,0 +1,366 @@
"""
Remote Transcription Service
A standalone FastAPI WebSocket server that accepts audio streams and returns transcriptions.
Designed to run on a GPU-equipped server for offloading transcription processing.
Usage:
python server.py [--host HOST] [--port PORT] [--model MODEL]
Environment variables:
TRANSCRIPTION_API_KEY: Required API key for authentication
TRANSCRIPTION_MODEL: Whisper model to use (default: base.en)
"""
import asyncio
import argparse
import os
import sys
import json
import base64
import logging
from datetime import datetime
from pathlib import Path
from typing import Optional, Dict, Set
from threading import Thread, Lock
import numpy as np
from fastapi import FastAPI, WebSocket, WebSocketDisconnect, HTTPException, Depends
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import uvicorn
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# API Key authentication
API_KEYS: Set[str] = set()
def load_api_keys():
"""Load API keys from environment variable."""
global API_KEYS
keys_env = os.environ.get('TRANSCRIPTION_API_KEYS', '')
if keys_env:
API_KEYS = set(key.strip() for key in keys_env.split(',') if key.strip())
# Also support single key
single_key = os.environ.get('TRANSCRIPTION_API_KEY', '')
if single_key:
API_KEYS.add(single_key)
if not API_KEYS:
logger.warning("No API keys configured. Set TRANSCRIPTION_API_KEY or TRANSCRIPTION_API_KEYS environment variable.")
logger.warning("Service will accept all connections (INSECURE for production).")
def verify_api_key(api_key: str) -> bool:
"""Verify if the API key is valid."""
if not API_KEYS:
return True # No authentication if no keys configured
return api_key in API_KEYS
app = FastAPI(
title="Remote Transcription Service",
description="GPU-accelerated speech-to-text transcription service",
version="1.0.0"
)
# Enable CORS for all origins (configure appropriately for production)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
class TranscriptionEngine:
"""Manages the transcription engine with thread-safe access."""
def __init__(self, model: str = "base.en", device: str = "auto"):
self.model_name = model
self.device = device
self.recorder = None
self.lock = Lock()
self.is_initialized = False
def initialize(self):
"""Initialize the transcription engine."""
if self.is_initialized:
return True
try:
from RealtimeSTT import AudioToTextRecorder
# Determine device
if self.device == "auto":
import torch
if torch.cuda.is_available():
self.device = "cuda"
else:
self.device = "cpu"
logger.info(f"Initializing transcription engine with model={self.model_name}, device={self.device}")
# Create recorder with minimal configuration
# We'll feed audio directly, not capture from microphone
self.recorder = AudioToTextRecorder(
model=self.model_name,
language="en",
device=self.device,
compute_type="default",
input_device_index=None, # No mic capture
silero_sensitivity=0.4,
webrtc_sensitivity=3,
post_speech_silence_duration=0.3,
min_length_of_recording=0.5,
enable_realtime_transcription=True,
realtime_model_type="tiny.en",
)
self.is_initialized = True
logger.info("Transcription engine initialized successfully")
return True
except Exception as e:
logger.error(f"Failed to initialize transcription engine: {e}")
return False
def transcribe(self, audio_data: np.ndarray, sample_rate: int = 16000) -> Optional[str]:
"""
Transcribe audio data.
Args:
audio_data: Audio data as numpy array
sample_rate: Sample rate of the audio
Returns:
Transcribed text or None if failed
"""
with self.lock:
if not self.is_initialized:
return None
try:
# Use faster-whisper directly for one-shot transcription
from faster_whisper import WhisperModel
if not hasattr(self, '_whisper_model'):
self._whisper_model = WhisperModel(
self.model_name,
device=self.device,
compute_type="default"
)
# Transcribe
segments, info = self._whisper_model.transcribe(
audio_data,
beam_size=5,
language="en"
)
# Combine segments
text = " ".join(segment.text for segment in segments)
return text.strip()
except Exception as e:
logger.error(f"Transcription error: {e}")
return None
# Global transcription engine
engine: Optional[TranscriptionEngine] = None
class ClientConnection:
"""Represents an active client connection."""
def __init__(self, websocket: WebSocket, client_id: str):
self.websocket = websocket
self.client_id = client_id
self.audio_buffer = []
self.sample_rate = 16000
self.connected_at = datetime.now()
# Active connections
active_connections: Dict[str, ClientConnection] = {}
@app.on_event("startup")
async def startup_event():
"""Initialize service on startup."""
load_api_keys()
global engine
model = os.environ.get('TRANSCRIPTION_MODEL', 'base.en')
engine = TranscriptionEngine(model=model)
# Initialize in background thread to not block startup
def init_engine():
engine.initialize()
Thread(target=init_engine, daemon=True).start()
logger.info("Remote Transcription Service started")
@app.get("/")
async def root():
"""Health check endpoint."""
return {
"service": "Remote Transcription Service",
"status": "running",
"model": engine.model_name if engine else "not loaded",
"device": engine.device if engine else "unknown",
"active_connections": len(active_connections)
}
@app.get("/health")
async def health():
"""Detailed health check."""
return {
"status": "healthy" if engine and engine.is_initialized else "initializing",
"model": engine.model_name if engine else None,
"device": engine.device if engine else None,
"initialized": engine.is_initialized if engine else False,
"connections": len(active_connections)
}
@app.websocket("/ws/transcribe")
async def websocket_transcribe(websocket: WebSocket):
"""
WebSocket endpoint for audio transcription.
Protocol:
1. Client sends: {"type": "auth", "api_key": "your-key"}
2. Server responds: {"type": "auth_result", "success": true/false}
3. Client sends audio chunks: {"type": "audio", "data": base64_audio, "sample_rate": 16000}
4. Server responds with transcription: {"type": "transcription", "text": "...", "is_preview": false}
5. Client can send: {"type": "end"} to close connection
"""
await websocket.accept()
client_id = f"client_{id(websocket)}_{datetime.now().timestamp()}"
authenticated = False
logger.info(f"New WebSocket connection: {client_id}")
try:
while True:
data = await websocket.receive_text()
message = json.loads(data)
msg_type = message.get("type", "")
if msg_type == "auth":
# Authenticate client
api_key = message.get("api_key", "")
if verify_api_key(api_key):
authenticated = True
active_connections[client_id] = ClientConnection(websocket, client_id)
await websocket.send_json({
"type": "auth_result",
"success": True,
"message": "Authentication successful"
})
logger.info(f"Client {client_id} authenticated")
else:
await websocket.send_json({
"type": "auth_result",
"success": False,
"message": "Invalid API key"
})
logger.warning(f"Client {client_id} failed authentication")
await websocket.close(code=4001, reason="Invalid API key")
return
elif msg_type == "audio":
if not authenticated:
await websocket.send_json({
"type": "error",
"message": "Not authenticated"
})
continue
# Decode audio data
audio_b64 = message.get("data", "")
sample_rate = message.get("sample_rate", 16000)
if audio_b64:
try:
audio_bytes = base64.b64decode(audio_b64)
audio_data = np.frombuffer(audio_bytes, dtype=np.float32)
# Transcribe
if engine and engine.is_initialized:
text = engine.transcribe(audio_data, sample_rate)
if text:
await websocket.send_json({
"type": "transcription",
"text": text,
"is_preview": False,
"timestamp": datetime.now().isoformat()
})
else:
await websocket.send_json({
"type": "error",
"message": "Transcription engine not ready"
})
except Exception as e:
logger.error(f"Audio processing error: {e}")
await websocket.send_json({
"type": "error",
"message": f"Audio processing error: {str(e)}"
})
elif msg_type == "end":
logger.info(f"Client {client_id} requested disconnect")
break
elif msg_type == "ping":
await websocket.send_json({"type": "pong"})
except WebSocketDisconnect:
logger.info(f"Client {client_id} disconnected")
except Exception as e:
logger.error(f"WebSocket error for {client_id}: {e}")
finally:
if client_id in active_connections:
del active_connections[client_id]
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Remote Transcription Service")
parser.add_argument("--host", default="0.0.0.0", help="Host to bind to")
parser.add_argument("--port", type=int, default=8765, help="Port to bind to")
parser.add_argument("--model", default="base.en", help="Whisper model to use")
args = parser.parse_args()
# Set model from command line
os.environ.setdefault('TRANSCRIPTION_MODEL', args.model)
logger.info(f"Starting Remote Transcription Service on {args.host}:{args.port}")
logger.info(f"Model: {args.model}")
uvicorn.run(
app,
host=args.host,
port=args.port,
log_level="info"
)
if __name__ == "__main__":
main()

View File

@@ -1,8 +1,9 @@
"""Web server for displaying transcriptions in a browser (for OBS browser source)."""
import asyncio
from pathlib import Path
from fastapi import FastAPI, WebSocket
from fastapi.responses import HTMLResponse
from fastapi.responses import HTMLResponse, FileResponse
from typing import List, Optional
import json
from datetime import datetime
@@ -11,7 +12,11 @@ from datetime import datetime
class TranscriptionWebServer:
"""Web server for displaying transcriptions."""
def __init__(self, host: str = "127.0.0.1", port: int = 8080, show_timestamps: bool = True, fade_after_seconds: int = 10, max_lines: int = 50, font_family: str = "Arial", font_size: int = 16):
def __init__(self, host: str = "127.0.0.1", port: int = 8080, show_timestamps: bool = True,
fade_after_seconds: int = 10, max_lines: int = 50, font_family: str = "Arial",
font_size: int = 16, fonts_dir: Optional[Path] = None,
font_source: str = "System Font", websafe_font: str = "Arial",
google_font: str = "Roboto"):
"""
Initialize web server.
@@ -21,8 +26,12 @@ class TranscriptionWebServer:
show_timestamps: Whether to show timestamps in transcriptions
fade_after_seconds: Time in seconds before transcriptions fade out (0 = never fade)
max_lines: Maximum number of lines to display at once
font_family: Font family for display
font_family: Font family for display (system font)
font_size: Font size in pixels
fonts_dir: Directory containing custom font files
font_source: Font source type ("System Font", "Web-Safe", "Google Font")
websafe_font: Web-safe font name
google_font: Google Font name
"""
self.host = host
self.port = port
@@ -31,6 +40,10 @@ class TranscriptionWebServer:
self.max_lines = max_lines
self.font_family = font_family
self.font_size = font_size
self.fonts_dir = fonts_dir
self.font_source = font_source
self.websafe_font = websafe_font
self.google_font = google_font
self.app = FastAPI()
self.active_connections: List[WebSocket] = []
self.transcriptions = [] # Store recent transcriptions
@@ -46,6 +59,23 @@ class TranscriptionWebServer:
"""Serve the transcription display page."""
return self._get_html()
@self.app.get("/fonts/{font_file}")
async def serve_font(font_file: str):
"""Serve custom font files."""
if self.fonts_dir:
font_path = self.fonts_dir / font_file
if font_path.exists() and font_path.suffix.lower() in {'.ttf', '.otf', '.woff', '.woff2'}:
# Determine MIME type
mime_types = {
'.ttf': 'font/ttf',
'.otf': 'font/otf',
'.woff': 'font/woff',
'.woff2': 'font/woff2'
}
media_type = mime_types.get(font_path.suffix.lower(), 'application/octet-stream')
return FileResponse(font_path, media_type=media_type)
return HTMLResponse(status_code=404, content="Font not found")
@self.app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
"""WebSocket endpoint for real-time updates."""
@@ -64,19 +94,70 @@ class TranscriptionWebServer:
except:
self.active_connections.remove(websocket)
def _get_font_face_css(self) -> str:
"""Generate @font-face CSS rules for custom fonts."""
if not self.fonts_dir or not self.fonts_dir.exists():
return ""
css_rules = []
font_extensions = {'.ttf', '.otf', '.woff', '.woff2'}
format_map = {
'.ttf': 'truetype',
'.otf': 'opentype',
'.woff': 'woff',
'.woff2': 'woff2'
}
for font_file in self.fonts_dir.iterdir():
if font_file.suffix.lower() in font_extensions:
font_name = font_file.stem
font_format = format_map.get(font_file.suffix.lower(), 'truetype')
css_rules.append(f"""
@font-face {{
font-family: '{font_name}';
src: url('/fonts/{font_file.name}') format('{font_format}');
font-weight: normal;
font-style: normal;
}}""")
return "\n".join(css_rules)
def _get_effective_font(self) -> str:
"""Get the effective font family based on font_source setting."""
if self.font_source == "Google Font" and self.google_font:
return self.google_font
elif self.font_source == "Web-Safe" and self.websafe_font:
return self.websafe_font
else:
return self.font_family
def _get_google_font_link(self) -> str:
"""Generate Google Fonts link tag if using Google Font."""
if self.font_source == "Google Font" and self.google_font:
font_name = self.google_font.replace(' ', '+')
return f'<link rel="stylesheet" href="https://fonts.googleapis.com/css2?family={font_name}&display=swap">'
return ""
def _get_html(self) -> str:
"""Generate HTML for transcription display."""
# Generate custom font CSS
font_face_css = self._get_font_face_css()
google_font_link = self._get_google_font_link()
effective_font = self._get_effective_font()
return f"""
<!DOCTYPE html>
<html>
<head>
<title>Transcription Display</title>
{google_font_link}
<style>
{font_face_css}
body {{
margin: 0;
padding: 20px;
background: transparent;
font-family: {self.font_family}, sans-serif;
font-family: '{effective_font}', sans-serif;
font-size: {self.font_size}px;
color: white;
overflow: hidden;
@@ -108,6 +189,14 @@ class TranscriptionWebServer:
.text {{
color: white;
}}
.transcription.preview {{
font-style: italic;
}}
.preview-indicator {{
color: #888;
font-size: 0.85em;
margin-right: 5px;
}}
@keyframes slideIn {{
from {{
opacity: 0;
@@ -129,9 +218,15 @@ class TranscriptionWebServer:
const fadeAfterSeconds = {self.fade_after_seconds};
const maxLines = {self.max_lines};
let currentPreviewElement = null;
ws.onmessage = (event) => {{
const data = JSON.parse(event.data);
addTranscription(data);
if (data.is_preview) {{
handlePreview(data);
}} else {{
addTranscription(data);
}}
}};
ws.onclose = () => {{
@@ -146,35 +241,86 @@ class TranscriptionWebServer:
}}
}}, 30000);
function addTranscription(data) {{
function handlePreview(data) {{
// If there's already a preview, update it
if (currentPreviewElement) {{
updatePreviewContent(currentPreviewElement, data);
}} else {{
// Create new preview element
currentPreviewElement = createTranscriptionElement(data, true);
container.appendChild(currentPreviewElement);
}}
// Enforce max lines limit
while (container.children.length > maxLines) {{
const first = container.firstChild;
if (first === currentPreviewElement) break; // Don't remove current preview
container.removeChild(first);
}}
}}
function updatePreviewContent(element, data) {{
let html = '';
if (data.timestamp) {{
html += `<span class="timestamp">[${{data.timestamp}}]</span>`;
}}
if (data.user_name && data.user_name.trim()) {{
html += `<span class="user">${{data.user_name}}:</span>`;
}}
html += `<span class="preview-indicator">[...]</span>`;
html += `<span class="text">${{data.text}}</span>`;
element.innerHTML = html;
}}
function createTranscriptionElement(data, isPreview) {{
const div = document.createElement('div');
div.className = 'transcription';
div.className = isPreview ? 'transcription preview' : 'transcription';
let html = '';
if (data.timestamp) {{
html += `<span class="timestamp">[${{data.timestamp}}]</span>`;
}}
if (data.user_name) {{
if (data.user_name && data.user_name.trim()) {{
html += `<span class="user">${{data.user_name}}:</span>`;
}}
if (isPreview) {{
html += `<span class="preview-indicator">[...]</span>`;
}}
html += `<span class="text">${{data.text}}</span>`;
div.innerHTML = html;
container.appendChild(div);
return div;
}}
// Set up fade-out if enabled
if (fadeAfterSeconds > 0) {{
setTimeout(() => {{
// Start fade animation
div.classList.add('fading');
function addTranscription(data) {{
// If there's a preview, replace it with final transcription
if (currentPreviewElement) {{
currentPreviewElement.className = 'transcription';
let html = '';
if (data.timestamp) {{
html += `<span class="timestamp">[${{data.timestamp}}]</span>`;
}}
if (data.user_name && data.user_name.trim()) {{
html += `<span class="user">${{data.user_name}}:</span>`;
}}
html += `<span class="text">${{data.text}}</span>`;
currentPreviewElement.innerHTML = html;
// Remove element after fade completes
setTimeout(() => {{
if (div.parentNode === container) {{
container.removeChild(div);
}}
}}, 1000); // Match the CSS transition duration
}}, fadeAfterSeconds * 1000);
// Set up fade-out for the final transcription
if (fadeAfterSeconds > 0) {{
setupFadeOut(currentPreviewElement);
}}
currentPreviewElement = null;
}} else {{
// No preview to replace, add new element
const div = createTranscriptionElement(data, false);
container.appendChild(div);
// Set up fade-out if enabled
if (fadeAfterSeconds > 0) {{
setupFadeOut(div);
}}
}}
// Enforce max lines limit
@@ -182,6 +328,20 @@ class TranscriptionWebServer:
container.removeChild(container.firstChild);
}}
}}
function setupFadeOut(element) {{
setTimeout(() => {{
// Start fade animation
element.classList.add('fading');
// Remove element after fade completes
setTimeout(() => {{
if (element.parentNode === container) {{
container.removeChild(element);
}}
}}, 1000); // Match the CSS transition duration
}}, fadeAfterSeconds * 1000);
}}
</script>
</body>
</html>
@@ -225,6 +385,43 @@ class TranscriptionWebServer:
for conn in disconnected:
self.active_connections.remove(conn)
async def broadcast_preview(self, text: str, user_name: str = "", timestamp: Optional[datetime] = None):
"""
Broadcast a preview transcription to all connected clients.
Preview transcriptions are shown in italics and will be replaced by final.
Args:
text: Preview transcription text
user_name: User/speaker name
timestamp: Timestamp of transcription
"""
if timestamp is None:
timestamp = datetime.now()
trans_data = {
"text": text,
"user_name": user_name,
"is_preview": True, # Flag to indicate this is a preview
}
# Only include timestamp if enabled
if self.show_timestamps:
trans_data["timestamp"] = timestamp.strftime("%H:%M:%S")
# Don't store previews in transcriptions list (they're temporary)
# Broadcast to all connected clients
disconnected = []
for connection in self.active_connections:
try:
await connection.send_json(trans_data)
except:
disconnected.append(connection)
# Remove disconnected clients
for conn in disconnected:
self.active_connections.remove(conn)
async def start(self):
"""Start the web server."""
import uvicorn