Files
local-transcription/DEEPGRAM_PROXY_PLAN.md
Developer 9ff883e2e3 Phase 6: Add Deepgram remote transcription (managed + BYOK modes)
New files:
- client/deepgram_transcription.py — DeepgramTranscriptionEngine with
  managed mode (proxy) and BYOK mode (direct Deepgram). Sends raw binary
  PCM audio over WebSocket, handles both proxy and Deepgram response formats.

Modified files:
- config/default_config.yaml — Replace remote_processing with new remote
  section (mode, server_url, auth_token, byok_api_key, deepgram_model, language)
- client/config.py — Add migration from old remote_processing config
- gui/settings_dialog_qt.py — Replace Remote Processing group with
  Transcription Mode section (Local/Managed/BYOK radio buttons, login/register
  dialogs, balance display, model selector)
- gui/main_window_qt.py — Select engine based on remote.mode config,
  add error and credits_low handlers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:45:30 -07:00

575 lines
18 KiB
Markdown

# Deepgram Proxy Service — Build Plan
## Project Overview
Build a standalone hosted service that acts as a Deepgram proxy for the Local Transcription
desktop app. Users can either provide their own Deepgram API key (BYOK) or use the managed
service with prepaid credits purchased via Stripe.
This is a **separate repository** from `local-transcription`. The desktop app will be updated
in a second phase to support both modes.
---
## Repository Structure
```
transcription-proxy/
├── src/
│ ├── server.js # Express app entry point
│ ├── config.js # Environment config loader
│ ├── db/
│ │ ├── index.js # node-postgres pool setup
│ │ └── migrations/ # SQL migration files (numbered)
│ │ ├── 001_users.sql
│ │ ├── 002_credits.sql
│ │ ├── 003_sessions.sql
│ │ └── 004_usage_ledger.sql
│ ├── middleware/
│ │ ├── auth.js # JWT verification middleware
│ │ └── rateLimit.js # Per-user rate limiting
│ ├── routes/
│ │ ├── auth.js # POST /auth/register, /auth/login, /auth/refresh
│ │ ├── billing.js # POST /billing/checkout, GET /billing/balance
│ │ └── account.js # GET /account/me, GET /account/usage
│ ├── websocket/
│ │ └── proxy.js # WebSocket proxy handler (core feature)
│ └── webhooks/
│ └── stripe.js # POST /webhooks/stripe
├── web/ # Simple frontend dashboard
│ ├── index.html # Landing / login page
│ ├── dashboard.html # Balance, usage history, buy credits
│ └── assets/
│ ├── app.js
│ └── style.css
├── .env.example
├── package.json
├── docker-compose.yml # Postgres + app for local dev
└── CLAUDE.md # This file (after renaming)
```
---
## Technology Stack
- **Runtime**: Node.js 20+
- **Framework**: Express 4
- **WebSocket**: `ws` library (not socket.io — keep it lean)
- **Database**: PostgreSQL 15+ via `pg` (node-postgres)
- **Auth**: JWT via `jsonwebtoken`, passwords hashed with `bcrypt`
- **Payments**: Stripe Node SDK (`stripe`)
- **Environment**: `dotenv`
- **Dev tooling**: `nodemon` for dev, no TypeScript (keep it simple)
---
## Database Schema
Run migrations in order. Use a simple `schema_migrations` table to track applied migrations.
### 001_users.sql
```sql
CREATE TABLE schema_migrations (
version INTEGER PRIMARY KEY,
applied_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT UNIQUE NOT NULL,
password_hash TEXT NOT NULL,
stripe_customer_id TEXT UNIQUE,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
```
### 002_credits.sql
```sql
CREATE TABLE credit_balance (
user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
seconds_remaining INTEGER NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ DEFAULT NOW()
);
```
### 003_sessions.sql
```sql
CREATE TABLE transcription_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id),
mode TEXT NOT NULL CHECK (mode IN ('managed', 'byok')),
started_at TIMESTAMPTZ DEFAULT NOW(),
ended_at TIMESTAMPTZ,
seconds_used INTEGER NOT NULL DEFAULT 0,
deepgram_model TEXT,
status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'completed', 'terminated'))
);
CREATE INDEX idx_sessions_user_id ON transcription_sessions(user_id);
CREATE INDEX idx_sessions_started_at ON transcription_sessions(started_at);
```
### 004_usage_ledger.sql
```sql
CREATE TABLE usage_ledger (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id),
session_id UUID REFERENCES transcription_sessions(id),
recorded_at TIMESTAMPTZ DEFAULT NOW(),
seconds INTEGER NOT NULL,
description TEXT -- e.g. 'session_usage', 'credit_purchase', 'manual_adjustment'
);
CREATE INDEX idx_ledger_user_id ON usage_ledger(user_id);
```
---
## Environment Variables (.env.example)
```env
# Server
PORT=3000
NODE_ENV=development
# Database
DATABASE_URL=postgresql://user:password@localhost:5432/transcription_proxy
# Auth
JWT_SECRET=changeme_use_long_random_string
JWT_EXPIRY=7d
# Stripe
STRIPE_SECRET_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
# Deepgram
DEEPGRAM_API_KEY=your_deepgram_key_here
# Pricing (seconds per dollar — adjust for your margin)
# Default: 1000 seconds per $1 = $0.006/min managed cost covered + margin
CREDITS_PER_DOLLAR=1000
```
---
## Phase 1 — Core Server & Auth
### Goals
- Working Express app with Postgres connection
- Migration runner
- User registration and login
- JWT middleware
### Tasks
1. **Scaffold project**
- `npm init`, install dependencies: `express ws pg jsonwebtoken bcrypt stripe dotenv`
- Dev dependencies: `nodemon`
- Add `start` and `dev` scripts to package.json
2. **Database connection** (`src/db/index.js`)
- Export a `pg.Pool` instance using `DATABASE_URL`
- Export a `migrate()` function that reads `src/db/migrations/*.sql` in order,
checks `schema_migrations` table, and applies unapplied ones
- Call `migrate()` on server startup before listening
3. **Auth routes** (`src/routes/auth.js`)
- `POST /auth/register` — validate email/password, hash password with bcrypt (cost 12),
insert user, insert empty credit_balance row, return JWT
- `POST /auth/login` — verify credentials, return JWT + refresh token
- `POST /auth/refresh` — validate refresh token, return new JWT
- Passwords: minimum 8 characters, validate email format
4. **JWT middleware** (`src/middleware/auth.js`)
- Verify `Authorization: Bearer <token>` header
- Attach `req.user = { id, email }` on success
- Return 401 on failure
- Export as `requireAuth` middleware
5. **Basic health check**
- `GET /health` returns `{ status: 'ok', db: 'connected' }`
---
## Phase 2 — Billing & Credits
### Goals
- Stripe Checkout session creation for credit purchases
- Webhook handler to fulfill purchases
- Balance endpoint
### Payment Methods
Use **Stripe Dynamic Payment Methods** — do NOT hardcode `payment_method_types` in the
Checkout Session. Instead, leave it unset and manage everything from the Stripe Dashboard.
Enable the following in the Stripe Dashboard under Settings → Payment Methods:
- **Cards** (Visa, Mastercard, Amex, Discover) — on by default
- **PayPal** — enable manually
- **Apple Pay** — on by default, shows automatically on Safari/iOS
- **Google Pay** — enable manually (one toggle)
- **Cash App Pay** — enable manually (popular with streaming audiences)
- **Link** — Stripe's saved payment network, on by default
Stripe will automatically show the most relevant methods to each user based on their
location and device. No code changes are needed to add or remove methods in future —
it's all dashboard config.
### Credit Packages
Define these as constants in `src/config.js`:
```javascript
CREDIT_PACKAGES: [
{ id: 'pack_500', label: '500 minutes', seconds: 30000, price_cents: 300 },
{ id: 'pack_1200', label: '1200 minutes', seconds: 72000, price_cents: 600 },
{ id: 'pack_3000', label: '3000 minutes', seconds: 180000, price_cents: 1200 },
]
```
Adjust pricing to cover Deepgram costs ($0.006/min = $0.0001/sec) plus margin and
Stripe fees (~2.9% + $0.30).
### Tasks
1. **Stripe customer creation**
- On user registration, create a Stripe customer and store `stripe_customer_id`
- Do this asynchronously (don't block registration response)
2. **Billing routes** (`src/routes/billing.js`)
- `GET /billing/packages` — return credit package list (no auth required)
- `POST /billing/checkout` — requires auth, accepts `{ package_id }`,
creates Stripe Checkout Session using dynamic payment methods (do NOT pass
`payment_method_types` — omitting it enables dynamic methods automatically),
include `payment_intent_data.metadata` containing `user_id` and `package_id`,
returns `{ checkout_url }`
- `GET /billing/balance` — requires auth, returns `{ seconds_remaining, minutes_remaining }`
3. **Stripe webhook** (`src/webhooks/stripe.js`)
- Mount at `POST /webhooks/stripe` with raw body (use `express.raw()` for this route only)
- Verify signature with `stripe.webhooks.constructEvent()`
- Handle `checkout.session.completed`:
- Extract `user_id` and `package_id` from metadata
- Add seconds to `credit_balance`
- Insert row into `usage_ledger` with description `'credit_purchase'`
- Handle `payment_intent.payment_failed`: log it (no action needed for prepaid)
4. **Success/cancel pages**
- Stripe Checkout redirects to `GET /billing/success?session_id=...` and `/billing/cancel`
- These can be simple HTML responses or redirects to the web dashboard
---
## Phase 3 — WebSocket Proxy (Core Feature)
This is the most critical component. The proxy sits between the desktop client and Deepgram,
forwarding audio while tracking usage in real time.
### Connection Flow
```
Client connects → validate JWT → check credit balance → open Deepgram upstream
Audio chunks arrive → forward to Deepgram → record usage every 5 seconds
Transcription arrives from Deepgram → forward to client
Client disconnects (or credits exhausted) → close upstream → finalize session
```
### WebSocket Protocol
**Client connects to**: `wss://your-domain/ws/transcribe`
**Client sends as first message** (JSON):
```json
{
"type": "auth",
"token": "<JWT>",
"config": {
"model": "nova-2",
"language": "en-US",
"interim_results": true,
"endpointing": 300
}
}
```
**After auth success, client sends**: raw audio binary frames (PCM 16kHz mono)
**Server sends to client**:
```json
{ "type": "ready" }
{ "type": "transcript", "text": "...", "is_final": true, "confidence": 0.98 }
{ "type": "error", "code": "insufficient_credits", "message": "..." }
{ "type": "credits_low", "seconds_remaining": 300 }
{ "type": "session_end", "seconds_used": 120 }
```
### Tasks (`src/websocket/proxy.js`)
1. **Upgrade handler**
- Attach to the HTTP server using `ws.Server({ noServer: true })`
- In `server.on('upgrade', ...)`, route `/ws/transcribe` to this handler
2. **Auth handshake**
- First message must be `{ type: 'auth', token: '...' }` — received within 5 seconds
or connection is terminated
- Verify JWT, load user's credit balance from DB
- If balance is 0 or negative, send `insufficient_credits` error and close
3. **Deepgram upstream connection**
- Open a WebSocket to Deepgram's streaming API:
`wss://api.deepgram.com/v1/listen?model=nova-2&language=en-US&interim_results=true`
- Auth header: `Authorization: Token <DEEPGRAM_API_KEY>`
- Use query params from client's `config` object (whitelist allowed params)
4. **Audio forwarding**
- All binary messages from client → forward directly to Deepgram upstream
- All messages from Deepgram → parse JSON, reformat, forward to client
5. **Usage tracking**
- Create a `transcription_sessions` row on connection
- Maintain an in-memory `secondsUsed` counter per connection
- Deepgram sends `{ type: 'Results', duration: X }` in responses — use this for
accurate second counting
- Every 10 seconds (or on disconnect), write current `secondsUsed` to DB:
- Update `transcription_sessions.seconds_used`
- Decrement `credit_balance.seconds_remaining`
- Insert into `usage_ledger`
- If `seconds_remaining` hits 0: send `insufficient_credits`, close connection
6. **Cleanup on disconnect**
- Mark session as `completed`, set `ended_at`
- Do final usage flush to DB
- Close Deepgram upstream if still open
7. **Error handling**
- If Deepgram upstream closes unexpectedly, notify client and close
- If client sends malformed data, log and continue (don't crash)
---
## Phase 4 — Account Routes & Rate Limiting
### Tasks
1. **Account routes** (`src/routes/account.js`)
- `GET /account/me` — returns `{ email, credits: { seconds_remaining, minutes_remaining }, created_at }`
- `GET /account/usage` — returns last 30 days of `usage_ledger` entries grouped by day,
plus list of last 10 sessions with duration
2. **Rate limiting** (`src/middleware/rateLimit.js`)
- Use in-memory rate limiting (no Redis needed at this scale)
- Auth endpoints: max 10 requests per minute per IP
- WebSocket connections: max 2 concurrent connections per user
(store active connections in a `Map<userId, Set<ws>>`)
---
## Phase 5 — Web Dashboard
A simple, functional HTML/CSS/JS dashboard. No framework — vanilla JS is fine.
This is a developer-friendly streamer tool, not a consumer SaaS, so clean and
functional beats flashy.
### Pages
**`/` (Landing / Login)**
- Brief product description (what this is, why it exists)
- Login form and link to register
- Link to GitHub/Gitea repo
**`/dashboard` (Post-login)**
- Current credit balance (minutes remaining, prominently displayed)
- "Buy Credits" section showing the three packages with Stripe Checkout buttons
- Usage chart: last 30 days bar chart (vanilla canvas or a small CDN chart lib)
- Recent sessions table: date, duration, status
**`/register`**
- Registration form
### Implementation Notes
- Store JWT in `localStorage`, attach as `Authorization` header on API calls
- Redirect to `/` if JWT missing or expired
- Keep CSS minimal but readable — this is a utility dashboard
---
## Phase 6 — Desktop App Integration
Changes needed in the `local-transcription` Python repo.
### New file: `client/remote_transcription.py`
This module replaces `transcription_engine_realtime.py` when remote mode is active.
```python
# Pseudocode / spec for Claude Code to implement
class RemoteTranscriptionEngine:
"""
Connects to the transcription proxy WebSocket and streams audio.
Provides the same callback interface as the local engine so the
rest of the app doesn't need to change.
"""
def __init__(self, config, on_transcript_callback):
# config contains: server_url, auth_token (or byok_api_key), model
...
def start(self):
# Open WebSocket connection
# Send auth message
# Start audio capture thread (reuse existing audio_capture.py)
...
def stop(self):
# Close WebSocket gracefully
...
def _on_audio_chunk(self, audio_data):
# Called by audio_capture.py with raw PCM data
# Send as binary WebSocket frame
...
def _on_server_message(self, message):
# Parse JSON from server
# On type='transcript': call on_transcript_callback
# On type='credits_low': trigger UI warning
# On type='error': surface to user
...
```
### BYOK Mode
When user provides their own Deepgram key, connect directly to Deepgram instead of the proxy:
- Endpoint: `wss://api.deepgram.com/v1/listen?...`
- Auth: `Authorization: Token <user_key>`
- No session tracking (Deepgram handles billing directly to the user)
- Same `RemoteTranscriptionEngine` class, just different URL and auth header
### Settings Changes (`gui/settings_dialog_qt.py`)
Add a new "Transcription Mode" section:
```
Transcription Mode:
○ Local (Whisper) [existing behavior]
○ Remote - Managed [requires login]
○ Remote - BYOK [requires Deepgram API key]
[If Managed selected]:
Server URL: [____________]
[Login / Register] [View Balance: 420 min remaining]
[If BYOK selected]:
Deepgram API Key: [____________]
Model: [nova-2 ▼]
```
### Config additions (`config/default_config.yaml`)
```yaml
remote:
mode: local # local | managed | byok
server_url: "" # proxy server URL for managed mode
auth_token: "" # JWT stored after login
byok_api_key: "" # Deepgram key for BYOK mode
deepgram_model: nova-2
language: en-US
```
---
## Build & Deployment Notes
### Docker Compose (local dev)
```yaml
version: '3.8'
services:
db:
image: postgres:15
environment:
POSTGRES_DB: transcription_proxy
POSTGRES_USER: user
POSTGRES_PASSWORD: password
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
app:
build: .
ports:
- "3000:3000"
environment:
DATABASE_URL: postgresql://user:password@db:5432/transcription_proxy
depends_on:
- db
volumes:
- .:/app
- /app/node_modules
volumes:
pgdata:
```
### Production Deployment
This service is a good fit for deployment on AnHonestHost WHP as a containerized app,
or on a small DigitalOcean/Linode VPS. Requirements are light:
- 512MB RAM is sufficient
- Postgres can be the same instance as other services or managed (e.g., Supabase free tier)
- Needs a public domain with SSL for WebSocket (`wss://`) to work from desktop clients
Reverse proxy config (Nginx or HAProxy) should:
- Proxy HTTP → `localhost:3000`
- Pass `Upgrade` and `Connection` headers for WebSocket support
- Set `proxy_read_timeout 3600` (sessions can be long)
---
## Implementation Order
Build and test in this sequence:
1. Project scaffold + DB connection + migrations
2. Auth (register/login/JWT) — test with curl
3. Stripe billing + webhook — test with Stripe CLI (`stripe listen`)
4. WebSocket proxy — test with a simple browser WebSocket client first
5. Usage tracking and credit decrement
6. Account/usage routes
7. Web dashboard
8. Desktop app integration (separate PR in local-transcription repo)
---
## Key Decisions & Rationale
| Decision | Choice | Reason |
|---|---|---|
| Credits model | Prepaid | No surprise charges, simpler billing, better for irregular streamer usage |
| WebSocket library | `ws` | Lightweight, no abstraction overhead, plays well with raw binary audio |
| Auth | JWT (stateless) | Desktop app holds token locally; no session store needed |
| DB driver | `node-postgres` (pg) | No ORM overhead; schema is simple enough for raw SQL |
| Migrations | Raw SQL files | No dependency on Knex/Prisma; easy to inspect and reason about |
| Rate limiting | In-memory | Redis is overkill for this scale; single-process Node is fine initially |
| Frontend | Vanilla JS | Dashboard is simple utility UI; no framework justified |
---
## What This Plan Does NOT Cover (Future Work)
- OAuth / social login
- Admin panel for managing users
- Refund / credit adjustment tooling
- Email verification
- Password reset flow
- Multi-language support beyond Deepgram's defaults
- Analytics / aggregated usage reporting
- Self-hosted Whisper inference as a third backend option