Files

Developer 9ff883e2e3 Phase 6: Add Deepgram remote transcription (managed + BYOK modes)

New files:
- client/deepgram_transcription.py — DeepgramTranscriptionEngine with
  managed mode (proxy) and BYOK mode (direct Deepgram). Sends raw binary
  PCM audio over WebSocket, handles both proxy and Deepgram response formats.

Modified files:
- config/default_config.yaml — Replace remote_processing with new remote
  section (mode, server_url, auth_token, byok_api_key, deepgram_model, language)
- client/config.py — Add migration from old remote_processing config
- gui/settings_dialog_qt.py — Replace Remote Processing group with
  Transcription Mode section (Local/Managed/BYOK radio buttons, login/register
  dialogs, balance display, model selector)
- gui/main_window_qt.py — Select engine based on remote.mode config,
  add error and credits_low handlers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-05 11:45:30 -07:00

18 KiB

Raw Blame History

Deepgram Proxy Service — Build Plan

Project Overview

Build a standalone hosted service that acts as a Deepgram proxy for the Local Transcription desktop app. Users can either provide their own Deepgram API key (BYOK) or use the managed service with prepaid credits purchased via Stripe.

This is a separate repository from local-transcription. The desktop app will be updated in a second phase to support both modes.

Repository Structure

transcription-proxy/
├── src/
│   ├── server.js              # Express app entry point
│   ├── config.js              # Environment config loader
│   ├── db/
│   │   ├── index.js           # node-postgres pool setup
│   │   └── migrations/        # SQL migration files (numbered)
│   │       ├── 001_users.sql
│   │       ├── 002_credits.sql
│   │       ├── 003_sessions.sql
│   │       └── 004_usage_ledger.sql
│   ├── middleware/
│   │   ├── auth.js            # JWT verification middleware
│   │   └── rateLimit.js       # Per-user rate limiting
│   ├── routes/
│   │   ├── auth.js            # POST /auth/register, /auth/login, /auth/refresh
│   │   ├── billing.js         # POST /billing/checkout, GET /billing/balance
│   │   └── account.js         # GET /account/me, GET /account/usage
│   ├── websocket/
│   │   └── proxy.js           # WebSocket proxy handler (core feature)
│   └── webhooks/
│       └── stripe.js          # POST /webhooks/stripe
├── web/                       # Simple frontend dashboard
│   ├── index.html             # Landing / login page
│   ├── dashboard.html         # Balance, usage history, buy credits
│   └── assets/
│       ├── app.js
│       └── style.css
├── .env.example
├── package.json
├── docker-compose.yml         # Postgres + app for local dev
└── CLAUDE.md                  # This file (after renaming)

Technology Stack

Runtime: Node.js 20+
Framework: Express 4
WebSocket: ws library (not socket.io — keep it lean)
Database: PostgreSQL 15+ via pg (node-postgres)
Auth: JWT via jsonwebtoken, passwords hashed with bcrypt
Payments: Stripe Node SDK (stripe)
Environment: dotenv
Dev tooling: nodemon for dev, no TypeScript (keep it simple)

Database Schema

Run migrations in order. Use a simple schema_migrations table to track applied migrations.

001_users.sql

CREATE TABLE schema_migrations (
  version INTEGER PRIMARY KEY,
  applied_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email TEXT UNIQUE NOT NULL,
  password_hash TEXT NOT NULL,
  stripe_customer_id TEXT UNIQUE,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

002_credits.sql

CREATE TABLE credit_balance (
  user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
  seconds_remaining INTEGER NOT NULL DEFAULT 0,
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

003_sessions.sql

CREATE TABLE transcription_sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),
  mode TEXT NOT NULL CHECK (mode IN ('managed', 'byok')),
  started_at TIMESTAMPTZ DEFAULT NOW(),
  ended_at TIMESTAMPTZ,
  seconds_used INTEGER NOT NULL DEFAULT 0,
  deepgram_model TEXT,
  status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'completed', 'terminated'))
);

CREATE INDEX idx_sessions_user_id ON transcription_sessions(user_id);
CREATE INDEX idx_sessions_started_at ON transcription_sessions(started_at);

004_usage_ledger.sql

CREATE TABLE usage_ledger (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),
  session_id UUID REFERENCES transcription_sessions(id),
  recorded_at TIMESTAMPTZ DEFAULT NOW(),
  seconds INTEGER NOT NULL,
  description TEXT  -- e.g. 'session_usage', 'credit_purchase', 'manual_adjustment'
);

CREATE INDEX idx_ledger_user_id ON usage_ledger(user_id);

Environment Variables (.env.example)

# Server
PORT=3000
NODE_ENV=development

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/transcription_proxy

# Auth
JWT_SECRET=changeme_use_long_random_string
JWT_EXPIRY=7d

# Stripe
STRIPE_SECRET_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...

# Deepgram
DEEPGRAM_API_KEY=your_deepgram_key_here

# Pricing (seconds per dollar — adjust for your margin)
# Default: 1000 seconds per $1 = $0.006/min managed cost covered + margin
CREDITS_PER_DOLLAR=1000

Phase 1 — Core Server & Auth

Goals

Working Express app with Postgres connection
Migration runner
User registration and login
JWT middleware

Tasks

Scaffold project
- npm init, install dependencies: express ws pg jsonwebtoken bcrypt stripe dotenv
- Dev dependencies: nodemon
- Add start and dev scripts to package.json
Database connection (src/db/index.js)
- Export a pg.Pool instance using DATABASE_URL
- Export a migrate() function that reads src/db/migrations/*.sql in order, checks schema_migrations table, and applies unapplied ones
- Call migrate() on server startup before listening
Auth routes (src/routes/auth.js)
- POST /auth/register — validate email/password, hash password with bcrypt (cost 12), insert user, insert empty credit_balance row, return JWT
- POST /auth/login — verify credentials, return JWT + refresh token
- POST /auth/refresh — validate refresh token, return new JWT
- Passwords: minimum 8 characters, validate email format
JWT middleware (src/middleware/auth.js)
- Verify Authorization: Bearer <token> header
- Attach req.user = { id, email } on success
- Return 401 on failure
- Export as requireAuth middleware
Basic health check
- GET /health returns { status: 'ok', db: 'connected' }

Phase 2 — Billing & Credits

Goals

Stripe Checkout session creation for credit purchases
Webhook handler to fulfill purchases
Balance endpoint

Payment Methods

Use Stripe Dynamic Payment Methods — do NOT hardcode payment_method_types in the Checkout Session. Instead, leave it unset and manage everything from the Stripe Dashboard.

Enable the following in the Stripe Dashboard under Settings → Payment Methods:

Cards (Visa, Mastercard, Amex, Discover) — on by default
PayPal — enable manually
Apple Pay — on by default, shows automatically on Safari/iOS
Google Pay — enable manually (one toggle)
Cash App Pay — enable manually (popular with streaming audiences)
Link — Stripe's saved payment network, on by default

Stripe will automatically show the most relevant methods to each user based on their location and device. No code changes are needed to add or remove methods in future — it's all dashboard config.

Credit Packages

Define these as constants in src/config.js:

CREDIT_PACKAGES: [
  { id: 'pack_500',  label: '500 minutes',  seconds: 30000,  price_cents: 300  },
  { id: 'pack_1200', label: '1200 minutes', seconds: 72000,  price_cents: 600  },
  { id: 'pack_3000', label: '3000 minutes', seconds: 180000, price_cents: 1200 },
]

Adjust pricing to cover Deepgram costs ($0.006/min = $0.0001/sec) plus margin and Stripe fees (~2.9% + $0.30).

Tasks

Stripe customer creation
- On user registration, create a Stripe customer and store stripe_customer_id
- Do this asynchronously (don't block registration response)
Billing routes (src/routes/billing.js)
- GET /billing/packages — return credit package list (no auth required)
- POST /billing/checkout — requires auth, accepts { package_id }, creates Stripe Checkout Session using dynamic payment methods (do NOT pass payment_method_types — omitting it enables dynamic methods automatically), include payment_intent_data.metadata containing user_id and package_id, returns { checkout_url }
- GET /billing/balance — requires auth, returns { seconds_remaining, minutes_remaining }
Stripe webhook (src/webhooks/stripe.js)
- Mount at POST /webhooks/stripe with raw body (use express.raw() for this route only)
- Verify signature with stripe.webhooks.constructEvent()
- Handle checkout.session.completed:
  - Extract user_id and package_id from metadata
  - Add seconds to credit_balance
  - Insert row into usage_ledger with description 'credit_purchase'
- Handle payment_intent.payment_failed: log it (no action needed for prepaid)
Success/cancel pages
- Stripe Checkout redirects to GET /billing/success?session_id=... and /billing/cancel
- These can be simple HTML responses or redirects to the web dashboard

Phase 3 — WebSocket Proxy (Core Feature)

This is the most critical component. The proxy sits between the desktop client and Deepgram, forwarding audio while tracking usage in real time.

Connection Flow

Client connects → validate JWT → check credit balance → open Deepgram upstream
     ↓
Audio chunks arrive → forward to Deepgram → record usage every 5 seconds
     ↓
Transcription arrives from Deepgram → forward to client
     ↓
Client disconnects (or credits exhausted) → close upstream → finalize session

WebSocket Protocol

Client connects to: wss://your-domain/ws/transcribe

Client sends as first message (JSON):

{
  "type": "auth",
  "token": "<JWT>",
  "config": {
    "model": "nova-2",
    "language": "en-US",
    "interim_results": true,
    "endpointing": 300
  }
}

After auth success, client sends: raw audio binary frames (PCM 16kHz mono)

Server sends to client:

{ "type": "ready" }
{ "type": "transcript", "text": "...", "is_final": true, "confidence": 0.98 }
{ "type": "error", "code": "insufficient_credits", "message": "..." }
{ "type": "credits_low", "seconds_remaining": 300 }
{ "type": "session_end", "seconds_used": 120 }

Tasks (`src/websocket/proxy.js`)

Upgrade handler
- Attach to the HTTP server using ws.Server({ noServer: true })
- In server.on('upgrade', ...), route /ws/transcribe to this handler
Auth handshake
- First message must be { type: 'auth', token: '...' } — received within 5 seconds or connection is terminated
- Verify JWT, load user's credit balance from DB
- If balance is 0 or negative, send insufficient_credits error and close
Deepgram upstream connection
- Open a WebSocket to Deepgram's streaming API: wss://api.deepgram.com/v1/listen?model=nova-2&language=en-US&interim_results=true
- Auth header: Authorization: Token <DEEPGRAM_API_KEY>
- Use query params from client's config object (whitelist allowed params)
Audio forwarding
- All binary messages from client → forward directly to Deepgram upstream
- All messages from Deepgram → parse JSON, reformat, forward to client
Usage tracking
- Create a transcription_sessions row on connection
- Maintain an in-memory secondsUsed counter per connection
- Deepgram sends { type: 'Results', duration: X } in responses — use this for accurate second counting
- Every 10 seconds (or on disconnect), write current secondsUsed to DB:
  - Update transcription_sessions.seconds_used
  - Decrement credit_balance.seconds_remaining
  - Insert into usage_ledger
- If seconds_remaining hits 0: send insufficient_credits, close connection
Cleanup on disconnect
- Mark session as completed, set ended_at
- Do final usage flush to DB
- Close Deepgram upstream if still open
Error handling
- If Deepgram upstream closes unexpectedly, notify client and close
- If client sends malformed data, log and continue (don't crash)

Phase 4 — Account Routes & Rate Limiting

Tasks

Account routes (src/routes/account.js)
- GET /account/me — returns { email, credits: { seconds_remaining, minutes_remaining }, created_at }
- GET /account/usage — returns last 30 days of usage_ledger entries grouped by day, plus list of last 10 sessions with duration
Rate limiting (src/middleware/rateLimit.js)
- Use in-memory rate limiting (no Redis needed at this scale)
- Auth endpoints: max 10 requests per minute per IP
- WebSocket connections: max 2 concurrent connections per user (store active connections in a Map<userId, Set<ws>>)

Phase 5 — Web Dashboard

A simple, functional HTML/CSS/JS dashboard. No framework — vanilla JS is fine. This is a developer-friendly streamer tool, not a consumer SaaS, so clean and functional beats flashy.

Implementation Notes

Store JWT in localStorage, attach as Authorization header on API calls
Redirect to / if JWT missing or expired
Keep CSS minimal but readable — this is a utility dashboard

Phase 6 — Desktop App Integration

Changes needed in the local-transcription Python repo.

New file: `client/remote_transcription.py`

This module replaces transcription_engine_realtime.py when remote mode is active.

# Pseudocode / spec for Claude Code to implement

class RemoteTranscriptionEngine:
    """
    Connects to the transcription proxy WebSocket and streams audio.
    Provides the same callback interface as the local engine so the
    rest of the app doesn't need to change.
    """

    def __init__(self, config, on_transcript_callback):
        # config contains: server_url, auth_token (or byok_api_key), model
        ...

    def start(self):
        # Open WebSocket connection
        # Send auth message
        # Start audio capture thread (reuse existing audio_capture.py)
        ...

    def stop(self):
        # Close WebSocket gracefully
        ...

    def _on_audio_chunk(self, audio_data):
        # Called by audio_capture.py with raw PCM data
        # Send as binary WebSocket frame
        ...

    def _on_server_message(self, message):
        # Parse JSON from server
        # On type='transcript': call on_transcript_callback
        # On type='credits_low': trigger UI warning
        # On type='error': surface to user
        ...

BYOK Mode

When user provides their own Deepgram key, connect directly to Deepgram instead of the proxy:

Endpoint: wss://api.deepgram.com/v1/listen?...
Auth: Authorization: Token <user_key>
No session tracking (Deepgram handles billing directly to the user)
Same RemoteTranscriptionEngine class, just different URL and auth header

Settings Changes (`gui/settings_dialog_qt.py`)

Add a new "Transcription Mode" section:

Transcription Mode:
  ○ Local (Whisper)          [existing behavior]
  ○ Remote - Managed         [requires login]
  ○ Remote - BYOK            [requires Deepgram API key]

[If Managed selected]:
  Server URL: [____________]
  [Login / Register]  [View Balance: 420 min remaining]

[If BYOK selected]:
  Deepgram API Key: [____________]
  Model: [nova-2 ▼]

Config additions (`config/default_config.yaml`)

remote:
  mode: local           # local | managed | byok
  server_url: ""        # proxy server URL for managed mode
  auth_token: ""        # JWT stored after login
  byok_api_key: ""      # Deepgram key for BYOK mode
  deepgram_model: nova-2
  language: en-US

Build & Deployment Notes

Docker Compose (local dev)

version: '3.8'
services:
  db:
    image: postgres:15
    environment:
      POSTGRES_DB: transcription_proxy
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgresql://user:password@db:5432/transcription_proxy
    depends_on:
      - db
    volumes:
      - .:/app
      - /app/node_modules

volumes:
  pgdata:

Production Deployment

This service is a good fit for deployment on AnHonestHost WHP as a containerized app, or on a small DigitalOcean/Linode VPS. Requirements are light:

512MB RAM is sufficient
Postgres can be the same instance as other services or managed (e.g., Supabase free tier)
Needs a public domain with SSL for WebSocket (wss://) to work from desktop clients

Reverse proxy config (Nginx or HAProxy) should:

Proxy HTTP → localhost:3000
Pass Upgrade and Connection headers for WebSocket support
Set proxy_read_timeout 3600 (sessions can be long)

Implementation Order

Build and test in this sequence:

Project scaffold + DB connection + migrations
Auth (register/login/JWT) — test with curl
Stripe billing + webhook — test with Stripe CLI (stripe listen)
WebSocket proxy — test with a simple browser WebSocket client first
Usage tracking and credit decrement
Account/usage routes
Web dashboard
Desktop app integration (separate PR in local-transcription repo)

Key Decisions & Rationale

Decision	Choice	Reason
Credits model	Prepaid	No surprise charges, simpler billing, better for irregular streamer usage
WebSocket library	`ws`	Lightweight, no abstraction overhead, plays well with raw binary audio
Auth	JWT (stateless)	Desktop app holds token locally; no session store needed
DB driver	`node-postgres` (pg)	No ORM overhead; schema is simple enough for raw SQL
Migrations	Raw SQL files	No dependency on Knex/Prisma; easy to inspect and reason about
Rate limiting	In-memory	Redis is overkill for this scale; single-process Node is fine initially
Frontend	Vanilla JS	Dashboard is simple utility UI; no framework justified

What This Plan Does NOT Cover (Future Work)

OAuth / social login
Admin panel for managing users
Refund / credit adjustment tooling
Email verification
Password reset flow
Multi-language support beyond Deepgram's defaults
Analytics / aggregated usage reporting
Self-hosted Whisper inference as a third backend option

18 KiB Raw Blame History

Deepgram Proxy Service — Build Plan

Project Overview

Repository Structure

Technology Stack

Database Schema

001_users.sql

002_credits.sql

003_sessions.sql

004_usage_ledger.sql

Environment Variables (.env.example)

Phase 1 — Core Server & Auth

Goals

Tasks

Phase 2 — Billing & Credits

Goals

Payment Methods

Credit Packages

Tasks

Phase 3 — WebSocket Proxy (Core Feature)

Connection Flow

WebSocket Protocol

Tasks (src/websocket/proxy.js)

Phase 4 — Account Routes & Rate Limiting

Tasks

Phase 5 — Web Dashboard

Pages

Implementation Notes

Phase 6 — Desktop App Integration

New file: client/remote_transcription.py

BYOK Mode

Settings Changes (gui/settings_dialog_qt.py)

Config additions (config/default_config.yaml)

Build & Deployment Notes

Docker Compose (local dev)

Production Deployment

Implementation Order

Key Decisions & Rationale

What This Plan Does NOT Cover (Future Work)

18 KiB

Raw Blame History

Tasks (`src/websocket/proxy.js`)

New file: `client/remote_transcription.py`

Settings Changes (`gui/settings_dialog_qt.py`)

Config additions (`config/default_config.yaml`)