Files
local-transcription/DEEPGRAM_PROXY_PLAN.md
Developer 9ff883e2e3 Phase 6: Add Deepgram remote transcription (managed + BYOK modes)
New files:
- client/deepgram_transcription.py — DeepgramTranscriptionEngine with
  managed mode (proxy) and BYOK mode (direct Deepgram). Sends raw binary
  PCM audio over WebSocket, handles both proxy and Deepgram response formats.

Modified files:
- config/default_config.yaml — Replace remote_processing with new remote
  section (mode, server_url, auth_token, byok_api_key, deepgram_model, language)
- client/config.py — Add migration from old remote_processing config
- gui/settings_dialog_qt.py — Replace Remote Processing group with
  Transcription Mode section (Local/Managed/BYOK radio buttons, login/register
  dialogs, balance display, model selector)
- gui/main_window_qt.py — Select engine based on remote.mode config,
  add error and credits_low handlers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:45:30 -07:00

18 KiB

Deepgram Proxy Service — Build Plan

Project Overview

Build a standalone hosted service that acts as a Deepgram proxy for the Local Transcription desktop app. Users can either provide their own Deepgram API key (BYOK) or use the managed service with prepaid credits purchased via Stripe.

This is a separate repository from local-transcription. The desktop app will be updated in a second phase to support both modes.


Repository Structure

transcription-proxy/
├── src/
│   ├── server.js              # Express app entry point
│   ├── config.js              # Environment config loader
│   ├── db/
│   │   ├── index.js           # node-postgres pool setup
│   │   └── migrations/        # SQL migration files (numbered)
│   │       ├── 001_users.sql
│   │       ├── 002_credits.sql
│   │       ├── 003_sessions.sql
│   │       └── 004_usage_ledger.sql
│   ├── middleware/
│   │   ├── auth.js            # JWT verification middleware
│   │   └── rateLimit.js       # Per-user rate limiting
│   ├── routes/
│   │   ├── auth.js            # POST /auth/register, /auth/login, /auth/refresh
│   │   ├── billing.js         # POST /billing/checkout, GET /billing/balance
│   │   └── account.js         # GET /account/me, GET /account/usage
│   ├── websocket/
│   │   └── proxy.js           # WebSocket proxy handler (core feature)
│   └── webhooks/
│       └── stripe.js          # POST /webhooks/stripe
├── web/                       # Simple frontend dashboard
│   ├── index.html             # Landing / login page
│   ├── dashboard.html         # Balance, usage history, buy credits
│   └── assets/
│       ├── app.js
│       └── style.css
├── .env.example
├── package.json
├── docker-compose.yml         # Postgres + app for local dev
└── CLAUDE.md                  # This file (after renaming)

Technology Stack

  • Runtime: Node.js 20+
  • Framework: Express 4
  • WebSocket: ws library (not socket.io — keep it lean)
  • Database: PostgreSQL 15+ via pg (node-postgres)
  • Auth: JWT via jsonwebtoken, passwords hashed with bcrypt
  • Payments: Stripe Node SDK (stripe)
  • Environment: dotenv
  • Dev tooling: nodemon for dev, no TypeScript (keep it simple)

Database Schema

Run migrations in order. Use a simple schema_migrations table to track applied migrations.

001_users.sql

CREATE TABLE schema_migrations (
  version INTEGER PRIMARY KEY,
  applied_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email TEXT UNIQUE NOT NULL,
  password_hash TEXT NOT NULL,
  stripe_customer_id TEXT UNIQUE,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

002_credits.sql

CREATE TABLE credit_balance (
  user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
  seconds_remaining INTEGER NOT NULL DEFAULT 0,
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

003_sessions.sql

CREATE TABLE transcription_sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),
  mode TEXT NOT NULL CHECK (mode IN ('managed', 'byok')),
  started_at TIMESTAMPTZ DEFAULT NOW(),
  ended_at TIMESTAMPTZ,
  seconds_used INTEGER NOT NULL DEFAULT 0,
  deepgram_model TEXT,
  status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'completed', 'terminated'))
);

CREATE INDEX idx_sessions_user_id ON transcription_sessions(user_id);
CREATE INDEX idx_sessions_started_at ON transcription_sessions(started_at);

004_usage_ledger.sql

CREATE TABLE usage_ledger (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),
  session_id UUID REFERENCES transcription_sessions(id),
  recorded_at TIMESTAMPTZ DEFAULT NOW(),
  seconds INTEGER NOT NULL,
  description TEXT  -- e.g. 'session_usage', 'credit_purchase', 'manual_adjustment'
);

CREATE INDEX idx_ledger_user_id ON usage_ledger(user_id);

Environment Variables (.env.example)

# Server
PORT=3000
NODE_ENV=development

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/transcription_proxy

# Auth
JWT_SECRET=changeme_use_long_random_string
JWT_EXPIRY=7d

# Stripe
STRIPE_SECRET_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...

# Deepgram
DEEPGRAM_API_KEY=your_deepgram_key_here

# Pricing (seconds per dollar — adjust for your margin)
# Default: 1000 seconds per $1 = $0.006/min managed cost covered + margin
CREDITS_PER_DOLLAR=1000

Phase 1 — Core Server & Auth

Goals

  • Working Express app with Postgres connection
  • Migration runner
  • User registration and login
  • JWT middleware

Tasks

  1. Scaffold project

    • npm init, install dependencies: express ws pg jsonwebtoken bcrypt stripe dotenv
    • Dev dependencies: nodemon
    • Add start and dev scripts to package.json
  2. Database connection (src/db/index.js)

    • Export a pg.Pool instance using DATABASE_URL
    • Export a migrate() function that reads src/db/migrations/*.sql in order, checks schema_migrations table, and applies unapplied ones
    • Call migrate() on server startup before listening
  3. Auth routes (src/routes/auth.js)

    • POST /auth/register — validate email/password, hash password with bcrypt (cost 12), insert user, insert empty credit_balance row, return JWT
    • POST /auth/login — verify credentials, return JWT + refresh token
    • POST /auth/refresh — validate refresh token, return new JWT
    • Passwords: minimum 8 characters, validate email format
  4. JWT middleware (src/middleware/auth.js)

    • Verify Authorization: Bearer <token> header
    • Attach req.user = { id, email } on success
    • Return 401 on failure
    • Export as requireAuth middleware
  5. Basic health check

    • GET /health returns { status: 'ok', db: 'connected' }

Phase 2 — Billing & Credits

Goals

  • Stripe Checkout session creation for credit purchases
  • Webhook handler to fulfill purchases
  • Balance endpoint

Payment Methods

Use Stripe Dynamic Payment Methods — do NOT hardcode payment_method_types in the Checkout Session. Instead, leave it unset and manage everything from the Stripe Dashboard.

Enable the following in the Stripe Dashboard under Settings → Payment Methods:

  • Cards (Visa, Mastercard, Amex, Discover) — on by default
  • PayPal — enable manually
  • Apple Pay — on by default, shows automatically on Safari/iOS
  • Google Pay — enable manually (one toggle)
  • Cash App Pay — enable manually (popular with streaming audiences)
  • Link — Stripe's saved payment network, on by default

Stripe will automatically show the most relevant methods to each user based on their location and device. No code changes are needed to add or remove methods in future — it's all dashboard config.

Credit Packages

Define these as constants in src/config.js:

CREDIT_PACKAGES: [
  { id: 'pack_500',  label: '500 minutes',  seconds: 30000,  price_cents: 300  },
  { id: 'pack_1200', label: '1200 minutes', seconds: 72000,  price_cents: 600  },
  { id: 'pack_3000', label: '3000 minutes', seconds: 180000, price_cents: 1200 },
]

Adjust pricing to cover Deepgram costs ($0.006/min = $0.0001/sec) plus margin and Stripe fees (~2.9% + $0.30).

Tasks

  1. Stripe customer creation

    • On user registration, create a Stripe customer and store stripe_customer_id
    • Do this asynchronously (don't block registration response)
  2. Billing routes (src/routes/billing.js)

    • GET /billing/packages — return credit package list (no auth required)
    • POST /billing/checkout — requires auth, accepts { package_id }, creates Stripe Checkout Session using dynamic payment methods (do NOT pass payment_method_types — omitting it enables dynamic methods automatically), include payment_intent_data.metadata containing user_id and package_id, returns { checkout_url }
    • GET /billing/balance — requires auth, returns { seconds_remaining, minutes_remaining }
  3. Stripe webhook (src/webhooks/stripe.js)

    • Mount at POST /webhooks/stripe with raw body (use express.raw() for this route only)
    • Verify signature with stripe.webhooks.constructEvent()
    • Handle checkout.session.completed:
      • Extract user_id and package_id from metadata
      • Add seconds to credit_balance
      • Insert row into usage_ledger with description 'credit_purchase'
    • Handle payment_intent.payment_failed: log it (no action needed for prepaid)
  4. Success/cancel pages

    • Stripe Checkout redirects to GET /billing/success?session_id=... and /billing/cancel
    • These can be simple HTML responses or redirects to the web dashboard

Phase 3 — WebSocket Proxy (Core Feature)

This is the most critical component. The proxy sits between the desktop client and Deepgram, forwarding audio while tracking usage in real time.

Connection Flow

Client connects → validate JWT → check credit balance → open Deepgram upstream
     ↓
Audio chunks arrive → forward to Deepgram → record usage every 5 seconds
     ↓
Transcription arrives from Deepgram → forward to client
     ↓
Client disconnects (or credits exhausted) → close upstream → finalize session

WebSocket Protocol

Client connects to: wss://your-domain/ws/transcribe

Client sends as first message (JSON):

{
  "type": "auth",
  "token": "<JWT>",
  "config": {
    "model": "nova-2",
    "language": "en-US",
    "interim_results": true,
    "endpointing": 300
  }
}

After auth success, client sends: raw audio binary frames (PCM 16kHz mono)

Server sends to client:

{ "type": "ready" }
{ "type": "transcript", "text": "...", "is_final": true, "confidence": 0.98 }
{ "type": "error", "code": "insufficient_credits", "message": "..." }
{ "type": "credits_low", "seconds_remaining": 300 }
{ "type": "session_end", "seconds_used": 120 }

Tasks (src/websocket/proxy.js)

  1. Upgrade handler

    • Attach to the HTTP server using ws.Server({ noServer: true })
    • In server.on('upgrade', ...), route /ws/transcribe to this handler
  2. Auth handshake

    • First message must be { type: 'auth', token: '...' } — received within 5 seconds or connection is terminated
    • Verify JWT, load user's credit balance from DB
    • If balance is 0 or negative, send insufficient_credits error and close
  3. Deepgram upstream connection

    • Open a WebSocket to Deepgram's streaming API: wss://api.deepgram.com/v1/listen?model=nova-2&language=en-US&interim_results=true
    • Auth header: Authorization: Token <DEEPGRAM_API_KEY>
    • Use query params from client's config object (whitelist allowed params)
  4. Audio forwarding

    • All binary messages from client → forward directly to Deepgram upstream
    • All messages from Deepgram → parse JSON, reformat, forward to client
  5. Usage tracking

    • Create a transcription_sessions row on connection
    • Maintain an in-memory secondsUsed counter per connection
    • Deepgram sends { type: 'Results', duration: X } in responses — use this for accurate second counting
    • Every 10 seconds (or on disconnect), write current secondsUsed to DB:
      • Update transcription_sessions.seconds_used
      • Decrement credit_balance.seconds_remaining
      • Insert into usage_ledger
    • If seconds_remaining hits 0: send insufficient_credits, close connection
  6. Cleanup on disconnect

    • Mark session as completed, set ended_at
    • Do final usage flush to DB
    • Close Deepgram upstream if still open
  7. Error handling

    • If Deepgram upstream closes unexpectedly, notify client and close
    • If client sends malformed data, log and continue (don't crash)

Phase 4 — Account Routes & Rate Limiting

Tasks

  1. Account routes (src/routes/account.js)

    • GET /account/me — returns { email, credits: { seconds_remaining, minutes_remaining }, created_at }
    • GET /account/usage — returns last 30 days of usage_ledger entries grouped by day, plus list of last 10 sessions with duration
  2. Rate limiting (src/middleware/rateLimit.js)

    • Use in-memory rate limiting (no Redis needed at this scale)
    • Auth endpoints: max 10 requests per minute per IP
    • WebSocket connections: max 2 concurrent connections per user (store active connections in a Map<userId, Set<ws>>)

Phase 5 — Web Dashboard

A simple, functional HTML/CSS/JS dashboard. No framework — vanilla JS is fine. This is a developer-friendly streamer tool, not a consumer SaaS, so clean and functional beats flashy.

Pages

/ (Landing / Login)

  • Brief product description (what this is, why it exists)
  • Login form and link to register
  • Link to GitHub/Gitea repo

/dashboard (Post-login)

  • Current credit balance (minutes remaining, prominently displayed)
  • "Buy Credits" section showing the three packages with Stripe Checkout buttons
  • Usage chart: last 30 days bar chart (vanilla canvas or a small CDN chart lib)
  • Recent sessions table: date, duration, status

/register

  • Registration form

Implementation Notes

  • Store JWT in localStorage, attach as Authorization header on API calls
  • Redirect to / if JWT missing or expired
  • Keep CSS minimal but readable — this is a utility dashboard

Phase 6 — Desktop App Integration

Changes needed in the local-transcription Python repo.

New file: client/remote_transcription.py

This module replaces transcription_engine_realtime.py when remote mode is active.

# Pseudocode / spec for Claude Code to implement

class RemoteTranscriptionEngine:
    """
    Connects to the transcription proxy WebSocket and streams audio.
    Provides the same callback interface as the local engine so the
    rest of the app doesn't need to change.
    """

    def __init__(self, config, on_transcript_callback):
        # config contains: server_url, auth_token (or byok_api_key), model
        ...

    def start(self):
        # Open WebSocket connection
        # Send auth message
        # Start audio capture thread (reuse existing audio_capture.py)
        ...

    def stop(self):
        # Close WebSocket gracefully
        ...

    def _on_audio_chunk(self, audio_data):
        # Called by audio_capture.py with raw PCM data
        # Send as binary WebSocket frame
        ...

    def _on_server_message(self, message):
        # Parse JSON from server
        # On type='transcript': call on_transcript_callback
        # On type='credits_low': trigger UI warning
        # On type='error': surface to user
        ...

BYOK Mode

When user provides their own Deepgram key, connect directly to Deepgram instead of the proxy:

  • Endpoint: wss://api.deepgram.com/v1/listen?...
  • Auth: Authorization: Token <user_key>
  • No session tracking (Deepgram handles billing directly to the user)
  • Same RemoteTranscriptionEngine class, just different URL and auth header

Settings Changes (gui/settings_dialog_qt.py)

Add a new "Transcription Mode" section:

Transcription Mode:
  ○ Local (Whisper)          [existing behavior]
  ○ Remote - Managed         [requires login]
  ○ Remote - BYOK            [requires Deepgram API key]

[If Managed selected]:
  Server URL: [____________]
  [Login / Register]  [View Balance: 420 min remaining]

[If BYOK selected]:
  Deepgram API Key: [____________]
  Model: [nova-2 ▼]

Config additions (config/default_config.yaml)

remote:
  mode: local           # local | managed | byok
  server_url: ""        # proxy server URL for managed mode
  auth_token: ""        # JWT stored after login
  byok_api_key: ""      # Deepgram key for BYOK mode
  deepgram_model: nova-2
  language: en-US

Build & Deployment Notes

Docker Compose (local dev)

version: '3.8'
services:
  db:
    image: postgres:15
    environment:
      POSTGRES_DB: transcription_proxy
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgresql://user:password@db:5432/transcription_proxy
    depends_on:
      - db
    volumes:
      - .:/app
      - /app/node_modules

volumes:
  pgdata:

Production Deployment

This service is a good fit for deployment on AnHonestHost WHP as a containerized app, or on a small DigitalOcean/Linode VPS. Requirements are light:

  • 512MB RAM is sufficient
  • Postgres can be the same instance as other services or managed (e.g., Supabase free tier)
  • Needs a public domain with SSL for WebSocket (wss://) to work from desktop clients

Reverse proxy config (Nginx or HAProxy) should:

  • Proxy HTTP → localhost:3000
  • Pass Upgrade and Connection headers for WebSocket support
  • Set proxy_read_timeout 3600 (sessions can be long)

Implementation Order

Build and test in this sequence:

  1. Project scaffold + DB connection + migrations
  2. Auth (register/login/JWT) — test with curl
  3. Stripe billing + webhook — test with Stripe CLI (stripe listen)
  4. WebSocket proxy — test with a simple browser WebSocket client first
  5. Usage tracking and credit decrement
  6. Account/usage routes
  7. Web dashboard
  8. Desktop app integration (separate PR in local-transcription repo)

Key Decisions & Rationale

Decision Choice Reason
Credits model Prepaid No surprise charges, simpler billing, better for irregular streamer usage
WebSocket library ws Lightweight, no abstraction overhead, plays well with raw binary audio
Auth JWT (stateless) Desktop app holds token locally; no session store needed
DB driver node-postgres (pg) No ORM overhead; schema is simple enough for raw SQL
Migrations Raw SQL files No dependency on Knex/Prisma; easy to inspect and reason about
Rate limiting In-memory Redis is overkill for this scale; single-process Node is fine initially
Frontend Vanilla JS Dashboard is simple utility UI; no framework justified

What This Plan Does NOT Cover (Future Work)

  • OAuth / social login
  • Admin panel for managing users
  • Refund / credit adjustment tooling
  • Email verification
  • Password reset flow
  • Multi-language support beyond Deepgram's defaults
  • Analytics / aggregated usage reporting
  • Self-hosted Whisper inference as a third backend option