# Deepgram Proxy Service — Build Plan ## Project Overview Build a standalone hosted service that acts as a Deepgram proxy for the Local Transcription desktop app. Users can either provide their own Deepgram API key (BYOK) or use the managed service with prepaid credits purchased via Stripe. This is a **separate repository** from `local-transcription`. The desktop app will be updated in a second phase to support both modes. --- ## Repository Structure ``` transcription-proxy/ ├── src/ │ ├── server.js # Express app entry point │ ├── config.js # Environment config loader │ ├── db/ │ │ ├── index.js # node-postgres pool setup │ │ └── migrations/ # SQL migration files (numbered) │ │ ├── 001_users.sql │ │ ├── 002_credits.sql │ │ ├── 003_sessions.sql │ │ └── 004_usage_ledger.sql │ ├── middleware/ │ │ ├── auth.js # JWT verification middleware │ │ └── rateLimit.js # Per-user rate limiting │ ├── routes/ │ │ ├── auth.js # POST /auth/register, /auth/login, /auth/refresh │ │ ├── billing.js # POST /billing/checkout, GET /billing/balance │ │ └── account.js # GET /account/me, GET /account/usage │ ├── websocket/ │ │ └── proxy.js # WebSocket proxy handler (core feature) │ └── webhooks/ │ └── stripe.js # POST /webhooks/stripe ├── web/ # Simple frontend dashboard │ ├── index.html # Landing / login page │ ├── dashboard.html # Balance, usage history, buy credits │ └── assets/ │ ├── app.js │ └── style.css ├── .env.example ├── package.json ├── docker-compose.yml # Postgres + app for local dev └── CLAUDE.md # This file (after renaming) ``` --- ## Technology Stack - **Runtime**: Node.js 20+ - **Framework**: Express 4 - **WebSocket**: `ws` library (not socket.io — keep it lean) - **Database**: PostgreSQL 15+ via `pg` (node-postgres) - **Auth**: JWT via `jsonwebtoken`, passwords hashed with `bcrypt` - **Payments**: Stripe Node SDK (`stripe`) - **Environment**: `dotenv` - **Dev tooling**: `nodemon` for dev, no TypeScript (keep it simple) --- ## Database Schema Run migrations in order. Use a simple `schema_migrations` table to track applied migrations. ### 001_users.sql ```sql CREATE TABLE schema_migrations ( version INTEGER PRIMARY KEY, applied_at TIMESTAMPTZ DEFAULT NOW() ); CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email TEXT UNIQUE NOT NULL, password_hash TEXT NOT NULL, stripe_customer_id TEXT UNIQUE, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); ``` ### 002_credits.sql ```sql CREATE TABLE credit_balance ( user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE, seconds_remaining INTEGER NOT NULL DEFAULT 0, updated_at TIMESTAMPTZ DEFAULT NOW() ); ``` ### 003_sessions.sql ```sql CREATE TABLE transcription_sessions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(id), mode TEXT NOT NULL CHECK (mode IN ('managed', 'byok')), started_at TIMESTAMPTZ DEFAULT NOW(), ended_at TIMESTAMPTZ, seconds_used INTEGER NOT NULL DEFAULT 0, deepgram_model TEXT, status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'completed', 'terminated')) ); CREATE INDEX idx_sessions_user_id ON transcription_sessions(user_id); CREATE INDEX idx_sessions_started_at ON transcription_sessions(started_at); ``` ### 004_usage_ledger.sql ```sql CREATE TABLE usage_ledger ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(id), session_id UUID REFERENCES transcription_sessions(id), recorded_at TIMESTAMPTZ DEFAULT NOW(), seconds INTEGER NOT NULL, description TEXT -- e.g. 'session_usage', 'credit_purchase', 'manual_adjustment' ); CREATE INDEX idx_ledger_user_id ON usage_ledger(user_id); ``` --- ## Environment Variables (.env.example) ```env # Server PORT=3000 NODE_ENV=development # Database DATABASE_URL=postgresql://user:password@localhost:5432/transcription_proxy # Auth JWT_SECRET=changeme_use_long_random_string JWT_EXPIRY=7d # Stripe STRIPE_SECRET_KEY=sk_test_... STRIPE_WEBHOOK_SECRET=whsec_... # Deepgram DEEPGRAM_API_KEY=your_deepgram_key_here # Pricing (seconds per dollar — adjust for your margin) # Default: 1000 seconds per $1 = $0.006/min managed cost covered + margin CREDITS_PER_DOLLAR=1000 ``` --- ## Phase 1 — Core Server & Auth ### Goals - Working Express app with Postgres connection - Migration runner - User registration and login - JWT middleware ### Tasks 1. **Scaffold project** - `npm init`, install dependencies: `express ws pg jsonwebtoken bcrypt stripe dotenv` - Dev dependencies: `nodemon` - Add `start` and `dev` scripts to package.json 2. **Database connection** (`src/db/index.js`) - Export a `pg.Pool` instance using `DATABASE_URL` - Export a `migrate()` function that reads `src/db/migrations/*.sql` in order, checks `schema_migrations` table, and applies unapplied ones - Call `migrate()` on server startup before listening 3. **Auth routes** (`src/routes/auth.js`) - `POST /auth/register` — validate email/password, hash password with bcrypt (cost 12), insert user, insert empty credit_balance row, return JWT - `POST /auth/login` — verify credentials, return JWT + refresh token - `POST /auth/refresh` — validate refresh token, return new JWT - Passwords: minimum 8 characters, validate email format 4. **JWT middleware** (`src/middleware/auth.js`) - Verify `Authorization: Bearer ` header - Attach `req.user = { id, email }` on success - Return 401 on failure - Export as `requireAuth` middleware 5. **Basic health check** - `GET /health` returns `{ status: 'ok', db: 'connected' }` --- ## Phase 2 — Billing & Credits ### Goals - Stripe Checkout session creation for credit purchases - Webhook handler to fulfill purchases - Balance endpoint ### Payment Methods Use **Stripe Dynamic Payment Methods** — do NOT hardcode `payment_method_types` in the Checkout Session. Instead, leave it unset and manage everything from the Stripe Dashboard. Enable the following in the Stripe Dashboard under Settings → Payment Methods: - **Cards** (Visa, Mastercard, Amex, Discover) — on by default - **PayPal** — enable manually - **Apple Pay** — on by default, shows automatically on Safari/iOS - **Google Pay** — enable manually (one toggle) - **Cash App Pay** — enable manually (popular with streaming audiences) - **Link** — Stripe's saved payment network, on by default Stripe will automatically show the most relevant methods to each user based on their location and device. No code changes are needed to add or remove methods in future — it's all dashboard config. ### Credit Packages Define these as constants in `src/config.js`: ```javascript CREDIT_PACKAGES: [ { id: 'pack_500', label: '500 minutes', seconds: 30000, price_cents: 300 }, { id: 'pack_1200', label: '1200 minutes', seconds: 72000, price_cents: 600 }, { id: 'pack_3000', label: '3000 minutes', seconds: 180000, price_cents: 1200 }, ] ``` Adjust pricing to cover Deepgram costs ($0.006/min = $0.0001/sec) plus margin and Stripe fees (~2.9% + $0.30). ### Tasks 1. **Stripe customer creation** - On user registration, create a Stripe customer and store `stripe_customer_id` - Do this asynchronously (don't block registration response) 2. **Billing routes** (`src/routes/billing.js`) - `GET /billing/packages` — return credit package list (no auth required) - `POST /billing/checkout` — requires auth, accepts `{ package_id }`, creates Stripe Checkout Session using dynamic payment methods (do NOT pass `payment_method_types` — omitting it enables dynamic methods automatically), include `payment_intent_data.metadata` containing `user_id` and `package_id`, returns `{ checkout_url }` - `GET /billing/balance` — requires auth, returns `{ seconds_remaining, minutes_remaining }` 3. **Stripe webhook** (`src/webhooks/stripe.js`) - Mount at `POST /webhooks/stripe` with raw body (use `express.raw()` for this route only) - Verify signature with `stripe.webhooks.constructEvent()` - Handle `checkout.session.completed`: - Extract `user_id` and `package_id` from metadata - Add seconds to `credit_balance` - Insert row into `usage_ledger` with description `'credit_purchase'` - Handle `payment_intent.payment_failed`: log it (no action needed for prepaid) 4. **Success/cancel pages** - Stripe Checkout redirects to `GET /billing/success?session_id=...` and `/billing/cancel` - These can be simple HTML responses or redirects to the web dashboard --- ## Phase 3 — WebSocket Proxy (Core Feature) This is the most critical component. The proxy sits between the desktop client and Deepgram, forwarding audio while tracking usage in real time. ### Connection Flow ``` Client connects → validate JWT → check credit balance → open Deepgram upstream ↓ Audio chunks arrive → forward to Deepgram → record usage every 5 seconds ↓ Transcription arrives from Deepgram → forward to client ↓ Client disconnects (or credits exhausted) → close upstream → finalize session ``` ### WebSocket Protocol **Client connects to**: `wss://your-domain/ws/transcribe` **Client sends as first message** (JSON): ```json { "type": "auth", "token": "", "config": { "model": "nova-2", "language": "en-US", "interim_results": true, "endpointing": 300 } } ``` **After auth success, client sends**: raw audio binary frames (PCM 16kHz mono) **Server sends to client**: ```json { "type": "ready" } { "type": "transcript", "text": "...", "is_final": true, "confidence": 0.98 } { "type": "error", "code": "insufficient_credits", "message": "..." } { "type": "credits_low", "seconds_remaining": 300 } { "type": "session_end", "seconds_used": 120 } ``` ### Tasks (`src/websocket/proxy.js`) 1. **Upgrade handler** - Attach to the HTTP server using `ws.Server({ noServer: true })` - In `server.on('upgrade', ...)`, route `/ws/transcribe` to this handler 2. **Auth handshake** - First message must be `{ type: 'auth', token: '...' }` — received within 5 seconds or connection is terminated - Verify JWT, load user's credit balance from DB - If balance is 0 or negative, send `insufficient_credits` error and close 3. **Deepgram upstream connection** - Open a WebSocket to Deepgram's streaming API: `wss://api.deepgram.com/v1/listen?model=nova-2&language=en-US&interim_results=true` - Auth header: `Authorization: Token ` - Use query params from client's `config` object (whitelist allowed params) 4. **Audio forwarding** - All binary messages from client → forward directly to Deepgram upstream - All messages from Deepgram → parse JSON, reformat, forward to client 5. **Usage tracking** - Create a `transcription_sessions` row on connection - Maintain an in-memory `secondsUsed` counter per connection - Deepgram sends `{ type: 'Results', duration: X }` in responses — use this for accurate second counting - Every 10 seconds (or on disconnect), write current `secondsUsed` to DB: - Update `transcription_sessions.seconds_used` - Decrement `credit_balance.seconds_remaining` - Insert into `usage_ledger` - If `seconds_remaining` hits 0: send `insufficient_credits`, close connection 6. **Cleanup on disconnect** - Mark session as `completed`, set `ended_at` - Do final usage flush to DB - Close Deepgram upstream if still open 7. **Error handling** - If Deepgram upstream closes unexpectedly, notify client and close - If client sends malformed data, log and continue (don't crash) --- ## Phase 4 — Account Routes & Rate Limiting ### Tasks 1. **Account routes** (`src/routes/account.js`) - `GET /account/me` — returns `{ email, credits: { seconds_remaining, minutes_remaining }, created_at }` - `GET /account/usage` — returns last 30 days of `usage_ledger` entries grouped by day, plus list of last 10 sessions with duration 2. **Rate limiting** (`src/middleware/rateLimit.js`) - Use in-memory rate limiting (no Redis needed at this scale) - Auth endpoints: max 10 requests per minute per IP - WebSocket connections: max 2 concurrent connections per user (store active connections in a `Map>`) --- ## Phase 5 — Web Dashboard A simple, functional HTML/CSS/JS dashboard. No framework — vanilla JS is fine. This is a developer-friendly streamer tool, not a consumer SaaS, so clean and functional beats flashy. ### Pages **`/` (Landing / Login)** - Brief product description (what this is, why it exists) - Login form and link to register - Link to GitHub/Gitea repo **`/dashboard` (Post-login)** - Current credit balance (minutes remaining, prominently displayed) - "Buy Credits" section showing the three packages with Stripe Checkout buttons - Usage chart: last 30 days bar chart (vanilla canvas or a small CDN chart lib) - Recent sessions table: date, duration, status **`/register`** - Registration form ### Implementation Notes - Store JWT in `localStorage`, attach as `Authorization` header on API calls - Redirect to `/` if JWT missing or expired - Keep CSS minimal but readable — this is a utility dashboard --- ## Phase 6 — Desktop App Integration Changes needed in the `local-transcription` Python repo. ### New file: `client/remote_transcription.py` This module replaces `transcription_engine_realtime.py` when remote mode is active. ```python # Pseudocode / spec for Claude Code to implement class RemoteTranscriptionEngine: """ Connects to the transcription proxy WebSocket and streams audio. Provides the same callback interface as the local engine so the rest of the app doesn't need to change. """ def __init__(self, config, on_transcript_callback): # config contains: server_url, auth_token (or byok_api_key), model ... def start(self): # Open WebSocket connection # Send auth message # Start audio capture thread (reuse existing audio_capture.py) ... def stop(self): # Close WebSocket gracefully ... def _on_audio_chunk(self, audio_data): # Called by audio_capture.py with raw PCM data # Send as binary WebSocket frame ... def _on_server_message(self, message): # Parse JSON from server # On type='transcript': call on_transcript_callback # On type='credits_low': trigger UI warning # On type='error': surface to user ... ``` ### BYOK Mode When user provides their own Deepgram key, connect directly to Deepgram instead of the proxy: - Endpoint: `wss://api.deepgram.com/v1/listen?...` - Auth: `Authorization: Token ` - No session tracking (Deepgram handles billing directly to the user) - Same `RemoteTranscriptionEngine` class, just different URL and auth header ### Settings Changes (`gui/settings_dialog_qt.py`) Add a new "Transcription Mode" section: ``` Transcription Mode: ○ Local (Whisper) [existing behavior] ○ Remote - Managed [requires login] ○ Remote - BYOK [requires Deepgram API key] [If Managed selected]: Server URL: [____________] [Login / Register] [View Balance: 420 min remaining] [If BYOK selected]: Deepgram API Key: [____________] Model: [nova-2 ▼] ``` ### Config additions (`config/default_config.yaml`) ```yaml remote: mode: local # local | managed | byok server_url: "" # proxy server URL for managed mode auth_token: "" # JWT stored after login byok_api_key: "" # Deepgram key for BYOK mode deepgram_model: nova-2 language: en-US ``` --- ## Build & Deployment Notes ### Docker Compose (local dev) ```yaml version: '3.8' services: db: image: postgres:15 environment: POSTGRES_DB: transcription_proxy POSTGRES_USER: user POSTGRES_PASSWORD: password ports: - "5432:5432" volumes: - pgdata:/var/lib/postgresql/data app: build: . ports: - "3000:3000" environment: DATABASE_URL: postgresql://user:password@db:5432/transcription_proxy depends_on: - db volumes: - .:/app - /app/node_modules volumes: pgdata: ``` ### Production Deployment This service is a good fit for deployment on AnHonestHost WHP as a containerized app, or on a small DigitalOcean/Linode VPS. Requirements are light: - 512MB RAM is sufficient - Postgres can be the same instance as other services or managed (e.g., Supabase free tier) - Needs a public domain with SSL for WebSocket (`wss://`) to work from desktop clients Reverse proxy config (Nginx or HAProxy) should: - Proxy HTTP → `localhost:3000` - Pass `Upgrade` and `Connection` headers for WebSocket support - Set `proxy_read_timeout 3600` (sessions can be long) --- ## Implementation Order Build and test in this sequence: 1. Project scaffold + DB connection + migrations 2. Auth (register/login/JWT) — test with curl 3. Stripe billing + webhook — test with Stripe CLI (`stripe listen`) 4. WebSocket proxy — test with a simple browser WebSocket client first 5. Usage tracking and credit decrement 6. Account/usage routes 7. Web dashboard 8. Desktop app integration (separate PR in local-transcription repo) --- ## Key Decisions & Rationale | Decision | Choice | Reason | |---|---|---| | Credits model | Prepaid | No surprise charges, simpler billing, better for irregular streamer usage | | WebSocket library | `ws` | Lightweight, no abstraction overhead, plays well with raw binary audio | | Auth | JWT (stateless) | Desktop app holds token locally; no session store needed | | DB driver | `node-postgres` (pg) | No ORM overhead; schema is simple enough for raw SQL | | Migrations | Raw SQL files | No dependency on Knex/Prisma; easy to inspect and reason about | | Rate limiting | In-memory | Redis is overkill for this scale; single-process Node is fine initially | | Frontend | Vanilla JS | Dashboard is simple utility UI; no framework justified | --- ## What This Plan Does NOT Cover (Future Work) - OAuth / social login - Admin panel for managing users - Refund / credit adjustment tooling - Email verification - Password reset flow - Multi-language support beyond Deepgram's defaults - Analytics / aggregated usage reporting - Self-hosted Whisper inference as a third backend option