575 lines
18 KiB
Markdown
575 lines
18 KiB
Markdown
|
|
# Deepgram Proxy Service — Build Plan
|
||
|
|
|
||
|
|
## Project Overview
|
||
|
|
|
||
|
|
Build a standalone hosted service that acts as a Deepgram proxy for the Local Transcription
|
||
|
|
desktop app. Users can either provide their own Deepgram API key (BYOK) or use the managed
|
||
|
|
service with prepaid credits purchased via Stripe.
|
||
|
|
|
||
|
|
This is a **separate repository** from `local-transcription`. The desktop app will be updated
|
||
|
|
in a second phase to support both modes.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Repository Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
transcription-proxy/
|
||
|
|
├── src/
|
||
|
|
│ ├── server.js # Express app entry point
|
||
|
|
│ ├── config.js # Environment config loader
|
||
|
|
│ ├── db/
|
||
|
|
│ │ ├── index.js # node-postgres pool setup
|
||
|
|
│ │ └── migrations/ # SQL migration files (numbered)
|
||
|
|
│ │ ├── 001_users.sql
|
||
|
|
│ │ ├── 002_credits.sql
|
||
|
|
│ │ ├── 003_sessions.sql
|
||
|
|
│ │ └── 004_usage_ledger.sql
|
||
|
|
│ ├── middleware/
|
||
|
|
│ │ ├── auth.js # JWT verification middleware
|
||
|
|
│ │ └── rateLimit.js # Per-user rate limiting
|
||
|
|
│ ├── routes/
|
||
|
|
│ │ ├── auth.js # POST /auth/register, /auth/login, /auth/refresh
|
||
|
|
│ │ ├── billing.js # POST /billing/checkout, GET /billing/balance
|
||
|
|
│ │ └── account.js # GET /account/me, GET /account/usage
|
||
|
|
│ ├── websocket/
|
||
|
|
│ │ └── proxy.js # WebSocket proxy handler (core feature)
|
||
|
|
│ └── webhooks/
|
||
|
|
│ └── stripe.js # POST /webhooks/stripe
|
||
|
|
├── web/ # Simple frontend dashboard
|
||
|
|
│ ├── index.html # Landing / login page
|
||
|
|
│ ├── dashboard.html # Balance, usage history, buy credits
|
||
|
|
│ └── assets/
|
||
|
|
│ ├── app.js
|
||
|
|
│ └── style.css
|
||
|
|
├── .env.example
|
||
|
|
├── package.json
|
||
|
|
├── docker-compose.yml # Postgres + app for local dev
|
||
|
|
└── CLAUDE.md # This file (after renaming)
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Technology Stack
|
||
|
|
|
||
|
|
- **Runtime**: Node.js 20+
|
||
|
|
- **Framework**: Express 4
|
||
|
|
- **WebSocket**: `ws` library (not socket.io — keep it lean)
|
||
|
|
- **Database**: PostgreSQL 15+ via `pg` (node-postgres)
|
||
|
|
- **Auth**: JWT via `jsonwebtoken`, passwords hashed with `bcrypt`
|
||
|
|
- **Payments**: Stripe Node SDK (`stripe`)
|
||
|
|
- **Environment**: `dotenv`
|
||
|
|
- **Dev tooling**: `nodemon` for dev, no TypeScript (keep it simple)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Database Schema
|
||
|
|
|
||
|
|
Run migrations in order. Use a simple `schema_migrations` table to track applied migrations.
|
||
|
|
|
||
|
|
### 001_users.sql
|
||
|
|
```sql
|
||
|
|
CREATE TABLE schema_migrations (
|
||
|
|
version INTEGER PRIMARY KEY,
|
||
|
|
applied_at TIMESTAMPTZ DEFAULT NOW()
|
||
|
|
);
|
||
|
|
|
||
|
|
CREATE TABLE users (
|
||
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
|
|
email TEXT UNIQUE NOT NULL,
|
||
|
|
password_hash TEXT NOT NULL,
|
||
|
|
stripe_customer_id TEXT UNIQUE,
|
||
|
|
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||
|
|
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
### 002_credits.sql
|
||
|
|
```sql
|
||
|
|
CREATE TABLE credit_balance (
|
||
|
|
user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
|
||
|
|
seconds_remaining INTEGER NOT NULL DEFAULT 0,
|
||
|
|
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
### 003_sessions.sql
|
||
|
|
```sql
|
||
|
|
CREATE TABLE transcription_sessions (
|
||
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
|
|
user_id UUID NOT NULL REFERENCES users(id),
|
||
|
|
mode TEXT NOT NULL CHECK (mode IN ('managed', 'byok')),
|
||
|
|
started_at TIMESTAMPTZ DEFAULT NOW(),
|
||
|
|
ended_at TIMESTAMPTZ,
|
||
|
|
seconds_used INTEGER NOT NULL DEFAULT 0,
|
||
|
|
deepgram_model TEXT,
|
||
|
|
status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'completed', 'terminated'))
|
||
|
|
);
|
||
|
|
|
||
|
|
CREATE INDEX idx_sessions_user_id ON transcription_sessions(user_id);
|
||
|
|
CREATE INDEX idx_sessions_started_at ON transcription_sessions(started_at);
|
||
|
|
```
|
||
|
|
|
||
|
|
### 004_usage_ledger.sql
|
||
|
|
```sql
|
||
|
|
CREATE TABLE usage_ledger (
|
||
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
|
|
user_id UUID NOT NULL REFERENCES users(id),
|
||
|
|
session_id UUID REFERENCES transcription_sessions(id),
|
||
|
|
recorded_at TIMESTAMPTZ DEFAULT NOW(),
|
||
|
|
seconds INTEGER NOT NULL,
|
||
|
|
description TEXT -- e.g. 'session_usage', 'credit_purchase', 'manual_adjustment'
|
||
|
|
);
|
||
|
|
|
||
|
|
CREATE INDEX idx_ledger_user_id ON usage_ledger(user_id);
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Environment Variables (.env.example)
|
||
|
|
|
||
|
|
```env
|
||
|
|
# Server
|
||
|
|
PORT=3000
|
||
|
|
NODE_ENV=development
|
||
|
|
|
||
|
|
# Database
|
||
|
|
DATABASE_URL=postgresql://user:password@localhost:5432/transcription_proxy
|
||
|
|
|
||
|
|
# Auth
|
||
|
|
JWT_SECRET=changeme_use_long_random_string
|
||
|
|
JWT_EXPIRY=7d
|
||
|
|
|
||
|
|
# Stripe
|
||
|
|
STRIPE_SECRET_KEY=sk_test_...
|
||
|
|
STRIPE_WEBHOOK_SECRET=whsec_...
|
||
|
|
|
||
|
|
# Deepgram
|
||
|
|
DEEPGRAM_API_KEY=your_deepgram_key_here
|
||
|
|
|
||
|
|
# Pricing (seconds per dollar — adjust for your margin)
|
||
|
|
# Default: 1000 seconds per $1 = $0.006/min managed cost covered + margin
|
||
|
|
CREDITS_PER_DOLLAR=1000
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 1 — Core Server & Auth
|
||
|
|
|
||
|
|
### Goals
|
||
|
|
- Working Express app with Postgres connection
|
||
|
|
- Migration runner
|
||
|
|
- User registration and login
|
||
|
|
- JWT middleware
|
||
|
|
|
||
|
|
### Tasks
|
||
|
|
|
||
|
|
1. **Scaffold project**
|
||
|
|
- `npm init`, install dependencies: `express ws pg jsonwebtoken bcrypt stripe dotenv`
|
||
|
|
- Dev dependencies: `nodemon`
|
||
|
|
- Add `start` and `dev` scripts to package.json
|
||
|
|
|
||
|
|
2. **Database connection** (`src/db/index.js`)
|
||
|
|
- Export a `pg.Pool` instance using `DATABASE_URL`
|
||
|
|
- Export a `migrate()` function that reads `src/db/migrations/*.sql` in order,
|
||
|
|
checks `schema_migrations` table, and applies unapplied ones
|
||
|
|
- Call `migrate()` on server startup before listening
|
||
|
|
|
||
|
|
3. **Auth routes** (`src/routes/auth.js`)
|
||
|
|
- `POST /auth/register` — validate email/password, hash password with bcrypt (cost 12),
|
||
|
|
insert user, insert empty credit_balance row, return JWT
|
||
|
|
- `POST /auth/login` — verify credentials, return JWT + refresh token
|
||
|
|
- `POST /auth/refresh` — validate refresh token, return new JWT
|
||
|
|
- Passwords: minimum 8 characters, validate email format
|
||
|
|
|
||
|
|
4. **JWT middleware** (`src/middleware/auth.js`)
|
||
|
|
- Verify `Authorization: Bearer <token>` header
|
||
|
|
- Attach `req.user = { id, email }` on success
|
||
|
|
- Return 401 on failure
|
||
|
|
- Export as `requireAuth` middleware
|
||
|
|
|
||
|
|
5. **Basic health check**
|
||
|
|
- `GET /health` returns `{ status: 'ok', db: 'connected' }`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 2 — Billing & Credits
|
||
|
|
|
||
|
|
### Goals
|
||
|
|
- Stripe Checkout session creation for credit purchases
|
||
|
|
- Webhook handler to fulfill purchases
|
||
|
|
- Balance endpoint
|
||
|
|
|
||
|
|
### Payment Methods
|
||
|
|
|
||
|
|
Use **Stripe Dynamic Payment Methods** — do NOT hardcode `payment_method_types` in the
|
||
|
|
Checkout Session. Instead, leave it unset and manage everything from the Stripe Dashboard.
|
||
|
|
|
||
|
|
Enable the following in the Stripe Dashboard under Settings → Payment Methods:
|
||
|
|
- **Cards** (Visa, Mastercard, Amex, Discover) — on by default
|
||
|
|
- **PayPal** — enable manually
|
||
|
|
- **Apple Pay** — on by default, shows automatically on Safari/iOS
|
||
|
|
- **Google Pay** — enable manually (one toggle)
|
||
|
|
- **Cash App Pay** — enable manually (popular with streaming audiences)
|
||
|
|
- **Link** — Stripe's saved payment network, on by default
|
||
|
|
|
||
|
|
Stripe will automatically show the most relevant methods to each user based on their
|
||
|
|
location and device. No code changes are needed to add or remove methods in future —
|
||
|
|
it's all dashboard config.
|
||
|
|
|
||
|
|
### Credit Packages
|
||
|
|
|
||
|
|
Define these as constants in `src/config.js`:
|
||
|
|
|
||
|
|
```javascript
|
||
|
|
CREDIT_PACKAGES: [
|
||
|
|
{ id: 'pack_500', label: '500 minutes', seconds: 30000, price_cents: 300 },
|
||
|
|
{ id: 'pack_1200', label: '1200 minutes', seconds: 72000, price_cents: 600 },
|
||
|
|
{ id: 'pack_3000', label: '3000 minutes', seconds: 180000, price_cents: 1200 },
|
||
|
|
]
|
||
|
|
```
|
||
|
|
|
||
|
|
Adjust pricing to cover Deepgram costs ($0.006/min = $0.0001/sec) plus margin and
|
||
|
|
Stripe fees (~2.9% + $0.30).
|
||
|
|
|
||
|
|
### Tasks
|
||
|
|
|
||
|
|
1. **Stripe customer creation**
|
||
|
|
- On user registration, create a Stripe customer and store `stripe_customer_id`
|
||
|
|
- Do this asynchronously (don't block registration response)
|
||
|
|
|
||
|
|
2. **Billing routes** (`src/routes/billing.js`)
|
||
|
|
- `GET /billing/packages` — return credit package list (no auth required)
|
||
|
|
- `POST /billing/checkout` — requires auth, accepts `{ package_id }`,
|
||
|
|
creates Stripe Checkout Session using dynamic payment methods (do NOT pass
|
||
|
|
`payment_method_types` — omitting it enables dynamic methods automatically),
|
||
|
|
include `payment_intent_data.metadata` containing `user_id` and `package_id`,
|
||
|
|
returns `{ checkout_url }`
|
||
|
|
- `GET /billing/balance` — requires auth, returns `{ seconds_remaining, minutes_remaining }`
|
||
|
|
|
||
|
|
3. **Stripe webhook** (`src/webhooks/stripe.js`)
|
||
|
|
- Mount at `POST /webhooks/stripe` with raw body (use `express.raw()` for this route only)
|
||
|
|
- Verify signature with `stripe.webhooks.constructEvent()`
|
||
|
|
- Handle `checkout.session.completed`:
|
||
|
|
- Extract `user_id` and `package_id` from metadata
|
||
|
|
- Add seconds to `credit_balance`
|
||
|
|
- Insert row into `usage_ledger` with description `'credit_purchase'`
|
||
|
|
- Handle `payment_intent.payment_failed`: log it (no action needed for prepaid)
|
||
|
|
|
||
|
|
4. **Success/cancel pages**
|
||
|
|
- Stripe Checkout redirects to `GET /billing/success?session_id=...` and `/billing/cancel`
|
||
|
|
- These can be simple HTML responses or redirects to the web dashboard
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 3 — WebSocket Proxy (Core Feature)
|
||
|
|
|
||
|
|
This is the most critical component. The proxy sits between the desktop client and Deepgram,
|
||
|
|
forwarding audio while tracking usage in real time.
|
||
|
|
|
||
|
|
### Connection Flow
|
||
|
|
|
||
|
|
```
|
||
|
|
Client connects → validate JWT → check credit balance → open Deepgram upstream
|
||
|
|
↓
|
||
|
|
Audio chunks arrive → forward to Deepgram → record usage every 5 seconds
|
||
|
|
↓
|
||
|
|
Transcription arrives from Deepgram → forward to client
|
||
|
|
↓
|
||
|
|
Client disconnects (or credits exhausted) → close upstream → finalize session
|
||
|
|
```
|
||
|
|
|
||
|
|
### WebSocket Protocol
|
||
|
|
|
||
|
|
**Client connects to**: `wss://your-domain/ws/transcribe`
|
||
|
|
|
||
|
|
**Client sends as first message** (JSON):
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "auth",
|
||
|
|
"token": "<JWT>",
|
||
|
|
"config": {
|
||
|
|
"model": "nova-2",
|
||
|
|
"language": "en-US",
|
||
|
|
"interim_results": true,
|
||
|
|
"endpointing": 300
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**After auth success, client sends**: raw audio binary frames (PCM 16kHz mono)
|
||
|
|
|
||
|
|
**Server sends to client**:
|
||
|
|
```json
|
||
|
|
{ "type": "ready" }
|
||
|
|
{ "type": "transcript", "text": "...", "is_final": true, "confidence": 0.98 }
|
||
|
|
{ "type": "error", "code": "insufficient_credits", "message": "..." }
|
||
|
|
{ "type": "credits_low", "seconds_remaining": 300 }
|
||
|
|
{ "type": "session_end", "seconds_used": 120 }
|
||
|
|
```
|
||
|
|
|
||
|
|
### Tasks (`src/websocket/proxy.js`)
|
||
|
|
|
||
|
|
1. **Upgrade handler**
|
||
|
|
- Attach to the HTTP server using `ws.Server({ noServer: true })`
|
||
|
|
- In `server.on('upgrade', ...)`, route `/ws/transcribe` to this handler
|
||
|
|
|
||
|
|
2. **Auth handshake**
|
||
|
|
- First message must be `{ type: 'auth', token: '...' }` — received within 5 seconds
|
||
|
|
or connection is terminated
|
||
|
|
- Verify JWT, load user's credit balance from DB
|
||
|
|
- If balance is 0 or negative, send `insufficient_credits` error and close
|
||
|
|
|
||
|
|
3. **Deepgram upstream connection**
|
||
|
|
- Open a WebSocket to Deepgram's streaming API:
|
||
|
|
`wss://api.deepgram.com/v1/listen?model=nova-2&language=en-US&interim_results=true`
|
||
|
|
- Auth header: `Authorization: Token <DEEPGRAM_API_KEY>`
|
||
|
|
- Use query params from client's `config` object (whitelist allowed params)
|
||
|
|
|
||
|
|
4. **Audio forwarding**
|
||
|
|
- All binary messages from client → forward directly to Deepgram upstream
|
||
|
|
- All messages from Deepgram → parse JSON, reformat, forward to client
|
||
|
|
|
||
|
|
5. **Usage tracking**
|
||
|
|
- Create a `transcription_sessions` row on connection
|
||
|
|
- Maintain an in-memory `secondsUsed` counter per connection
|
||
|
|
- Deepgram sends `{ type: 'Results', duration: X }` in responses — use this for
|
||
|
|
accurate second counting
|
||
|
|
- Every 10 seconds (or on disconnect), write current `secondsUsed` to DB:
|
||
|
|
- Update `transcription_sessions.seconds_used`
|
||
|
|
- Decrement `credit_balance.seconds_remaining`
|
||
|
|
- Insert into `usage_ledger`
|
||
|
|
- If `seconds_remaining` hits 0: send `insufficient_credits`, close connection
|
||
|
|
|
||
|
|
6. **Cleanup on disconnect**
|
||
|
|
- Mark session as `completed`, set `ended_at`
|
||
|
|
- Do final usage flush to DB
|
||
|
|
- Close Deepgram upstream if still open
|
||
|
|
|
||
|
|
7. **Error handling**
|
||
|
|
- If Deepgram upstream closes unexpectedly, notify client and close
|
||
|
|
- If client sends malformed data, log and continue (don't crash)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 4 — Account Routes & Rate Limiting
|
||
|
|
|
||
|
|
### Tasks
|
||
|
|
|
||
|
|
1. **Account routes** (`src/routes/account.js`)
|
||
|
|
- `GET /account/me` — returns `{ email, credits: { seconds_remaining, minutes_remaining }, created_at }`
|
||
|
|
- `GET /account/usage` — returns last 30 days of `usage_ledger` entries grouped by day,
|
||
|
|
plus list of last 10 sessions with duration
|
||
|
|
|
||
|
|
2. **Rate limiting** (`src/middleware/rateLimit.js`)
|
||
|
|
- Use in-memory rate limiting (no Redis needed at this scale)
|
||
|
|
- Auth endpoints: max 10 requests per minute per IP
|
||
|
|
- WebSocket connections: max 2 concurrent connections per user
|
||
|
|
(store active connections in a `Map<userId, Set<ws>>`)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 5 — Web Dashboard
|
||
|
|
|
||
|
|
A simple, functional HTML/CSS/JS dashboard. No framework — vanilla JS is fine.
|
||
|
|
This is a developer-friendly streamer tool, not a consumer SaaS, so clean and
|
||
|
|
functional beats flashy.
|
||
|
|
|
||
|
|
### Pages
|
||
|
|
|
||
|
|
**`/` (Landing / Login)**
|
||
|
|
- Brief product description (what this is, why it exists)
|
||
|
|
- Login form and link to register
|
||
|
|
- Link to GitHub/Gitea repo
|
||
|
|
|
||
|
|
**`/dashboard` (Post-login)**
|
||
|
|
- Current credit balance (minutes remaining, prominently displayed)
|
||
|
|
- "Buy Credits" section showing the three packages with Stripe Checkout buttons
|
||
|
|
- Usage chart: last 30 days bar chart (vanilla canvas or a small CDN chart lib)
|
||
|
|
- Recent sessions table: date, duration, status
|
||
|
|
|
||
|
|
**`/register`**
|
||
|
|
- Registration form
|
||
|
|
|
||
|
|
### Implementation Notes
|
||
|
|
- Store JWT in `localStorage`, attach as `Authorization` header on API calls
|
||
|
|
- Redirect to `/` if JWT missing or expired
|
||
|
|
- Keep CSS minimal but readable — this is a utility dashboard
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 6 — Desktop App Integration
|
||
|
|
|
||
|
|
Changes needed in the `local-transcription` Python repo.
|
||
|
|
|
||
|
|
### New file: `client/remote_transcription.py`
|
||
|
|
|
||
|
|
This module replaces `transcription_engine_realtime.py` when remote mode is active.
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Pseudocode / spec for Claude Code to implement
|
||
|
|
|
||
|
|
class RemoteTranscriptionEngine:
|
||
|
|
"""
|
||
|
|
Connects to the transcription proxy WebSocket and streams audio.
|
||
|
|
Provides the same callback interface as the local engine so the
|
||
|
|
rest of the app doesn't need to change.
|
||
|
|
"""
|
||
|
|
|
||
|
|
def __init__(self, config, on_transcript_callback):
|
||
|
|
# config contains: server_url, auth_token (or byok_api_key), model
|
||
|
|
...
|
||
|
|
|
||
|
|
def start(self):
|
||
|
|
# Open WebSocket connection
|
||
|
|
# Send auth message
|
||
|
|
# Start audio capture thread (reuse existing audio_capture.py)
|
||
|
|
...
|
||
|
|
|
||
|
|
def stop(self):
|
||
|
|
# Close WebSocket gracefully
|
||
|
|
...
|
||
|
|
|
||
|
|
def _on_audio_chunk(self, audio_data):
|
||
|
|
# Called by audio_capture.py with raw PCM data
|
||
|
|
# Send as binary WebSocket frame
|
||
|
|
...
|
||
|
|
|
||
|
|
def _on_server_message(self, message):
|
||
|
|
# Parse JSON from server
|
||
|
|
# On type='transcript': call on_transcript_callback
|
||
|
|
# On type='credits_low': trigger UI warning
|
||
|
|
# On type='error': surface to user
|
||
|
|
...
|
||
|
|
```
|
||
|
|
|
||
|
|
### BYOK Mode
|
||
|
|
|
||
|
|
When user provides their own Deepgram key, connect directly to Deepgram instead of the proxy:
|
||
|
|
- Endpoint: `wss://api.deepgram.com/v1/listen?...`
|
||
|
|
- Auth: `Authorization: Token <user_key>`
|
||
|
|
- No session tracking (Deepgram handles billing directly to the user)
|
||
|
|
- Same `RemoteTranscriptionEngine` class, just different URL and auth header
|
||
|
|
|
||
|
|
### Settings Changes (`gui/settings_dialog_qt.py`)
|
||
|
|
|
||
|
|
Add a new "Transcription Mode" section:
|
||
|
|
|
||
|
|
```
|
||
|
|
Transcription Mode:
|
||
|
|
○ Local (Whisper) [existing behavior]
|
||
|
|
○ Remote - Managed [requires login]
|
||
|
|
○ Remote - BYOK [requires Deepgram API key]
|
||
|
|
|
||
|
|
[If Managed selected]:
|
||
|
|
Server URL: [____________]
|
||
|
|
[Login / Register] [View Balance: 420 min remaining]
|
||
|
|
|
||
|
|
[If BYOK selected]:
|
||
|
|
Deepgram API Key: [____________]
|
||
|
|
Model: [nova-2 ▼]
|
||
|
|
```
|
||
|
|
|
||
|
|
### Config additions (`config/default_config.yaml`)
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
remote:
|
||
|
|
mode: local # local | managed | byok
|
||
|
|
server_url: "" # proxy server URL for managed mode
|
||
|
|
auth_token: "" # JWT stored after login
|
||
|
|
byok_api_key: "" # Deepgram key for BYOK mode
|
||
|
|
deepgram_model: nova-2
|
||
|
|
language: en-US
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Build & Deployment Notes
|
||
|
|
|
||
|
|
### Docker Compose (local dev)
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
version: '3.8'
|
||
|
|
services:
|
||
|
|
db:
|
||
|
|
image: postgres:15
|
||
|
|
environment:
|
||
|
|
POSTGRES_DB: transcription_proxy
|
||
|
|
POSTGRES_USER: user
|
||
|
|
POSTGRES_PASSWORD: password
|
||
|
|
ports:
|
||
|
|
- "5432:5432"
|
||
|
|
volumes:
|
||
|
|
- pgdata:/var/lib/postgresql/data
|
||
|
|
|
||
|
|
app:
|
||
|
|
build: .
|
||
|
|
ports:
|
||
|
|
- "3000:3000"
|
||
|
|
environment:
|
||
|
|
DATABASE_URL: postgresql://user:password@db:5432/transcription_proxy
|
||
|
|
depends_on:
|
||
|
|
- db
|
||
|
|
volumes:
|
||
|
|
- .:/app
|
||
|
|
- /app/node_modules
|
||
|
|
|
||
|
|
volumes:
|
||
|
|
pgdata:
|
||
|
|
```
|
||
|
|
|
||
|
|
### Production Deployment
|
||
|
|
|
||
|
|
This service is a good fit for deployment on AnHonestHost WHP as a containerized app,
|
||
|
|
or on a small DigitalOcean/Linode VPS. Requirements are light:
|
||
|
|
- 512MB RAM is sufficient
|
||
|
|
- Postgres can be the same instance as other services or managed (e.g., Supabase free tier)
|
||
|
|
- Needs a public domain with SSL for WebSocket (`wss://`) to work from desktop clients
|
||
|
|
|
||
|
|
Reverse proxy config (Nginx or HAProxy) should:
|
||
|
|
- Proxy HTTP → `localhost:3000`
|
||
|
|
- Pass `Upgrade` and `Connection` headers for WebSocket support
|
||
|
|
- Set `proxy_read_timeout 3600` (sessions can be long)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Implementation Order
|
||
|
|
|
||
|
|
Build and test in this sequence:
|
||
|
|
|
||
|
|
1. Project scaffold + DB connection + migrations
|
||
|
|
2. Auth (register/login/JWT) — test with curl
|
||
|
|
3. Stripe billing + webhook — test with Stripe CLI (`stripe listen`)
|
||
|
|
4. WebSocket proxy — test with a simple browser WebSocket client first
|
||
|
|
5. Usage tracking and credit decrement
|
||
|
|
6. Account/usage routes
|
||
|
|
7. Web dashboard
|
||
|
|
8. Desktop app integration (separate PR in local-transcription repo)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Key Decisions & Rationale
|
||
|
|
|
||
|
|
| Decision | Choice | Reason |
|
||
|
|
|---|---|---|
|
||
|
|
| Credits model | Prepaid | No surprise charges, simpler billing, better for irregular streamer usage |
|
||
|
|
| WebSocket library | `ws` | Lightweight, no abstraction overhead, plays well with raw binary audio |
|
||
|
|
| Auth | JWT (stateless) | Desktop app holds token locally; no session store needed |
|
||
|
|
| DB driver | `node-postgres` (pg) | No ORM overhead; schema is simple enough for raw SQL |
|
||
|
|
| Migrations | Raw SQL files | No dependency on Knex/Prisma; easy to inspect and reason about |
|
||
|
|
| Rate limiting | In-memory | Redis is overkill for this scale; single-process Node is fine initially |
|
||
|
|
| Frontend | Vanilla JS | Dashboard is simple utility UI; no framework justified |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## What This Plan Does NOT Cover (Future Work)
|
||
|
|
|
||
|
|
- OAuth / social login
|
||
|
|
- Admin panel for managing users
|
||
|
|
- Refund / credit adjustment tooling
|
||
|
|
- Email verification
|
||
|
|
- Password reset flow
|
||
|
|
- Multi-language support beyond Deepgram's defaults
|
||
|
|
- Analytics / aggregated usage reporting
|
||
|
|
- Self-hosted Whisper inference as a third backend option
|