Changes:
- Set PyTorch CUDA index (cu121) as default for all builds
- CUDA builds support both GPU and CPU (auto-fallback)
- Fixes uv run reinstalling CPU-only PyTorch
- Updated dependency-groups syntax (fixes deprecation warning)
Benefits:
- Simpler build process - no CPU vs CUDA distinction needed
- uv sync and uv run now get CUDA-enabled PyTorch automatically
- Builds work on systems with or without NVIDIA GPUs
- Fixes issue where uv run check_cuda.py was getting CPU version
Index: https://download.pytorch.org/whl/cu121 (PyTorch 2.5.1+cu121)
- Checks PyTorch installation and version
- Detects CUDA availability and GPU info
- Tests CUDA with simple tensor operation
- Shows device manager detection results
- Provides troubleshooting hints for CPU-only builds
Usage: python check_cuda.py or uv run check_cuda.py
- Web server is always-running (not optional) for OBS integration
- Users no longer need to manually install fastapi and uvicorn
- Previously required: uv pip install "fastapi[standard]" uvicorn
- Now auto-installed with: uv sync
Fixes: Missing FastAPI/uvicorn dependencies on fresh Windows installs
- uv pip uninstall doesn't support the -y flag (auto-confirm)
- uv uninstalls without confirmation by default
- Suppressed error output if torch not installed (2>/dev/null on Linux, 2>nul on Windows)
- Added || true on Linux to prevent script exit if torch not found
Fixes: error: unexpected argument '-y' found in CUDA build scripts
Research findings:
- collect_all() has design flaws and poor performance with pydantic
- Pydantic uses compiled cpython extensions that prevent module discovery
- collect_submodules() is the recommended approach per PyInstaller docs
Changes:
- Replaced collect_all() with collect_submodules() for better reliability
- Now collects 105 pydantic submodules (vs unreliable collect_all)
- Added collect_data_files() for packages requiring data files
- Added explicit pydantic dependencies: colorsys, decimal, json, etc.
- Applies to both Windows AND Linux (no longer platform-specific)
Results:
✓ Collected 52 submodules from fastapi
✓ Collected 34 submodules from starlette
✓ Collected 105 submodules from pydantic
✓ Collected 3 submodules from pydantic_core
✓ Plus uvicorn, websockets, h11, anyio
Fixes: ModuleNotFoundError: No module named 'fastapi' on Windows
Based on: https://github.com/pyinstaller/pyinstaller/issues/5359
- On Windows, PyInstaller wasn't properly bundling FastAPI dependencies
- Added platform-specific collection using PyInstaller.utils.hooks.collect_all
- Only applies aggressive collection on Windows to keep Linux builds stable
- Collects all submodules and data files for: fastapi, starlette, pydantic,
pydantic_core, anyio, uvicorn, websockets, h11
- Linux builds remain unchanged and continue to work as before
Fixes: ModuleNotFoundError: No module named 'fastapi' on Windows executable
Fixed PyInstaller build error on Windows:
"ModuleNotFoundError: No module named 'fastapi'"
Added to hiddenimports:
- FastAPI and its core modules
- Starlette (FastAPI framework base)
- Pydantic (data validation)
- anyio, sniffio (async libraries)
- h11, websockets (protocol implementations)
- requests and dependencies (for server sync client)
This ensures all web server dependencies are bundled in the executable.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Created a beautiful landing page with random room/passphrase generation
and updated security model for read-only access.
New Files:
- server/php/index.html: Landing page with URL generator
Features:
- Random room name generation (e.g., "swift-phoenix-1234")
- Random passphrase generation (16 chars, URL-safe)
- Copy-to-clipboard functionality
- Responsive design with gradient header
- Step-by-step usage instructions
- FAQ section
Security Model Changes:
- WRITE (send transcriptions): Requires room + passphrase
- READ (view display): Only requires room name
Updated Files:
- server.php:
* handleStream(): Passphrase optional (read-only)
* handleList(): Passphrase optional (read-only)
* Added roomExists() helper function
- display.php:
* Removed passphrase from URL parameters
* Removed passphrase from SSE connection
* Removed passphrase from list endpoint
Benefits:
- Display URL is safer (no passphrase in OBS browser source)
- Simpler setup (only room name needed for viewing)
- Better security model (write-protected, read-open)
- Anyone with room name can watch, only authorized can send
Example URLs:
- Client: server.php (with room + passphrase in app settings)
- Display: display.php?room=swift-phoenix-1234&fade=10×tamps=true
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Replaced fixed 6-color palette with dynamic HSL color generation
using the golden ratio for optimal color distribution.
Changes:
- Removed static CSS color classes (user-0 through user-5)
- Added getUserColor() function using golden ratio
- Colors generated as HSL(hue, 85%, 65%)
- Each user gets a unique, visually distinct color
- Supports unlimited users (20+)
How it works:
- Golden ratio (φ ≈ 0.618) distributes hues evenly across color wheel
- User 1: hue 0°
- User 2: hue 222° (0.618 * 360)
- User 3: hue 85° ((0.618 * 2 * 360) % 360)
- etc.
Benefits:
- No color repetition for any number of users
- Maximum visual distinction between consecutive users
- Consistent brightness/saturation for readability
- Colors are vibrant and stand out on dark backgrounds
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Phase 2 implementation: Multiple streamers can now merge their captions
into a single stream using a PHP server.
PHP Server (server/php/):
- server.php: API endpoint for sending/streaming transcriptions
- display.php: Web page for viewing merged captions in OBS
- config.php: Server configuration
- .htaccess: Security settings
- README.md: Comprehensive deployment guide
Features:
- Room-based isolation (multiple groups on same server)
- Passphrase authentication per room
- Real-time streaming via Server-Sent Events (SSE)
- Different colors for each user
- File-based storage (no database required)
- Auto-cleanup of old rooms
- Works on standard PHP hosting
Client-Side:
- client/server_sync.py: HTTP client for sending to PHP server
- Settings dialog updated with server sync options
- Config updated with server_sync section
Server Configuration:
- URL: Server endpoint (e.g., http://example.com/transcription/server.php)
- Room: Unique room name for your group
- Passphrase: Shared secret for authentication
OBS Integration:
Display URL format:
http://example.com/transcription/display.php?room=ROOM&passphrase=PASS&fade=10×tamps=true
NOTE: Main window integration pending (client sends transcriptions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added a clickable link in the status bar that opens the web display
in the default browser. This makes it easy for users to access the
OBS browser source without manually typing the URL.
Features:
- Shows "🌐 Open Web Display" link in green
- Tooltip shows the full URL
- Opens in default browser when clicked
- Reads host/port from config automatically
Location: Status bar, after user name
URL format: http://127.0.0.1:8080 (or configured host:port)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changes:
1. Changed UI text from "Recording" to "Transcribing" for clarity
2. Implemented overlapping audio chunks to prevent word cutoff
Audio Overlap Feature:
- Added overlap_duration parameter (default: 0.5 seconds)
- Audio chunks now overlap by 0.5s to capture words at boundaries
- Prevents missed words when chunks are processed separately
- Configurable via audio.overlap_duration in config.yaml
How it works:
- Each 3-second chunk includes 0.5s from the previous chunk
- Buffer advances by (chunk_size - overlap_size) instead of full chunk
- Ensures words at chunk boundaries are captured in at least one chunk
- No duplicate transcription due to Whisper's context handling
Example with 3s chunks and 0.5s overlap:
Chunk 1: [0.0s - 3.0s]
Chunk 2: [2.5s - 5.5s] <- 0.5s overlap
Chunk 3: [5.0s - 8.0s] <- 0.5s overlap
Files modified:
- client/audio_capture.py: Implemented overlapping buffer logic
- config/default_config.yaml: Added overlap_duration setting
- gui/main_window_qt.py: Updated UI text, passed overlap param
- main_cli.py: Passed overlap param
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed PyInstaller build error where the Voice Activity Detection (VAD)
model was missing from the compiled executable.
Changes:
- Added faster_whisper/assets folder to PyInstaller datas
- Includes silero_vad_v6.onnx (1.2MB) in the build
- Resolves ONNXRuntimeError on transcription start
Error fixed:
[ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from
.../faster_whisper/assets/silero_vad_v6.onnx failed: File doesn't exist
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Phase 1 Complete - Standalone Desktop Application
Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)
Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML
Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)
Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>