Commit Graph

27 Commits

Author SHA1 Message Date
bd0e84c5e7 Fix model switching crash and improve error handling
**Model Reload Fixes:**
- Properly disconnect signals before reconnecting to prevent duplicate connections
- Wait for previous model loader thread to finish before starting new one
- Add garbage collection after unloading model to free memory
- Improve error handling in model reload callback

**Settings Dialog:**
- Remove duplicate success message (callback handles it)
- Only show message if no callback is defined

**Transcription Engine:**
- Explicitly delete model reference before setting to None
- Force garbage collection to ensure memory is freed

This prevents crashes when switching models, especially when done
multiple times in succession or while the app is under load.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-27 06:28:40 -08:00
146a8c8beb Enhance display customization and remove PHP server
Major improvements to display configuration and server architecture:

**Display Enhancements:**
- Add URL parameters for display customization (timestamps, maxlines, fontsize, fontfamily)
- Fix max lines enforcement to prevent scroll bars in OBS
- Apply font family and size settings to both local and sync displays
- Remove auto-scroll, enforce overflow:hidden for clean OBS integration

**Node.js Server:**
- Add timestamps toggle: timestamps=true/false
- Add max lines limit: maxlines=50
- Add font configuration: fontsize=16, fontfamily=Arial
- Update index page with URL parameters documentation
- Improve display URLs in room generation

**Local Web Server:**
- Add max_lines, font_family, font_size configuration
- Respect settings from GUI configuration
- Apply changes immediately without restart

**Architecture:**
- Remove PHP server implementation (Node.js recommended)
- Update all documentation to reference Node.js server
- Update default config URLs to Node.js endpoints
- Clean up 1700+ lines of PHP code

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-27 06:15:55 -08:00
e831dadd24 Fix app stability: graceful model switching and web server improvements
- Add comprehensive error handling to prevent crashes during model reload
- Implement automatic port fallback (8080-8084) for web server conflicts
- Configure uvicorn to work properly with PyInstaller console=False builds
- Add proper web server shutdown on app close to release ports
- Improve error reporting with full tracebacks for debugging

Fixes:
- App crashing when switching models
- Web server not starting after app crash (port conflict)
- Web server failing silently in compiled builds without console

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 17:50:37 -08:00
478146c58d Improve UX: hide console window and fade connection status
- Hide console window on compiled desktop app (console=False in spec)
- Add 20-second auto-fade to "Connected" status in OBS display
- Keep "Disconnected" status visible until reconnection
- Add PM2 deployment configuration and documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 17:04:28 -08:00
64c864b0f0 Fix multi-user server sync performance and integration
Major fixes:
- Integrated ServerSyncClient into GUI for actual multi-user sync
- Fixed CUDA device display to show actual hardware used
- Optimized server sync with parallel HTTP requests (5x faster)
- Fixed 2-second DNS delay by using 127.0.0.1 instead of localhost
- Added comprehensive debugging and performance logging

Performance improvements:
- HTTP requests: 2045ms → 52ms (97% faster)
- Multi-user sync lag: ~4s → ~100ms (97% faster)
- Parallel request processing with ThreadPoolExecutor (3 workers)

New features:
- Room generator with one-click copy on Node.js landing page
- Auto-detection of PHP vs Node.js server types
- Localhost warning banner for WSL2 users
- Comprehensive debug logging throughout sync pipeline

Files modified:
- gui/main_window_qt.py - Server sync integration, device display fix
- client/server_sync.py - Parallel HTTP, server type detection
- server/nodejs/server.js - Room generator, warnings, debug logs

Documentation added:
- PERFORMANCE_FIX.md - Server sync optimization details
- FIX_2_SECOND_HTTP_DELAY.md - DNS/localhost issue solution
- LATENCY_GUIDE.md - Audio chunk duration tuning guide
- DEBUG_4_SECOND_LAG.md - Comprehensive debugging guide
- SESSION_SUMMARY.md - Complete session summary

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 16:44:55 -08:00
c28679acb6 Update to support sync captions 2025-12-26 16:15:52 -08:00
2870d45bdc Add Apache ProxyTimeout configuration for SSE support
Created apache-sse-config.conf with required Apache settings to support
long-running SSE connections. Apache's mod_proxy_fcgi has a default
timeout of 30-60 seconds which kills SSE connections prematurely.

The configuration sets ProxyTimeout to 21600 seconds (6 hours) to match
HAProxy's timeout and allow long streaming sessions.

Added note to .htaccess explaining this requirement, as ProxyTimeout
cannot be set in .htaccess and must be configured in the virtual host.

To fix 504 Gateway Timeout errors:
1. Add ProxyTimeout directive to Apache virtual host config
2. Reload Apache
3. Test SSE connection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 15:29:56 -08:00
9feb17b734 Fix SSE connection closing prematurely for non-existent rooms
The server was calling exit() immediately when a room didn't exist,
which caused the SSE connection to open and then close right away.
This triggered EventSource to reconnect in a loop.

Now the server keeps the connection open and sends keepalives even
for rooms that don't exist yet. This is the correct SSE behavior -
maintain the connection and stream data when it becomes available.

Fixes the "connection established then immediately errors" issue
seen in diagnostic tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 14:03:06 -08:00
50deaeae96 Add comprehensive console debugging to SSE diagnostic test
Enhanced JavaScript debugging with:
- Detailed connection state logging (CONNECTING/OPEN/CLOSED)
- EventSource object inspection
- Real-time readyState monitoring
- Verbose error information (type, target, readyState)
- URL and location details for troubleshooting
- Console grouping for organized output

This will help diagnose SSE connection issues by providing
detailed information in the browser console.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 13:57:28 -08:00
54910a5df3 Fix misleading keepalive comment in PHP server
The comment said keepalives were sent every 15 seconds, but the code
uses sleep(1), so they're actually sent every 1 second. This is correct
for SSE connections - frequent keepalives prevent proxy timeouts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 13:12:09 -08:00
af1c1231b6 Fix PHP-FPM compatibility and add server diagnostics
Changes to .htaccess:
- Removed php_flag and php_value directives (don't work with php-fpm)
- Simplified DirectoryMatch to FilesMatch for .json files
- Added note about configuring PHP settings in php.ini/pool config
- More compatible with user directories

Added diagnostic.php:
- Tests PHP version, extensions, and configuration
- Checks storage directory permissions
- Tests Server-Sent Events (SSE) connection
- Shows server API type (php-fpm vs mod_php)
- Provides troubleshooting hints for common issues
- Live SSE connection test with detailed logging

Added data/index.php:
- Blocks direct access to data directory
- Returns 403 Forbidden

Fixes:
- php-fpm environments not respecting .htaccess PHP settings
- DirectoryMatch issues in user directories
- Hard to diagnose connection problems

Usage: Navigate to diagnostic.php to troubleshoot server issues
2025-12-26 12:45:23 -08:00
1acdb065c5 Fix uv index: Use explicit=true for PyTorch index
- Added explicit=true to pytorch-cu121 index
- Only torch, torchvision, torchaudio use PyTorch index
- All other packages (requests, fastapi, etc.) use PyPI
- Fixes: requests version conflict (PyTorch index has 2.28.1, we need >=2.31.0)

How explicit=true works:
- PyTorch index only checked for packages listed in tool.uv.sources
- Prevents dependency confusion and version conflicts
- Best practice for supplemental package indexes
2025-12-26 12:16:08 -08:00
a5556c475d Fix uv index configuration: Use PyTorch CUDA as additional index
- Changed from 'default' to named additional index
- Added tool.uv.sources to specify torch comes from pytorch-cu121 index
- Other packages (fastapi, uvicorn, etc.) still come from PyPI
- Fixes: 'fastapi was not found in the package registry' error

How it works:
- PyPI remains the default index for most packages
- torch package explicitly uses pytorch-cu121 index
- Best of both worlds: CUDA PyTorch + all other packages from PyPI
2025-12-26 12:13:40 -08:00
0bcd8e8d21 Configure uv to always use PyTorch CUDA index
Changes:
- Set PyTorch CUDA index (cu121) as default for all builds
- CUDA builds support both GPU and CPU (auto-fallback)
- Fixes uv run reinstalling CPU-only PyTorch
- Updated dependency-groups syntax (fixes deprecation warning)

Benefits:
- Simpler build process - no CPU vs CUDA distinction needed
- uv sync and uv run now get CUDA-enabled PyTorch automatically
- Builds work on systems with or without NVIDIA GPUs
- Fixes issue where uv run check_cuda.py was getting CPU version

Index: https://download.pytorch.org/whl/cu121 (PyTorch 2.5.1+cu121)
2025-12-26 12:08:42 -08:00
8604662262 Add CUDA diagnostic script for troubleshooting GPU detection
- Checks PyTorch installation and version
- Detects CUDA availability and GPU info
- Tests CUDA with simple tensor operation
- Shows device manager detection results
- Provides troubleshooting hints for CPU-only builds

Usage: python check_cuda.py or uv run check_cuda.py
2025-12-26 12:00:37 -08:00
d51b24e2e5 Move FastAPI and uvicorn to main dependencies
- Web server is always-running (not optional) for OBS integration
- Users no longer need to manually install fastapi and uvicorn
- Previously required: uv pip install "fastapi[standard]" uvicorn
- Now auto-installed with: uv sync

Fixes: Missing FastAPI/uvicorn dependencies on fresh Windows installs
2025-12-26 11:57:50 -08:00
e0c8241607 Fix CUDA build scripts: Remove unsupported -y flag from uv pip uninstall
- uv pip uninstall doesn't support the -y flag (auto-confirm)
- uv uninstalls without confirmation by default
- Suppressed error output if torch not installed (2>/dev/null on Linux, 2>nul on Windows)
- Added || true on Linux to prevent script exit if torch not found

Fixes: error: unexpected argument '-y' found in CUDA build scripts
2025-12-26 11:43:46 -08:00
6ec350af69 Fix Windows FastAPI import: Replace collect_all with collect_submodules
Research findings:
- collect_all() has design flaws and poor performance with pydantic
- Pydantic uses compiled cpython extensions that prevent module discovery
- collect_submodules() is the recommended approach per PyInstaller docs

Changes:
- Replaced collect_all() with collect_submodules() for better reliability
- Now collects 105 pydantic submodules (vs unreliable collect_all)
- Added collect_data_files() for packages requiring data files
- Added explicit pydantic dependencies: colorsys, decimal, json, etc.
- Applies to both Windows AND Linux (no longer platform-specific)

Results:
✓ Collected 52 submodules from fastapi
✓ Collected 34 submodules from starlette
✓ Collected 105 submodules from pydantic
✓ Collected 3 submodules from pydantic_core
✓ Plus uvicorn, websockets, h11, anyio

Fixes: ModuleNotFoundError: No module named 'fastapi' on Windows
Based on: https://github.com/pyinstaller/pyinstaller/issues/5359
2025-12-26 11:30:29 -08:00
926910177d Fix Windows build: Use collect_all for FastAPI packages
- On Windows, PyInstaller wasn't properly bundling FastAPI dependencies
- Added platform-specific collection using PyInstaller.utils.hooks.collect_all
- Only applies aggressive collection on Windows to keep Linux builds stable
- Collects all submodules and data files for: fastapi, starlette, pydantic,
  pydantic_core, anyio, uvicorn, websockets, h11
- Linux builds remain unchanged and continue to work as before

Fixes: ModuleNotFoundError: No module named 'fastapi' on Windows executable
2025-12-26 11:01:43 -08:00
0ee3f1003e Fix Windows build: Add FastAPI and dependencies to hiddenimports
Fixed PyInstaller build error on Windows:
"ModuleNotFoundError: No module named 'fastapi'"

Added to hiddenimports:
- FastAPI and its core modules
- Starlette (FastAPI framework base)
- Pydantic (data validation)
- anyio, sniffio (async libraries)
- h11, websockets (protocol implementations)
- requests and dependencies (for server sync client)

This ensures all web server dependencies are bundled in the executable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 10:34:11 -08:00
2c341e8cea Add index page with URL generator and remove passphrase from display
Created a beautiful landing page with random room/passphrase generation
and updated security model for read-only access.

New Files:
- server/php/index.html: Landing page with URL generator

Features:
- Random room name generation (e.g., "swift-phoenix-1234")
- Random passphrase generation (16 chars, URL-safe)
- Copy-to-clipboard functionality
- Responsive design with gradient header
- Step-by-step usage instructions
- FAQ section

Security Model Changes:
- WRITE (send transcriptions): Requires room + passphrase
- READ (view display): Only requires room name

Updated Files:
- server.php:
  * handleStream(): Passphrase optional (read-only)
  * handleList(): Passphrase optional (read-only)
  * Added roomExists() helper function

- display.php:
  * Removed passphrase from URL parameters
  * Removed passphrase from SSE connection
  * Removed passphrase from list endpoint

Benefits:
- Display URL is safer (no passphrase in OBS browser source)
- Simpler setup (only room name needed for viewing)
- Better security model (write-protected, read-open)
- Anyone with room name can watch, only authorized can send

Example URLs:
- Client: server.php (with room + passphrase in app settings)
- Display: display.php?room=swift-phoenix-1234&fade=10&timestamps=true

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 10:18:40 -08:00
fec44f9757 Support unlimited users with dynamic color generation
Replaced fixed 6-color palette with dynamic HSL color generation
using the golden ratio for optimal color distribution.

Changes:
- Removed static CSS color classes (user-0 through user-5)
- Added getUserColor() function using golden ratio
- Colors generated as HSL(hue, 85%, 65%)
- Each user gets a unique, visually distinct color
- Supports unlimited users (20+)

How it works:
- Golden ratio (φ ≈ 0.618) distributes hues evenly across color wheel
- User 1: hue 0°
- User 2: hue 222° (0.618 * 360)
- User 3: hue 85° ((0.618 * 2 * 360) % 360)
- etc.

Benefits:
- No color repetition for any number of users
- Maximum visual distinction between consecutive users
- Consistent brightness/saturation for readability
- Colors are vibrant and stand out on dark backgrounds

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 10:11:13 -08:00
9c3a0d7678 Add multi-user server sync (PHP server + client)
Phase 2 implementation: Multiple streamers can now merge their captions
into a single stream using a PHP server.

PHP Server (server/php/):
- server.php: API endpoint for sending/streaming transcriptions
- display.php: Web page for viewing merged captions in OBS
- config.php: Server configuration
- .htaccess: Security settings
- README.md: Comprehensive deployment guide

Features:
- Room-based isolation (multiple groups on same server)
- Passphrase authentication per room
- Real-time streaming via Server-Sent Events (SSE)
- Different colors for each user
- File-based storage (no database required)
- Auto-cleanup of old rooms
- Works on standard PHP hosting

Client-Side:
- client/server_sync.py: HTTP client for sending to PHP server
- Settings dialog updated with server sync options
- Config updated with server_sync section

Server Configuration:
- URL: Server endpoint (e.g., http://example.com/transcription/server.php)
- Room: Unique room name for your group
- Passphrase: Shared secret for authentication

OBS Integration:
Display URL format:
http://example.com/transcription/display.php?room=ROOM&passphrase=PASS&fade=10&timestamps=true

NOTE: Main window integration pending (client sends transcriptions)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 10:09:12 -08:00
aa8c294fdc Add clickable web display link to main UI
Added a clickable link in the status bar that opens the web display
in the default browser. This makes it easy for users to access the
OBS browser source without manually typing the URL.

Features:
- Shows "🌐 Open Web Display" link in green
- Tooltip shows the full URL
- Opens in default browser when clicked
- Reads host/port from config automatically

Location: Status bar, after user name
URL format: http://127.0.0.1:8080 (or configured host:port)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 08:58:24 -08:00
0ba84e6ddd Improve transcription accuracy with overlapping audio chunks
Changes:
1. Changed UI text from "Recording" to "Transcribing" for clarity
2. Implemented overlapping audio chunks to prevent word cutoff

Audio Overlap Feature:
- Added overlap_duration parameter (default: 0.5 seconds)
- Audio chunks now overlap by 0.5s to capture words at boundaries
- Prevents missed words when chunks are processed separately
- Configurable via audio.overlap_duration in config.yaml

How it works:
- Each 3-second chunk includes 0.5s from the previous chunk
- Buffer advances by (chunk_size - overlap_size) instead of full chunk
- Ensures words at chunk boundaries are captured in at least one chunk
- No duplicate transcription due to Whisper's context handling

Example with 3s chunks and 0.5s overlap:
  Chunk 1: [0.0s - 3.0s]
  Chunk 2: [2.5s - 5.5s]  <- 0.5s overlap
  Chunk 3: [5.0s - 8.0s]  <- 0.5s overlap

Files modified:
- client/audio_capture.py: Implemented overlapping buffer logic
- config/default_config.yaml: Added overlap_duration setting
- gui/main_window_qt.py: Updated UI text, passed overlap param
- main_cli.py: Passed overlap param

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 08:47:19 -08:00
003c27c8d5 Fix: Bundle Silero VAD model with PyInstaller
Fixed PyInstaller build error where the Voice Activity Detection (VAD)
model was missing from the compiled executable.

Changes:
- Added faster_whisper/assets folder to PyInstaller datas
- Includes silero_vad_v6.onnx (1.2MB) in the build
- Resolves ONNXRuntimeError on transcription start

Error fixed:
[ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from
.../faster_whisper/assets/silero_vad_v6.onnx failed: File doesn't exist

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 08:26:58 -08:00
472233aec4 Initial commit: Local Transcription App v1.0
Phase 1 Complete - Standalone Desktop Application

Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)

Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML

Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)

Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-25 18:48:23 -08:00