Commit Graph

4 Commits

Author SHA1 Message Date
64c864b0f0 Fix multi-user server sync performance and integration
Major fixes:
- Integrated ServerSyncClient into GUI for actual multi-user sync
- Fixed CUDA device display to show actual hardware used
- Optimized server sync with parallel HTTP requests (5x faster)
- Fixed 2-second DNS delay by using 127.0.0.1 instead of localhost
- Added comprehensive debugging and performance logging

Performance improvements:
- HTTP requests: 2045ms → 52ms (97% faster)
- Multi-user sync lag: ~4s → ~100ms (97% faster)
- Parallel request processing with ThreadPoolExecutor (3 workers)

New features:
- Room generator with one-click copy on Node.js landing page
- Auto-detection of PHP vs Node.js server types
- Localhost warning banner for WSL2 users
- Comprehensive debug logging throughout sync pipeline

Files modified:
- gui/main_window_qt.py - Server sync integration, device display fix
- client/server_sync.py - Parallel HTTP, server type detection
- server/nodejs/server.js - Room generator, warnings, debug logs

Documentation added:
- PERFORMANCE_FIX.md - Server sync optimization details
- FIX_2_SECOND_HTTP_DELAY.md - DNS/localhost issue solution
- LATENCY_GUIDE.md - Audio chunk duration tuning guide
- DEBUG_4_SECOND_LAG.md - Comprehensive debugging guide
- SESSION_SUMMARY.md - Complete session summary

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 16:44:55 -08:00
9c3a0d7678 Add multi-user server sync (PHP server + client)
Phase 2 implementation: Multiple streamers can now merge their captions
into a single stream using a PHP server.

PHP Server (server/php/):
- server.php: API endpoint for sending/streaming transcriptions
- display.php: Web page for viewing merged captions in OBS
- config.php: Server configuration
- .htaccess: Security settings
- README.md: Comprehensive deployment guide

Features:
- Room-based isolation (multiple groups on same server)
- Passphrase authentication per room
- Real-time streaming via Server-Sent Events (SSE)
- Different colors for each user
- File-based storage (no database required)
- Auto-cleanup of old rooms
- Works on standard PHP hosting

Client-Side:
- client/server_sync.py: HTTP client for sending to PHP server
- Settings dialog updated with server sync options
- Config updated with server_sync section

Server Configuration:
- URL: Server endpoint (e.g., http://example.com/transcription/server.php)
- Room: Unique room name for your group
- Passphrase: Shared secret for authentication

OBS Integration:
Display URL format:
http://example.com/transcription/display.php?room=ROOM&passphrase=PASS&fade=10&timestamps=true

NOTE: Main window integration pending (client sends transcriptions)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 10:09:12 -08:00
0ba84e6ddd Improve transcription accuracy with overlapping audio chunks
Changes:
1. Changed UI text from "Recording" to "Transcribing" for clarity
2. Implemented overlapping audio chunks to prevent word cutoff

Audio Overlap Feature:
- Added overlap_duration parameter (default: 0.5 seconds)
- Audio chunks now overlap by 0.5s to capture words at boundaries
- Prevents missed words when chunks are processed separately
- Configurable via audio.overlap_duration in config.yaml

How it works:
- Each 3-second chunk includes 0.5s from the previous chunk
- Buffer advances by (chunk_size - overlap_size) instead of full chunk
- Ensures words at chunk boundaries are captured in at least one chunk
- No duplicate transcription due to Whisper's context handling

Example with 3s chunks and 0.5s overlap:
  Chunk 1: [0.0s - 3.0s]
  Chunk 2: [2.5s - 5.5s]  <- 0.5s overlap
  Chunk 3: [5.0s - 8.0s]  <- 0.5s overlap

Files modified:
- client/audio_capture.py: Implemented overlapping buffer logic
- config/default_config.yaml: Added overlap_duration setting
- gui/main_window_qt.py: Updated UI text, passed overlap param
- main_cli.py: Passed overlap param

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 08:47:19 -08:00
472233aec4 Initial commit: Local Transcription App v1.0
Phase 1 Complete - Standalone Desktop Application

Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)

Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML

Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)

Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-25 18:48:23 -08:00