local-transcription

Archived

Author	SHA1	Message	Date
shadowdao	64c864b0f0	Fix multi-user server sync performance and integration Major fixes: - Integrated ServerSyncClient into GUI for actual multi-user sync - Fixed CUDA device display to show actual hardware used - Optimized server sync with parallel HTTP requests (5x faster) - Fixed 2-second DNS delay by using 127.0.0.1 instead of localhost - Added comprehensive debugging and performance logging Performance improvements: - HTTP requests: 2045ms → 52ms (97% faster) - Multi-user sync lag: ~4s → ~100ms (97% faster) - Parallel request processing with ThreadPoolExecutor (3 workers) New features: - Room generator with one-click copy on Node.js landing page - Auto-detection of PHP vs Node.js server types - Localhost warning banner for WSL2 users - Comprehensive debug logging throughout sync pipeline Files modified: - gui/main_window_qt.py - Server sync integration, device display fix - client/server_sync.py - Parallel HTTP, server type detection - server/nodejs/server.js - Room generator, warnings, debug logs Documentation added: - PERFORMANCE_FIX.md - Server sync optimization details - FIX_2_SECOND_HTTP_DELAY.md - DNS/localhost issue solution - LATENCY_GUIDE.md - Audio chunk duration tuning guide - DEBUG_4_SECOND_LAG.md - Comprehensive debugging guide - SESSION_SUMMARY.md - Complete session summary 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 16:44:55 -08:00
shadowdao	9c3a0d7678	Add multi-user server sync (PHP server + client) Phase 2 implementation: Multiple streamers can now merge their captions into a single stream using a PHP server. PHP Server (server/php/): - server.php: API endpoint for sending/streaming transcriptions - display.php: Web page for viewing merged captions in OBS - config.php: Server configuration - .htaccess: Security settings - README.md: Comprehensive deployment guide Features: - Room-based isolation (multiple groups on same server) - Passphrase authentication per room - Real-time streaming via Server-Sent Events (SSE) - Different colors for each user - File-based storage (no database required) - Auto-cleanup of old rooms - Works on standard PHP hosting Client-Side: - client/server_sync.py: HTTP client for sending to PHP server - Settings dialog updated with server sync options - Config updated with server_sync section Server Configuration: - URL: Server endpoint (e.g., http://example.com/transcription/server.php) - Room: Unique room name for your group - Passphrase: Shared secret for authentication OBS Integration: Display URL format: http://example.com/transcription/display.php?room=ROOM&passphrase=PASS&fade=10&timestamps=true NOTE: Main window integration pending (client sends transcriptions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 10:09:12 -08:00
shadowdao	0ba84e6ddd	Improve transcription accuracy with overlapping audio chunks Changes: 1. Changed UI text from "Recording" to "Transcribing" for clarity 2. Implemented overlapping audio chunks to prevent word cutoff Audio Overlap Feature: - Added overlap_duration parameter (default: 0.5 seconds) - Audio chunks now overlap by 0.5s to capture words at boundaries - Prevents missed words when chunks are processed separately - Configurable via audio.overlap_duration in config.yaml How it works: - Each 3-second chunk includes 0.5s from the previous chunk - Buffer advances by (chunk_size - overlap_size) instead of full chunk - Ensures words at chunk boundaries are captured in at least one chunk - No duplicate transcription due to Whisper's context handling Example with 3s chunks and 0.5s overlap: Chunk 1: [0.0s - 3.0s] Chunk 2: [2.5s - 5.5s] <- 0.5s overlap Chunk 3: [5.0s - 8.0s] <- 0.5s overlap Files modified: - client/audio_capture.py: Implemented overlapping buffer logic - config/default_config.yaml: Added overlap_duration setting - gui/main_window_qt.py: Updated UI text, passed overlap param - main_cli.py: Passed overlap param 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 08:47:19 -08:00
shadowdao	472233aec4	Initial commit: Local Transcription App v1.0 Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-25 18:48:23 -08:00

4 Commits