Initial commit: Local Transcription App v1.0

Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-25 18:48:23 -08:00
commit 472233aec4
31 changed files with 5116 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,56 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Virtual environments
+venv/
+env/
+ENV/
+.venv/
+.venv
+
+# uv
+uv.lock
+.python-version
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Application specific
+*.log
+config/*.yaml
+!config/default_config.yaml
+.local-transcription/
+
+# Model cache
+models/
+.cache/
+
+# PyInstaller
+*.spec.lock
--- a/BUILD.md
+++ b/BUILD.md
@@ -0,0 +1,259 @@
+# Building Local Transcription
+
+This guide explains how to build standalone executables for Linux and Windows.
+
+## Prerequisites
+
+1. **Python 3.8+** installed on your system
+2. **uv** package manager (install from https://docs.astral.sh/uv/)
+3. All project dependencies installed (`uv sync`)
+
+## Building for Linux
+
+### Standard Build (CPU-only):
+
+```bash
+# Make the build script executable (first time only)
+chmod +x build.sh
+
+# Run the build script
+./build.sh
+```
+
+### CUDA Build (GPU Support):
+
+Build with CUDA support even without NVIDIA hardware:
+
+```bash
+# Make the build script executable (first time only)
+chmod +x build-cuda.sh
+
+# Run the CUDA build script
+./build-cuda.sh
+```
+
+This will:
+- Install PyTorch with CUDA 12.1 support
+- Bundle CUDA runtime libraries (~600MB extra)
+- Create an executable that works on both GPU and CPU systems
+- Automatically fall back to CPU if no CUDA GPU is available
+
+The executable will be created in `dist/LocalTranscription/LocalTranscription`
+
+### Manual build:
+```bash
+# Clean previous builds
+rm -rf build dist
+
+# Build with PyInstaller
+uv run pyinstaller local-transcription.spec
+```
+
+### Distribution:
+```bash
+cd dist
+tar -czf LocalTranscription-Linux.tar.gz LocalTranscription/
+```
+
+## Building for Windows
+
+### Standard Build (CPU-only):
+
+```cmd
+# Run the build script
+build.bat
+```
+
+### CUDA Build (GPU Support):
+
+Build with CUDA support even without NVIDIA hardware:
+
+```cmd
+# Run the CUDA build script
+build-cuda.bat
+```
+
+This will:
+- Install PyTorch with CUDA 12.1 support
+- Bundle CUDA runtime libraries (~600MB extra)
+- Create an executable that works on both GPU and CPU systems
+- Automatically fall back to CPU if no CUDA GPU is available
+
+The executable will be created in `dist\LocalTranscription\LocalTranscription.exe`
+
+### Manual build:
+```cmd
+# Clean previous builds
+rmdir /s /q build
+rmdir /s /q dist
+
+# Build with PyInstaller
+uv run pyinstaller local-transcription.spec
+```
+
+### Distribution:
+- Compress the `dist\LocalTranscription` folder to a ZIP file
+- Or use an installer creator like NSIS or Inno Setup
+
+## Important Notes
+
+### Cross-Platform Building
+
+**You cannot cross-compile!**
+- Linux executables must be built on Linux
+- Windows executables must be built on Windows
+- Mac executables must be built on macOS
+
+### First Run
+
+On the first run, the application will:
+1. Create a config directory at `~/.local-transcription/` (Linux) or `%USERPROFILE%\.local-transcription\` (Windows)
+2. Download the Whisper model (if not already present)
+3. The model will be cached in `~/.cache/huggingface/` by default
+
+### Executable Size
+
+The built executable will be large (300MB - 2GB+) because it includes:
+- Python runtime
+- PySide6 (Qt framework)
+- PyTorch/faster-whisper
+- NumPy, SciPy, and other dependencies
+
+### Console Window
+
+By default, the console window is visible (for debugging). To hide it:
+
+1. Edit `local-transcription.spec`
+2. Change `console=True` to `console=False` in the `EXE` section
+3. Rebuild
+
+### GPU Support
+
+#### Building with CUDA (Recommended for Distribution)
+
+**Yes, you CAN build with CUDA support on systems without NVIDIA GPUs!**
+
+PyTorch provides CUDA-enabled builds that bundle the CUDA runtime libraries. This means:
+
+1. **You don't need NVIDIA hardware** to create CUDA-enabled builds
+2. **The executable will work everywhere** - on systems with or without NVIDIA GPUs
+3. **Automatic fallback** - the app detects available hardware and uses GPU if available, CPU otherwise
+4. **Larger file size** - adds ~600MB-1GB to the executable size
+
+**How it works:**
+```bash
+# Linux
+./build-cuda.sh
+
+# Windows
+build-cuda.bat
+```
+
+The build script will:
+- Install PyTorch with bundled CUDA 12.1 runtime
+- Package all CUDA libraries into the executable
+- Create a universal build that runs on any system
+
+**When users run the executable:**
+- If they have an NVIDIA GPU with drivers: Uses GPU acceleration
+- If they don't have NVIDIA GPU: Automatically uses CPU
+- No configuration needed - it just works!
+
+#### Alternative: CPU-Only Builds
+
+If you only want CPU support (smaller file size):
+```bash
+# Linux
+./build.sh
+
+# Windows
+build.bat
+```
+
+#### AMD GPU Support
+
+- **ROCm**: Requires special PyTorch builds from AMD
+- Not recommended for general distribution
+- Better to use CUDA build (works on all systems) or CPU build
+
+### Optimizations
+
+To reduce size:
+
+1. **Remove unused model sizes**: The app downloads models on-demand, so you don't need to bundle them
+2. **Use UPX compression**: Already enabled in the spec file
+3. **Exclude dev dependencies**: Only build dependencies are needed
+
+## Testing the Build
+
+After building, test the executable:
+
+### Linux:
+```bash
+cd dist/LocalTranscription
+./LocalTranscription
+```
+
+### Windows:
+```cmd
+cd dist\LocalTranscription
+LocalTranscription.exe
+```
+
+## Troubleshooting
+
+### Missing modules error
+If you get "No module named X" errors, add the module to the `hiddenimports` list in `local-transcription.spec`
+
+### DLL errors (Windows)
+Make sure Visual C++ Redistributable is installed on the target system:
+https://aka.ms/vs/17/release/vc_redist.x64.exe
+
+### Audio device errors
+The application needs access to audio devices. Ensure:
+- Microphone permissions are granted
+- Audio drivers are installed
+- PulseAudio (Linux) or Windows Audio is running
+
+### Model download fails
+Ensure internet connection on first run. Models are downloaded from:
+https://huggingface.co/guillaumekln/faster-whisper-base
+
+## Advanced: Adding an Icon
+
+1. Create or obtain an `.ico` file (Windows) or `.png` file (Linux)
+2. Edit `local-transcription.spec`
+3. Change `icon=None` to `icon='path/to/your/icon.ico'`
+4. Rebuild
+
+## Advanced: Creating an Installer
+
+### Windows (using Inno Setup):
+
+1. Install Inno Setup: https://jrsoftware.org/isinfo.php
+2. Create an `.iss` script file
+3. Build the installer
+
+### Linux (using AppImage):
+
+```bash
+# Install appimagetool
+wget https://github.com/AppImage/AppImageKit/releases/download/continuous/appimagetool-x86_64.AppImage
+chmod +x appimagetool-x86_64.AppImage
+
+# Create AppDir structure
+mkdir -p LocalTranscription.AppDir/usr/bin
+cp -r dist/LocalTranscription/* LocalTranscription.AppDir/usr/bin/
+
+# Create desktop file and icon
+# (Create .desktop file and icon as needed)
+
+# Build AppImage
+./appimagetool-x86_64.AppImage LocalTranscription.AppDir
+```
+
+## Support
+
+For build issues, check:
+1. PyInstaller documentation: https://pyinstaller.org/
+2. Project issues: https://github.com/anthropics/claude-code/issues
--- a/INSTALL.md
+++ b/INSTALL.md
@@ -0,0 +1,194 @@
+# Installation Guide
+
+## Prerequisites
+
+- **Python 3.9 or higher**
+- **uv** (Python package installer)
+- **FFmpeg** (required by faster-whisper)
+- **CUDA-capable GPU** (optional, for GPU acceleration)
+
+### Installing uv
+
+If you don't have `uv` installed:
+
+```bash
+# On macOS and Linux
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# On Windows
+powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
+
+# Or with pip
+pip install uv
+```
+
+### Installing FFmpeg
+
+#### On Ubuntu/Debian:
+```bash
+sudo apt update
+sudo apt install ffmpeg
+```
+
+#### On macOS (with Homebrew):
+```bash
+brew install ffmpeg
+```
+
+#### On Windows:
+Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH.
+
+## Installation Steps
+
+### 1. Navigate to Project Directory
+
+```bash
+cd /home/jknapp/code/local-transcription
+```
+
+### 2. Install Dependencies with uv
+
+```bash
+# uv will automatically create a virtual environment and install dependencies
+uv sync
+```
+
+This single command will:
+- Create a virtual environment (`.venv/`)
+- Install all dependencies from `pyproject.toml`
+- Lock dependencies for reproducibility
+
+**Note for CUDA users:** If you have an NVIDIA GPU, install PyTorch with CUDA support:
+
+```bash
+# For CUDA 11.8
+uv pip install torch --index-url https://download.pytorch.org/whl/cu118
+
+# For CUDA 12.1
+uv pip install torch --index-url https://download.pytorch.org/whl/cu121
+```
+
+### 3. Run the Application
+
+```bash
+# Option 1: Using uv run (automatically uses the venv)
+uv run python main.py
+
+# Option 2: Activate venv manually
+source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+python main.py
+```
+
+On first run, the application will:
+- Download the Whisper model (this may take a few minutes)
+- Create a configuration file at `~/.local-transcription/config.yaml`
+
+## Quick Start Commands
+
+```bash
+# Install everything
+uv sync
+
+# Run the application
+uv run python main.py
+
+# Install with server dependencies (for Phase 2+)
+uv sync --extra server
+
+# Update dependencies
+uv sync --upgrade
+```
+
+## Configuration
+
+Settings can be changed through the GUI (Settings button) or by editing:
+```
+~/.local-transcription/config.yaml
+```
+
+## Troubleshooting
+
+### Audio Device Issues
+
+If no audio devices are detected:
+```bash
+uv run python -c "import sounddevice as sd; print(sd.query_devices())"
+```
+
+### GPU Not Detected
+
+Check if CUDA is available:
+```bash
+uv run python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
+```
+
+### Model Download Fails
+
+Models are downloaded to `~/.cache/huggingface/`. If download fails:
+- Check internet connection
+- Ensure sufficient disk space (~1-3 GB depending on model size)
+
+### uv Command Not Found
+
+Make sure uv is in your PATH:
+```bash
+# Add to ~/.bashrc or ~/.zshrc
+export PATH="$HOME/.cargo/bin:$PATH"
+```
+
+## Performance Tips
+
+For best real-time performance:
+
+1. **Use GPU if available** - 5-10x faster than CPU
+2. **Start with smaller models**:
+   - `tiny`: Fastest, ~39M parameters, 1-2s latency
+   - `base`: Good balance, ~74M parameters, 2-3s latency
+   - `small`: Better accuracy, ~244M parameters, 3-5s latency
+3. **Enable VAD** (Voice Activity Detection) to skip silent audio
+4. **Adjust chunk duration**: Smaller = lower latency, larger = better accuracy
+
+## System Requirements
+
+### Minimum:
+- CPU: Dual-core 2GHz+
+- RAM: 4GB
+- Model: tiny or base
+
+### Recommended:
+- CPU: Quad-core 3GHz+ or GPU (NVIDIA GTX 1060+)
+- RAM: 8GB
+- Model: base or small
+
+### For Best Performance:
+- GPU: NVIDIA RTX 2060 or better
+- RAM: 16GB
+- Model: small or medium
+
+## Development
+
+### Install development dependencies:
+```bash
+uv sync --extra dev
+```
+
+### Run tests:
+```bash
+uv run pytest
+```
+
+### Format code:
+```bash
+uv run black .
+uv run ruff check .
+```
+
+## Why uv?
+
+`uv` is significantly faster than pip:
+- **10-100x faster** dependency resolution
+- **Automatic virtual environment** management
+- **Reproducible builds** with lockfile
+- **Drop-in replacement** for pip commands
+
+Learn more at [astral.sh/uv](https://astral.sh/uv)
--- a/NEXT_STEPS.md
+++ b/NEXT_STEPS.md
@@ -0,0 +1,440 @@
+# Next Steps for Local Transcription
+
+This document outlines potential future enhancements and features for the Local Transcription application.
+
+## Current Status: Phase 1 Complete ✅
+
+The application currently has:
+- ✅ Desktop GUI with PySide6
+- ✅ Real-time transcription with Whisper (faster-whisper)
+- ✅ Audio capture with automatic sample rate detection and resampling
+- ✅ Noise suppression with Voice Activity Detection (VAD)
+- ✅ Web server for OBS browser source integration
+- ✅ Configurable display settings (font, timestamps, fade duration)
+- ✅ Settings apply without restart
+- ✅ Auto-fade for web display
+- ✅ Standalone executable builds for Linux and Windows
+- ✅ CUDA support (with automatic CPU fallback)
+
+## Phase 2: Multi-User Server Architecture (Optional)
+
+If you want to enable multiple users to sync their transcriptions to a shared display:
+
+### Server Components
+
+1. **WebSocket Server**
+   - Accept connections from multiple clients
+   - Aggregate transcriptions from all connected users
+   - Broadcast to web display clients
+   - Handle user authentication/authorization
+   - Rate limiting and abuse prevention
+
+2. **Database/Storage** (Optional)
+   - Store transcription history
+   - User management
+   - Session logs for later review
+   - Consider: SQLite, PostgreSQL, or Redis
+
+3. **Web Admin Interface**
+   - Monitor connected clients
+   - View active sessions
+   - Manage users and permissions
+   - Export transcription logs
+
+### Client Updates
+
+1. **Server Sync Toggle**
+   - Enable/disable server sync in Settings
+   - Server URL configuration
+   - API key/authentication setup
+   - Connection status indicator
+
+2. **Network Handling**
+   - Auto-reconnect on connection loss
+   - Queue transcriptions when offline
+   - Sync when connection restored
+
+### Implementation Technologies
+
+- **Server Framework**: FastAPI (already used for web display)
+- **WebSocket**: Already integrated
+- **Database**: SQLAlchemy + SQLite/PostgreSQL
+- **Deployment**: Docker container for easy deployment
+
+**Estimated Effort**: 2-3 weeks for full implementation
+
+---
+
+## Phase 3: Enhanced Features
+
+### Transcription Improvements
+
+1. **Multi-Language Support**
+   - Automatic language detection
+   - Real-time language switching
+   - Translation between languages
+   - Per-user language settings
+
+2. **Speaker Diarization**
+   - Detect and label different speakers
+   - Use pyannote.audio or similar
+   - Automatically assign speaker IDs
+
+3. **Custom Vocabulary**
+   - Add gaming terms, streamer names
+   - Technical jargon support
+   - Proper noun correction
+
+4. **Punctuation & Formatting**
+   - Automatic punctuation insertion
+   - Sentence capitalization
+   - Better text formatting
+
+### Display Enhancements
+
+1. **Theme System**
+   - Light/dark themes
+   - Custom color schemes
+   - User-created themes (JSON/YAML)
+   - Per-element styling
+
+2. **Animation Options**
+   - Different fade effects
+   - Slide in/out animations
+   - Configurable transition speeds
+   - Particle effects (optional)
+
+3. **Layout Modes**
+   - Karaoke-style (word highlighting)
+   - Ticker tape (scrolling bottom)
+   - Multi-column for multiple users
+   - Picture-in-picture mode
+
+4. **Web Display Customization**
+   - CSS customization interface
+   - Live preview in settings
+   - Save/load custom styles
+   - Community theme sharing
+
+### Audio Processing
+
+1. **Advanced Noise Reduction**
+   - RNNoise integration
+   - Custom noise profiles
+   - Adaptive filtering
+   - Echo cancellation
+
+2. **Audio Effects**
+   - Equalization presets
+   - Compression/normalization
+   - Voice enhancement filters
+
+3. **Multi-Input Support**
+   - Multiple microphones simultaneously
+   - Virtual audio cable integration
+   - Audio routing/mixing
+
+---
+
+## Phase 4: Integration & Automation
+
+### OBS Integration
+
+1. **OBS Plugin** (Advanced)
+   - Native OBS plugin instead of browser source
+   - Lower resource usage
+   - Better performance
+   - Tighter integration
+
+2. **Scene Integration**
+   - Auto-show/hide based on speech
+   - Integrate with OBS scene switcher
+   - Hotkey support
+
+### Streaming Platform Integration
+
+1. **Twitch Integration**
+   - Send captions to Twitch chat
+   - Twitch API integration
+   - Custom Twitch bot
+
+2. **YouTube Integration**
+   - Live caption upload
+   - YouTube API integration
+
+3. **Discord Integration**
+   - Send transcriptions to Discord webhook
+   - Discord bot for voice chat transcription
+
+### Automation
+
+1. **Hotkey Support**
+   - Global hotkeys for start/stop
+   - Toggle display visibility
+   - Quick settings access
+
+2. **Voice Commands**
+   - "Hey Transcription, start/stop"
+   - Command detection in audio stream
+   - Configurable wake words
+
+3. **Auto-Start Options**
+   - Start with OBS
+   - Start on system boot
+   - Auto-detect streaming software
+
+---
+
+## Phase 5: Advanced Features
+
+### AI Enhancements
+
+1. **Summarization**
+   - Real-time conversation summaries
+   - Key point extraction
+   - Topic detection
+
+2. **Sentiment Analysis**
+   - Detect tone/emotion
+   - Highlight important moments
+   - Filter profanity (optional)
+
+3. **Context Awareness**
+   - Remember conversation context
+   - Better transcription accuracy
+   - Adaptive vocabulary
+
+### Analytics & Insights
+
+1. **Usage Statistics**
+   - Words per minute
+   - Speaking time per user
+   - Most common words/phrases
+   - Accuracy metrics
+
+2. **Export Options**
+   - Export to SRT/VTT for video captions
+   - PDF/Word document export
+   - CSV for data analysis
+   - JSON API for custom tools
+
+3. **Search & Filter**
+   - Search transcription history
+   - Filter by user, date, keyword
+   - Highlight search results
+
+### Accessibility
+
+1. **Screen Reader Support**
+   - Full NVDA/JAWS compatibility
+   - Keyboard navigation
+   - Voice feedback
+
+2. **High Contrast Modes**
+   - Enhanced visibility options
+   - Color blind friendly palettes
+
+3. **Text-to-Speech**
+   - Read back transcriptions
+   - Multiple voice options
+   - Speed control
+
+---
+
+## Performance Optimizations
+
+### Current Considerations
+
+1. **Model Optimization**
+   - Quantization (int8, int4)
+   - Smaller model variants
+   - TensorRT optimization (NVIDIA)
+   - ONNX Runtime support
+
+2. **Caching**
+   - Cache common phrases
+   - Model warm-up on startup
+   - Preload frequently used resources
+
+3. **Resource Management**
+   - Dynamic batch sizing
+   - Memory pooling
+   - Thread pool optimization
+
+### Future Optimizations
+
+1. **Distributed Processing**
+   - Offload to cloud GPU
+   - Share processing across multiple machines
+   - Load balancing
+
+2. **Edge Computing**
+   - Run on edge devices (Raspberry Pi)
+   - Mobile app support
+   - Embedded systems
+
+---
+
+## Community Features
+
+### Sharing & Collaboration
+
+1. **Theme Marketplace**
+   - Share custom themes
+   - Download community themes
+   - Rating system
+
+2. **Plugin System**
+   - Allow community plugins
+   - Custom audio filters
+   - Display widgets
+   - Integration modules
+
+3. **Documentation**
+   - Video tutorials
+   - Wiki/knowledge base
+   - API documentation
+   - Developer guides
+
+### User Support
+
+1. **In-App Help**
+   - Contextual help tooltips
+   - Getting started wizard
+   - Troubleshooting guide
+
+2. **Community Forum**
+   - GitHub Discussions
+   - Discord server
+   - Reddit community
+
+---
+
+## Technical Debt & Maintenance
+
+### Code Quality
+
+1. **Testing**
+   - Unit tests for core modules
+   - Integration tests
+   - End-to-end tests
+   - Performance benchmarks
+
+2. **Documentation**
+   - API documentation
+   - Code comments
+   - Architecture diagrams
+   - Developer setup guide
+
+3. **CI/CD**
+   - Automated builds
+   - Automated testing
+   - Release automation
+   - Cross-platform testing
+
+### Security
+
+1. **Security Audits**
+   - Dependency scanning
+   - Vulnerability assessment
+   - Code security review
+
+2. **Data Privacy**
+   - Local-first by default
+   - Optional cloud features
+   - GDPR compliance (if applicable)
+   - Clear privacy policy
+
+---
+
+## Immediate Quick Wins
+
+These are small enhancements that could be implemented quickly:
+
+### Easy (< 1 day)
+
+- [ ] Add application icon
+- [ ] Add "About" dialog with version info
+- [ ] Add keyboard shortcuts (Ctrl+S for settings, etc.)
+- [ ] Add system tray icon
+- [ ] Save window position/size
+- [ ] Add "Check for Updates" feature
+- [ ] Export transcriptions to text file
+
+### Medium (1-3 days)
+
+- [ ] Add profanity filter (optional)
+- [ ] Add confidence score display
+- [ ] Add audio level meter
+- [ ] Multiple language support in UI
+- [ ] Dark/light theme toggle
+- [ ] Backup/restore settings
+- [ ] Recent transcriptions history
+
+### Larger (1+ weeks)
+
+- [ ] Cloud sync for settings
+- [ ] Mobile companion app
+- [ ] Browser extension
+- [ ] API server mode
+- [ ] Plugin architecture
+- [ ] Advanced audio visualization
+
+---
+
+## Resources & References
+
+### Documentation
+- [Faster-Whisper](https://github.com/guillaumekln/faster-whisper)
+- [PySide6 Documentation](https://doc.qt.io/qtforpython/)
+- [FastAPI Documentation](https://fastapi.tiangolo.com/)
+- [PyInstaller Manual](https://pyinstaller.org/en/stable/)
+
+### Similar Projects
+- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - C++ implementation
+- [Buzz](https://github.com/chidiwilliams/buzz) - Desktop transcription tool
+- [OpenAI Whisper](https://github.com/openai/whisper) - Original implementation
+
+### Community
+- Create GitHub Discussions for feature requests
+- Set up issue templates
+- Contributing guidelines
+- Code of conduct
+
+---
+
+## Decision Log
+
+Track major architectural decisions here:
+
+### 2025-12-25: PyInstaller for Distribution
+- **Decision**: Use PyInstaller for creating standalone executables
+- **Rationale**: Good PySide6 support, active development, cross-platform
+- **Alternatives Considered**: cx_Freeze, Nuitka, py2exe
+- **Impact**: Users can run without Python installation
+
+### 2025-12-25: CUDA Build Strategy
+- **Decision**: Provide CUDA-enabled builds that bundle CUDA runtime
+- **Rationale**: Universal builds work everywhere, automatic GPU detection
+- **Trade-off**: Larger file size (~600MB extra) for better UX
+- **Impact**: Single build for both GPU and CPU users
+
+### 2025-12-25: Web Server Always Running
+- **Decision**: Remove enable/disable toggle, always run web server
+- **Rationale**: Simplifies UX, no configuration needed for OBS
+- **Impact**: Uses one local port (8080 by default), minimal overhead
+
+---
+
+## Contact & Contribution
+
+When this project is public:
+- **Issues**: Report bugs and request features on GitHub Issues
+- **Pull Requests**: Contributions welcome! See CONTRIBUTING.md
+- **Discussions**: Join GitHub Discussions for questions and ideas
+- **License**: [To be determined - consider MIT or Apache 2.0]
+
+---
+
+*Last Updated: 2025-12-25*
+*Version: 1.0.0 (Phase 1 Complete)*
--- a/README.md
+++ b/README.md
@@ -0,0 +1,494 @@
+# Local Transcription for Streamers
+
+A local speech-to-text application designed for streamers that provides real-time transcription using Whisper or similar models. Multiple users can run the application locally and sync their transcriptions to a centralized web stream that can be easily captured in OBS or other streaming software.
+
+## Features
+
+- **Standalone Desktop Application**: Use locally with built-in GUI display - no server required
+- **Local Transcription**: Run Whisper (or compatible models) locally on your machine
+- **CPU/GPU Support**: Choose between CPU or GPU processing based on your hardware
+- **Real-time Processing**: Live audio transcription with minimal latency
+- **Noise Suppression**: Built-in audio preprocessing to reduce background noise
+- **User Configuration**: Set your display name and preferences through the GUI
+- **Optional Multi-user Sync**: Connect to a server to sync transcriptions with other users
+- **OBS Integration**: Web-based output designed for easy browser source capture
+- **Privacy-First**: All processing happens locally; only transcription text is shared
+- **Customizable**: Configure model size, language, and streaming settings
+
+## Quick Start
+
+### Running from Source
+
+```bash
+# Install dependencies
+uv sync
+
+# Run the application
+uv run python main.py
+```
+
+### Building Standalone Executables
+
+To create standalone executables for distribution:
+
+**Linux:**
+```bash
+./build.sh
+```
+
+**Windows:**
+```cmd
+build.bat
+```
+
+For detailed build instructions, see [BUILD.md](BUILD.md).
+
+## Architecture Overview
+
+The application can run in two modes:
+
+### Standalone Mode (No Server Required):
+1. **Desktop Application**: Captures audio, performs speech-to-text, and displays transcriptions locally in a GUI window
+
+### Multi-user Sync Mode (Optional):
+1. **Local Transcription Client**: Captures audio, performs speech-to-text, and sends results to the web server
+2. **Centralized Web Server**: Aggregates transcriptions from multiple clients and serves a web stream
+3. **Web Stream Interface**: Browser-accessible page displaying synchronized transcriptions (for OBS capture)
+
+## Use Cases
+
+- **Multi-language Streams**: Multiple translators transcribing in different languages
+- **Accessibility**: Provide real-time captions for viewers
+- **Collaborative Podcasts**: Multiple hosts with separate transcriptions
+- **Gaming Commentary**: Track who said what in multiplayer sessions
+
+---
+
+## Implementation Plan
+
+### Phase 1: Standalone Desktop Application
+
+**Objective**: Build a fully functional standalone transcription app with GUI that works without any server
+
+#### Components:
+1. **Audio Capture Module**
+   - Capture system audio or microphone input
+   - Support multiple audio sources (virtual audio cables, physical devices)
+   - Real-time audio buffering with configurable chunk sizes
+   - **Noise Suppression**: Preprocess audio to reduce background noise
+   - Libraries: `pyaudio`, `sounddevice`, `noisereduce`, `webrtcvad`
+
+2. **Noise Suppression Engine**
+   - Real-time noise reduction using RNNoise or noisereduce
+   - Adjustable noise reduction strength
+   - Optional VAD (Voice Activity Detection) to skip silent segments
+   - Libraries: `noisereduce`, `rnnoise-python`, `webrtcvad`
+
+3. **Transcription Engine**
+   - Integrate OpenAI Whisper (or alternatives: faster-whisper, whisper.cpp)
+   - Support multiple model sizes (tiny, base, small, medium, large)
+   - CPU and GPU inference options
+   - Model management and automatic downloading
+   - Libraries: `openai-whisper`, `faster-whisper`, `torch`
+
+4. **Device Selection**
+   - Auto-detect available compute devices (CPU, CUDA, MPS for Mac)
+   - Allow user to specify preferred device via GUI
+   - Graceful fallback if GPU unavailable
+   - Display device status and performance metrics
+
+5. **Desktop GUI Application**
+   - Cross-platform GUI using PyQt6, Tkinter, or CustomTkinter
+   - Main transcription display window (scrolling text area)
+   - Settings panel for configuration
+   - User name input field
+   - Audio input device selector
+   - Model size selector
+   - CPU/GPU toggle
+   - Start/Stop transcription button
+   - Optional: System tray integration
+   - Libraries: `PyQt6`, `customtkinter`, or `tkinter`
+
+6. **Local Display**
+   - Real-time transcription display in GUI window
+   - Scrolling text with timestamps
+   - User name/label shown with transcriptions
+   - Copy transcription to clipboard
+   - Optional: Save transcription to file (TXT, SRT, VTT)
+
+#### Tasks:
+- [ ] Set up project structure and dependencies
+- [ ] Implement audio capture with device selection
+- [ ] Add noise suppression and VAD preprocessing
+- [ ] Integrate Whisper model loading and inference
+- [ ] Add CPU/GPU device detection and selection logic
+- [ ] Create real-time audio buffer processing pipeline
+- [ ] Design and implement GUI layout (main window)
+- [ ] Add settings panel with user name configuration
+- [ ] Implement local transcription display area
+- [ ] Add start/stop controls and status indicators
+- [ ] Test transcription accuracy and latency
+- [ ] Test noise suppression effectiveness
+
+---
+
+### Phase 2: Web Server and Sync System
+
+**Objective**: Create a centralized server to aggregate and serve transcriptions
+
+#### Components:
+1. **Web Server**
+   - FastAPI or Flask-based REST API
+   - WebSocket support for real-time updates
+   - User/client registration and management
+   - Libraries: `fastapi`, `uvicorn`, `websockets`
+
+2. **Transcription Aggregator**
+   - Receive transcription chunks from multiple clients
+   - Associate transcriptions with user IDs/names
+   - Timestamp management and synchronization
+   - Buffer management for smooth streaming
+
+3. **Database/Storage** (Optional)
+   - Store transcription history (SQLite for simplicity)
+   - Session management
+   - Export functionality (SRT, VTT, TXT formats)
+
+#### API Endpoints:
+- `POST /api/register` - Register a new client
+- `POST /api/transcription` - Submit transcription chunk
+- `WS /api/stream` - WebSocket for real-time transcription stream
+- `GET /stream` - Web page for OBS browser source
+
+#### Tasks:
+- [ ] Set up FastAPI server with CORS support
+- [ ] Implement WebSocket handler for real-time streaming
+- [ ] Create client registration system
+- [ ] Build transcription aggregation logic
+- [ ] Add timestamp synchronization
+- [ ] Create data models for clients and transcriptions
+
+---
+
+### Phase 3: Client-Server Communication (Optional Multi-user Mode)
+
+**Objective**: Add optional server connectivity to enable multi-user transcription sync
+
+#### Components:
+1. **HTTP/WebSocket Client**
+   - Register client with server on startup
+   - Send transcription chunks as they're generated
+   - Handle connection drops and reconnection
+   - Libraries: `requests`, `websockets`
+
+2. **Configuration System**
+   - Config file for server URL, API keys, user settings
+   - Model preferences (size, language)
+   - Audio input settings
+   - Format: YAML or JSON
+
+3. **Status Monitoring**
+   - Connection status indicator
+   - Transcription queue health
+   - Error handling and logging
+
+#### Tasks:
+- [ ] Add "Enable Server Sync" toggle to GUI
+- [ ] Add server URL configuration field in settings
+- [ ] Implement WebSocket client for sending transcriptions
+- [ ] Add configuration file support (YAML/JSON)
+- [ ] Create connection management with auto-reconnect
+- [ ] Add local logging and error handling
+- [ ] Add server connection status indicator to GUI
+- [ ] Allow app to function normally if server is unavailable
+
+---
+
+### Phase 4: Web Stream Interface (OBS Integration)
+
+**Objective**: Create a web page that displays synchronized transcriptions for OBS
+
+#### Components:
+1. **Web Frontend**
+   - HTML/CSS/JavaScript page for displaying transcriptions
+   - Responsive design with customizable styling
+   - Auto-scroll with configurable retention window
+   - Libraries: Vanilla JS or lightweight framework (Alpine.js, htmx)
+
+2. **Styling Options**
+   - Customizable fonts, colors, sizes
+   - Background transparency for OBS chroma key
+   - User name/ID display options
+   - Timestamp display (optional)
+
+3. **Display Modes**
+   - Scrolling captions (like live TV captions)
+   - Multi-user panel view (separate sections per user)
+   - Overlay mode (minimal UI for transparency)
+
+#### Tasks:
+- [ ] Create HTML template for transcription display
+- [ ] Implement WebSocket client in JavaScript
+- [ ] Add CSS styling with OBS-friendly transparency
+- [ ] Create customization controls (URL parameters or UI)
+- [ ] Test with OBS browser source
+- [ ] Add configurable retention/scroll behavior
+
+---
+
+### Phase 5: Advanced Features
+
+**Objective**: Enhance functionality and user experience
+
+#### Features:
+1. **Language Detection**
+   - Auto-detect spoken language
+   - Multi-language support in single stream
+   - Language selector in GUI
+
+2. **Speaker Diarization** (Optional)
+   - Identify different speakers
+   - Label transcriptions by speaker
+   - Useful for multi-host streams
+
+3. **Profanity Filtering**
+   - Optional word filtering/replacement
+   - Customizable filter lists
+   - Toggle in GUI settings
+
+4. **Advanced Noise Profiles**
+   - Save and load custom noise profiles
+   - Adaptive noise suppression
+   - Different profiles for different environments
+
+5. **Export Functionality**
+   - Save transcriptions in multiple formats (TXT, SRT, VTT, JSON)
+   - Export button in GUI
+   - Automatic session saving
+
+6. **Hotkey Support**
+   - Global hotkeys to start/stop transcription
+   - Mute/unmute hotkey
+   - Quick save hotkey
+
+7. **Docker Support**
+   - Containerized server deployment
+   - Docker Compose for easy multi-component setup
+   - Pre-built images for easy deployment
+
+8. **Themes and Customization**
+   - Dark/light theme toggle
+   - Customizable font sizes and colors for display
+   - OBS-friendly transparent overlay mode
+
+#### Tasks:
+- [ ] Add language detection and multi-language support
+- [ ] Implement speaker diarization
+- [ ] Create optional profanity filter
+- [ ] Add export functionality (SRT, VTT, plain text, JSON)
+- [ ] Implement global hotkey support
+- [ ] Create Docker containers for server component
+- [ ] Add theme customization options
+- [ ] Create advanced noise profile management
+
+---
+
+## Technology Stack
+
+### Local Client:
+- **Python 3.9+**
+- **GUI**: PyQt6 / CustomTkinter / tkinter
+- **Audio**: PyAudio / sounddevice
+- **Noise Suppression**: noisereduce / rnnoise-python
+- **VAD**: webrtcvad
+- **ML Framework**: PyTorch (for Whisper)
+- **Transcription**: openai-whisper / faster-whisper
+- **Networking**: websockets, requests (optional for server sync)
+- **Config**: PyYAML / json
+
+### Server:
+- **Backend**: FastAPI / Flask
+- **WebSocket**: python-websockets / FastAPI WebSockets
+- **Server**: Uvicorn / Gunicorn
+- **Database** (optional): SQLite / PostgreSQL
+- **CORS**: fastapi-cors
+
+### Web Interface:
+- **Frontend**: HTML5, CSS3, JavaScript (ES6+)
+- **Real-time**: WebSocket API
+- **Styling**: CSS Grid/Flexbox for layout
+
+---
+
+## Project Structure
+
+```
+local-transcription/
+ client/                      # Local transcription client
+    __init__.py
+    audio_capture.py         # Audio input handling
+    transcription_engine.py  # Whisper integration
+    network_client.py        # Server communication
+    config.py                # Configuration management
+    main.py                  # Client entry point
+ server/                      # Centralized web server
+    __init__.py
+    api.py                   # FastAPI routes
+    websocket_handler.py     # WebSocket management
+    models.py                # Data models
+    database.py              # Optional DB layer
+    main.py                  # Server entry point
+ web/                         # Web stream interface
+    index.html               # OBS browser source page
+    styles.css               # Customizable styling
+    app.js                   # WebSocket client & UI logic
+ config/
+    client_config.example.yaml
+    server_config.example.yaml
+ tests/
+    test_audio.py
+    test_transcription.py
+    test_server.py
+ requirements.txt             # Python dependencies
+ README.md
+ main.py                      # Combined launcher (optional)
+```
+
+---
+
+## Installation (Planned)
+
+### Prerequisites:
+- Python 3.9 or higher
+- CUDA-capable GPU (optional, for GPU acceleration)
+- FFmpeg (required by Whisper)
+
+### Steps:
+
+1. **Clone the repository**
+   ```bash
+   git clone <repository-url>
+   cd local-transcription
+   ```
+
+2. **Install dependencies**
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+3. **Download Whisper models**
+   ```bash
+   # Models will be auto-downloaded on first run
+   # Or manually download:
+   python -c "import whisper; whisper.load_model('base')"
+   ```
+
+4. **Configure client**
+   ```bash
+   cp config/client_config.example.yaml config/client_config.yaml
+   # Edit config/client_config.yaml with your settings
+   ```
+
+5. **Run the server** (one instance)
+   ```bash
+   python server/main.py
+   ```
+
+6. **Run the client** (on each user's machine)
+   ```bash
+   python client/main.py
+   ```
+
+7. **Add to OBS**
+   - Add a Browser Source
+   - URL: `http://<server-ip>:8000/stream`
+   - Set width/height as needed
+   - Check "Shutdown source when not visible" for performance
+
+---
+
+## Configuration (Planned)
+
+### Client Configuration:
+```yaml
+user:
+  name: "Streamer1"          # Display name for transcriptions
+  id: "unique-user-id"       # Optional unique identifier
+
+audio:
+  input_device: "default"    # or specific device index
+  sample_rate: 16000
+  chunk_duration: 2.0        # seconds
+
+noise_suppression:
+  enabled: true              # Enable/disable noise reduction
+  strength: 0.7              # 0.0 to 1.0 - reduction strength
+  method: "noisereduce"      # "noisereduce" or "rnnoise"
+
+transcription:
+  model: "base"              # tiny, base, small, medium, large
+  device: "cuda"             # cpu, cuda, mps
+  language: "en"             # or "auto" for detection
+  task: "transcribe"         # or "translate"
+
+processing:
+  use_vad: true              # Voice Activity Detection
+  min_confidence: 0.5        # Minimum transcription confidence
+
+server_sync:
+  enabled: false             # Enable multi-user server sync
+  url: "ws://localhost:8000" # Server URL (when enabled)
+  api_key: ""                # Optional API key
+
+display:
+  show_timestamps: true      # Show timestamps in local display
+  max_lines: 100             # Maximum lines to keep in display
+  font_size: 12              # GUI font size
+```
+
+### Server Configuration:
+```yaml
+server:
+  host: "0.0.0.0"
+  port: 8000
+  api_key_required: false
+
+stream:
+  max_clients: 10
+  buffer_size: 100         # messages to buffer
+  retention_time: 300      # seconds
+
+database:
+  enabled: false
+  path: "transcriptions.db"
+```
+
+---
+
+## Roadmap
+
+- [x] Project planning and architecture design
+- [ ] Phase 1: Standalone desktop application with GUI
+- [ ] Phase 2: Web server and sync system (optional multi-user mode)
+- [ ] Phase 3: Client-server communication (optional)
+- [ ] Phase 4: Web stream interface for OBS (optional)
+- [ ] Phase 5: Advanced features (hotkeys, themes, Docker, etc.)
+
+---
+
+## Contributing
+
+Contributions are welcome! Please feel free to submit issues or pull requests.
+
+---
+
+## License
+
+[Choose appropriate license - MIT, Apache 2.0, etc.]
+
+---
+
+## Acknowledgments
+
+- OpenAI Whisper for the excellent speech recognition model
+- The streaming community for inspiration and use cases
--- a/build-cuda.bat
+++ b/build-cuda.bat
@@ -0,0 +1,56 @@
+@echo off
+REM Build script for Windows with CUDA support
+
+echo Building Local Transcription with CUDA support...
+echo ==================================================
+echo.
+echo This will create a build that supports both CPU and CUDA GPUs.
+echo The executable will be larger (~2-3GB) but will work on any system.
+echo.
+
+set /p INSTALL_CUDA="Install PyTorch with CUDA support? (y/n) "
+if /i "%INSTALL_CUDA%"=="y" (
+    echo Installing PyTorch with CUDA 12.1 support...
+
+    REM Uninstall CPU-only version if present
+    uv pip uninstall -y torch
+
+    REM Install CUDA-enabled PyTorch
+    REM This installs PyTorch with bundled CUDA runtime
+    uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
+
+    echo CUDA-enabled PyTorch installed
+    echo.
+)
+
+REM Clean previous builds
+echo Cleaning previous builds...
+if exist build rmdir /s /q build
+if exist dist rmdir /s /q dist
+
+REM Build with PyInstaller
+echo Running PyInstaller...
+uv run pyinstaller local-transcription.spec
+
+REM Check if build succeeded
+if exist "dist\LocalTranscription" (
+    echo.
+    echo Build successful!
+    echo Executable location: dist\LocalTranscription\LocalTranscription.exe
+    echo.
+    echo CUDA Support: YES (falls back to CPU if CUDA not available^)
+    echo.
+    echo To run the application:
+    echo   cd dist\LocalTranscription
+    echo   LocalTranscription.exe
+    echo.
+    echo To create a distributable package:
+    echo   - Compress the dist\LocalTranscription folder to a ZIP file
+    echo   - Name it: LocalTranscription-Windows-CUDA.zip
+    echo.
+    echo Note: This build will work on systems with or without NVIDIA GPUs.
+) else (
+    echo.
+    echo Build failed!
+    exit /b 1
+)
--- a/build-cuda.sh
+++ b/build-cuda.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+# Build script for Linux with CUDA support
+
+echo "Building Local Transcription with CUDA support..."
+echo "=================================================="
+echo ""
+echo "This will create a build that supports both CPU and CUDA GPUs."
+echo "The executable will be larger (~2-3GB) but will work on any system."
+echo ""
+
+# Check if we should install CUDA-enabled PyTorch
+read -p "Install PyTorch with CUDA support? (y/n) " -n 1 -r
+echo
+if [[ $REPLY =~ ^[Yy]$ ]]
+then
+    echo "Installing PyTorch with CUDA 12.1 support..."
+    # Uninstall CPU-only version if present
+    uv pip uninstall -y torch
+
+    # Install CUDA-enabled PyTorch
+    # This installs PyTorch with bundled CUDA runtime
+    uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
+
+    echo "✓ CUDA-enabled PyTorch installed"
+    echo ""
+fi
+
+# Clean previous builds
+echo "Cleaning previous builds..."
+rm -rf build dist
+
+# Build with PyInstaller
+echo "Running PyInstaller..."
+uv run pyinstaller local-transcription.spec
+
+# Check if build succeeded
+if [ -d "dist/LocalTranscription" ]; then
+    echo ""
+    echo "✓ Build successful!"
+    echo "Executable location: dist/LocalTranscription/LocalTranscription"
+    echo ""
+    echo "CUDA Support: YES (falls back to CPU if CUDA not available)"
+    echo ""
+    echo "To run the application:"
+    echo "  cd dist/LocalTranscription"
+    echo "  ./LocalTranscription"
+    echo ""
+    echo "To create a distributable package:"
+    echo "  cd dist"
+    echo "  tar -czf LocalTranscription-Linux-CUDA.tar.gz LocalTranscription/"
+    echo ""
+    echo "Note: This build will work on systems with or without NVIDIA GPUs."
+else
+    echo ""
+    echo "✗ Build failed!"
+    exit 1
+fi
--- a/build.bat
+++ b/build.bat
@@ -0,0 +1,34 @@
+@echo off
+REM Build script for Windows
+
+echo Building Local Transcription for Windows...
+echo ==========================================
+echo.
+
+REM Clean previous builds
+echo Cleaning previous builds...
+if exist build rmdir /s /q build
+if exist dist rmdir /s /q dist
+
+REM Build with PyInstaller
+echo Running PyInstaller...
+uv run pyinstaller local-transcription.spec
+
+REM Check if build succeeded
+if exist "dist\LocalTranscription" (
+    echo.
+    echo Build successful!
+    echo Executable location: dist\LocalTranscription\LocalTranscription.exe
+    echo.
+    echo To run the application:
+    echo   cd dist\LocalTranscription
+    echo   LocalTranscription.exe
+    echo.
+    echo To create a distributable package:
+    echo   - Install 7-Zip or WinRAR
+    echo   - Compress the dist\LocalTranscription folder to a ZIP file
+) else (
+    echo.
+    echo Build failed!
+    exit /b 1
+)
--- a/build.sh
+++ b/build.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+# Build script for Linux
+
+echo "Building Local Transcription for Linux..."
+echo "========================================="
+
+# Clean previous builds
+echo "Cleaning previous builds..."
+rm -rf build dist
+
+# Build with PyInstaller
+echo "Running PyInstaller..."
+uv run pyinstaller local-transcription.spec
+
+# Check if build succeeded
+if [ -d "dist/LocalTranscription" ]; then
+    echo ""
+    echo "✓ Build successful!"
+    echo "Executable location: dist/LocalTranscription/LocalTranscription"
+    echo ""
+    echo "To run the application:"
+    echo "  cd dist/LocalTranscription"
+    echo "  ./LocalTranscription"
+    echo ""
+    echo "To create a distributable package:"
+    echo "  cd dist"
+    echo "  tar -czf LocalTranscription-Linux.tar.gz LocalTranscription/"
+else
+    echo ""
+    echo "✗ Build failed!"
+    exit 1
+fi
--- a/client/init.py
+++ b/client/init.py
--- a/client/audio_capture.py
+++ b/client/audio_capture.py
@@ -0,0 +1,246 @@
+"""Audio capture module for recording microphone or system audio."""
+
+import numpy as np
+import sounddevice as sd
+from scipy import signal
+from typing import Callable, Optional, List, Tuple
+from threading import Thread, Event
+import queue
+
+
+class AudioCapture:
+    """Captures audio from input devices and provides chunks for processing."""
+
+    def __init__(
+        self,
+        sample_rate: int = 16000,
+        chunk_duration: float = 3.0,
+        device: Optional[int] = None
+    ):
+        """
+        Initialize audio capture.
+
+        Args:
+            sample_rate: Target audio sample rate in Hz (16000 for Whisper)
+            chunk_duration: Duration of each audio chunk in seconds
+            device: Input device index, or None for default
+        """
+        self.target_sample_rate = sample_rate
+        self.chunk_duration = chunk_duration
+        self.device = device
+        self.chunk_size = int(sample_rate * chunk_duration)
+
+        # Hardware sample rate (will be auto-detected)
+        self.hardware_sample_rate = None
+
+        self.audio_queue = queue.Queue()
+        self.is_recording = False
+        self.stop_event = Event()
+        self.recording_thread: Optional[Thread] = None
+
+    def _detect_sample_rate(self) -> int:
+        """
+        Detect a supported sample rate for the audio device.
+
+        Returns:
+            Supported sample rate
+        """
+        # Try common sample rates in order of preference
+        common_rates = [self.target_sample_rate, 48000, 44100, 22050, 32000, 8000]
+
+        for rate in common_rates:
+            try:
+                # Try to create a test stream
+                with sd.InputStream(
+                    device=self.device,
+                    channels=1,
+                    samplerate=rate,
+                    blocksize=1024
+                ):
+                    print(f"Using hardware sample rate: {rate} Hz")
+                    return rate
+            except sd.PortAudioError:
+                continue
+
+        # If nothing works, default to 48000
+        print(f"Warning: Could not detect sample rate, defaulting to 48000 Hz")
+        return 48000
+
+    def _resample(self, audio: np.ndarray, from_rate: int, to_rate: int) -> np.ndarray:
+        """
+        Resample audio from one sample rate to another.
+
+        Args:
+            audio: Input audio data
+            from_rate: Source sample rate
+            to_rate: Target sample rate
+
+        Returns:
+            Resampled audio
+        """
+        if from_rate == to_rate:
+            return audio
+
+        # Calculate resampling ratio
+        num_samples = int(len(audio) * to_rate / from_rate)
+
+        # Use scipy's resample for high-quality resampling
+        resampled = signal.resample(audio, num_samples)
+
+        return resampled.astype(np.float32)
+
+    @staticmethod
+    def get_input_devices() -> List[Tuple[int, str]]:
+        """
+        Get list of available input audio devices.
+
+        Returns:
+            List of (device_index, device_name) tuples
+        """
+        devices = []
+        device_list = sd.query_devices()
+
+        for i, device in enumerate(device_list):
+            # Only include devices with input channels
+            if device['max_input_channels'] > 0:
+                devices.append((i, device['name']))
+
+        return devices
+
+    @staticmethod
+    def get_default_device() -> Optional[Tuple[int, str]]:
+        """
+        Get the default input device.
+
+        Returns:
+            (device_index, device_name) tuple or None
+        """
+        try:
+            default_device = sd.query_devices(kind='input')
+            device_list = sd.query_devices()
+
+            for i, device in enumerate(device_list):
+                if device['name'] == default_device['name']:
+                    return (i, device['name'])
+        except:
+            pass
+
+        return None
+
+    def _audio_callback(self, indata, frames, time_info, status):
+        """Callback function for sounddevice stream."""
+        if status:
+            print(f"Audio status: {status}")
+
+        # Copy audio data to queue
+        audio_data = indata.copy().flatten()
+        self.audio_queue.put(audio_data)
+
+    def start_recording(self, callback: Optional[Callable[[np.ndarray], None]] = None):
+        """
+        Start recording audio.
+
+        Args:
+            callback: Optional callback function to receive audio chunks
+        """
+        if self.is_recording:
+            return
+
+        # Detect supported sample rate
+        self.hardware_sample_rate = self._detect_sample_rate()
+
+        self.is_recording = True
+        self.stop_event.clear()
+
+        def record_loop():
+            """Recording loop that runs in a separate thread."""
+            buffer = np.array([], dtype=np.float32)
+
+            # Calculate hardware chunk size
+            hardware_chunk_size = int(self.hardware_sample_rate * self.chunk_duration)
+
+            try:
+                with sd.InputStream(
+                    device=self.device,
+                    channels=1,
+                    samplerate=self.hardware_sample_rate,
+                    callback=self._audio_callback,
+                    blocksize=int(self.hardware_sample_rate * 0.1)  # 100ms blocks
+                ):
+                    while not self.stop_event.is_set():
+                        try:
+                            # Get audio data from queue (with timeout)
+                            audio_chunk = self.audio_queue.get(timeout=0.1)
+                            buffer = np.concatenate([buffer, audio_chunk])
+
+                            # If we have enough data for a full chunk
+                            if len(buffer) >= hardware_chunk_size:
+                                # Extract chunk
+                                chunk = buffer[:hardware_chunk_size]
+                                buffer = buffer[hardware_chunk_size:]
+
+                                # Resample to target rate if needed
+                                if self.hardware_sample_rate != self.target_sample_rate:
+                                    chunk = self._resample(
+                                        chunk,
+                                        self.hardware_sample_rate,
+                                        self.target_sample_rate
+                                    )
+
+                                # Send to callback if provided
+                                if callback:
+                                    callback(chunk)
+
+                        except queue.Empty:
+                            continue
+                        except Exception as e:
+                            print(f"Error in recording loop: {e}")
+            except Exception as e:
+                print(f"Error opening audio stream: {e}")
+                self.is_recording = False
+
+        self.recording_thread = Thread(target=record_loop, daemon=True)
+        self.recording_thread.start()
+
+    def stop_recording(self):
+        """Stop recording audio."""
+        if not self.is_recording:
+            return
+
+        self.is_recording = False
+        self.stop_event.set()
+
+        if self.recording_thread:
+            self.recording_thread.join(timeout=2.0)
+            self.recording_thread = None
+
+    def get_audio_chunk(self, timeout: float = 1.0) -> Optional[np.ndarray]:
+        """
+        Get the next audio chunk from the queue.
+
+        Args:
+            timeout: Maximum time to wait for a chunk
+
+        Returns:
+            Audio chunk as numpy array or None if timeout
+        """
+        try:
+            return self.audio_queue.get(timeout=timeout)
+        except queue.Empty:
+            return None
+
+    def is_recording_active(self) -> bool:
+        """Check if recording is currently active."""
+        return self.is_recording
+
+    def clear_queue(self):
+        """Clear any pending audio chunks from the queue."""
+        while not self.audio_queue.empty():
+            try:
+                self.audio_queue.get_nowait()
+            except queue.Empty:
+                break
+
+    def __del__(self):
+        """Cleanup when object is destroyed."""
+        self.stop_recording()
--- a/client/config.py
+++ b/client/config.py
@@ -0,0 +1,141 @@
+"""Configuration management for the local transcription application."""
+
+import os
+import yaml
+from pathlib import Path
+from typing import Any, Dict, Optional
+
+
+class Config:
+    """Manages application configuration with YAML file storage."""
+
+    def __init__(self, config_path: Optional[str] = None):
+        """
+        Initialize configuration.
+
+        Args:
+            config_path: Path to configuration file. If None, uses default location.
+        """
+        self.app_dir = Path.home() / ".local-transcription"
+        self.app_dir.mkdir(parents=True, exist_ok=True)
+
+        if config_path is None:
+            self.config_path = self.app_dir / "config.yaml"
+        else:
+            self.config_path = Path(config_path)
+
+        self.config: Dict[str, Any] = {}
+        self.load()
+
+    def load(self) -> None:
+        """Load configuration from file or create default if not exists."""
+        if self.config_path.exists():
+            with open(self.config_path, 'r') as f:
+                self.config = yaml.safe_load(f) or {}
+        else:
+            # Load default configuration
+            default_config_path = Path(__file__).parent.parent / "config" / "default_config.yaml"
+            if default_config_path.exists():
+                with open(default_config_path, 'r') as f:
+                    self.config = yaml.safe_load(f) or {}
+            else:
+                self.config = self._get_default_config()
+
+            # Save the default configuration
+            self.save()
+
+    def save(self) -> None:
+        """Save current configuration to file."""
+        with open(self.config_path, 'w') as f:
+            yaml.dump(self.config, f, default_flow_style=False, indent=2)
+
+    def get(self, key_path: str, default: Any = None) -> Any:
+        """
+        Get configuration value using dot notation.
+
+        Args:
+            key_path: Dot-separated path to config value (e.g., "audio.sample_rate")
+            default: Default value if key not found
+
+        Returns:
+            Configuration value or default
+        """
+        keys = key_path.split('.')
+        value = self.config
+
+        for key in keys:
+            if isinstance(value, dict) and key in value:
+                value = value[key]
+            else:
+                return default
+
+        return value
+
+    def set(self, key_path: str, value: Any) -> None:
+        """
+        Set configuration value using dot notation.
+
+        Args:
+            key_path: Dot-separated path to config value (e.g., "audio.sample_rate")
+            value: Value to set
+        """
+        keys = key_path.split('.')
+        config = self.config
+
+        # Navigate to the parent dict
+        for key in keys[:-1]:
+            if key not in config:
+                config[key] = {}
+            config = config[key]
+
+        # Set the value
+        config[keys[-1]] = value
+        self.save()
+
+    def _get_default_config(self) -> Dict[str, Any]:
+        """Get hardcoded default configuration."""
+        return {
+            'user': {
+                'name': 'User',
+                'id': ''
+            },
+            'audio': {
+                'input_device': 'default',
+                'sample_rate': 16000,
+                'chunk_duration': 3.0
+            },
+            'noise_suppression': {
+                'enabled': True,
+                'strength': 0.7,
+                'method': 'noisereduce'
+            },
+            'transcription': {
+                'model': 'base',
+                'device': 'auto',
+                'language': 'en',
+                'task': 'transcribe'
+            },
+            'processing': {
+                'use_vad': True,
+                'min_confidence': 0.5
+            },
+            'server_sync': {
+                'enabled': False,
+                'url': 'ws://localhost:8000',
+                'api_key': ''
+            },
+            'display': {
+                'show_timestamps': True,
+                'max_lines': 100,
+                'font_size': 12,
+                'theme': 'dark'
+            }
+        }
+
+    def reset_to_default(self) -> None:
+        """Reset configuration to default values."""
+        self.config = self._get_default_config()
+        self.save()
+
+    def __repr__(self) -> str:
+        return f"Config(path={self.config_path})"
--- a/client/device_utils.py
+++ b/client/device_utils.py
@@ -0,0 +1,128 @@
+"""Utilities for detecting and managing compute devices (CPU/GPU)."""
+
+import torch
+from typing import List, Tuple
+
+
+class DeviceManager:
+    """Manages device detection and selection for transcription."""
+
+    def __init__(self):
+        """Initialize device manager and detect available devices."""
+        self.available_devices = self._detect_devices()
+        self.current_device = self.available_devices[0] if self.available_devices else "cpu"
+
+    def _detect_devices(self) -> List[str]:
+        """
+        Detect available compute devices.
+
+        Returns:
+            List of available device names
+        """
+        devices = ["cpu"]
+
+        # Check for CUDA (NVIDIA GPU)
+        if torch.cuda.is_available():
+            devices.append("cuda")
+
+        # Check for MPS (Apple Silicon GPU)
+        if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
+            devices.append("mps")
+
+        return devices
+
+    def get_device_info(self) -> List[Tuple[str, str]]:
+        """
+        Get detailed information about available devices.
+
+        Returns:
+            List of (device_name, device_description) tuples
+        """
+        info = []
+
+        for device in self.available_devices:
+            if device == "cpu":
+                info.append(("cpu", "CPU"))
+            elif device == "cuda":
+                try:
+                    gpu_name = torch.cuda.get_device_name(0)
+                    info.append(("cuda", f"CUDA GPU: {gpu_name}"))
+                except:
+                    info.append(("cuda", "CUDA GPU"))
+            elif device == "mps":
+                info.append(("mps", "Apple Silicon GPU (MPS)"))
+
+        return info
+
+    def set_device(self, device: str) -> bool:
+        """
+        Set the current device for transcription.
+
+        Args:
+            device: Device name ('cpu', 'cuda', 'mps', or 'auto')
+
+        Returns:
+            True if device was set successfully, False otherwise
+        """
+        if device == "auto":
+            # Auto-select best available device
+            if "cuda" in self.available_devices:
+                self.current_device = "cuda"
+            elif "mps" in self.available_devices:
+                self.current_device = "mps"
+            else:
+                self.current_device = "cpu"
+            return True
+
+        if device in self.available_devices:
+            self.current_device = device
+            return True
+
+        return False
+
+    def get_device(self) -> str:
+        """
+        Get the currently selected device.
+
+        Returns:
+            Current device name
+        """
+        return self.current_device
+
+    def is_gpu_available(self) -> bool:
+        """
+        Check if any GPU is available.
+
+        Returns:
+            True if CUDA or MPS is available
+        """
+        return "cuda" in self.available_devices or "mps" in self.available_devices
+
+    def get_device_for_whisper(self) -> str:
+        """
+        Get device string formatted for faster-whisper.
+
+        Returns:
+            Device string for faster-whisper ('cpu', 'cuda', etc.)
+        """
+        if self.current_device == "mps":
+            # faster-whisper doesn't support MPS, fall back to CPU
+            return "cpu"
+        return self.current_device
+
+    def get_compute_type(self) -> str:
+        """
+        Get the appropriate compute type for the current device.
+
+        Returns:
+            Compute type string for faster-whisper
+        """
+        if self.current_device == "cuda":
+            # Use float16 for GPU for better performance
+            return "float16"
+        else:
+            # Use int8 for CPU for better performance
+            return "int8"
+
+    def __repr__(self) -> str:
+        return f"DeviceManager(current={self.current_device}, available={self.available_devices})"
--- a/client/noise_suppression.py
+++ b/client/noise_suppression.py
@@ -0,0 +1,164 @@
+"""Noise suppression module for reducing background noise in audio."""
+
+import warnings
+# Suppress pkg_resources deprecation warning from webrtcvad
+warnings.filterwarnings("ignore", message=".*pkg_resources.*", category=UserWarning)
+
+import numpy as np
+import noisereduce as nr
+import webrtcvad
+from typing import Optional
+
+
+class NoiseSuppressor:
+    """Handles noise reduction and voice activity detection."""
+
+    def __init__(
+        self,
+        sample_rate: int = 16000,
+        method: str = "noisereduce",
+        strength: float = 0.7,
+        use_vad: bool = True
+    ):
+        """
+        Initialize noise suppressor.
+
+        Args:
+            sample_rate: Audio sample rate in Hz
+            method: Noise reduction method ('noisereduce' or 'none')
+            strength: Noise reduction strength (0.0 to 1.0)
+            use_vad: Whether to use Voice Activity Detection
+        """
+        self.sample_rate = sample_rate
+        self.method = method
+        self.strength = max(0.0, min(1.0, strength))  # Clamp to [0, 1]
+        self.use_vad = use_vad
+
+        # Initialize VAD if requested
+        self.vad = None
+        if use_vad:
+            try:
+                # WebRTC VAD supports 16kHz, 32kHz, and 48kHz
+                if sample_rate in [8000, 16000, 32000, 48000]:
+                    self.vad = webrtcvad.Vad(2)  # Aggressiveness: 0-3 (2 is balanced)
+                else:
+                    print(f"Warning: VAD not supported for sample rate {sample_rate}Hz")
+                    self.use_vad = False
+            except Exception as e:
+                print(f"Warning: Failed to initialize VAD: {e}")
+                self.use_vad = False
+
+        # Store noise profile for adaptive reduction
+        self.noise_profile: Optional[np.ndarray] = None
+
+    def reduce_noise(self, audio: np.ndarray) -> np.ndarray:
+        """
+        Apply noise reduction to audio.
+
+        Args:
+            audio: Audio data as numpy array (float32, range [-1, 1])
+
+        Returns:
+            Noise-reduced audio
+        """
+        if self.method == "none" or self.strength == 0.0:
+            return audio
+
+        try:
+            # Ensure audio is float32
+            audio = audio.astype(np.float32)
+
+            if self.method == "noisereduce":
+                # Apply noisereduce noise reduction
+                reduced = nr.reduce_noise(
+                    y=audio,
+                    sr=self.sample_rate,
+                    prop_decrease=self.strength,
+                    stationary=True
+                )
+                return reduced.astype(np.float32)
+            else:
+                return audio
+
+        except Exception as e:
+            print(f"Error in noise reduction: {e}")
+            return audio
+
+    def is_speech(self, audio: np.ndarray) -> bool:
+        """
+        Detect if audio contains speech using VAD.
+
+        Args:
+            audio: Audio data as numpy array (float32, range [-1, 1])
+
+        Returns:
+            True if speech is detected, False otherwise
+        """
+        if not self.use_vad or self.vad is None:
+            return True  # Assume speech if VAD not available
+
+        try:
+            # Convert float32 audio to int16 for VAD
+            audio_int16 = (audio * 32767).astype(np.int16)
+
+            # VAD requires specific frame sizes (10, 20, or 30 ms)
+            frame_duration_ms = 30
+            frame_size = int(self.sample_rate * frame_duration_ms / 1000)
+
+            # Process audio in frames
+            num_frames = len(audio_int16) // frame_size
+            speech_frames = 0
+
+            for i in range(num_frames):
+                frame = audio_int16[i * frame_size:(i + 1) * frame_size]
+                if self.vad.is_speech(frame.tobytes(), self.sample_rate):
+                    speech_frames += 1
+
+            # Consider it speech if more than 30% of frames contain speech
+            return speech_frames > (num_frames * 0.3)
+
+        except Exception as e:
+            print(f"Error in VAD: {e}")
+            return True  # Assume speech on error
+
+    def process(self, audio: np.ndarray, skip_silent: bool = True) -> Optional[np.ndarray]:
+        """
+        Process audio with noise reduction and optional VAD filtering.
+
+        Args:
+            audio: Audio data as numpy array
+            skip_silent: If True, return None for non-speech audio
+
+        Returns:
+            Processed audio or None if silent (when skip_silent=True)
+        """
+        # Check for speech first (before noise reduction)
+        if skip_silent and self.use_vad:
+            if not self.is_speech(audio):
+                return None
+
+        # Apply noise reduction
+        processed_audio = self.reduce_noise(audio)
+
+        return processed_audio
+
+    def set_strength(self, strength: float):
+        """
+        Update noise reduction strength.
+
+        Args:
+            strength: New strength value (0.0 to 1.0)
+        """
+        self.strength = max(0.0, min(1.0, strength))
+
+    def set_vad_enabled(self, enabled: bool):
+        """
+        Enable or disable Voice Activity Detection.
+
+        Args:
+            enabled: True to enable VAD, False to disable
+        """
+        self.use_vad = enabled and self.vad is not None
+
+    def __repr__(self) -> str:
+        return f"NoiseSuppressor(method={self.method}, strength={self.strength}, vad={self.use_vad})"
--- a/client/transcription_engine.py
+++ b/client/transcription_engine.py
@@ -0,0 +1,232 @@
+"""Transcription engine using faster-whisper for speech-to-text."""
+
+import numpy as np
+from faster_whisper import WhisperModel
+from typing import Optional, List, Tuple
+from datetime import datetime
+import threading
+
+
+class TranscriptionResult:
+    """Represents a transcription result."""
+
+    def __init__(self, text: str, confidence: float, timestamp: datetime, user_name: str = ""):
+        """
+        Initialize transcription result.
+
+        Args:
+            text: Transcribed text
+            confidence: Confidence score (0.0 to 1.0)
+            timestamp: Timestamp of transcription
+            user_name: Name of the user/speaker
+        """
+        self.text = text.strip()
+        self.confidence = confidence
+        self.timestamp = timestamp
+        self.user_name = user_name
+
+    def __repr__(self) -> str:
+        time_str = self.timestamp.strftime("%H:%M:%S")
+        if self.user_name:
+            return f"[{time_str}] {self.user_name}: {self.text}"
+        return f"[{time_str}] {self.text}"
+
+    def to_dict(self) -> dict:
+        """Convert to dictionary."""
+        return {
+            'text': self.text,
+            'confidence': self.confidence,
+            'timestamp': self.timestamp.isoformat(),
+            'user_name': self.user_name
+        }
+
+
+class TranscriptionEngine:
+    """Handles speech-to-text transcription using faster-whisper."""
+
+    def __init__(
+        self,
+        model_size: str = "base",
+        device: str = "cpu",
+        compute_type: str = "int8",
+        language: str = "en",
+        min_confidence: float = 0.5
+    ):
+        """
+        Initialize transcription engine.
+
+        Args:
+            model_size: Whisper model size ('tiny', 'base', 'small', 'medium', 'large')
+            device: Device to use ('cpu', 'cuda', 'auto')
+            compute_type: Compute type ('int8', 'float16', 'float32')
+            language: Language code for transcription
+            min_confidence: Minimum confidence threshold for transcriptions
+        """
+        self.model_size = model_size
+        self.device = device
+        self.compute_type = compute_type
+        self.language = language
+        self.min_confidence = min_confidence
+        self.model: Optional[WhisperModel] = None
+        self.model_lock = threading.Lock()
+        self.is_loaded = False
+
+    def load_model(self) -> bool:
+        """
+        Load the Whisper model.
+
+        Returns:
+            True if model loaded successfully, False otherwise
+        """
+        try:
+            print(f"Loading Whisper {self.model_size} model on {self.device}...")
+
+            with self.model_lock:
+                self.model = WhisperModel(
+                    self.model_size,
+                    device=self.device,
+                    compute_type=self.compute_type
+                )
+                self.is_loaded = True
+
+            print(f"Model loaded successfully!")
+            return True
+
+        except Exception as e:
+            print(f"Error loading model: {e}")
+            self.is_loaded = False
+            return False
+
+    def transcribe(
+        self,
+        audio: np.ndarray,
+        sample_rate: int = 16000,
+        user_name: str = ""
+    ) -> Optional[TranscriptionResult]:
+        """
+        Transcribe audio to text.
+
+        Args:
+            audio: Audio data as numpy array (float32)
+            sample_rate: Audio sample rate in Hz
+            user_name: Name of the user/speaker
+
+        Returns:
+            TranscriptionResult or None if transcription failed or confidence too low
+        """
+        if not self.is_loaded or self.model is None:
+            print("Model not loaded")
+            return None
+
+        try:
+            # Ensure audio is float32
+            audio = audio.astype(np.float32)
+
+            # Transcribe using faster-whisper
+            with self.model_lock:
+                segments, info = self.model.transcribe(
+                    audio,
+                    language=self.language if self.language != "auto" else None,
+                    vad_filter=True,  # Use built-in VAD
+                    vad_parameters=dict(
+                        min_silence_duration_ms=500
+                    )
+                )
+
+                # Collect all segments
+                full_text = ""
+                total_confidence = 0.0
+                segment_count = 0
+
+                for segment in segments:
+                    full_text += segment.text + " "
+                    total_confidence += segment.avg_logprob
+                    segment_count += 1
+
+            # Calculate average confidence
+            if segment_count == 0:
+                return None
+
+            # Convert log probability to approximate confidence (0-1 range)
+            # avg_logprob is typically in range [-1, 0], so we transform it
+            avg_confidence = np.exp(total_confidence / segment_count)
+
+            # Filter by minimum confidence
+            if avg_confidence < self.min_confidence:
+                return None
+
+            # Clean up text
+            text = full_text.strip()
+
+            if not text:
+                return None
+
+            # Create result
+            result = TranscriptionResult(
+                text=text,
+                confidence=avg_confidence,
+                timestamp=datetime.now(),
+                user_name=user_name
+            )
+
+            return result
+
+        except Exception as e:
+            print(f"Error during transcription: {e}")
+            return None
+
+    def change_model(self, model_size: str) -> bool:
+        """
+        Change to a different model size.
+
+        Args:
+            model_size: New model size
+
+        Returns:
+            True if model changed successfully
+        """
+        self.model_size = model_size
+        self.is_loaded = False
+        self.model = None
+        return self.load_model()
+
+    def change_device(self, device: str, compute_type: Optional[str] = None) -> bool:
+        """
+        Change compute device.
+
+        Args:
+            device: New device ('cpu', 'cuda', etc.)
+            compute_type: Optional new compute type
+
+        Returns:
+            True if device changed successfully
+        """
+        self.device = device
+        if compute_type:
+            self.compute_type = compute_type
+
+        self.is_loaded = False
+        self.model = None
+        return self.load_model()
+
+    def change_language(self, language: str):
+        """
+        Change transcription language.
+
+        Args:
+            language: Language code or 'auto'
+        """
+        self.language = language
+
+    def unload_model(self):
+        """Unload the model from memory."""
+        with self.model_lock:
+            self.model = None
+            self.is_loaded = False
+
+    def __repr__(self) -> str:
+        return f"TranscriptionEngine(model={self.model_size}, device={self.device}, loaded={self.is_loaded})"
+
+    def __del__(self):
+        """Cleanup when object is destroyed."""
+        self.unload_model()
--- a/config/default_config.yaml
+++ b/config/default_config.yaml
@@ -0,0 +1,40 @@
+user:
+  name: "User"
+  id: ""
+
+audio:
+  input_device: "default"
+  sample_rate: 16000
+  chunk_duration: 3.0
+
+noise_suppression:
+  enabled: true
+  strength: 0.7
+  method: "noisereduce"
+
+transcription:
+  model: "base"
+  device: "auto"
+  language: "en"
+  task: "transcribe"
+
+processing:
+  use_vad: true
+  min_confidence: 0.5
+
+server_sync:
+  enabled: false
+  url: "ws://localhost:8000"
+  api_key: ""
+
+display:
+  show_timestamps: true
+  max_lines: 100
+  font_family: "Courier"
+  font_size: 12
+  theme: "dark"
+  fade_after_seconds: 10  # Time before transcriptions fade out (0 = never fade)
+
+web_server:
+  port: 8080
+  host: "127.0.0.1"
--- a/gui/init.py
+++ b/gui/init.py
--- a/gui/main_window.py
+++ b/gui/main_window.py
@@ -0,0 +1,364 @@
+"""Main application window for the local transcription app."""
+
+import customtkinter as ctk
+from tkinter import filedialog, messagebox
+import threading
+from pathlib import Path
+import sys
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent))
+
+from client.config import Config
+from client.device_utils import DeviceManager
+from client.audio_capture import AudioCapture
+from client.noise_suppression import NoiseSuppressor
+from client.transcription_engine import TranscriptionEngine
+from gui.transcription_display import TranscriptionDisplay
+from gui.settings_dialog import SettingsDialog
+
+
+class MainWindow(ctk.CTk):
+    """Main application window."""
+
+    def __init__(self):
+        """Initialize the main window."""
+        super().__init__()
+
+        # Application state
+        self.is_transcribing = False
+        self.config = Config()
+        self.device_manager = DeviceManager()
+
+        # Components (initialized later)
+        self.audio_capture: AudioCapture = None
+        self.noise_suppressor: NoiseSuppressor = None
+        self.transcription_engine: TranscriptionEngine = None
+
+        # Configure window
+        self.title("Local Transcription")
+        self.geometry("900x700")
+
+        # Set theme
+        ctk.set_appearance_mode(self.config.get('display.theme', 'dark'))
+        ctk.set_default_color_theme("blue")
+
+        # Create UI
+        self._create_widgets()
+
+        # Handle window close
+        self.protocol("WM_DELETE_WINDOW", self._on_closing)
+
+        # Initialize components after GUI is ready (delay to avoid XCB threading issues)
+        self.after(100, self._initialize_components)
+
+    def _create_widgets(self):
+        """Create all UI widgets."""
+        # Header frame
+        header_frame = ctk.CTkFrame(self, height=80)
+        header_frame.pack(fill="x", padx=10, pady=(10, 0))
+        header_frame.pack_propagate(False)
+
+        # Title
+        title_label = ctk.CTkLabel(
+            header_frame,
+            text="Local Transcription",
+            font=("", 24, "bold")
+        )
+        title_label.pack(side="left", padx=20, pady=20)
+
+        # Settings button
+        self.settings_button = ctk.CTkButton(
+            header_frame,
+            text="⚙ Settings",
+            command=self._open_settings,
+            width=120
+        )
+        self.settings_button.pack(side="right", padx=20, pady=20)
+
+        # Status frame
+        status_frame = ctk.CTkFrame(self, height=60)
+        status_frame.pack(fill="x", padx=10, pady=(10, 0))
+        status_frame.pack_propagate(False)
+
+        # Status label
+        self.status_label = ctk.CTkLabel(
+            status_frame,
+            text="⚫ Ready",
+            font=("", 14)
+        )
+        self.status_label.pack(side="left", padx=20)
+
+        # Device info
+        device_info = self.device_manager.get_device_info()
+        device_text = device_info[0][1] if device_info else "No device"
+        self.device_label = ctk.CTkLabel(
+            status_frame,
+            text=f"Device: {device_text}",
+            font=("", 12)
+        )
+        self.device_label.pack(side="left", padx=20)
+
+        # User name display
+        user_name = self.config.get('user.name', 'User')
+        self.user_label = ctk.CTkLabel(
+            status_frame,
+            text=f"User: {user_name}",
+            font=("", 12)
+        )
+        self.user_label.pack(side="left", padx=20)
+
+        # Transcription display frame
+        display_frame = ctk.CTkFrame(self)
+        display_frame.pack(fill="both", expand=True, padx=10, pady=10)
+
+        # Transcription display
+        self.transcription_display = TranscriptionDisplay(
+            display_frame,
+            max_lines=self.config.get('display.max_lines', 100),
+            show_timestamps=self.config.get('display.show_timestamps', True),
+            font=("Courier", self.config.get('display.font_size', 12))
+        )
+        self.transcription_display.pack(fill="both", expand=True, padx=10, pady=10)
+
+        # Control frame
+        control_frame = ctk.CTkFrame(self, height=80)
+        control_frame.pack(fill="x", padx=10, pady=(0, 10))
+        control_frame.pack_propagate(False)
+
+        # Start/Stop button
+        self.start_button = ctk.CTkButton(
+            control_frame,
+            text="▶ Start Transcription",
+            command=self._toggle_transcription,
+            width=200,
+            height=50,
+            font=("", 16, "bold"),
+            fg_color="green"
+        )
+        self.start_button.pack(side="left", padx=20, pady=15)
+
+        # Clear button
+        self.clear_button = ctk.CTkButton(
+            control_frame,
+            text="Clear",
+            command=self._clear_transcriptions,
+            width=120,
+            height=50
+        )
+        self.clear_button.pack(side="left", padx=10, pady=15)
+
+        # Save button
+        self.save_button = ctk.CTkButton(
+            control_frame,
+            text="💾 Save",
+            command=self._save_transcriptions,
+            width=120,
+            height=50
+        )
+        self.save_button.pack(side="left", padx=10, pady=15)
+
+    def _initialize_components(self):
+        """Initialize audio, noise suppression, and transcription components."""
+        # Update status
+        self.status_label.configure(text="⚙ Initializing...")
+        self.update()
+
+        try:
+            # Set device based on config
+            device_config = self.config.get('transcription.device', 'auto')
+            self.device_manager.set_device(device_config)
+
+            # Initialize transcription engine
+            model_size = self.config.get('transcription.model', 'base')
+            language = self.config.get('transcription.language', 'en')
+            device = self.device_manager.get_device_for_whisper()
+            compute_type = self.device_manager.get_compute_type()
+
+            self.transcription_engine = TranscriptionEngine(
+                model_size=model_size,
+                device=device,
+                compute_type=compute_type,
+                language=language,
+                min_confidence=self.config.get('processing.min_confidence', 0.5)
+            )
+
+            # Load model (synchronously to avoid X11 threading issues)
+            success = self.transcription_engine.load_model()
+
+            if success:
+                self.status_label.configure(text="✓ Ready")
+            else:
+                self.status_label.configure(text="❌ Model loading failed")
+                messagebox.showerror("Error", "Failed to load transcription model")
+
+        except Exception as e:
+            print(f"Error initializing components: {e}")
+            self.status_label.configure(text="❌ Initialization failed")
+            messagebox.showerror("Error", f"Failed to initialize:\n{e}")
+
+    def _update_status(self, status: str):
+        """Update status label (thread-safe)."""
+        self.after(0, lambda: self.status_label.configure(text=status))
+
+    def _toggle_transcription(self):
+        """Start or stop transcription."""
+        if not self.is_transcribing:
+            self._start_transcription()
+        else:
+            self._stop_transcription()
+
+    def _start_transcription(self):
+        """Start transcription."""
+        try:
+            # Check if engine is ready
+            if not self.transcription_engine or not self.transcription_engine.is_loaded:
+                messagebox.showerror("Error", "Transcription engine not ready")
+                return
+
+            # Get audio device
+            audio_device_str = self.config.get('audio.input_device', 'default')
+            audio_device = None if audio_device_str == 'default' else int(audio_device_str)
+
+            # Initialize audio capture
+            self.audio_capture = AudioCapture(
+                sample_rate=self.config.get('audio.sample_rate', 16000),
+                chunk_duration=self.config.get('audio.chunk_duration', 3.0),
+                device=audio_device
+            )
+
+            # Initialize noise suppressor
+            self.noise_suppressor = NoiseSuppressor(
+                sample_rate=self.config.get('audio.sample_rate', 16000),
+                method="noisereduce" if self.config.get('noise_suppression.enabled', True) else "none",
+                strength=self.config.get('noise_suppression.strength', 0.7),
+                use_vad=self.config.get('processing.use_vad', True)
+            )
+
+            # Start recording
+            self.audio_capture.start_recording(callback=self._process_audio_chunk)
+
+            # Update UI
+            self.is_transcribing = True
+            self.start_button.configure(text="⏸ Stop Transcription", fg_color="red")
+            self.status_label.configure(text="🔴 Recording...")
+
+        except Exception as e:
+            messagebox.showerror("Error", f"Failed to start transcription:\n{e}")
+            print(f"Error starting transcription: {e}")
+
+    def _stop_transcription(self):
+        """Stop transcription."""
+        try:
+            # Stop recording
+            if self.audio_capture:
+                self.audio_capture.stop_recording()
+
+            # Update UI
+            self.is_transcribing = False
+            self.start_button.configure(text="▶ Start Transcription", fg_color="green")
+            self.status_label.configure(text="✓ Ready")
+
+        except Exception as e:
+            messagebox.showerror("Error", f"Failed to stop transcription:\n{e}")
+            print(f"Error stopping transcription: {e}")
+
+    def _process_audio_chunk(self, audio_chunk):
+        """Process an audio chunk (noise suppression + transcription)."""
+        def process():
+            try:
+                # Apply noise suppression
+                processed_audio = self.noise_suppressor.process(audio_chunk, skip_silent=True)
+
+                # Skip if silent (VAD filtered it out)
+                if processed_audio is None:
+                    return
+
+                # Transcribe
+                user_name = self.config.get('user.name', 'User')
+                result = self.transcription_engine.transcribe(
+                    processed_audio,
+                    sample_rate=self.config.get('audio.sample_rate', 16000),
+                    user_name=user_name
+                )
+
+                # Display result
+                if result:
+                    self.after(0, lambda: self.transcription_display.add_transcription(
+                        text=result.text,
+                        user_name=result.user_name,
+                        timestamp=result.timestamp
+                    ))
+
+            except Exception as e:
+                print(f"Error processing audio: {e}")
+
+        # Run in background thread
+        threading.Thread(target=process, daemon=True).start()
+
+    def _clear_transcriptions(self):
+        """Clear all transcriptions."""
+        if messagebox.askyesno("Clear Transcriptions", "Are you sure you want to clear all transcriptions?"):
+            self.transcription_display.clear()
+
+    def _save_transcriptions(self):
+        """Save transcriptions to file."""
+        filepath = filedialog.asksaveasfilename(
+            defaultextension=".txt",
+            filetypes=[("Text files", "*.txt"), ("All files", "*.*")]
+        )
+
+        if filepath:
+            if self.transcription_display.save_to_file(filepath):
+                messagebox.showinfo("Saved", f"Transcriptions saved to:\n{filepath}")
+            else:
+                messagebox.showerror("Error", "Failed to save transcriptions")
+
+    def _open_settings(self):
+        """Open settings dialog."""
+        # Get audio devices
+        audio_devices = AudioCapture.get_input_devices()
+        if not audio_devices:
+            audio_devices = [(0, "Default")]
+
+        # Get compute devices
+        compute_devices = self.device_manager.get_device_info()
+        compute_devices.insert(0, ("auto", "Auto-detect"))
+
+        # Open settings dialog
+        SettingsDialog(
+            self,
+            self.config,
+            audio_devices,
+            compute_devices,
+            on_save=self._on_settings_saved
+        )
+
+    def _on_settings_saved(self):
+        """Handle settings being saved."""
+        # Update user label
+        user_name = self.config.get('user.name', 'User')
+        self.user_label.configure(text=f"User: {user_name}")
+
+        # Update display settings
+        self.transcription_display.set_max_lines(self.config.get('display.max_lines', 100))
+        self.transcription_display.set_show_timestamps(self.config.get('display.show_timestamps', True))
+
+        # Note: Model/device changes require restart
+        messagebox.showinfo(
+            "Settings Saved",
+            "Some settings (model size, device) require restarting the application to take effect."
+        )
+
+    def _on_closing(self):
+        """Handle window closing."""
+        # Stop transcription if running
+        if self.is_transcribing:
+            self._stop_transcription()
+
+        # Unload model
+        if self.transcription_engine:
+            self.transcription_engine.unload_model()
+
+        # Close window
+        self.destroy()
--- a/gui/main_window_qt.py
+++ b/gui/main_window_qt.py
@@ -0,0 +1,524 @@
+"""PySide6 main application window for the local transcription app."""
+
+from PySide6.QtWidgets import (
+    QMainWindow, QWidget, QVBoxLayout, QHBoxLayout,
+    QPushButton, QLabel, QFileDialog, QMessageBox
+)
+from PySide6.QtCore import Qt, QThread, Signal
+from PySide6.QtGui import QFont
+from pathlib import Path
+import sys
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent))
+
+from client.config import Config
+from client.device_utils import DeviceManager
+from client.audio_capture import AudioCapture
+from client.noise_suppression import NoiseSuppressor
+from client.transcription_engine import TranscriptionEngine
+from gui.transcription_display_qt import TranscriptionDisplay
+from gui.settings_dialog_qt import SettingsDialog
+from server.web_display import TranscriptionWebServer
+import asyncio
+from threading import Thread
+
+
+class WebServerThread(Thread):
+    """Thread for running the web server."""
+
+    def __init__(self, web_server):
+        super().__init__(daemon=True)
+        self.web_server = web_server
+        self.loop = None
+
+    def run(self):
+        """Run the web server in async event loop."""
+        self.loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(self.loop)
+        self.loop.run_until_complete(self.web_server.start())
+
+
+class ModelLoaderThread(QThread):
+    """Thread for loading the Whisper model without blocking the GUI."""
+
+    finished = Signal(bool, str)  # success, message
+
+    def __init__(self, transcription_engine):
+        super().__init__()
+        self.transcription_engine = transcription_engine
+
+    def run(self):
+        """Load the model in background thread."""
+        try:
+            success = self.transcription_engine.load_model()
+            if success:
+                self.finished.emit(True, "Model loaded successfully")
+            else:
+                self.finished.emit(False, "Failed to load model")
+        except Exception as e:
+            self.finished.emit(False, f"Error loading model: {e}")
+
+
+class MainWindow(QMainWindow):
+    """Main application window using PySide6."""
+
+    def __init__(self):
+        """Initialize the main window."""
+        super().__init__()
+
+        # Application state
+        self.is_transcribing = False
+        self.config = Config()
+        self.device_manager = DeviceManager()
+
+        # Components (initialized later)
+        self.audio_capture: AudioCapture = None
+        self.noise_suppressor: NoiseSuppressor = None
+        self.transcription_engine: TranscriptionEngine = None
+        self.model_loader_thread: ModelLoaderThread = None
+
+        # Track current model settings
+        self.current_model_size: str = None
+        self.current_device_config: str = None
+
+        # Web server components
+        self.web_server: TranscriptionWebServer = None
+        self.web_server_thread: WebServerThread = None
+
+        # Configure window
+        self.setWindowTitle("Local Transcription")
+        self.resize(900, 700)
+
+        # Create UI
+        self._create_widgets()
+
+        # Initialize components (in background)
+        self._initialize_components()
+
+        # Start web server if enabled
+        self._start_web_server_if_enabled()
+
+    def _create_widgets(self):
+        """Create all UI widgets."""
+        # Central widget
+        central_widget = QWidget()
+        self.setCentralWidget(central_widget)
+
+        main_layout = QVBoxLayout()
+        central_widget.setLayout(main_layout)
+
+        # Header
+        header_widget = QWidget()
+        header_widget.setFixedHeight(80)
+        header_layout = QHBoxLayout()
+        header_widget.setLayout(header_layout)
+
+        title_label = QLabel("Local Transcription")
+        title_font = QFont()
+        title_font.setPointSize(24)
+        title_font.setBold(True)
+        title_label.setFont(title_font)
+        header_layout.addWidget(title_label)
+
+        header_layout.addStretch()
+
+        self.settings_button = QPushButton("⚙ Settings")
+        self.settings_button.setFixedSize(120, 40)
+        self.settings_button.clicked.connect(self._open_settings)
+        header_layout.addWidget(self.settings_button)
+
+        main_layout.addWidget(header_widget)
+
+        # Status bar
+        status_widget = QWidget()
+        status_widget.setFixedHeight(60)
+        status_layout = QHBoxLayout()
+        status_widget.setLayout(status_layout)
+
+        self.status_label = QLabel("⚫ Initializing...")
+        status_font = QFont()
+        status_font.setPointSize(14)
+        self.status_label.setFont(status_font)
+        status_layout.addWidget(self.status_label)
+
+        device_info = self.device_manager.get_device_info()
+        device_text = device_info[0][1] if device_info else "No device"
+        self.device_label = QLabel(f"Device: {device_text}")
+        status_layout.addWidget(self.device_label)
+
+        user_name = self.config.get('user.name', 'User')
+        self.user_label = QLabel(f"User: {user_name}")
+        status_layout.addWidget(self.user_label)
+
+        status_layout.addStretch()
+
+        main_layout.addWidget(status_widget)
+
+        # Transcription display
+        self.transcription_display = TranscriptionDisplay(
+            max_lines=self.config.get('display.max_lines', 100),
+            show_timestamps=self.config.get('display.show_timestamps', True),
+            font_family=self.config.get('display.font_family', 'Courier'),
+            font_size=self.config.get('display.font_size', 12)
+        )
+        main_layout.addWidget(self.transcription_display)
+
+        # Control buttons
+        control_widget = QWidget()
+        control_widget.setFixedHeight(80)
+        control_layout = QHBoxLayout()
+        control_widget.setLayout(control_layout)
+
+        self.start_button = QPushButton("▶ Start Transcription")
+        self.start_button.setFixedSize(240, 50)
+        button_font = QFont()
+        button_font.setPointSize(14)
+        button_font.setBold(True)
+        self.start_button.setFont(button_font)
+        self.start_button.clicked.connect(self._toggle_transcription)
+        self.start_button.setStyleSheet("background-color: #2ecc71; color: white;")
+        control_layout.addWidget(self.start_button)
+
+        self.clear_button = QPushButton("Clear")
+        self.clear_button.setFixedSize(120, 50)
+        self.clear_button.clicked.connect(self._clear_transcriptions)
+        control_layout.addWidget(self.clear_button)
+
+        self.save_button = QPushButton("💾 Save")
+        self.save_button.setFixedSize(120, 50)
+        self.save_button.clicked.connect(self._save_transcriptions)
+        control_layout.addWidget(self.save_button)
+
+        control_layout.addStretch()
+
+        main_layout.addWidget(control_widget)
+
+    def _initialize_components(self):
+        """Initialize audio, noise suppression, and transcription components."""
+        # Update status
+        self.status_label.setText("⚙ Initializing...")
+
+        # Set device based on config
+        device_config = self.config.get('transcription.device', 'auto')
+        self.device_manager.set_device(device_config)
+
+        # Initialize transcription engine
+        model_size = self.config.get('transcription.model', 'base')
+        language = self.config.get('transcription.language', 'en')
+        device = self.device_manager.get_device_for_whisper()
+        compute_type = self.device_manager.get_compute_type()
+
+        # Track current settings
+        self.current_model_size = model_size
+        self.current_device_config = device_config
+
+        self.transcription_engine = TranscriptionEngine(
+            model_size=model_size,
+            device=device,
+            compute_type=compute_type,
+            language=language,
+            min_confidence=self.config.get('processing.min_confidence', 0.5)
+        )
+
+        # Load model in background thread
+        self.model_loader_thread = ModelLoaderThread(self.transcription_engine)
+        self.model_loader_thread.finished.connect(self._on_model_loaded)
+        self.model_loader_thread.start()
+
+    def _on_model_loaded(self, success: bool, message: str):
+        """Handle model loading completion."""
+        if success:
+            host = self.config.get('web_server.host', '127.0.0.1')
+            port = self.config.get('web_server.port', 8080)
+            self.status_label.setText(f"✓ Ready | Web: http://{host}:{port}")
+            self.start_button.setEnabled(True)
+        else:
+            self.status_label.setText("❌ Model loading failed")
+            QMessageBox.critical(self, "Error", message)
+            self.start_button.setEnabled(False)
+
+    def _start_web_server_if_enabled(self):
+        """Start web server."""
+        host = self.config.get('web_server.host', '127.0.0.1')
+        port = self.config.get('web_server.port', 8080)
+        show_timestamps = self.config.get('display.show_timestamps', True)
+        fade_after_seconds = self.config.get('display.fade_after_seconds', 10)
+
+        print(f"Starting web server at http://{host}:{port}")
+        self.web_server = TranscriptionWebServer(
+            host=host,
+            port=port,
+            show_timestamps=show_timestamps,
+            fade_after_seconds=fade_after_seconds
+        )
+        self.web_server_thread = WebServerThread(self.web_server)
+        self.web_server_thread.start()
+
+    def _toggle_transcription(self):
+        """Start or stop transcription."""
+        if not self.is_transcribing:
+            self._start_transcription()
+        else:
+            self._stop_transcription()
+
+    def _start_transcription(self):
+        """Start transcription."""
+        try:
+            # Check if engine is ready
+            if not self.transcription_engine or not self.transcription_engine.is_loaded:
+                QMessageBox.critical(self, "Error", "Transcription engine not ready")
+                return
+
+            # Get audio device
+            audio_device_str = self.config.get('audio.input_device', 'default')
+            audio_device = None if audio_device_str == 'default' else int(audio_device_str)
+
+            # Initialize audio capture
+            self.audio_capture = AudioCapture(
+                sample_rate=self.config.get('audio.sample_rate', 16000),
+                chunk_duration=self.config.get('audio.chunk_duration', 3.0),
+                device=audio_device
+            )
+
+            # Initialize noise suppressor
+            self.noise_suppressor = NoiseSuppressor(
+                sample_rate=self.config.get('audio.sample_rate', 16000),
+                method="noisereduce" if self.config.get('noise_suppression.enabled', True) else "none",
+                strength=self.config.get('noise_suppression.strength', 0.7),
+                use_vad=self.config.get('processing.use_vad', True)
+            )
+
+            # Start recording
+            self.audio_capture.start_recording(callback=self._process_audio_chunk)
+
+            # Update UI
+            self.is_transcribing = True
+            self.start_button.setText("⏸ Stop Transcription")
+            self.start_button.setStyleSheet("background-color: #e74c3c; color: white;")
+            self.status_label.setText("🔴 Recording...")
+
+        except Exception as e:
+            QMessageBox.critical(self, "Error", f"Failed to start transcription:\n{e}")
+            print(f"Error starting transcription: {e}")
+
+    def _stop_transcription(self):
+        """Stop transcription."""
+        try:
+            # Stop recording
+            if self.audio_capture:
+                self.audio_capture.stop_recording()
+
+            # Update UI
+            self.is_transcribing = False
+            self.start_button.setText("▶ Start Transcription")
+            self.start_button.setStyleSheet("background-color: #2ecc71; color: white;")
+            self.status_label.setText("✓ Ready")
+
+        except Exception as e:
+            QMessageBox.critical(self, "Error", f"Failed to stop transcription:\n{e}")
+            print(f"Error stopping transcription: {e}")
+
+    def _process_audio_chunk(self, audio_chunk):
+        """Process an audio chunk (noise suppression + transcription)."""
+        def process():
+            try:
+                # Apply noise suppression
+                processed_audio = self.noise_suppressor.process(audio_chunk, skip_silent=True)
+
+                # Skip if silent (VAD filtered it out)
+                if processed_audio is None:
+                    return
+
+                # Transcribe
+                user_name = self.config.get('user.name', 'User')
+                result = self.transcription_engine.transcribe(
+                    processed_audio,
+                    sample_rate=self.config.get('audio.sample_rate', 16000),
+                    user_name=user_name
+                )
+
+                # Display result (use Qt signal for thread safety)
+                if result:
+                    # We need to update UI from main thread
+                    # Note: We don't pass timestamp - let the display widget create it
+                    from PySide6.QtCore import QMetaObject, Q_ARG
+                    QMetaObject.invokeMethod(
+                        self.transcription_display,
+                        "add_transcription",
+                        Qt.QueuedConnection,
+                        Q_ARG(str, result.text),
+                        Q_ARG(str, result.user_name)
+                    )
+
+                    # Broadcast to web server if enabled
+                    if self.web_server and self.web_server_thread:
+                        asyncio.run_coroutine_threadsafe(
+                            self.web_server.broadcast_transcription(
+                                result.text,
+                                result.user_name,
+                                result.timestamp
+                            ),
+                            self.web_server_thread.loop
+                        )
+
+            except Exception as e:
+                print(f"Error processing audio: {e}")
+                import traceback
+                traceback.print_exc()
+
+        # Run in background thread
+        from threading import Thread
+        Thread(target=process, daemon=True).start()
+
+    def _clear_transcriptions(self):
+        """Clear all transcriptions."""
+        reply = QMessageBox.question(
+            self,
+            "Clear Transcriptions",
+            "Are you sure you want to clear all transcriptions?",
+            QMessageBox.Yes | QMessageBox.No
+        )
+
+        if reply == QMessageBox.Yes:
+            self.transcription_display.clear_all()
+
+    def _save_transcriptions(self):
+        """Save transcriptions to file."""
+        filepath, _ = QFileDialog.getSaveFileName(
+            self,
+            "Save Transcriptions",
+            "",
+            "Text files (*.txt);;All files (*.*)"
+        )
+
+        if filepath:
+            if self.transcription_display.save_to_file(filepath):
+                QMessageBox.information(self, "Saved", f"Transcriptions saved to:\n{filepath}")
+            else:
+                QMessageBox.critical(self, "Error", "Failed to save transcriptions")
+
+    def _open_settings(self):
+        """Open settings dialog."""
+        # Get audio devices
+        audio_devices = AudioCapture.get_input_devices()
+        if not audio_devices:
+            audio_devices = [(0, "Default")]
+
+        # Get compute devices
+        compute_devices = self.device_manager.get_device_info()
+        compute_devices.insert(0, ("auto", "Auto-detect"))
+
+        # Open settings dialog
+        dialog = SettingsDialog(
+            self,
+            self.config,
+            audio_devices,
+            compute_devices,
+            on_save=self._on_settings_saved
+        )
+        dialog.exec()
+
+    def _on_settings_saved(self):
+        """Handle settings being saved."""
+        # Update user label
+        user_name = self.config.get('user.name', 'User')
+        self.user_label.setText(f"User: {user_name}")
+
+        # Update display settings
+        show_timestamps = self.config.get('display.show_timestamps', True)
+        self.transcription_display.set_max_lines(self.config.get('display.max_lines', 100))
+        self.transcription_display.set_show_timestamps(show_timestamps)
+        self.transcription_display.set_font(
+            self.config.get('display.font_family', 'Courier'),
+            self.config.get('display.font_size', 12)
+        )
+
+        # Update web server settings
+        if self.web_server:
+            self.web_server.show_timestamps = show_timestamps
+            self.web_server.fade_after_seconds = self.config.get('display.fade_after_seconds', 10)
+
+        # Check if model/device settings changed - reload model if needed
+        new_model = self.config.get('transcription.model', 'base')
+        new_device_config = self.config.get('transcription.device', 'auto')
+
+        # Only reload if model size or device changed
+        if self.current_model_size != new_model or self.current_device_config != new_device_config:
+            self._reload_model()
+        else:
+            QMessageBox.information(self, "Settings Saved", "Settings have been applied successfully!")
+
+    def _reload_model(self):
+        """Reload the transcription model with new settings."""
+        # Stop transcription if running
+        was_transcribing = self.is_transcribing
+        if was_transcribing:
+            self._stop_transcription()
+
+        # Update status
+        self.status_label.setText("⚙ Reloading model...")
+        self.start_button.setEnabled(False)
+
+        # Unload current model
+        if self.transcription_engine:
+            self.transcription_engine.unload_model()
+
+        # Set device based on config
+        device_config = self.config.get('transcription.device', 'auto')
+        self.device_manager.set_device(device_config)
+
+        # Re-initialize transcription engine
+        model_size = self.config.get('transcription.model', 'base')
+        language = self.config.get('transcription.language', 'en')
+        device = self.device_manager.get_device_for_whisper()
+        compute_type = self.device_manager.get_compute_type()
+
+        # Update tracked settings
+        self.current_model_size = model_size
+        self.current_device_config = device_config
+
+        self.transcription_engine = TranscriptionEngine(
+            model_size=model_size,
+            device=device,
+            compute_type=compute_type,
+            language=language,
+            min_confidence=self.config.get('processing.min_confidence', 0.5)
+        )
+
+        # Load model in background thread
+        if self.model_loader_thread and self.model_loader_thread.isRunning():
+            self.model_loader_thread.wait()
+
+        self.model_loader_thread = ModelLoaderThread(self.transcription_engine)
+        self.model_loader_thread.finished.connect(self._on_model_reloaded)
+        self.model_loader_thread.start()
+
+    def _on_model_reloaded(self, success: bool, message: str):
+        """Handle model reloading completion."""
+        if success:
+            host = self.config.get('web_server.host', '127.0.0.1')
+            port = self.config.get('web_server.port', 8080)
+            self.status_label.setText(f"✓ Ready | Web: http://{host}:{port}")
+            self.start_button.setEnabled(True)
+            QMessageBox.information(self, "Settings Saved", "Model reloaded successfully with new settings!")
+        else:
+            self.status_label.setText("❌ Model loading failed")
+            QMessageBox.critical(self, "Error", f"Failed to reload model:\n{message}")
+            self.start_button.setEnabled(False)
+
+    def closeEvent(self, event):
+        """Handle window closing."""
+        # Stop transcription if running
+        if self.is_transcribing:
+            self._stop_transcription()
+
+        # Unload model
+        if self.transcription_engine:
+            self.transcription_engine.unload_model()
+
+        # Wait for model loader thread
+        if self.model_loader_thread and self.model_loader_thread.isRunning():
+            self.model_loader_thread.wait()
+
+        event.accept()
--- a/gui/settings_dialog.py
+++ b/gui/settings_dialog.py
@@ -0,0 +1,310 @@
+"""Settings dialog for configuring the application."""
+
+import customtkinter as ctk
+from tkinter import messagebox
+from typing import Callable, List, Tuple
+
+
+class SettingsDialog(ctk.CTkToplevel):
+    """Dialog window for application settings."""
+
+    def __init__(
+        self,
+        parent,
+        config,
+        audio_devices: List[Tuple[int, str]],
+        compute_devices: List[Tuple[str, str]],
+        on_save: Callable = None
+    ):
+        """
+        Initialize settings dialog.
+
+        Args:
+            parent: Parent window
+            config: Configuration object
+            audio_devices: List of (device_index, device_name) tuples
+            compute_devices: List of (device_id, device_description) tuples
+            on_save: Callback function when settings are saved
+        """
+        super().__init__(parent)
+
+        self.config = config
+        self.audio_devices = audio_devices
+        self.compute_devices = compute_devices
+        self.on_save = on_save
+
+        # Window configuration
+        self.title("Settings")
+        self.geometry("600x700")
+        self.resizable(False, False)
+
+        # Make dialog modal
+        self.transient(parent)
+        self.grab_set()
+
+        self._create_widgets()
+        self._load_current_settings()
+
+    def _create_widgets(self):
+        """Create all settings widgets."""
+        # Main container with padding
+        main_frame = ctk.CTkFrame(self)
+        main_frame.pack(fill="both", expand=True, padx=20, pady=20)
+
+        # User Settings Section
+        user_frame = ctk.CTkFrame(main_frame)
+        user_frame.pack(fill="x", pady=(0, 15))
+
+        ctk.CTkLabel(user_frame, text="User Settings", font=("", 16, "bold")).pack(
+            anchor="w", padx=10, pady=(10, 5)
+        )
+
+        # User name
+        name_frame = ctk.CTkFrame(user_frame)
+        name_frame.pack(fill="x", padx=10, pady=5)
+        ctk.CTkLabel(name_frame, text="Display Name:", width=150).pack(side="left", padx=5)
+        self.name_entry = ctk.CTkEntry(name_frame, width=300)
+        self.name_entry.pack(side="left", padx=5)
+
+        # Audio Settings Section
+        audio_frame = ctk.CTkFrame(main_frame)
+        audio_frame.pack(fill="x", pady=(0, 15))
+
+        ctk.CTkLabel(audio_frame, text="Audio Settings", font=("", 16, "bold")).pack(
+            anchor="w", padx=10, pady=(10, 5)
+        )
+
+        # Audio device
+        device_frame = ctk.CTkFrame(audio_frame)
+        device_frame.pack(fill="x", padx=10, pady=5)
+        ctk.CTkLabel(device_frame, text="Input Device:", width=150).pack(side="left", padx=5)
+        device_names = [name for _, name in self.audio_devices]
+        self.audio_device_menu = ctk.CTkOptionMenu(device_frame, values=device_names, width=300)
+        self.audio_device_menu.pack(side="left", padx=5)
+
+        # Chunk duration
+        chunk_frame = ctk.CTkFrame(audio_frame)
+        chunk_frame.pack(fill="x", padx=10, pady=5)
+        ctk.CTkLabel(chunk_frame, text="Chunk Duration (s):", width=150).pack(side="left", padx=5)
+        self.chunk_entry = ctk.CTkEntry(chunk_frame, width=100)
+        self.chunk_entry.pack(side="left", padx=5)
+
+        # Transcription Settings Section
+        transcription_frame = ctk.CTkFrame(main_frame)
+        transcription_frame.pack(fill="x", pady=(0, 15))
+
+        ctk.CTkLabel(transcription_frame, text="Transcription Settings", font=("", 16, "bold")).pack(
+            anchor="w", padx=10, pady=(10, 5)
+        )
+
+        # Model size
+        model_frame = ctk.CTkFrame(transcription_frame)
+        model_frame.pack(fill="x", padx=10, pady=5)
+        ctk.CTkLabel(model_frame, text="Model Size:", width=150).pack(side="left", padx=5)
+        self.model_menu = ctk.CTkOptionMenu(
+            model_frame,
+            values=["tiny", "base", "small", "medium", "large"],
+            width=200
+        )
+        self.model_menu.pack(side="left", padx=5)
+
+        # Compute device
+        compute_frame = ctk.CTkFrame(transcription_frame)
+        compute_frame.pack(fill="x", padx=10, pady=5)
+        ctk.CTkLabel(compute_frame, text="Compute Device:", width=150).pack(side="left", padx=5)
+        device_descs = [desc for _, desc in self.compute_devices]
+        self.compute_device_menu = ctk.CTkOptionMenu(compute_frame, values=device_descs, width=300)
+        self.compute_device_menu.pack(side="left", padx=5)
+
+        # Language
+        lang_frame = ctk.CTkFrame(transcription_frame)
+        lang_frame.pack(fill="x", padx=10, pady=5)
+        ctk.CTkLabel(lang_frame, text="Language:", width=150).pack(side="left", padx=5)
+        self.lang_menu = ctk.CTkOptionMenu(
+            lang_frame,
+            values=["auto", "en", "es", "fr", "de", "it", "pt", "ru", "zh", "ja", "ko"],
+            width=200
+        )
+        self.lang_menu.pack(side="left", padx=5)
+
+        # Noise Suppression Section
+        noise_frame = ctk.CTkFrame(main_frame)
+        noise_frame.pack(fill="x", pady=(0, 15))
+
+        ctk.CTkLabel(noise_frame, text="Noise Suppression", font=("", 16, "bold")).pack(
+            anchor="w", padx=10, pady=(10, 5)
+        )
+
+        # Enable noise suppression
+        ns_enable_frame = ctk.CTkFrame(noise_frame)
+        ns_enable_frame.pack(fill="x", padx=10, pady=5)
+        self.noise_enabled_var = ctk.BooleanVar()
+        self.noise_enabled_check = ctk.CTkCheckBox(
+            ns_enable_frame,
+            text="Enable Noise Suppression",
+            variable=self.noise_enabled_var
+        )
+        self.noise_enabled_check.pack(side="left", padx=5)
+
+        # Noise suppression strength
+        strength_frame = ctk.CTkFrame(noise_frame)
+        strength_frame.pack(fill="x", padx=10, pady=5)
+        ctk.CTkLabel(strength_frame, text="Strength:", width=150).pack(side="left", padx=5)
+        self.noise_strength_slider = ctk.CTkSlider(
+            strength_frame,
+            from_=0.0,
+            to=1.0,
+            number_of_steps=20,
+            width=300
+        )
+        self.noise_strength_slider.pack(side="left", padx=5)
+        self.noise_strength_label = ctk.CTkLabel(strength_frame, text="0.7", width=40)
+        self.noise_strength_label.pack(side="left", padx=5)
+        self.noise_strength_slider.configure(command=self._update_strength_label)
+
+        # VAD
+        vad_frame = ctk.CTkFrame(noise_frame)
+        vad_frame.pack(fill="x", padx=10, pady=5)
+        self.vad_enabled_var = ctk.BooleanVar()
+        self.vad_enabled_check = ctk.CTkCheckBox(
+            vad_frame,
+            text="Enable Voice Activity Detection",
+            variable=self.vad_enabled_var
+        )
+        self.vad_enabled_check.pack(side="left", padx=5)
+
+        # Display Settings Section
+        display_frame = ctk.CTkFrame(main_frame)
+        display_frame.pack(fill="x", pady=(0, 15))
+
+        ctk.CTkLabel(display_frame, text="Display Settings", font=("", 16, "bold")).pack(
+            anchor="w", padx=10, pady=(10, 5)
+        )
+
+        # Show timestamps
+        ts_frame = ctk.CTkFrame(display_frame)
+        ts_frame.pack(fill="x", padx=10, pady=5)
+        self.timestamps_var = ctk.BooleanVar()
+        self.timestamps_check = ctk.CTkCheckBox(
+            ts_frame,
+            text="Show Timestamps",
+            variable=self.timestamps_var
+        )
+        self.timestamps_check.pack(side="left", padx=5)
+
+        # Max lines
+        maxlines_frame = ctk.CTkFrame(display_frame)
+        maxlines_frame.pack(fill="x", padx=10, pady=5)
+        ctk.CTkLabel(maxlines_frame, text="Max Lines:", width=150).pack(side="left", padx=5)
+        self.maxlines_entry = ctk.CTkEntry(maxlines_frame, width=100)
+        self.maxlines_entry.pack(side="left", padx=5)
+
+        # Buttons
+        button_frame = ctk.CTkFrame(main_frame)
+        button_frame.pack(fill="x", pady=(10, 0))
+
+        self.save_button = ctk.CTkButton(
+            button_frame,
+            text="Save",
+            command=self._save_settings,
+            width=120
+        )
+        self.save_button.pack(side="right", padx=5)
+
+        self.cancel_button = ctk.CTkButton(
+            button_frame,
+            text="Cancel",
+            command=self.destroy,
+            width=120,
+            fg_color="gray"
+        )
+        self.cancel_button.pack(side="right", padx=5)
+
+    def _update_strength_label(self, value):
+        """Update the noise strength label."""
+        self.noise_strength_label.configure(text=f"{value:.1f}")
+
+    def _load_current_settings(self):
+        """Load current settings from config."""
+        # User settings
+        self.name_entry.insert(0, self.config.get('user.name', 'User'))
+
+        # Audio settings
+        current_device = self.config.get('audio.input_device', 'default')
+        for idx, (dev_idx, dev_name) in enumerate(self.audio_devices):
+            if str(dev_idx) == current_device or current_device == 'default' and idx == 0:
+                self.audio_device_menu.set(dev_name)
+                break
+
+        self.chunk_entry.insert(0, str(self.config.get('audio.chunk_duration', 3.0)))
+
+        # Transcription settings
+        self.model_menu.set(self.config.get('transcription.model', 'base'))
+
+        current_compute = self.config.get('transcription.device', 'auto')
+        for dev_id, dev_desc in self.compute_devices:
+            if dev_id == current_compute or (current_compute == 'auto' and dev_id == self.compute_devices[0][0]):
+                self.compute_device_menu.set(dev_desc)
+                break
+
+        self.lang_menu.set(self.config.get('transcription.language', 'en'))
+
+        # Noise suppression
+        self.noise_enabled_var.set(self.config.get('noise_suppression.enabled', True))
+        strength = self.config.get('noise_suppression.strength', 0.7)
+        self.noise_strength_slider.set(strength)
+        self._update_strength_label(strength)
+        self.vad_enabled_var.set(self.config.get('processing.use_vad', True))
+
+        # Display settings
+        self.timestamps_var.set(self.config.get('display.show_timestamps', True))
+        self.maxlines_entry.insert(0, str(self.config.get('display.max_lines', 100)))
+
+    def _save_settings(self):
+        """Save settings to config."""
+        try:
+            # User settings
+            self.config.set('user.name', self.name_entry.get())
+
+            # Audio settings
+            selected_audio = self.audio_device_menu.get()
+            for dev_idx, dev_name in self.audio_devices:
+                if dev_name == selected_audio:
+                    self.config.set('audio.input_device', str(dev_idx))
+                    break
+
+            chunk_duration = float(self.chunk_entry.get())
+            self.config.set('audio.chunk_duration', chunk_duration)
+
+            # Transcription settings
+            self.config.set('transcription.model', self.model_menu.get())
+
+            selected_compute = self.compute_device_menu.get()
+            for dev_id, dev_desc in self.compute_devices:
+                if dev_desc == selected_compute:
+                    self.config.set('transcription.device', dev_id)
+                    break
+
+            self.config.set('transcription.language', self.lang_menu.get())
+
+            # Noise suppression
+            self.config.set('noise_suppression.enabled', self.noise_enabled_var.get())
+            self.config.set('noise_suppression.strength', self.noise_strength_slider.get())
+            self.config.set('processing.use_vad', self.vad_enabled_var.get())
+
+            # Display settings
+            self.config.set('display.show_timestamps', self.timestamps_var.get())
+            max_lines = int(self.maxlines_entry.get())
+            self.config.set('display.max_lines', max_lines)
+
+            # Call save callback
+            if self.on_save:
+                self.on_save()
+
+            messagebox.showinfo("Settings Saved", "Settings have been saved successfully!")
+            self.destroy()
+
+        except ValueError as e:
+            messagebox.showerror("Invalid Input", f"Please check your input values:\n{e}")
+        except Exception as e:
+            messagebox.showerror("Error", f"Failed to save settings:\n{e}")
--- a/gui/settings_dialog_qt.py
+++ b/gui/settings_dialog_qt.py
@@ -0,0 +1,261 @@
+"""PySide6 settings dialog for configuring the application."""
+
+from PySide6.QtWidgets import (
+    QDialog, QVBoxLayout, QHBoxLayout, QFormLayout,
+    QLabel, QLineEdit, QComboBox, QCheckBox, QSlider,
+    QPushButton, QMessageBox, QGroupBox
+)
+from PySide6.QtCore import Qt
+from typing import Callable, List, Tuple
+
+
+class SettingsDialog(QDialog):
+    """Dialog window for application settings using PySide6."""
+
+    def __init__(
+        self,
+        parent,
+        config,
+        audio_devices: List[Tuple[int, str]],
+        compute_devices: List[Tuple[str, str]],
+        on_save: Callable = None
+    ):
+        """
+        Initialize settings dialog.
+
+        Args:
+            parent: Parent window
+            config: Configuration object
+            audio_devices: List of (device_index, device_name) tuples
+            compute_devices: List of (device_id, device_description) tuples
+            on_save: Callback function when settings are saved
+        """
+        super().__init__(parent)
+
+        self.config = config
+        self.audio_devices = audio_devices
+        self.compute_devices = compute_devices
+        self.on_save = on_save
+
+        # Window configuration
+        self.setWindowTitle("Settings")
+        self.setMinimumSize(600, 700)
+        self.setModal(True)
+
+        self._create_widgets()
+        self._load_current_settings()
+
+    def _create_widgets(self):
+        """Create all settings widgets."""
+        main_layout = QVBoxLayout()
+        self.setLayout(main_layout)
+
+        # User Settings Group
+        user_group = QGroupBox("User Settings")
+        user_layout = QFormLayout()
+
+        self.name_input = QLineEdit()
+        user_layout.addRow("Display Name:", self.name_input)
+
+        user_group.setLayout(user_layout)
+        main_layout.addWidget(user_group)
+
+        # Audio Settings Group
+        audio_group = QGroupBox("Audio Settings")
+        audio_layout = QFormLayout()
+
+        self.audio_device_combo = QComboBox()
+        device_names = [name for _, name in self.audio_devices]
+        self.audio_device_combo.addItems(device_names)
+        audio_layout.addRow("Input Device:", self.audio_device_combo)
+
+        self.chunk_input = QLineEdit()
+        audio_layout.addRow("Chunk Duration (s):", self.chunk_input)
+
+        audio_group.setLayout(audio_layout)
+        main_layout.addWidget(audio_group)
+
+        # Transcription Settings Group
+        transcription_group = QGroupBox("Transcription Settings")
+        transcription_layout = QFormLayout()
+
+        self.model_combo = QComboBox()
+        self.model_combo.addItems(["tiny", "base", "small", "medium", "large"])
+        transcription_layout.addRow("Model Size:", self.model_combo)
+
+        self.compute_device_combo = QComboBox()
+        device_descs = [desc for _, desc in self.compute_devices]
+        self.compute_device_combo.addItems(device_descs)
+        transcription_layout.addRow("Compute Device:", self.compute_device_combo)
+
+        self.lang_combo = QComboBox()
+        self.lang_combo.addItems(["auto", "en", "es", "fr", "de", "it", "pt", "ru", "zh", "ja", "ko"])
+        transcription_layout.addRow("Language:", self.lang_combo)
+
+        transcription_group.setLayout(transcription_layout)
+        main_layout.addWidget(transcription_group)
+
+        # Noise Suppression Group
+        noise_group = QGroupBox("Noise Suppression")
+        noise_layout = QVBoxLayout()
+
+        self.noise_enabled_check = QCheckBox("Enable Noise Suppression")
+        noise_layout.addWidget(self.noise_enabled_check)
+
+        # Strength slider
+        strength_layout = QHBoxLayout()
+        strength_layout.addWidget(QLabel("Strength:"))
+
+        self.noise_strength_slider = QSlider(Qt.Horizontal)
+        self.noise_strength_slider.setMinimum(0)
+        self.noise_strength_slider.setMaximum(100)
+        self.noise_strength_slider.setValue(70)
+        self.noise_strength_slider.valueChanged.connect(self._update_strength_label)
+        strength_layout.addWidget(self.noise_strength_slider)
+
+        self.noise_strength_label = QLabel("0.7")
+        strength_layout.addWidget(self.noise_strength_label)
+
+        noise_layout.addLayout(strength_layout)
+
+        self.vad_enabled_check = QCheckBox("Enable Voice Activity Detection")
+        noise_layout.addWidget(self.vad_enabled_check)
+
+        noise_group.setLayout(noise_layout)
+        main_layout.addWidget(noise_group)
+
+        # Display Settings Group
+        display_group = QGroupBox("Display Settings")
+        display_layout = QFormLayout()
+
+        self.timestamps_check = QCheckBox()
+        display_layout.addRow("Show Timestamps:", self.timestamps_check)
+
+        self.maxlines_input = QLineEdit()
+        display_layout.addRow("Max Lines:", self.maxlines_input)
+
+        self.font_family_combo = QComboBox()
+        self.font_family_combo.addItems(["Courier", "Arial", "Times New Roman", "Consolas", "Monaco", "Monospace"])
+        display_layout.addRow("Font Family:", self.font_family_combo)
+
+        self.font_size_input = QLineEdit()
+        display_layout.addRow("Font Size:", self.font_size_input)
+
+        self.fade_seconds_input = QLineEdit()
+        display_layout.addRow("Fade After (seconds):", self.fade_seconds_input)
+
+        display_group.setLayout(display_layout)
+        main_layout.addWidget(display_group)
+
+        # Buttons
+        button_layout = QHBoxLayout()
+        button_layout.addStretch()
+
+        self.cancel_button = QPushButton("Cancel")
+        self.cancel_button.clicked.connect(self.reject)
+        button_layout.addWidget(self.cancel_button)
+
+        self.save_button = QPushButton("Save")
+        self.save_button.clicked.connect(self._save_settings)
+        self.save_button.setDefault(True)
+        button_layout.addWidget(self.save_button)
+
+        main_layout.addLayout(button_layout)
+
+    def _update_strength_label(self, value):
+        """Update the noise strength label."""
+        self.noise_strength_label.setText(f"{value / 100:.1f}")
+
+    def _load_current_settings(self):
+        """Load current settings from config."""
+        # User settings
+        self.name_input.setText(self.config.get('user.name', 'User'))
+
+        # Audio settings
+        current_device = self.config.get('audio.input_device', 'default')
+        for idx, (dev_idx, dev_name) in enumerate(self.audio_devices):
+            if str(dev_idx) == current_device or (current_device == 'default' and idx == 0):
+                self.audio_device_combo.setCurrentIndex(idx)
+                break
+
+        self.chunk_input.setText(str(self.config.get('audio.chunk_duration', 3.0)))
+
+        # Transcription settings
+        model = self.config.get('transcription.model', 'base')
+        self.model_combo.setCurrentText(model)
+
+        current_compute = self.config.get('transcription.device', 'auto')
+        for idx, (dev_id, dev_desc) in enumerate(self.compute_devices):
+            if dev_id == current_compute or (current_compute == 'auto' and idx == 0):
+                self.compute_device_combo.setCurrentIndex(idx)
+                break
+
+        lang = self.config.get('transcription.language', 'en')
+        self.lang_combo.setCurrentText(lang)
+
+        # Noise suppression
+        self.noise_enabled_check.setChecked(self.config.get('noise_suppression.enabled', True))
+        strength = self.config.get('noise_suppression.strength', 0.7)
+        self.noise_strength_slider.setValue(int(strength * 100))
+        self._update_strength_label(int(strength * 100))
+        self.vad_enabled_check.setChecked(self.config.get('processing.use_vad', True))
+
+        # Display settings
+        self.timestamps_check.setChecked(self.config.get('display.show_timestamps', True))
+        self.maxlines_input.setText(str(self.config.get('display.max_lines', 100)))
+
+        font_family = self.config.get('display.font_family', 'Courier')
+        self.font_family_combo.setCurrentText(font_family)
+
+        self.font_size_input.setText(str(self.config.get('display.font_size', 12)))
+        self.fade_seconds_input.setText(str(self.config.get('display.fade_after_seconds', 10)))
+
+    def _save_settings(self):
+        """Save settings to config."""
+        try:
+            # User settings
+            self.config.set('user.name', self.name_input.text())
+
+            # Audio settings
+            selected_audio_idx = self.audio_device_combo.currentIndex()
+            dev_idx, _ = self.audio_devices[selected_audio_idx]
+            self.config.set('audio.input_device', str(dev_idx))
+
+            chunk_duration = float(self.chunk_input.text())
+            self.config.set('audio.chunk_duration', chunk_duration)
+
+            # Transcription settings
+            self.config.set('transcription.model', self.model_combo.currentText())
+
+            selected_compute_idx = self.compute_device_combo.currentIndex()
+            dev_id, _ = self.compute_devices[selected_compute_idx]
+            self.config.set('transcription.device', dev_id)
+
+            self.config.set('transcription.language', self.lang_combo.currentText())
+
+            # Noise suppression
+            self.config.set('noise_suppression.enabled', self.noise_enabled_check.isChecked())
+            self.config.set('noise_suppression.strength', self.noise_strength_slider.value() / 100.0)
+            self.config.set('processing.use_vad', self.vad_enabled_check.isChecked())
+
+            # Display settings
+            self.config.set('display.show_timestamps', self.timestamps_check.isChecked())
+            max_lines = int(self.maxlines_input.text())
+            self.config.set('display.max_lines', max_lines)
+            self.config.set('display.font_family', self.font_family_combo.currentText())
+            font_size = int(self.font_size_input.text())
+            self.config.set('display.font_size', font_size)
+            fade_seconds = int(self.fade_seconds_input.text())
+            self.config.set('display.fade_after_seconds', fade_seconds)
+
+            # Call save callback
+            if self.on_save:
+                self.on_save()
+
+            QMessageBox.information(self, "Settings Saved", "Settings have been saved successfully!")
+            self.accept()
+
+        except ValueError as e:
+            QMessageBox.critical(self, "Invalid Input", f"Please check your input values:\n{e}")
+        except Exception as e:
+            QMessageBox.critical(self, "Error", f"Failed to save settings:\n{e}")
--- a/gui/transcription_display.py
+++ b/gui/transcription_display.py
@@ -0,0 +1,127 @@
+"""Transcription display widget for showing real-time transcriptions."""
+
+import customtkinter as ctk
+from typing import List
+from datetime import datetime
+
+
+class TranscriptionDisplay(ctk.CTkTextbox):
+    """Custom text widget for displaying transcriptions."""
+
+    def __init__(self, master, max_lines: int = 100, show_timestamps: bool = True, **kwargs):
+        """
+        Initialize transcription display.
+
+        Args:
+            master: Parent widget
+            max_lines: Maximum number of lines to keep in display
+            show_timestamps: Whether to show timestamps
+            **kwargs: Additional arguments for CTkTextbox
+        """
+        super().__init__(master, **kwargs)
+
+        self.max_lines = max_lines
+        self.show_timestamps = show_timestamps
+        self.line_count = 0
+
+        # Configure text widget
+        self.configure(state="disabled")  # Read-only by default
+
+    def add_transcription(self, text: str, user_name: str = "", timestamp: datetime = None):
+        """
+        Add a new transcription to the display.
+
+        Args:
+            text: Transcription text
+            user_name: User/speaker name
+            timestamp: Timestamp of transcription
+        """
+        if timestamp is None:
+            timestamp = datetime.now()
+
+        # Build the display line
+        line_parts = []
+
+        if self.show_timestamps:
+            time_str = timestamp.strftime("%H:%M:%S")
+            line_parts.append(f"[{time_str}]")
+
+        if user_name:
+            line_parts.append(f"{user_name}:")
+
+        line_parts.append(text)
+
+        line = " ".join(line_parts) + "\n"
+
+        # Add to display
+        self.configure(state="normal")
+        self.insert("end", line)
+        self.configure(state="disabled")
+
+        # Auto-scroll to bottom
+        self.see("end")
+
+        # Track line count
+        self.line_count += 1
+
+        # Remove old lines if exceeding max
+        if self.line_count > self.max_lines:
+            self._remove_oldest_lines(self.line_count - self.max_lines)
+
+    def _remove_oldest_lines(self, num_lines: int):
+        """
+        Remove oldest lines from the display.
+
+        Args:
+            num_lines: Number of lines to remove
+        """
+        self.configure(state="normal")
+        self.delete("1.0", f"{num_lines + 1}.0")
+        self.configure(state="disabled")
+        self.line_count -= num_lines
+
+    def clear(self):
+        """Clear all transcriptions."""
+        self.configure(state="normal")
+        self.delete("1.0", "end")
+        self.configure(state="disabled")
+        self.line_count = 0
+
+    def get_all_text(self) -> str:
+        """
+        Get all transcription text.
+
+        Returns:
+            All text in the display
+        """
+        return self.get("1.0", "end")
+
+    def set_max_lines(self, max_lines: int):
+        """Update maximum number of lines to keep."""
+        self.max_lines = max_lines
+
+        # Trim if necessary
+        if self.line_count > self.max_lines:
+            self._remove_oldest_lines(self.line_count - self.max_lines)
+
+    def set_show_timestamps(self, show: bool):
+        """Update whether to show timestamps."""
+        self.show_timestamps = show
+
+    def save_to_file(self, filepath: str) -> bool:
+        """
+        Save transcriptions to a file.
+
+        Args:
+            filepath: Path to save file
+
+        Returns:
+            True if saved successfully
+        """
+        try:
+            with open(filepath, 'w') as f:
+                f.write(self.get_all_text())
+            return True
+        except Exception as e:
+            print(f"Error saving transcriptions: {e}")
+            return False
--- a/gui/transcription_display_qt.py
+++ b/gui/transcription_display_qt.py
@@ -0,0 +1,159 @@
+"""PySide6 transcription display widget for showing real-time transcriptions."""
+
+from PySide6.QtWidgets import QTextEdit
+from PySide6.QtGui import QFont, QTextCursor
+from PySide6.QtCore import Qt, Slot
+from datetime import datetime
+
+
+class TranscriptionDisplay(QTextEdit):
+    """Custom text widget for displaying transcriptions using PySide6."""
+
+    def __init__(self, parent=None, max_lines=100, show_timestamps=True, font_family="Courier", font_size=12):
+        """
+        Initialize transcription display.
+
+        Args:
+            parent: Parent widget
+            max_lines: Maximum number of lines to keep in display
+            show_timestamps: Whether to show timestamps
+            font_family: Font family name
+            font_size: Font size in points
+        """
+        super().__init__(parent)
+
+        self.max_lines = max_lines
+        self.show_timestamps = show_timestamps
+        self.line_count = 0
+        self.font_family = font_family
+        self.font_size = font_size
+
+        # Configure text widget
+        self.setReadOnly(True)
+        self.setFont(QFont(font_family, font_size))
+
+        # Set dark theme styling
+        self.setStyleSheet("""
+            QTextEdit {
+                background-color: #2b2b2b;
+                color: #ffffff;
+                border: 1px solid #3d3d3d;
+                border-radius: 5px;
+                padding: 10px;
+            }
+        """)
+
+    @Slot(str, str)
+    def add_transcription(self, text: str, user_name: str = "", timestamp: datetime = None):
+        """
+        Add a new transcription to the display.
+
+        Args:
+            text: Transcription text
+            user_name: User/speaker name
+            timestamp: Timestamp of transcription
+        """
+        if timestamp is None:
+            timestamp = datetime.now()
+
+        # Build the display line
+        line_parts = []
+
+        if self.show_timestamps:
+            time_str = timestamp.strftime("%H:%M:%S")
+            line_parts.append(f"[{time_str}]")
+
+        if user_name:
+            line_parts.append(f"{user_name}:")
+
+        line_parts.append(text)
+
+        line = " ".join(line_parts)
+
+        # Add to display
+        self.append(line)
+
+        # Auto-scroll to bottom
+        cursor = self.textCursor()
+        cursor.movePosition(QTextCursor.End)
+        self.setTextCursor(cursor)
+
+        # Track line count
+        self.line_count += 1
+
+        # Remove old lines if exceeding max
+        if self.line_count > self.max_lines:
+            self._remove_oldest_lines(self.line_count - self.max_lines)
+
+    def _remove_oldest_lines(self, num_lines: int):
+        """
+        Remove oldest lines from the display.
+
+        Args:
+            num_lines: Number of lines to remove
+        """
+        cursor = self.textCursor()
+        cursor.movePosition(QTextCursor.Start)
+
+        for _ in range(num_lines):
+            cursor.select(QTextCursor.BlockUnderCursor)
+            cursor.removeSelectedText()
+            cursor.deleteChar()  # Remove the newline
+
+        self.line_count -= num_lines
+
+    def clear_all(self):
+        """Clear all transcriptions."""
+        self.clear()
+        self.line_count = 0
+
+    def get_all_text(self) -> str:
+        """
+        Get all transcription text.
+
+        Returns:
+            All text in the display
+        """
+        return self.toPlainText()
+
+    def set_max_lines(self, max_lines: int):
+        """Update maximum number of lines to keep."""
+        self.max_lines = max_lines
+
+        # Trim if necessary
+        if self.line_count > self.max_lines:
+            self._remove_oldest_lines(self.line_count - self.max_lines)
+
+    def set_show_timestamps(self, show: bool):
+        """Update whether to show timestamps."""
+        self.show_timestamps = show
+
+    def set_font(self, font_family: str, font_size: int):
+        """
+        Update font settings.
+
+        Args:
+            font_family: Font family name
+            font_size: Font size in points
+        """
+        self.font_family = font_family
+        self.font_size = font_size
+        super().setFont(QFont(font_family, font_size))
+
+    def save_to_file(self, filepath: str) -> bool:
+        """
+        Save transcriptions to a file.
+
+        Args:
+            filepath: Path to save file
+
+        Returns:
+            True if saved successfully
+        """
+        try:
+            with open(filepath, 'w') as f:
+                f.write(self.toPlainText())
+            return True
+        except Exception as e:
+            print(f"Error saving transcriptions: {e}")
+            return False
--- a/local-transcription.spec
+++ b/local-transcription.spec
@@ -0,0 +1,86 @@
+# -*- mode: python ; coding: utf-8 -*-
+"""PyInstaller spec file for Local Transcription app."""
+
+import sys
+from pathlib import Path
+
+block_cipher = None
+
+# Determine if we're on Windows
+is_windows = sys.platform == 'win32'
+
+a = Analysis(
+    ['main.py'],
+    pathex=[],
+    binaries=[],
+    datas=[
+        ('config/default_config.yaml', 'config'),
+    ],
+    hiddenimports=[
+        'PySide6.QtCore',
+        'PySide6.QtWidgets',
+        'PySide6.QtGui',
+        'faster_whisper',
+        'faster_whisper.transcribe',
+        'faster_whisper.vad',
+        'ctranslate2',
+        'sounddevice',
+        'noisereduce',
+        'webrtcvad',
+        'scipy',
+        'scipy.signal',
+        'numpy',
+        'fastapi',
+        'uvicorn',
+        'uvicorn.logging',
+        'uvicorn.loops',
+        'uvicorn.loops.auto',
+        'uvicorn.protocols',
+        'uvicorn.protocols.http',
+        'uvicorn.protocols.http.auto',
+        'uvicorn.protocols.websockets',
+        'uvicorn.protocols.websockets.auto',
+        'uvicorn.lifespan',
+        'uvicorn.lifespan.on',
+    ],
+    hookspath=[],
+    hooksconfig={},
+    runtime_hooks=[],
+    excludes=[],
+    win_no_prefer_redirects=False,
+    win_private_assemblies=False,
+    cipher=block_cipher,
+    noarchive=False,
+)
+
+pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
+
+exe = EXE(
+    pyz,
+    a.scripts,
+    [],
+    exclude_binaries=True,
+    name='LocalTranscription',
+    debug=False,
+    bootloader_ignore_signals=False,
+    strip=False,
+    upx=True,
+    console=True,  # Set to False to hide console window
+    disable_windowed_traceback=False,
+    argv_emulation=False,
+    target_arch=None,
+    codesign_identity=None,
+    entitlements_file=None,
+    icon=None,  # Add icon file path here if you have one
+)
+
+coll = COLLECT(
+    exe,
+    a.binaries,
+    a.zipfiles,
+    a.datas,
+    strip=False,
+    upx=True,
+    upx_exclude=[],
+    name='LocalTranscription',
+)
--- a/main.py
+++ b/main.py
@@ -0,0 +1,52 @@
+#!/usr/bin/env python3
+"""
+Local Transcription Application
+
+A standalone desktop application for real-time speech-to-text transcription
+using Whisper models. Supports CPU/GPU processing, noise suppression, and
+optional multi-user server synchronization.
+"""
+
+import sys
+from pathlib import Path
+
+# Add project root to Python path
+project_root = Path(__file__).parent
+sys.path.insert(0, str(project_root))
+
+from PySide6.QtWidgets import QApplication
+from gui.main_window_qt import MainWindow
+
+
+def main():
+    """Main application entry point."""
+    try:
+        print("Starting Local Transcription Application...")
+        print("=" * 50)
+
+        # Create Qt application
+        app = QApplication(sys.argv)
+
+        # Set application info
+        app.setApplicationName("Local Transcription")
+        app.setOrganizationName("LocalTranscription")
+
+        # Create and show main window
+        window = MainWindow()
+        window.show()
+
+        # Run application
+        sys.exit(app.exec())
+
+    except KeyboardInterrupt:
+        print("\nApplication interrupted by user")
+        sys.exit(0)
+    except Exception as e:
+        print(f"Fatal error: {e}")
+        import traceback
+        traceback.print_exc()
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/main_cli.py
+++ b/main_cli.py
@@ -0,0 +1,221 @@
+#!/usr/bin/env python3
+"""
+Local Transcription CLI
+
+Command-line version of the transcription application.
+Works without GUI - perfect for testing and headless operation.
+"""
+
+import sys
+import os
+from pathlib import Path
+import signal
+import argparse
+
+# Add project root to Python path
+project_root = Path(__file__).parent
+sys.path.insert(0, str(project_root))
+
+from client.config import Config
+from client.device_utils import DeviceManager
+from client.audio_capture import AudioCapture
+from client.noise_suppression import NoiseSuppressor
+from client.transcription_engine import TranscriptionEngine
+
+
+class TranscriptionCLI:
+    """CLI transcription application."""
+
+    def __init__(self, args):
+        """Initialize the CLI application."""
+        self.args = args
+        self.config = Config()
+        self.device_manager = DeviceManager()
+        self.is_running = False
+
+        # Override config with command-line arguments
+        if args.model:
+            self.config.set('transcription.model', args.model)
+        if args.device:
+            self.config.set('transcription.device', args.device)
+        if args.language:
+            self.config.set('transcription.language', args.language)
+        if args.user:
+            self.config.set('user.name', args.user)
+
+        # Components
+        self.audio_capture = None
+        self.noise_suppressor = None
+        self.transcription_engine = None
+
+    def initialize(self):
+        """Initialize all components."""
+        print("=" * 60)
+        print("Local Transcription CLI")
+        print("=" * 60)
+
+        # Device setup
+        device_config = self.config.get('transcription.device', 'auto')
+        self.device_manager.set_device(device_config)
+
+        print(f"\nUser: {self.config.get('user.name', 'User')}")
+        print(f"Model: {self.config.get('transcription.model', 'base')}")
+        print(f"Language: {self.config.get('transcription.language', 'en')}")
+        print(f"Device: {self.device_manager.current_device}")
+
+        # Initialize transcription engine
+        print(f"\nLoading Whisper model...")
+        model_size = self.config.get('transcription.model', 'base')
+        language = self.config.get('transcription.language', 'en')
+        device = self.device_manager.get_device_for_whisper()
+        compute_type = self.device_manager.get_compute_type()
+
+        self.transcription_engine = TranscriptionEngine(
+            model_size=model_size,
+            device=device,
+            compute_type=compute_type,
+            language=language,
+            min_confidence=self.config.get('processing.min_confidence', 0.5)
+        )
+
+        success = self.transcription_engine.load_model()
+        if not success:
+            print("❌ Failed to load model!")
+            return False
+
+        print("✓ Model loaded successfully!")
+
+        # Initialize audio capture
+        audio_device_str = self.config.get('audio.input_device', 'default')
+        audio_device = None if audio_device_str == 'default' else int(audio_device_str)
+
+        self.audio_capture = AudioCapture(
+            sample_rate=self.config.get('audio.sample_rate', 16000),
+            chunk_duration=self.config.get('audio.chunk_duration', 3.0),
+            device=audio_device
+        )
+
+        # Initialize noise suppressor
+        self.noise_suppressor = NoiseSuppressor(
+            sample_rate=self.config.get('audio.sample_rate', 16000),
+            method="noisereduce" if self.config.get('noise_suppression.enabled', True) else "none",
+            strength=self.config.get('noise_suppression.strength', 0.7),
+            use_vad=self.config.get('processing.use_vad', True)
+        )
+
+        print("\n✓ All components initialized!")
+        return True
+
+    def process_audio_chunk(self, audio_chunk):
+        """Process an audio chunk."""
+        try:
+            # Apply noise suppression
+            processed_audio = self.noise_suppressor.process(audio_chunk, skip_silent=True)
+
+            # Skip if silent
+            if processed_audio is None:
+                return
+
+            # Transcribe
+            user_name = self.config.get('user.name', 'User')
+            result = self.transcription_engine.transcribe(
+                processed_audio,
+                sample_rate=self.config.get('audio.sample_rate', 16000),
+                user_name=user_name
+            )
+
+            # Display result
+            if result:
+                print(f"{result}")
+
+        except Exception as e:
+            print(f"Error processing audio: {e}")
+
+    def run(self):
+        """Run the transcription loop."""
+        if not self.initialize():
+            return 1
+
+        # Setup signal handler for graceful shutdown
+        def signal_handler(sig, frame):
+            print("\n\nStopping transcription...")
+            self.is_running = False
+
+        signal.signal(signal.SIGINT, signal_handler)
+
+        print("\n" + "=" * 60)
+        print("🎤 Recording... (Press Ctrl+C to stop)")
+        print("=" * 60)
+        print()
+
+        # Start recording
+        self.is_running = True
+        self.audio_capture.start_recording(callback=self.process_audio_chunk)
+
+        # Keep running until interrupted
+        try:
+            while self.is_running:
+                signal.pause()
+        except AttributeError:
+            # signal.pause() not available on Windows
+            import time
+            while self.is_running:
+                time.sleep(0.1)
+
+        # Cleanup
+        self.audio_capture.stop_recording()
+        self.transcription_engine.unload_model()
+
+        print("\n" + "=" * 60)
+        print("✓ Transcription stopped")
+        print("=" * 60)
+
+        return 0
+
+
+def main():
+    """Main entry point."""
+    parser = argparse.ArgumentParser(
+        description='Local Transcription CLI - Real-time speech-to-text'
+    )
+    parser.add_argument(
+        '-m', '--model',
+        choices=['tiny', 'base', 'small', 'medium', 'large'],
+        help='Whisper model size'
+    )
+    parser.add_argument(
+        '-d', '--device',
+        choices=['cpu', 'cuda', 'auto'],
+        help='Compute device'
+    )
+    parser.add_argument(
+        '-l', '--language',
+        help='Language code (e.g., en, es, fr) or "auto"'
+    )
+    parser.add_argument(
+        '-u', '--user',
+        help='User/speaker name'
+    )
+    parser.add_argument(
+        '--list-devices',
+        action='store_true',
+        help='List available audio input devices'
+    )
+
+    args = parser.parse_args()
+
+    # List devices if requested
+    if args.list_devices:
+        print("Available audio input devices:")
+        devices = AudioCapture.get_input_devices()
+        for idx, name in devices:
+            print(f"  [{idx}] {name}")
+        return 0
+
+    # Run application
+    app = TranscriptionCLI(args)
+    return app.run()
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,59 @@
+[project]
+name = "local-transcription"
+version = "0.1.0"
+description = "A standalone desktop application for real-time speech-to-text transcription using Whisper models"
+readme = "README.md"
+requires-python = ">=3.9"
+license = {text = "MIT"}
+authors = [
+    {name = "Your Name", email = "your.email@example.com"}
+]
+keywords = ["transcription", "speech-to-text", "whisper", "streaming", "obs"]
+
+dependencies = [
+    "numpy>=1.24.0",
+    "pyyaml>=6.0",
+    "sounddevice>=0.4.6",
+    "scipy>=1.10.0",
+    "noisereduce>=3.0.0",
+    "webrtcvad>=2.0.10",
+    "faster-whisper>=0.10.0",
+    "torch>=2.0.0",
+    "PySide6>=6.6.0",
+]
+
+[project.optional-dependencies]
+server = [
+    "fastapi>=0.104.0",
+    "uvicorn>=0.24.0",
+    "websockets>=12.0",
+    "requests>=2.31.0",
+]
+dev = [
+    "pytest>=7.4.0",
+    "black>=23.0.0",
+    "ruff>=0.1.0",
+]
+
+[project.scripts]
+local-transcription = "main:main"
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.hatch.build.targets.wheel]
+packages = ["client", "gui"]
+
+[tool.uv]
+dev-dependencies = [
+    "pyinstaller>=6.17.0",
+]
+
+[tool.ruff]
+line-length = 100
+target-version = "py39"
+
+[tool.black]
+line-length = 100
+target-version = ["py39"]
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,23 @@
+# Core Dependencies
+numpy>=1.24.0
+pyyaml>=6.0
+
+# Audio Processing
+sounddevice>=0.4.6
+scipy>=1.10.0
+
+# Noise Suppression
+noisereduce>=3.0.0
+webrtcvad>=2.0.10
+
+# Transcription - Using faster-whisper for better real-time performance
+faster-whisper>=0.10.0
+torch>=2.0.0
+
+# GUI - Using CustomTkinter for modern look
+customtkinter>=5.2.0
+pillow>=10.0.0
+
+# Optional: Server sync dependencies (will move to requirements-server.txt later)
+# websockets>=12.0
+# requests>=2.31.0
--- a/server/init.py
+++ b/server/init.py
--- a/server/web_display.py
+++ b/server/web_display.py
@@ -0,0 +1,233 @@
+"""Web server for displaying transcriptions in a browser (for OBS browser source)."""
+
+import asyncio
+from fastapi import FastAPI, WebSocket
+from fastapi.responses import HTMLResponse
+from typing import List, Optional
+import json
+from datetime import datetime
+
+
+class TranscriptionWebServer:
+    """Web server for displaying transcriptions."""
+
+    def __init__(self, host: str = "127.0.0.1", port: int = 8080, show_timestamps: bool = True, fade_after_seconds: int = 10):
+        """
+        Initialize web server.
+
+        Args:
+            host: Server host address
+            port: Server port
+            show_timestamps: Whether to show timestamps in transcriptions
+            fade_after_seconds: Time in seconds before transcriptions fade out (0 = never fade)
+        """
+        self.host = host
+        self.port = port
+        self.show_timestamps = show_timestamps
+        self.fade_after_seconds = fade_after_seconds
+        self.app = FastAPI()
+        self.active_connections: List[WebSocket] = []
+        self.transcriptions = []  # Store recent transcriptions
+
+        # Setup routes
+        self._setup_routes()
+
+    def _setup_routes(self):
+        """Setup FastAPI routes."""
+
+        @self.app.get("/", response_class=HTMLResponse)
+        async def get_display():
+            """Serve the transcription display page."""
+            return self._get_html()
+
+        @self.app.websocket("/ws")
+        async def websocket_endpoint(websocket: WebSocket):
+            """WebSocket endpoint for real-time updates."""
+            await websocket.accept()
+            self.active_connections.append(websocket)
+
+            try:
+                # Send recent transcriptions
+                for trans in self.transcriptions[-20:]:  # Last 20
+                    await websocket.send_json(trans)
+
+                # Keep connection alive
+                while True:
+                    # Wait for ping/pong to keep connection alive
+                    await websocket.receive_text()
+            except:
+                self.active_connections.remove(websocket)
+
+    def _get_html(self) -> str:
+        """Generate HTML for transcription display."""
+        return f"""
+<!DOCTYPE html>
+<html>
+<head>
+    <title>Transcription Display</title>
+    <style>
+        body {{
+            margin: 0;
+            padding: 20px;
+            background: transparent;
+            font-family: Arial, sans-serif;
+            color: white;
+        }}
+        #transcriptions {{
+            max-height: 100vh;
+            overflow-y: auto;
+        }}
+        .transcription {{
+            margin: 10px 0;
+            padding: 10px;
+            background: rgba(0, 0, 0, 0.7);
+            border-radius: 5px;
+            animation: slideIn 0.3s ease-out;
+            transition: opacity 1s ease-out;
+        }}
+        .transcription.fading {{
+            opacity: 0;
+        }}
+        .timestamp {{
+            color: #888;
+            font-size: 0.9em;
+            margin-right: 10px;
+        }}
+        .user {{
+            color: #4CAF50;
+            font-weight: bold;
+            margin-right: 10px;
+        }}
+        .text {{
+            color: white;
+        }}
+        @keyframes slideIn {{
+            from {{
+                opacity: 0;
+                transform: translateY(-10px);
+            }}
+            to {{
+                opacity: 1;
+                transform: translateY(0);
+            }}
+        }}
+    </style>
+</head>
+<body>
+    <div id="transcriptions"></div>
+
+    <script>
+        const container = document.getElementById('transcriptions');
+        const ws = new WebSocket(`ws://${{window.location.host}}/ws`);
+        const fadeAfterSeconds = {self.fade_after_seconds};
+
+        ws.onmessage = (event) => {{
+            const data = JSON.parse(event.data);
+            addTranscription(data);
+        }};
+
+        ws.onclose = () => {{
+            console.log('WebSocket closed. Attempting to reconnect...');
+            setTimeout(() => location.reload(), 3000);
+        }};
+
+        // Send keepalive pings
+        setInterval(() => {{
+            if (ws.readyState === WebSocket.OPEN) {{
+                ws.send('ping');
+            }}
+        }}, 30000);
+
+        function addTranscription(data) {{
+            const div = document.createElement('div');
+            div.className = 'transcription';
+
+            let html = '';
+            if (data.timestamp) {{
+                html += `<span class="timestamp">[${{data.timestamp}}]</span>`;
+            }}
+            if (data.user_name) {{
+                html += `<span class="user">${{data.user_name}}:</span>`;
+            }}
+            html += `<span class="text">${{data.text}}</span>`;
+
+            div.innerHTML = html;
+            container.appendChild(div);
+
+            // Auto-scroll to bottom
+            container.scrollTop = container.scrollHeight;
+
+            // Set up fade-out if enabled
+            if (fadeAfterSeconds > 0) {{
+                setTimeout(() => {{
+                    // Start fade animation
+                    div.classList.add('fading');
+
+                    // Remove element after fade completes
+                    setTimeout(() => {{
+                        if (div.parentNode === container) {{
+                            container.removeChild(div);
+                        }}
+                    }}, 1000); // Match the CSS transition duration
+                }}, fadeAfterSeconds * 1000);
+            }}
+
+            // Limit to 50 transcriptions (fallback)
+            while (container.children.length > 50) {{
+                container.removeChild(container.firstChild);
+            }}
+        }}
+    </script>
+</body>
+</html>
+        """
+
+    async def broadcast_transcription(self, text: str, user_name: str = "", timestamp: Optional[datetime] = None):
+        """
+        Broadcast a transcription to all connected clients.
+
+        Args:
+            text: Transcription text
+            user_name: User/speaker name
+            timestamp: Timestamp of transcription
+        """
+        if timestamp is None:
+            timestamp = datetime.now()
+
+        trans_data = {
+            "text": text,
+            "user_name": user_name,
+        }
+
+        # Only include timestamp if enabled
+        if self.show_timestamps:
+            trans_data["timestamp"] = timestamp.strftime("%H:%M:%S")
+
+        # Store transcription
+        self.transcriptions.append(trans_data)
+        if len(self.transcriptions) > 100:
+            self.transcriptions.pop(0)
+
+        # Broadcast to all connected clients
+        disconnected = []
+        for connection in self.active_connections:
+            try:
+                await connection.send_json(trans_data)
+            except:
+                disconnected.append(connection)
+
+        # Remove disconnected clients
+        for conn in disconnected:
+            self.active_connections.remove(conn)
+
+    async def start(self):
+        """Start the web server."""
+        import uvicorn
+        config = uvicorn.Config(
+            self.app,
+            host=self.host,
+            port=self.port,
+            log_level="warning"
+        )
+        server = uvicorn.Server(config)
+        await server.serve()
--- a/test_components.py
+++ b/test_components.py
@@ -0,0 +1,124 @@
+#!/usr/bin/env python3
+"""
+Test script to verify all components work without GUI.
+This can run in headless environments.
+"""
+
+import sys
+from pathlib import Path
+
+# Add project root to Python path
+project_root = Path(__file__).parent
+sys.path.insert(0, str(project_root))
+
+print("=" * 60)
+print("Testing Local Transcription Components (No GUI)")
+print("=" * 60)
+
+# Test 1: Configuration
+print("\n1. Testing Configuration System...")
+try:
+    from client.config import Config
+    config = Config()
+    print(f"   ✓ Config loaded: {config.config_path}")
+    print(f"   ✓ User name: {config.get('user.name')}")
+    print(f"   ✓ Model: {config.get('transcription.model')}")
+except Exception as e:
+    print(f"   ✗ Config failed: {e}")
+    sys.exit(1)
+
+# Test 2: Device Detection
+print("\n2. Testing Device Detection...")
+try:
+    from client.device_utils import DeviceManager
+    device_mgr = DeviceManager()
+    print(f"   ✓ Available devices: {device_mgr.available_devices}")
+    print(f"   ✓ Current device: {device_mgr.current_device}")
+    print(f"   ✓ GPU available: {device_mgr.is_gpu_available()}")
+
+    device_info = device_mgr.get_device_info()
+    for dev_id, dev_desc in device_info:
+        print(f"      - {dev_id}: {dev_desc}")
+except Exception as e:
+    print(f"   ✗ Device detection failed: {e}")
+    sys.exit(1)
+
+# Test 3: Audio Devices
+print("\n3. Testing Audio Capture...")
+try:
+    from client.audio_capture import AudioCapture
+    devices = AudioCapture.get_input_devices()
+    print(f"   ✓ Found {len(devices)} audio input device(s)")
+    for idx, name in devices[:5]:  # Show first 5
+        print(f"      - [{idx}] {name}")
+    if len(devices) > 5:
+        print(f"      ... and {len(devices) - 5} more")
+except Exception as e:
+    print(f"   ✗ Audio capture failed: {e}")
+
+# Test 4: Noise Suppression
+print("\n4. Testing Noise Suppression...")
+try:
+    from client.noise_suppression import NoiseSuppressor
+    import numpy as np
+
+    suppressor = NoiseSuppressor(sample_rate=16000, method="noisereduce", strength=0.7)
+    print(f"   ✓ Noise suppressor created: {suppressor}")
+
+    # Test with dummy audio
+    test_audio = np.random.randn(16000).astype(np.float32) * 0.1
+    processed = suppressor.process(test_audio, skip_silent=False)
+    print(f"   ✓ Processed audio shape: {processed.shape}")
+except Exception as e:
+    print(f"   ✗ Noise suppression failed: {e}")
+
+# Test 5: Transcription Engine
+print("\n5. Testing Transcription Engine (Loading Model)...")
+try:
+    from client.transcription_engine import TranscriptionEngine
+
+    device = device_mgr.get_device_for_whisper()
+    compute_type = device_mgr.get_compute_type()
+
+    print(f"   → Using device: {device} with compute type: {compute_type}")
+    print(f"   → Loading model (this may take 1-2 minutes on first run)...")
+
+    engine = TranscriptionEngine(
+        model_size="tiny",  # Use tiny for faster testing
+        device=device,
+        compute_type=compute_type,
+        language="en"
+    )
+
+    success = engine.load_model()
+    if success:
+        print(f"   ✓ Model loaded successfully!")
+        print(f"   ✓ Engine: {engine}")
+
+        # Test transcription with dummy audio
+        print(f"\n   Testing transcription with silent audio...")
+        test_audio = np.zeros(48000, dtype=np.float32)  # 3 seconds of silence
+        result = engine.transcribe(test_audio, sample_rate=16000, user_name="Test")
+
+        if result:
+            print(f"   ✓ Transcription result: '{result.text}'")
+        else:
+            print(f"   ℹ No transcription (expected for silent audio)")
+
+        engine.unload_model()
+    else:
+        print(f"   ✗ Model loading failed")
+        sys.exit(1)
+
+except Exception as e:
+    print(f"   ✗ Transcription engine failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
+
+print("\n" + "=" * 60)
+print("✓ All Components Tested Successfully!")
+print("=" * 60)
+print("\nThe application is ready to use!")
+print("Run 'uv run python main.py' on a system with a display.")
+print("=" * 60)