README.md

# Local Transcription for Streamers

A local speech-to-text application designed for streamers that provides real-time transcription using Whisper or similar models. Multiple users can run the application locally and sync their transcriptions to a centralized web stream that can be easily captured in OBS or other streaming software.

## Features

- **Standalone Desktop Application**: Use locally with built-in GUI display - no server required
- **Local Transcription**: Run Whisper (or compatible models) locally on your machine
- **CPU/GPU Support**: Choose between CPU or GPU processing based on your hardware
- **Real-time Processing**: Live audio transcription with minimal latency
- **Noise Suppression**: Built-in audio preprocessing to reduce background noise
- **User Configuration**: Set your display name and preferences through the GUI
- **Optional Multi-user Sync**: Connect to a server to sync transcriptions with other users
- **OBS Integration**: Web-based output designed for easy browser source capture
- **Privacy-First**: All processing happens locally; only transcription text is shared
- **Customizable**: Configure model size, language, and streaming settings

## Quick Start

### Running from Source

```bash
# Install dependencies
uv sync

# Run the application
uv run python main.py
```

### Building Standalone Executables

To create standalone executables for distribution:

**Linux:**
```bash
./build.sh
```

**Windows:**
```cmd
build.bat
```

For detailed build instructions, see [BUILD.md](BUILD.md).

## Architecture Overview

The application can run in two modes:

### Standalone Mode (No Server Required):
1. **Desktop Application**: Captures audio, performs speech-to-text, and displays transcriptions locally in a GUI window

### Multi-user Sync Mode (Optional):
1. **Local Transcription Client**: Captures audio, performs speech-to-text, and sends results to the web server
2. **Centralized Web Server**: Aggregates transcriptions from multiple clients and serves a web stream
3. **Web Stream Interface**: Browser-accessible page displaying synchronized transcriptions (for OBS capture)

## Use Cases

- **Multi-language Streams**: Multiple translators transcribing in different languages
- **Accessibility**: Provide real-time captions for viewers
- **Collaborative Podcasts**: Multiple hosts with separate transcriptions
- **Gaming Commentary**: Track who said what in multiplayer sessions

---

## Implementation Plan

### Phase 1: Standalone Desktop Application

**Objective**: Build a fully functional standalone transcription app with GUI that works without any server

#### Components:
1. **Audio Capture Module**
   - Capture system audio or microphone input
   - Support multiple audio sources (virtual audio cables, physical devices)
   - Real-time audio buffering with configurable chunk sizes
   - **Noise Suppression**: Preprocess audio to reduce background noise
   - Libraries: `pyaudio`, `sounddevice`, `noisereduce`, `webrtcvad`

2. **Noise Suppression Engine**
   - Real-time noise reduction using RNNoise or noisereduce
   - Adjustable noise reduction strength
   - Optional VAD (Voice Activity Detection) to skip silent segments
   - Libraries: `noisereduce`, `rnnoise-python`, `webrtcvad`

3. **Transcription Engine**
   - Integrate OpenAI Whisper (or alternatives: faster-whisper, whisper.cpp)
   - Support multiple model sizes (tiny, base, small, medium, large)
   - CPU and GPU inference options
   - Model management and automatic downloading
   - Libraries: `openai-whisper`, `faster-whisper`, `torch`

4. **Device Selection**
   - Auto-detect available compute devices (CPU, CUDA, MPS for Mac)
   - Allow user to specify preferred device via GUI
   - Graceful fallback if GPU unavailable
   - Display device status and performance metrics

5. **Desktop GUI Application**
   - Cross-platform GUI using PyQt6, Tkinter, or CustomTkinter
   - Main transcription display window (scrolling text area)
   - Settings panel for configuration
   - User name input field
   - Audio input device selector
   - Model size selector
   - CPU/GPU toggle
   - Start/Stop transcription button
   - Optional: System tray integration
   - Libraries: `PyQt6`, `customtkinter`, or `tkinter`

6. **Local Display**
   - Real-time transcription display in GUI window
   - Scrolling text with timestamps
   - User name/label shown with transcriptions
   - Copy transcription to clipboard
   - Optional: Save transcription to file (TXT, SRT, VTT)

#### Tasks:
- [ ] Set up project structure and dependencies
- [ ] Implement audio capture with device selection
- [ ] Add noise suppression and VAD preprocessing
- [ ] Integrate Whisper model loading and inference
- [ ] Add CPU/GPU device detection and selection logic
- [ ] Create real-time audio buffer processing pipeline
- [ ] Design and implement GUI layout (main window)
- [ ] Add settings panel with user name configuration
- [ ] Implement local transcription display area
- [ ] Add start/stop controls and status indicators
- [ ] Test transcription accuracy and latency
- [ ] Test noise suppression effectiveness

---

### Phase 2: Web Server and Sync System

**Objective**: Create a centralized server to aggregate and serve transcriptions

#### Components:
1. **Web Server**
   - FastAPI or Flask-based REST API
   - WebSocket support for real-time updates
   - User/client registration and management
   - Libraries: `fastapi`, `uvicorn`, `websockets`

2. **Transcription Aggregator**
   - Receive transcription chunks from multiple clients
   - Associate transcriptions with user IDs/names
   - Timestamp management and synchronization
   - Buffer management for smooth streaming

3. **Database/Storage** (Optional)
   - Store transcription history (SQLite for simplicity)
   - Session management
   - Export functionality (SRT, VTT, TXT formats)

#### API Endpoints:
- `POST /api/register` - Register a new client
- `POST /api/transcription` - Submit transcription chunk
- `WS /api/stream` - WebSocket for real-time transcription stream
- `GET /stream` - Web page for OBS browser source

#### Tasks:
- [ ] Set up FastAPI server with CORS support
- [ ] Implement WebSocket handler for real-time streaming
- [ ] Create client registration system
- [ ] Build transcription aggregation logic
- [ ] Add timestamp synchronization
- [ ] Create data models for clients and transcriptions

---

### Phase 3: Client-Server Communication (Optional Multi-user Mode)

**Objective**: Add optional server connectivity to enable multi-user transcription sync

#### Components:
1. **HTTP/WebSocket Client**
   - Register client with server on startup
   - Send transcription chunks as they're generated
   - Handle connection drops and reconnection
   - Libraries: `requests`, `websockets`

2. **Configuration System**
   - Config file for server URL, API keys, user settings
   - Model preferences (size, language)
   - Audio input settings
   - Format: YAML or JSON

3. **Status Monitoring**
   - Connection status indicator
   - Transcription queue health
   - Error handling and logging

#### Tasks:
- [ ] Add "Enable Server Sync" toggle to GUI
- [ ] Add server URL configuration field in settings
- [ ] Implement WebSocket client for sending transcriptions
- [ ] Add configuration file support (YAML/JSON)
- [ ] Create connection management with auto-reconnect
- [ ] Add local logging and error handling
- [ ] Add server connection status indicator to GUI
- [ ] Allow app to function normally if server is unavailable

---

### Phase 4: Web Stream Interface (OBS Integration)

**Objective**: Create a web page that displays synchronized transcriptions for OBS

#### Components:
1. **Web Frontend**
   - HTML/CSS/JavaScript page for displaying transcriptions
   - Responsive design with customizable styling
   - Auto-scroll with configurable retention window
   - Libraries: Vanilla JS or lightweight framework (Alpine.js, htmx)

2. **Styling Options**
   - Customizable fonts, colors, sizes
   - Background transparency for OBS chroma key
   - User name/ID display options
   - Timestamp display (optional)

3. **Display Modes**
   - Scrolling captions (like live TV captions)
   - Multi-user panel view (separate sections per user)
   - Overlay mode (minimal UI for transparency)

#### Tasks:
- [ ] Create HTML template for transcription display
- [ ] Implement WebSocket client in JavaScript
- [ ] Add CSS styling with OBS-friendly transparency
- [ ] Create customization controls (URL parameters or UI)
- [ ] Test with OBS browser source
- [ ] Add configurable retention/scroll behavior

---

### Phase 5: Advanced Features

**Objective**: Enhance functionality and user experience

#### Features:
1. **Language Detection**
   - Auto-detect spoken language
   - Multi-language support in single stream
   - Language selector in GUI

2. **Speaker Diarization** (Optional)
   - Identify different speakers
   - Label transcriptions by speaker
   - Useful for multi-host streams

3. **Profanity Filtering**
   - Optional word filtering/replacement
   - Customizable filter lists
   - Toggle in GUI settings

4. **Advanced Noise Profiles**
   - Save and load custom noise profiles
   - Adaptive noise suppression
   - Different profiles for different environments

5. **Export Functionality**
   - Save transcriptions in multiple formats (TXT, SRT, VTT, JSON)
   - Export button in GUI
   - Automatic session saving

6. **Hotkey Support**
   - Global hotkeys to start/stop transcription
   - Mute/unmute hotkey
   - Quick save hotkey

7. **Docker Support**
   - Containerized server deployment
   - Docker Compose for easy multi-component setup
   - Pre-built images for easy deployment

8. **Themes and Customization**
   - Dark/light theme toggle
   - Customizable font sizes and colors for display
   - OBS-friendly transparent overlay mode

#### Tasks:
- [ ] Add language detection and multi-language support
- [ ] Implement speaker diarization
- [ ] Create optional profanity filter
- [ ] Add export functionality (SRT, VTT, plain text, JSON)
- [ ] Implement global hotkey support
- [ ] Create Docker containers for server component
- [ ] Add theme customization options
- [ ] Create advanced noise profile management

---

## Technology Stack

### Local Client:
- **Python 3.9+**
- **GUI**: PyQt6 / CustomTkinter / tkinter
- **Audio**: PyAudio / sounddevice
- **Noise Suppression**: noisereduce / rnnoise-python
- **VAD**: webrtcvad
- **ML Framework**: PyTorch (for Whisper)
- **Transcription**: openai-whisper / faster-whisper
- **Networking**: websockets, requests (optional for server sync)
- **Config**: PyYAML / json

### Server:
- **Backend**: FastAPI / Flask
- **WebSocket**: python-websockets / FastAPI WebSockets
- **Server**: Uvicorn / Gunicorn
- **Database** (optional): SQLite / PostgreSQL
- **CORS**: fastapi-cors

### Web Interface:
- **Frontend**: HTML5, CSS3, JavaScript (ES6+)
- **Real-time**: WebSocket API
- **Styling**: CSS Grid/Flexbox for layout

---

## Project Structure

```
local-transcription/
 client/                      # Local transcription client
    __init__.py
    audio_capture.py         # Audio input handling
    transcription_engine.py  # Whisper integration
    network_client.py        # Server communication
    config.py                # Configuration management
    main.py                  # Client entry point
 server/                      # Centralized web server
    __init__.py
    api.py                   # FastAPI routes
    websocket_handler.py     # WebSocket management
    models.py                # Data models
    database.py              # Optional DB layer
    main.py                  # Server entry point
 web/                         # Web stream interface
    index.html               # OBS browser source page
    styles.css               # Customizable styling
    app.js                   # WebSocket client & UI logic
 config/
    client_config.example.yaml
    server_config.example.yaml
 tests/
    test_audio.py
    test_transcription.py
    test_server.py
 requirements.txt             # Python dependencies
 README.md
 main.py                      # Combined launcher (optional)
```

---

## Installation (Planned)

### Prerequisites:
- Python 3.9 or higher
- CUDA-capable GPU (optional, for GPU acceleration)
- FFmpeg (required by Whisper)

### Steps:

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd local-transcription
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Download Whisper models**
   ```bash
   # Models will be auto-downloaded on first run
   # Or manually download:
   python -c "import whisper; whisper.load_model('base')"
   ```

4. **Configure client**
   ```bash
   cp config/client_config.example.yaml config/client_config.yaml
   # Edit config/client_config.yaml with your settings
   ```

5. **Run the server** (one instance)
   ```bash
   python server/main.py
   ```

6. **Run the client** (on each user's machine)
   ```bash
   python client/main.py
   ```

7. **Add to OBS**
   - Add a Browser Source
   - URL: `http://<server-ip>:8000/stream`
   - Set width/height as needed
   - Check "Shutdown source when not visible" for performance

---

## Configuration (Planned)

### Client Configuration:
```yaml
user:
  name: "Streamer1"          # Display name for transcriptions
  id: "unique-user-id"       # Optional unique identifier

audio:
  input_device: "default"    # or specific device index
  sample_rate: 16000
  chunk_duration: 2.0        # seconds

noise_suppression:
  enabled: true              # Enable/disable noise reduction
  strength: 0.7              # 0.0 to 1.0 - reduction strength
  method: "noisereduce"      # "noisereduce" or "rnnoise"

transcription:
  model: "base"              # tiny, base, small, medium, large
  device: "cuda"             # cpu, cuda, mps
  language: "en"             # or "auto" for detection
  task: "transcribe"         # or "translate"

processing:
  use_vad: true              # Voice Activity Detection
  min_confidence: 0.5        # Minimum transcription confidence

server_sync:
  enabled: false             # Enable multi-user server sync
  url: "ws://localhost:8000" # Server URL (when enabled)
  api_key: ""                # Optional API key

display:
  show_timestamps: true      # Show timestamps in local display
  max_lines: 100             # Maximum lines to keep in display
  font_size: 12              # GUI font size
```

### Server Configuration:
```yaml
server:
  host: "0.0.0.0"
  port: 8000
  api_key_required: false

stream:
  max_clients: 10
  buffer_size: 100         # messages to buffer
  retention_time: 300      # seconds

database:
  enabled: false
  path: "transcriptions.db"
```

---

## Roadmap

- [x] Project planning and architecture design
- [ ] Phase 1: Standalone desktop application with GUI
- [ ] Phase 2: Web server and sync system (optional multi-user mode)
- [ ] Phase 3: Client-server communication (optional)
- [ ] Phase 4: Web stream interface for OBS (optional)
- [ ] Phase 5: Advanced features (hotkeys, themes, Docker, etc.)

---

## Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

---

## License

[Choose appropriate license - MIT, Apache 2.0, etc.]

---

## Acknowledgments

- OpenAI Whisper for the excellent speech recognition model
- The streaming community for inspiration and use cases
-												Initial commit: Local Transcription App v1.0

Phase 1 Complete - Standalone Desktop Application

Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)

Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML

Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)

Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

											
										
										
											2025-12-25 18:48:23 -08:00
+								# Local Transcription for Streamers
 								A local speech-to-text application designed for streamers that provides real-time transcription using Whisper or similar models. Multiple users can run the application locally and sync their transcriptions to a centralized web stream that can be easily captured in OBS or other streaming software.
 								## Features
 								- **Standalone Desktop Application**: Use locally with built-in GUI display - no server required
 								- **Local Transcription**: Run Whisper (or compatible models) locally on your machine
 								- **CPU/GPU Support**: Choose between CPU or GPU processing based on your hardware
 								- **Real-time Processing**: Live audio transcription with minimal latency
 								- **Noise Suppression**: Built-in audio preprocessing to reduce background noise
 								- **User Configuration**: Set your display name and preferences through the GUI
 								- **Optional Multi-user Sync**: Connect to a server to sync transcriptions with other users
 								- **OBS Integration**: Web-based output designed for easy browser source capture
 								- **Privacy-First**: All processing happens locally; only transcription text is shared
 								- **Customizable**: Configure model size, language, and streaming settings
 								## Quick Start
 								### Running from Source
 								```bash
 								# Install dependencies
 								uv sync
 								# Run the application
 								uv run python main.py
 								```
 								### Building Standalone Executables
 								To create standalone executables for distribution:
 								**Linux:**
 								```bash
 								./build.sh
 								```
 								**Windows:**
 								```cmd
 								build.bat
 								```
 								For detailed build instructions, see [BUILD.md](BUILD.md).
 								## Architecture Overview
 								The application can run in two modes:
 								### Standalone Mode (No Server Required):
 . **Desktop Application**: Captures audio, performs speech-to-text, and displays transcriptions locally in a GUI window
 								### Multi-user Sync Mode (Optional):
 . **Local Transcription Client**: Captures audio, performs speech-to-text, and sends results to the web server
 . **Centralized Web Server**: Aggregates transcriptions from multiple clients and serves a web stream
 . **Web Stream Interface**: Browser-accessible page displaying synchronized transcriptions (for OBS capture)
 								## Use Cases
 								- **Multi-language Streams**: Multiple translators transcribing in different languages
 								- **Accessibility**: Provide real-time captions for viewers
 								- **Collaborative Podcasts**: Multiple hosts with separate transcriptions
 								- **Gaming Commentary**: Track who said what in multiplayer sessions
 								---
 								## Implementation Plan
 								### Phase 1: Standalone Desktop Application
 								**Objective**: Build a fully functional standalone transcription app with GUI that works without any server
 								#### Components:
 . **Audio Capture Module**
 								   - Capture system audio or microphone input
 								   - Support multiple audio sources (virtual audio cables, physical devices)
 								   - Real-time audio buffering with configurable chunk sizes
 								   - **Noise Suppression**: Preprocess audio to reduce background noise
 								   - Libraries: `pyaudio`, `sounddevice`, `noisereduce`, `webrtcvad`
 . **Noise Suppression Engine**
 								   - Real-time noise reduction using RNNoise or noisereduce
 								   - Adjustable noise reduction strength
 								   - Optional VAD (Voice Activity Detection) to skip silent segments
 								   - Libraries: `noisereduce`, `rnnoise-python`, `webrtcvad`
 . **Transcription Engine**
 								   - Integrate OpenAI Whisper (or alternatives: faster-whisper, whisper.cpp)
 								   - Support multiple model sizes (tiny, base, small, medium, large)
 								   - CPU and GPU inference options
 								   - Model management and automatic downloading
 								   - Libraries: `openai-whisper`, `faster-whisper`, `torch`
 . **Device Selection**
 								   - Auto-detect available compute devices (CPU, CUDA, MPS for Mac)
 								   - Allow user to specify preferred device via GUI
 								   - Graceful fallback if GPU unavailable
 								   - Display device status and performance metrics
 . **Desktop GUI Application**
 								   - Cross-platform GUI using PyQt6, Tkinter, or CustomTkinter
 								   - Main transcription display window (scrolling text area)
 								   - Settings panel for configuration
 								   - User name input field
 								   - Audio input device selector
 								   - Model size selector
 								   - CPU/GPU toggle
 								   - Start/Stop transcription button
 								   - Optional: System tray integration
 								   - Libraries: `PyQt6`, `customtkinter`, or `tkinter`
 . **Local Display**
 								   - Real-time transcription display in GUI window
 								   - Scrolling text with timestamps
 								   - User name/label shown with transcriptions
 								   - Copy transcription to clipboard
 								   - Optional: Save transcription to file (TXT, SRT, VTT)
 								#### Tasks:
 								- [ ] Set up project structure and dependencies
 								- [ ] Implement audio capture with device selection
 								- [ ] Add noise suppression and VAD preprocessing
 								- [ ] Integrate Whisper model loading and inference
 								- [ ] Add CPU/GPU device detection and selection logic
 								- [ ] Create real-time audio buffer processing pipeline
 								- [ ] Design and implement GUI layout (main window)
 								- [ ] Add settings panel with user name configuration
 								- [ ] Implement local transcription display area
 								- [ ] Add start/stop controls and status indicators
 								- [ ] Test transcription accuracy and latency
 								- [ ] Test noise suppression effectiveness
 								---
 								### Phase 2: Web Server and Sync System
 								**Objective**: Create a centralized server to aggregate and serve transcriptions
 								#### Components:
 . **Web Server**
 								   - FastAPI or Flask-based REST API
 								   - WebSocket support for real-time updates
 								   - User/client registration and management
 								   - Libraries: `fastapi`, `uvicorn`, `websockets`
 . **Transcription Aggregator**
 								   - Receive transcription chunks from multiple clients
 								   - Associate transcriptions with user IDs/names
 								   - Timestamp management and synchronization
 								   - Buffer management for smooth streaming
 . **Database/Storage** (Optional)
 								   - Store transcription history (SQLite for simplicity)
 								   - Session management
 								   - Export functionality (SRT, VTT, TXT formats)
 								#### API Endpoints:
 								- `POST /api/register` - Register a new client
 								- `POST /api/transcription` - Submit transcription chunk
 								- `WS /api/stream` - WebSocket for real-time transcription stream
 								- `GET /stream` - Web page for OBS browser source
 								#### Tasks:
 								- [ ] Set up FastAPI server with CORS support
 								- [ ] Implement WebSocket handler for real-time streaming
 								- [ ] Create client registration system
 								- [ ] Build transcription aggregation logic
 								- [ ] Add timestamp synchronization
 								- [ ] Create data models for clients and transcriptions
 								---
 								### Phase 3: Client-Server Communication (Optional Multi-user Mode)
 								**Objective**: Add optional server connectivity to enable multi-user transcription sync
 								#### Components:
 . **HTTP/WebSocket Client**
 								   - Register client with server on startup
 								   - Send transcription chunks as they're generated
 								   - Handle connection drops and reconnection
 								   - Libraries: `requests`, `websockets`
 . **Configuration System**
 								   - Config file for server URL, API keys, user settings
 								   - Model preferences (size, language)
 								   - Audio input settings
 								   - Format: YAML or JSON
 . **Status Monitoring**
 								   - Connection status indicator
 								   - Transcription queue health
 								   - Error handling and logging
 								#### Tasks:
 								- [ ] Add "Enable Server Sync" toggle to GUI
 								- [ ] Add server URL configuration field in settings
 								- [ ] Implement WebSocket client for sending transcriptions
 								- [ ] Add configuration file support (YAML/JSON)
 								- [ ] Create connection management with auto-reconnect
 								- [ ] Add local logging and error handling
 								- [ ] Add server connection status indicator to GUI
 								- [ ] Allow app to function normally if server is unavailable
 								---
 								### Phase 4: Web Stream Interface (OBS Integration)
 								**Objective**: Create a web page that displays synchronized transcriptions for OBS
 								#### Components:
 . **Web Frontend**
 								   - HTML/CSS/JavaScript page for displaying transcriptions
 								   - Responsive design with customizable styling
 								   - Auto-scroll with configurable retention window
 								   - Libraries: Vanilla JS or lightweight framework (Alpine.js, htmx)
 . **Styling Options**
 								   - Customizable fonts, colors, sizes
 								   - Background transparency for OBS chroma key
 								   - User name/ID display options
 								   - Timestamp display (optional)
 . **Display Modes**
 								   - Scrolling captions (like live TV captions)
 								   - Multi-user panel view (separate sections per user)
 								   - Overlay mode (minimal UI for transparency)
 								#### Tasks:
 								- [ ] Create HTML template for transcription display
 								- [ ] Implement WebSocket client in JavaScript
 								- [ ] Add CSS styling with OBS-friendly transparency
 								- [ ] Create customization controls (URL parameters or UI)
 								- [ ] Test with OBS browser source
 								- [ ] Add configurable retention/scroll behavior
 								---
 								### Phase 5: Advanced Features
 								**Objective**: Enhance functionality and user experience
 								#### Features:
 . **Language Detection**
 								   - Auto-detect spoken language
 								   - Multi-language support in single stream
 								   - Language selector in GUI
 . **Speaker Diarization** (Optional)
 								   - Identify different speakers
 								   - Label transcriptions by speaker
 								   - Useful for multi-host streams
 . **Profanity Filtering**
 								   - Optional word filtering/replacement
 								   - Customizable filter lists
 								   - Toggle in GUI settings
 . **Advanced Noise Profiles**
 								   - Save and load custom noise profiles
 								   - Adaptive noise suppression
 								   - Different profiles for different environments
 . **Export Functionality**
 								   - Save transcriptions in multiple formats (TXT, SRT, VTT, JSON)
 								   - Export button in GUI
 								   - Automatic session saving
 . **Hotkey Support**
 								   - Global hotkeys to start/stop transcription
 								   - Mute/unmute hotkey
 								   - Quick save hotkey
 . **Docker Support**
 								   - Containerized server deployment
 								   - Docker Compose for easy multi-component setup
 								   - Pre-built images for easy deployment
 . **Themes and Customization**
 								   - Dark/light theme toggle
 								   - Customizable font sizes and colors for display
 								   - OBS-friendly transparent overlay mode
 								#### Tasks:
 								- [ ] Add language detection and multi-language support
 								- [ ] Implement speaker diarization
 								- [ ] Create optional profanity filter
 								- [ ] Add export functionality (SRT, VTT, plain text, JSON)
 								- [ ] Implement global hotkey support
 								- [ ] Create Docker containers for server component
 								- [ ] Add theme customization options
 								- [ ] Create advanced noise profile management
 								---
 								## Technology Stack
 								### Local Client:
 								- **Python 3.9+**
 								- **GUI**: PyQt6 / CustomTkinter / tkinter
 								- **Audio**: PyAudio / sounddevice
 								- **Noise Suppression**: noisereduce / rnnoise-python
 								- **VAD**: webrtcvad
 								- **ML Framework**: PyTorch (for Whisper)
 								- **Transcription**: openai-whisper / faster-whisper
 								- **Networking**: websockets, requests (optional for server sync)
 								- **Config**: PyYAML / json
 								### Server:
 								- **Backend**: FastAPI / Flask
 								- **WebSocket**: python-websockets / FastAPI WebSockets
 								- **Server**: Uvicorn / Gunicorn
 								- **Database** (optional): SQLite / PostgreSQL
 								- **CORS**: fastapi-cors
 								### Web Interface:
 								- **Frontend**: HTML5, CSS3, JavaScript (ES6+)
 								- **Real-time**: WebSocket API
 								- **Styling**: CSS Grid/Flexbox for layout
 								---
 								## Project Structure
 								```
 								local-transcription/
 								 client/                      # Local transcription client
 								    __init__.py
 								    audio_capture.py         # Audio input handling
 								    transcription_engine.py  # Whisper integration
 								    network_client.py        # Server communication
 								    config.py                # Configuration management
 								    main.py                  # Client entry point
 								 server/                      # Centralized web server
 								    __init__.py
 								    api.py                   # FastAPI routes
 								    websocket_handler.py     # WebSocket management
 								    models.py                # Data models
 								    database.py              # Optional DB layer
 								    main.py                  # Server entry point
 								 web/                         # Web stream interface
 								    index.html               # OBS browser source page
 								    styles.css               # Customizable styling
 								    app.js                   # WebSocket client & UI logic
 								 config/
 								    client_config.example.yaml
 								    server_config.example.yaml
 								 tests/
 								    test_audio.py
 								    test_transcription.py
 								    test_server.py
 								 requirements.txt             # Python dependencies
 								 README.md
 								 main.py                      # Combined launcher (optional)
 								```
 								---
 								## Installation (Planned)
 								### Prerequisites:
 								- Python 3.9 or higher
 								- CUDA-capable GPU (optional, for GPU acceleration)
 								- FFmpeg (required by Whisper)
 								### Steps:
 . **Clone the repository**
 								   ```bash
 								   git clone <repository-url>
 								   cd local-transcription
 								   ```
 . **Install dependencies**
 								   ```bash
 								   pip install -r requirements.txt
 								   ```
 . **Download Whisper models**
 								   ```bash
 								   # Models will be auto-downloaded on first run
 								   # Or manually download:
 								   python -c "import whisper; whisper.load_model('base')"
 								   ```
 . **Configure client**
 								   ```bash
 								   cp config/client_config.example.yaml config/client_config.yaml
 								   # Edit config/client_config.yaml with your settings
 								   ```
 . **Run the server** (one instance)
 								   ```bash
 								   python server/main.py
 								   ```
 . **Run the client** (on each user's machine)
 								   ```bash
 								   python client/main.py
 								   ```
 . **Add to OBS**
 								   - Add a Browser Source
 								   - URL: `http://<server-ip>:8000/stream`
 								   - Set width/height as needed
 								   - Check "Shutdown source when not visible" for performance
 								---
 								## Configuration (Planned)
 								### Client Configuration:
 								```yaml
 								user:
 								  name: "Streamer1"          # Display name for transcriptions
 								  id: "unique-user-id"       # Optional unique identifier
 								audio:
 								  input_device: "default"    # or specific device index
 								  sample_rate: 16000
 								  chunk_duration: 2.0        # seconds
 								noise_suppression:
 								  enabled: true              # Enable/disable noise reduction
 								  strength: 0.7              # 0.0 to 1.0 - reduction strength
 								  method: "noisereduce"      # "noisereduce" or "rnnoise"
 								transcription:
 								  model: "base"              # tiny, base, small, medium, large
 								  device: "cuda"             # cpu, cuda, mps
 								  language: "en"             # or "auto" for detection
 								  task: "transcribe"         # or "translate"
 								processing:
 								  use_vad: true              # Voice Activity Detection
 								  min_confidence: 0.5        # Minimum transcription confidence
 								server_sync:
 								  enabled: false             # Enable multi-user server sync
 								  url: "ws://localhost:8000" # Server URL (when enabled)
 								  api_key: ""                # Optional API key
 								display:
 								  show_timestamps: true      # Show timestamps in local display
 								  max_lines: 100             # Maximum lines to keep in display
 								  font_size: 12              # GUI font size
 								```
 								### Server Configuration:
 								```yaml
 								server:
 								  host: "0.0.0.0"
 								  port: 8000
 								  api_key_required: false
 								stream:
 								  max_clients: 10
 								  buffer_size: 100         # messages to buffer
 								  retention_time: 300      # seconds
 								database:
 								  enabled: false
 								  path: "transcriptions.db"
 								```
 								---
 								## Roadmap
 								- [x] Project planning and architecture design
 								- [ ] Phase 1: Standalone desktop application with GUI
 								- [ ] Phase 2: Web server and sync system (optional multi-user mode)
 								- [ ] Phase 3: Client-server communication (optional)
 								- [ ] Phase 4: Web stream interface for OBS (optional)
 								- [ ] Phase 5: Advanced features (hotkeys, themes, Docker, etc.)
 								---
 								## Contributing
 								Contributions are welcome! Please feel free to submit issues or pull requests.
 								---
 								## License
 								[Choose appropriate license - MIT, Apache 2.0, etc.]
 								---
 								## Acknowledgments
 								- OpenAI Whisper for the excellent speech recognition model
 								- The streaming community for inspiration and use cases