Initial commit: Local Transcription App v1.0
Phase 1 Complete - Standalone Desktop Application Features: - Real-time speech-to-text with Whisper (faster-whisper) - PySide6 desktop GUI with settings dialog - Web server for OBS browser source integration - Audio capture with automatic sample rate detection and resampling - Noise suppression with Voice Activity Detection (VAD) - Configurable display settings (font, timestamps, fade duration) - Settings apply without restart (with automatic model reloading) - Auto-fade for web display transcriptions - CPU/GPU support with automatic device detection - Standalone executable builds (PyInstaller) - CUDA build support (works on systems without CUDA hardware) Components: - Audio capture with sounddevice - Noise reduction with noisereduce + webrtcvad - Transcription with faster-whisper - GUI with PySide6 - Web server with FastAPI + WebSocket - Configuration system with YAML Build System: - Standard builds (CPU-only): build.sh / build.bat - CUDA builds (universal): build-cuda.sh / build-cuda.bat - Comprehensive BUILD.md documentation - Cross-platform support (Linux, Windows) Documentation: - README.md with project overview and quick start - BUILD.md with detailed build instructions - NEXT_STEPS.md with future enhancement roadmap - INSTALL.md with setup instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
56
.gitignore
vendored
Normal file
56
.gitignore
vendored
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*$py.class
|
||||||
|
*.so
|
||||||
|
.Python
|
||||||
|
build/
|
||||||
|
develop-eggs/
|
||||||
|
dist/
|
||||||
|
downloads/
|
||||||
|
eggs/
|
||||||
|
.eggs/
|
||||||
|
lib/
|
||||||
|
lib64/
|
||||||
|
parts/
|
||||||
|
sdist/
|
||||||
|
var/
|
||||||
|
wheels/
|
||||||
|
*.egg-info/
|
||||||
|
.installed.cfg
|
||||||
|
*.egg
|
||||||
|
|
||||||
|
# Virtual environments
|
||||||
|
venv/
|
||||||
|
env/
|
||||||
|
ENV/
|
||||||
|
.venv/
|
||||||
|
.venv
|
||||||
|
|
||||||
|
# uv
|
||||||
|
uv.lock
|
||||||
|
.python-version
|
||||||
|
|
||||||
|
# IDE
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
|
*.swp
|
||||||
|
*.swo
|
||||||
|
*~
|
||||||
|
|
||||||
|
# OS
|
||||||
|
.DS_Store
|
||||||
|
Thumbs.db
|
||||||
|
|
||||||
|
# Application specific
|
||||||
|
*.log
|
||||||
|
config/*.yaml
|
||||||
|
!config/default_config.yaml
|
||||||
|
.local-transcription/
|
||||||
|
|
||||||
|
# Model cache
|
||||||
|
models/
|
||||||
|
.cache/
|
||||||
|
|
||||||
|
# PyInstaller
|
||||||
|
*.spec.lock
|
||||||
259
BUILD.md
Normal file
259
BUILD.md
Normal file
@@ -0,0 +1,259 @@
|
|||||||
|
# Building Local Transcription
|
||||||
|
|
||||||
|
This guide explains how to build standalone executables for Linux and Windows.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
1. **Python 3.8+** installed on your system
|
||||||
|
2. **uv** package manager (install from https://docs.astral.sh/uv/)
|
||||||
|
3. All project dependencies installed (`uv sync`)
|
||||||
|
|
||||||
|
## Building for Linux
|
||||||
|
|
||||||
|
### Standard Build (CPU-only):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Make the build script executable (first time only)
|
||||||
|
chmod +x build.sh
|
||||||
|
|
||||||
|
# Run the build script
|
||||||
|
./build.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### CUDA Build (GPU Support):
|
||||||
|
|
||||||
|
Build with CUDA support even without NVIDIA hardware:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Make the build script executable (first time only)
|
||||||
|
chmod +x build-cuda.sh
|
||||||
|
|
||||||
|
# Run the CUDA build script
|
||||||
|
./build-cuda.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
This will:
|
||||||
|
- Install PyTorch with CUDA 12.1 support
|
||||||
|
- Bundle CUDA runtime libraries (~600MB extra)
|
||||||
|
- Create an executable that works on both GPU and CPU systems
|
||||||
|
- Automatically fall back to CPU if no CUDA GPU is available
|
||||||
|
|
||||||
|
The executable will be created in `dist/LocalTranscription/LocalTranscription`
|
||||||
|
|
||||||
|
### Manual build:
|
||||||
|
```bash
|
||||||
|
# Clean previous builds
|
||||||
|
rm -rf build dist
|
||||||
|
|
||||||
|
# Build with PyInstaller
|
||||||
|
uv run pyinstaller local-transcription.spec
|
||||||
|
```
|
||||||
|
|
||||||
|
### Distribution:
|
||||||
|
```bash
|
||||||
|
cd dist
|
||||||
|
tar -czf LocalTranscription-Linux.tar.gz LocalTranscription/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Building for Windows
|
||||||
|
|
||||||
|
### Standard Build (CPU-only):
|
||||||
|
|
||||||
|
```cmd
|
||||||
|
# Run the build script
|
||||||
|
build.bat
|
||||||
|
```
|
||||||
|
|
||||||
|
### CUDA Build (GPU Support):
|
||||||
|
|
||||||
|
Build with CUDA support even without NVIDIA hardware:
|
||||||
|
|
||||||
|
```cmd
|
||||||
|
# Run the CUDA build script
|
||||||
|
build-cuda.bat
|
||||||
|
```
|
||||||
|
|
||||||
|
This will:
|
||||||
|
- Install PyTorch with CUDA 12.1 support
|
||||||
|
- Bundle CUDA runtime libraries (~600MB extra)
|
||||||
|
- Create an executable that works on both GPU and CPU systems
|
||||||
|
- Automatically fall back to CPU if no CUDA GPU is available
|
||||||
|
|
||||||
|
The executable will be created in `dist\LocalTranscription\LocalTranscription.exe`
|
||||||
|
|
||||||
|
### Manual build:
|
||||||
|
```cmd
|
||||||
|
# Clean previous builds
|
||||||
|
rmdir /s /q build
|
||||||
|
rmdir /s /q dist
|
||||||
|
|
||||||
|
# Build with PyInstaller
|
||||||
|
uv run pyinstaller local-transcription.spec
|
||||||
|
```
|
||||||
|
|
||||||
|
### Distribution:
|
||||||
|
- Compress the `dist\LocalTranscription` folder to a ZIP file
|
||||||
|
- Or use an installer creator like NSIS or Inno Setup
|
||||||
|
|
||||||
|
## Important Notes
|
||||||
|
|
||||||
|
### Cross-Platform Building
|
||||||
|
|
||||||
|
**You cannot cross-compile!**
|
||||||
|
- Linux executables must be built on Linux
|
||||||
|
- Windows executables must be built on Windows
|
||||||
|
- Mac executables must be built on macOS
|
||||||
|
|
||||||
|
### First Run
|
||||||
|
|
||||||
|
On the first run, the application will:
|
||||||
|
1. Create a config directory at `~/.local-transcription/` (Linux) or `%USERPROFILE%\.local-transcription\` (Windows)
|
||||||
|
2. Download the Whisper model (if not already present)
|
||||||
|
3. The model will be cached in `~/.cache/huggingface/` by default
|
||||||
|
|
||||||
|
### Executable Size
|
||||||
|
|
||||||
|
The built executable will be large (300MB - 2GB+) because it includes:
|
||||||
|
- Python runtime
|
||||||
|
- PySide6 (Qt framework)
|
||||||
|
- PyTorch/faster-whisper
|
||||||
|
- NumPy, SciPy, and other dependencies
|
||||||
|
|
||||||
|
### Console Window
|
||||||
|
|
||||||
|
By default, the console window is visible (for debugging). To hide it:
|
||||||
|
|
||||||
|
1. Edit `local-transcription.spec`
|
||||||
|
2. Change `console=True` to `console=False` in the `EXE` section
|
||||||
|
3. Rebuild
|
||||||
|
|
||||||
|
### GPU Support
|
||||||
|
|
||||||
|
#### Building with CUDA (Recommended for Distribution)
|
||||||
|
|
||||||
|
**Yes, you CAN build with CUDA support on systems without NVIDIA GPUs!**
|
||||||
|
|
||||||
|
PyTorch provides CUDA-enabled builds that bundle the CUDA runtime libraries. This means:
|
||||||
|
|
||||||
|
1. **You don't need NVIDIA hardware** to create CUDA-enabled builds
|
||||||
|
2. **The executable will work everywhere** - on systems with or without NVIDIA GPUs
|
||||||
|
3. **Automatic fallback** - the app detects available hardware and uses GPU if available, CPU otherwise
|
||||||
|
4. **Larger file size** - adds ~600MB-1GB to the executable size
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
```bash
|
||||||
|
# Linux
|
||||||
|
./build-cuda.sh
|
||||||
|
|
||||||
|
# Windows
|
||||||
|
build-cuda.bat
|
||||||
|
```
|
||||||
|
|
||||||
|
The build script will:
|
||||||
|
- Install PyTorch with bundled CUDA 12.1 runtime
|
||||||
|
- Package all CUDA libraries into the executable
|
||||||
|
- Create a universal build that runs on any system
|
||||||
|
|
||||||
|
**When users run the executable:**
|
||||||
|
- If they have an NVIDIA GPU with drivers: Uses GPU acceleration
|
||||||
|
- If they don't have NVIDIA GPU: Automatically uses CPU
|
||||||
|
- No configuration needed - it just works!
|
||||||
|
|
||||||
|
#### Alternative: CPU-Only Builds
|
||||||
|
|
||||||
|
If you only want CPU support (smaller file size):
|
||||||
|
```bash
|
||||||
|
# Linux
|
||||||
|
./build.sh
|
||||||
|
|
||||||
|
# Windows
|
||||||
|
build.bat
|
||||||
|
```
|
||||||
|
|
||||||
|
#### AMD GPU Support
|
||||||
|
|
||||||
|
- **ROCm**: Requires special PyTorch builds from AMD
|
||||||
|
- Not recommended for general distribution
|
||||||
|
- Better to use CUDA build (works on all systems) or CPU build
|
||||||
|
|
||||||
|
### Optimizations
|
||||||
|
|
||||||
|
To reduce size:
|
||||||
|
|
||||||
|
1. **Remove unused model sizes**: The app downloads models on-demand, so you don't need to bundle them
|
||||||
|
2. **Use UPX compression**: Already enabled in the spec file
|
||||||
|
3. **Exclude dev dependencies**: Only build dependencies are needed
|
||||||
|
|
||||||
|
## Testing the Build
|
||||||
|
|
||||||
|
After building, test the executable:
|
||||||
|
|
||||||
|
### Linux:
|
||||||
|
```bash
|
||||||
|
cd dist/LocalTranscription
|
||||||
|
./LocalTranscription
|
||||||
|
```
|
||||||
|
|
||||||
|
### Windows:
|
||||||
|
```cmd
|
||||||
|
cd dist\LocalTranscription
|
||||||
|
LocalTranscription.exe
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Missing modules error
|
||||||
|
If you get "No module named X" errors, add the module to the `hiddenimports` list in `local-transcription.spec`
|
||||||
|
|
||||||
|
### DLL errors (Windows)
|
||||||
|
Make sure Visual C++ Redistributable is installed on the target system:
|
||||||
|
https://aka.ms/vs/17/release/vc_redist.x64.exe
|
||||||
|
|
||||||
|
### Audio device errors
|
||||||
|
The application needs access to audio devices. Ensure:
|
||||||
|
- Microphone permissions are granted
|
||||||
|
- Audio drivers are installed
|
||||||
|
- PulseAudio (Linux) or Windows Audio is running
|
||||||
|
|
||||||
|
### Model download fails
|
||||||
|
Ensure internet connection on first run. Models are downloaded from:
|
||||||
|
https://huggingface.co/guillaumekln/faster-whisper-base
|
||||||
|
|
||||||
|
## Advanced: Adding an Icon
|
||||||
|
|
||||||
|
1. Create or obtain an `.ico` file (Windows) or `.png` file (Linux)
|
||||||
|
2. Edit `local-transcription.spec`
|
||||||
|
3. Change `icon=None` to `icon='path/to/your/icon.ico'`
|
||||||
|
4. Rebuild
|
||||||
|
|
||||||
|
## Advanced: Creating an Installer
|
||||||
|
|
||||||
|
### Windows (using Inno Setup):
|
||||||
|
|
||||||
|
1. Install Inno Setup: https://jrsoftware.org/isinfo.php
|
||||||
|
2. Create an `.iss` script file
|
||||||
|
3. Build the installer
|
||||||
|
|
||||||
|
### Linux (using AppImage):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install appimagetool
|
||||||
|
wget https://github.com/AppImage/AppImageKit/releases/download/continuous/appimagetool-x86_64.AppImage
|
||||||
|
chmod +x appimagetool-x86_64.AppImage
|
||||||
|
|
||||||
|
# Create AppDir structure
|
||||||
|
mkdir -p LocalTranscription.AppDir/usr/bin
|
||||||
|
cp -r dist/LocalTranscription/* LocalTranscription.AppDir/usr/bin/
|
||||||
|
|
||||||
|
# Create desktop file and icon
|
||||||
|
# (Create .desktop file and icon as needed)
|
||||||
|
|
||||||
|
# Build AppImage
|
||||||
|
./appimagetool-x86_64.AppImage LocalTranscription.AppDir
|
||||||
|
```
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
For build issues, check:
|
||||||
|
1. PyInstaller documentation: https://pyinstaller.org/
|
||||||
|
2. Project issues: https://github.com/anthropics/claude-code/issues
|
||||||
194
INSTALL.md
Normal file
194
INSTALL.md
Normal file
@@ -0,0 +1,194 @@
|
|||||||
|
# Installation Guide
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- **Python 3.9 or higher**
|
||||||
|
- **uv** (Python package installer)
|
||||||
|
- **FFmpeg** (required by faster-whisper)
|
||||||
|
- **CUDA-capable GPU** (optional, for GPU acceleration)
|
||||||
|
|
||||||
|
### Installing uv
|
||||||
|
|
||||||
|
If you don't have `uv` installed:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On macOS and Linux
|
||||||
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||||
|
|
||||||
|
# On Windows
|
||||||
|
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
|
||||||
|
|
||||||
|
# Or with pip
|
||||||
|
pip install uv
|
||||||
|
```
|
||||||
|
|
||||||
|
### Installing FFmpeg
|
||||||
|
|
||||||
|
#### On Ubuntu/Debian:
|
||||||
|
```bash
|
||||||
|
sudo apt update
|
||||||
|
sudo apt install ffmpeg
|
||||||
|
```
|
||||||
|
|
||||||
|
#### On macOS (with Homebrew):
|
||||||
|
```bash
|
||||||
|
brew install ffmpeg
|
||||||
|
```
|
||||||
|
|
||||||
|
#### On Windows:
|
||||||
|
Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH.
|
||||||
|
|
||||||
|
## Installation Steps
|
||||||
|
|
||||||
|
### 1. Navigate to Project Directory
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /home/jknapp/code/local-transcription
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Install Dependencies with uv
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# uv will automatically create a virtual environment and install dependencies
|
||||||
|
uv sync
|
||||||
|
```
|
||||||
|
|
||||||
|
This single command will:
|
||||||
|
- Create a virtual environment (`.venv/`)
|
||||||
|
- Install all dependencies from `pyproject.toml`
|
||||||
|
- Lock dependencies for reproducibility
|
||||||
|
|
||||||
|
**Note for CUDA users:** If you have an NVIDIA GPU, install PyTorch with CUDA support:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# For CUDA 11.8
|
||||||
|
uv pip install torch --index-url https://download.pytorch.org/whl/cu118
|
||||||
|
|
||||||
|
# For CUDA 12.1
|
||||||
|
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Run the Application
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Option 1: Using uv run (automatically uses the venv)
|
||||||
|
uv run python main.py
|
||||||
|
|
||||||
|
# Option 2: Activate venv manually
|
||||||
|
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
||||||
|
python main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
On first run, the application will:
|
||||||
|
- Download the Whisper model (this may take a few minutes)
|
||||||
|
- Create a configuration file at `~/.local-transcription/config.yaml`
|
||||||
|
|
||||||
|
## Quick Start Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install everything
|
||||||
|
uv sync
|
||||||
|
|
||||||
|
# Run the application
|
||||||
|
uv run python main.py
|
||||||
|
|
||||||
|
# Install with server dependencies (for Phase 2+)
|
||||||
|
uv sync --extra server
|
||||||
|
|
||||||
|
# Update dependencies
|
||||||
|
uv sync --upgrade
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Settings can be changed through the GUI (Settings button) or by editing:
|
||||||
|
```
|
||||||
|
~/.local-transcription/config.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Audio Device Issues
|
||||||
|
|
||||||
|
If no audio devices are detected:
|
||||||
|
```bash
|
||||||
|
uv run python -c "import sounddevice as sd; print(sd.query_devices())"
|
||||||
|
```
|
||||||
|
|
||||||
|
### GPU Not Detected
|
||||||
|
|
||||||
|
Check if CUDA is available:
|
||||||
|
```bash
|
||||||
|
uv run python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Model Download Fails
|
||||||
|
|
||||||
|
Models are downloaded to `~/.cache/huggingface/`. If download fails:
|
||||||
|
- Check internet connection
|
||||||
|
- Ensure sufficient disk space (~1-3 GB depending on model size)
|
||||||
|
|
||||||
|
### uv Command Not Found
|
||||||
|
|
||||||
|
Make sure uv is in your PATH:
|
||||||
|
```bash
|
||||||
|
# Add to ~/.bashrc or ~/.zshrc
|
||||||
|
export PATH="$HOME/.cargo/bin:$PATH"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Tips
|
||||||
|
|
||||||
|
For best real-time performance:
|
||||||
|
|
||||||
|
1. **Use GPU if available** - 5-10x faster than CPU
|
||||||
|
2. **Start with smaller models**:
|
||||||
|
- `tiny`: Fastest, ~39M parameters, 1-2s latency
|
||||||
|
- `base`: Good balance, ~74M parameters, 2-3s latency
|
||||||
|
- `small`: Better accuracy, ~244M parameters, 3-5s latency
|
||||||
|
3. **Enable VAD** (Voice Activity Detection) to skip silent audio
|
||||||
|
4. **Adjust chunk duration**: Smaller = lower latency, larger = better accuracy
|
||||||
|
|
||||||
|
## System Requirements
|
||||||
|
|
||||||
|
### Minimum:
|
||||||
|
- CPU: Dual-core 2GHz+
|
||||||
|
- RAM: 4GB
|
||||||
|
- Model: tiny or base
|
||||||
|
|
||||||
|
### Recommended:
|
||||||
|
- CPU: Quad-core 3GHz+ or GPU (NVIDIA GTX 1060+)
|
||||||
|
- RAM: 8GB
|
||||||
|
- Model: base or small
|
||||||
|
|
||||||
|
### For Best Performance:
|
||||||
|
- GPU: NVIDIA RTX 2060 or better
|
||||||
|
- RAM: 16GB
|
||||||
|
- Model: small or medium
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
### Install development dependencies:
|
||||||
|
```bash
|
||||||
|
uv sync --extra dev
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run tests:
|
||||||
|
```bash
|
||||||
|
uv run pytest
|
||||||
|
```
|
||||||
|
|
||||||
|
### Format code:
|
||||||
|
```bash
|
||||||
|
uv run black .
|
||||||
|
uv run ruff check .
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why uv?
|
||||||
|
|
||||||
|
`uv` is significantly faster than pip:
|
||||||
|
- **10-100x faster** dependency resolution
|
||||||
|
- **Automatic virtual environment** management
|
||||||
|
- **Reproducible builds** with lockfile
|
||||||
|
- **Drop-in replacement** for pip commands
|
||||||
|
|
||||||
|
Learn more at [astral.sh/uv](https://astral.sh/uv)
|
||||||
440
NEXT_STEPS.md
Normal file
440
NEXT_STEPS.md
Normal file
@@ -0,0 +1,440 @@
|
|||||||
|
# Next Steps for Local Transcription
|
||||||
|
|
||||||
|
This document outlines potential future enhancements and features for the Local Transcription application.
|
||||||
|
|
||||||
|
## Current Status: Phase 1 Complete ✅
|
||||||
|
|
||||||
|
The application currently has:
|
||||||
|
- ✅ Desktop GUI with PySide6
|
||||||
|
- ✅ Real-time transcription with Whisper (faster-whisper)
|
||||||
|
- ✅ Audio capture with automatic sample rate detection and resampling
|
||||||
|
- ✅ Noise suppression with Voice Activity Detection (VAD)
|
||||||
|
- ✅ Web server for OBS browser source integration
|
||||||
|
- ✅ Configurable display settings (font, timestamps, fade duration)
|
||||||
|
- ✅ Settings apply without restart
|
||||||
|
- ✅ Auto-fade for web display
|
||||||
|
- ✅ Standalone executable builds for Linux and Windows
|
||||||
|
- ✅ CUDA support (with automatic CPU fallback)
|
||||||
|
|
||||||
|
## Phase 2: Multi-User Server Architecture (Optional)
|
||||||
|
|
||||||
|
If you want to enable multiple users to sync their transcriptions to a shared display:
|
||||||
|
|
||||||
|
### Server Components
|
||||||
|
|
||||||
|
1. **WebSocket Server**
|
||||||
|
- Accept connections from multiple clients
|
||||||
|
- Aggregate transcriptions from all connected users
|
||||||
|
- Broadcast to web display clients
|
||||||
|
- Handle user authentication/authorization
|
||||||
|
- Rate limiting and abuse prevention
|
||||||
|
|
||||||
|
2. **Database/Storage** (Optional)
|
||||||
|
- Store transcription history
|
||||||
|
- User management
|
||||||
|
- Session logs for later review
|
||||||
|
- Consider: SQLite, PostgreSQL, or Redis
|
||||||
|
|
||||||
|
3. **Web Admin Interface**
|
||||||
|
- Monitor connected clients
|
||||||
|
- View active sessions
|
||||||
|
- Manage users and permissions
|
||||||
|
- Export transcription logs
|
||||||
|
|
||||||
|
### Client Updates
|
||||||
|
|
||||||
|
1. **Server Sync Toggle**
|
||||||
|
- Enable/disable server sync in Settings
|
||||||
|
- Server URL configuration
|
||||||
|
- API key/authentication setup
|
||||||
|
- Connection status indicator
|
||||||
|
|
||||||
|
2. **Network Handling**
|
||||||
|
- Auto-reconnect on connection loss
|
||||||
|
- Queue transcriptions when offline
|
||||||
|
- Sync when connection restored
|
||||||
|
|
||||||
|
### Implementation Technologies
|
||||||
|
|
||||||
|
- **Server Framework**: FastAPI (already used for web display)
|
||||||
|
- **WebSocket**: Already integrated
|
||||||
|
- **Database**: SQLAlchemy + SQLite/PostgreSQL
|
||||||
|
- **Deployment**: Docker container for easy deployment
|
||||||
|
|
||||||
|
**Estimated Effort**: 2-3 weeks for full implementation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: Enhanced Features
|
||||||
|
|
||||||
|
### Transcription Improvements
|
||||||
|
|
||||||
|
1. **Multi-Language Support**
|
||||||
|
- Automatic language detection
|
||||||
|
- Real-time language switching
|
||||||
|
- Translation between languages
|
||||||
|
- Per-user language settings
|
||||||
|
|
||||||
|
2. **Speaker Diarization**
|
||||||
|
- Detect and label different speakers
|
||||||
|
- Use pyannote.audio or similar
|
||||||
|
- Automatically assign speaker IDs
|
||||||
|
|
||||||
|
3. **Custom Vocabulary**
|
||||||
|
- Add gaming terms, streamer names
|
||||||
|
- Technical jargon support
|
||||||
|
- Proper noun correction
|
||||||
|
|
||||||
|
4. **Punctuation & Formatting**
|
||||||
|
- Automatic punctuation insertion
|
||||||
|
- Sentence capitalization
|
||||||
|
- Better text formatting
|
||||||
|
|
||||||
|
### Display Enhancements
|
||||||
|
|
||||||
|
1. **Theme System**
|
||||||
|
- Light/dark themes
|
||||||
|
- Custom color schemes
|
||||||
|
- User-created themes (JSON/YAML)
|
||||||
|
- Per-element styling
|
||||||
|
|
||||||
|
2. **Animation Options**
|
||||||
|
- Different fade effects
|
||||||
|
- Slide in/out animations
|
||||||
|
- Configurable transition speeds
|
||||||
|
- Particle effects (optional)
|
||||||
|
|
||||||
|
3. **Layout Modes**
|
||||||
|
- Karaoke-style (word highlighting)
|
||||||
|
- Ticker tape (scrolling bottom)
|
||||||
|
- Multi-column for multiple users
|
||||||
|
- Picture-in-picture mode
|
||||||
|
|
||||||
|
4. **Web Display Customization**
|
||||||
|
- CSS customization interface
|
||||||
|
- Live preview in settings
|
||||||
|
- Save/load custom styles
|
||||||
|
- Community theme sharing
|
||||||
|
|
||||||
|
### Audio Processing
|
||||||
|
|
||||||
|
1. **Advanced Noise Reduction**
|
||||||
|
- RNNoise integration
|
||||||
|
- Custom noise profiles
|
||||||
|
- Adaptive filtering
|
||||||
|
- Echo cancellation
|
||||||
|
|
||||||
|
2. **Audio Effects**
|
||||||
|
- Equalization presets
|
||||||
|
- Compression/normalization
|
||||||
|
- Voice enhancement filters
|
||||||
|
|
||||||
|
3. **Multi-Input Support**
|
||||||
|
- Multiple microphones simultaneously
|
||||||
|
- Virtual audio cable integration
|
||||||
|
- Audio routing/mixing
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4: Integration & Automation
|
||||||
|
|
||||||
|
### OBS Integration
|
||||||
|
|
||||||
|
1. **OBS Plugin** (Advanced)
|
||||||
|
- Native OBS plugin instead of browser source
|
||||||
|
- Lower resource usage
|
||||||
|
- Better performance
|
||||||
|
- Tighter integration
|
||||||
|
|
||||||
|
2. **Scene Integration**
|
||||||
|
- Auto-show/hide based on speech
|
||||||
|
- Integrate with OBS scene switcher
|
||||||
|
- Hotkey support
|
||||||
|
|
||||||
|
### Streaming Platform Integration
|
||||||
|
|
||||||
|
1. **Twitch Integration**
|
||||||
|
- Send captions to Twitch chat
|
||||||
|
- Twitch API integration
|
||||||
|
- Custom Twitch bot
|
||||||
|
|
||||||
|
2. **YouTube Integration**
|
||||||
|
- Live caption upload
|
||||||
|
- YouTube API integration
|
||||||
|
|
||||||
|
3. **Discord Integration**
|
||||||
|
- Send transcriptions to Discord webhook
|
||||||
|
- Discord bot for voice chat transcription
|
||||||
|
|
||||||
|
### Automation
|
||||||
|
|
||||||
|
1. **Hotkey Support**
|
||||||
|
- Global hotkeys for start/stop
|
||||||
|
- Toggle display visibility
|
||||||
|
- Quick settings access
|
||||||
|
|
||||||
|
2. **Voice Commands**
|
||||||
|
- "Hey Transcription, start/stop"
|
||||||
|
- Command detection in audio stream
|
||||||
|
- Configurable wake words
|
||||||
|
|
||||||
|
3. **Auto-Start Options**
|
||||||
|
- Start with OBS
|
||||||
|
- Start on system boot
|
||||||
|
- Auto-detect streaming software
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5: Advanced Features
|
||||||
|
|
||||||
|
### AI Enhancements
|
||||||
|
|
||||||
|
1. **Summarization**
|
||||||
|
- Real-time conversation summaries
|
||||||
|
- Key point extraction
|
||||||
|
- Topic detection
|
||||||
|
|
||||||
|
2. **Sentiment Analysis**
|
||||||
|
- Detect tone/emotion
|
||||||
|
- Highlight important moments
|
||||||
|
- Filter profanity (optional)
|
||||||
|
|
||||||
|
3. **Context Awareness**
|
||||||
|
- Remember conversation context
|
||||||
|
- Better transcription accuracy
|
||||||
|
- Adaptive vocabulary
|
||||||
|
|
||||||
|
### Analytics & Insights
|
||||||
|
|
||||||
|
1. **Usage Statistics**
|
||||||
|
- Words per minute
|
||||||
|
- Speaking time per user
|
||||||
|
- Most common words/phrases
|
||||||
|
- Accuracy metrics
|
||||||
|
|
||||||
|
2. **Export Options**
|
||||||
|
- Export to SRT/VTT for video captions
|
||||||
|
- PDF/Word document export
|
||||||
|
- CSV for data analysis
|
||||||
|
- JSON API for custom tools
|
||||||
|
|
||||||
|
3. **Search & Filter**
|
||||||
|
- Search transcription history
|
||||||
|
- Filter by user, date, keyword
|
||||||
|
- Highlight search results
|
||||||
|
|
||||||
|
### Accessibility
|
||||||
|
|
||||||
|
1. **Screen Reader Support**
|
||||||
|
- Full NVDA/JAWS compatibility
|
||||||
|
- Keyboard navigation
|
||||||
|
- Voice feedback
|
||||||
|
|
||||||
|
2. **High Contrast Modes**
|
||||||
|
- Enhanced visibility options
|
||||||
|
- Color blind friendly palettes
|
||||||
|
|
||||||
|
3. **Text-to-Speech**
|
||||||
|
- Read back transcriptions
|
||||||
|
- Multiple voice options
|
||||||
|
- Speed control
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Performance Optimizations
|
||||||
|
|
||||||
|
### Current Considerations
|
||||||
|
|
||||||
|
1. **Model Optimization**
|
||||||
|
- Quantization (int8, int4)
|
||||||
|
- Smaller model variants
|
||||||
|
- TensorRT optimization (NVIDIA)
|
||||||
|
- ONNX Runtime support
|
||||||
|
|
||||||
|
2. **Caching**
|
||||||
|
- Cache common phrases
|
||||||
|
- Model warm-up on startup
|
||||||
|
- Preload frequently used resources
|
||||||
|
|
||||||
|
3. **Resource Management**
|
||||||
|
- Dynamic batch sizing
|
||||||
|
- Memory pooling
|
||||||
|
- Thread pool optimization
|
||||||
|
|
||||||
|
### Future Optimizations
|
||||||
|
|
||||||
|
1. **Distributed Processing**
|
||||||
|
- Offload to cloud GPU
|
||||||
|
- Share processing across multiple machines
|
||||||
|
- Load balancing
|
||||||
|
|
||||||
|
2. **Edge Computing**
|
||||||
|
- Run on edge devices (Raspberry Pi)
|
||||||
|
- Mobile app support
|
||||||
|
- Embedded systems
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Community Features
|
||||||
|
|
||||||
|
### Sharing & Collaboration
|
||||||
|
|
||||||
|
1. **Theme Marketplace**
|
||||||
|
- Share custom themes
|
||||||
|
- Download community themes
|
||||||
|
- Rating system
|
||||||
|
|
||||||
|
2. **Plugin System**
|
||||||
|
- Allow community plugins
|
||||||
|
- Custom audio filters
|
||||||
|
- Display widgets
|
||||||
|
- Integration modules
|
||||||
|
|
||||||
|
3. **Documentation**
|
||||||
|
- Video tutorials
|
||||||
|
- Wiki/knowledge base
|
||||||
|
- API documentation
|
||||||
|
- Developer guides
|
||||||
|
|
||||||
|
### User Support
|
||||||
|
|
||||||
|
1. **In-App Help**
|
||||||
|
- Contextual help tooltips
|
||||||
|
- Getting started wizard
|
||||||
|
- Troubleshooting guide
|
||||||
|
|
||||||
|
2. **Community Forum**
|
||||||
|
- GitHub Discussions
|
||||||
|
- Discord server
|
||||||
|
- Reddit community
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Technical Debt & Maintenance
|
||||||
|
|
||||||
|
### Code Quality
|
||||||
|
|
||||||
|
1. **Testing**
|
||||||
|
- Unit tests for core modules
|
||||||
|
- Integration tests
|
||||||
|
- End-to-end tests
|
||||||
|
- Performance benchmarks
|
||||||
|
|
||||||
|
2. **Documentation**
|
||||||
|
- API documentation
|
||||||
|
- Code comments
|
||||||
|
- Architecture diagrams
|
||||||
|
- Developer setup guide
|
||||||
|
|
||||||
|
3. **CI/CD**
|
||||||
|
- Automated builds
|
||||||
|
- Automated testing
|
||||||
|
- Release automation
|
||||||
|
- Cross-platform testing
|
||||||
|
|
||||||
|
### Security
|
||||||
|
|
||||||
|
1. **Security Audits**
|
||||||
|
- Dependency scanning
|
||||||
|
- Vulnerability assessment
|
||||||
|
- Code security review
|
||||||
|
|
||||||
|
2. **Data Privacy**
|
||||||
|
- Local-first by default
|
||||||
|
- Optional cloud features
|
||||||
|
- GDPR compliance (if applicable)
|
||||||
|
- Clear privacy policy
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Immediate Quick Wins
|
||||||
|
|
||||||
|
These are small enhancements that could be implemented quickly:
|
||||||
|
|
||||||
|
### Easy (< 1 day)
|
||||||
|
|
||||||
|
- [ ] Add application icon
|
||||||
|
- [ ] Add "About" dialog with version info
|
||||||
|
- [ ] Add keyboard shortcuts (Ctrl+S for settings, etc.)
|
||||||
|
- [ ] Add system tray icon
|
||||||
|
- [ ] Save window position/size
|
||||||
|
- [ ] Add "Check for Updates" feature
|
||||||
|
- [ ] Export transcriptions to text file
|
||||||
|
|
||||||
|
### Medium (1-3 days)
|
||||||
|
|
||||||
|
- [ ] Add profanity filter (optional)
|
||||||
|
- [ ] Add confidence score display
|
||||||
|
- [ ] Add audio level meter
|
||||||
|
- [ ] Multiple language support in UI
|
||||||
|
- [ ] Dark/light theme toggle
|
||||||
|
- [ ] Backup/restore settings
|
||||||
|
- [ ] Recent transcriptions history
|
||||||
|
|
||||||
|
### Larger (1+ weeks)
|
||||||
|
|
||||||
|
- [ ] Cloud sync for settings
|
||||||
|
- [ ] Mobile companion app
|
||||||
|
- [ ] Browser extension
|
||||||
|
- [ ] API server mode
|
||||||
|
- [ ] Plugin architecture
|
||||||
|
- [ ] Advanced audio visualization
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resources & References
|
||||||
|
|
||||||
|
### Documentation
|
||||||
|
- [Faster-Whisper](https://github.com/guillaumekln/faster-whisper)
|
||||||
|
- [PySide6 Documentation](https://doc.qt.io/qtforpython/)
|
||||||
|
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
|
||||||
|
- [PyInstaller Manual](https://pyinstaller.org/en/stable/)
|
||||||
|
|
||||||
|
### Similar Projects
|
||||||
|
- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - C++ implementation
|
||||||
|
- [Buzz](https://github.com/chidiwilliams/buzz) - Desktop transcription tool
|
||||||
|
- [OpenAI Whisper](https://github.com/openai/whisper) - Original implementation
|
||||||
|
|
||||||
|
### Community
|
||||||
|
- Create GitHub Discussions for feature requests
|
||||||
|
- Set up issue templates
|
||||||
|
- Contributing guidelines
|
||||||
|
- Code of conduct
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decision Log
|
||||||
|
|
||||||
|
Track major architectural decisions here:
|
||||||
|
|
||||||
|
### 2025-12-25: PyInstaller for Distribution
|
||||||
|
- **Decision**: Use PyInstaller for creating standalone executables
|
||||||
|
- **Rationale**: Good PySide6 support, active development, cross-platform
|
||||||
|
- **Alternatives Considered**: cx_Freeze, Nuitka, py2exe
|
||||||
|
- **Impact**: Users can run without Python installation
|
||||||
|
|
||||||
|
### 2025-12-25: CUDA Build Strategy
|
||||||
|
- **Decision**: Provide CUDA-enabled builds that bundle CUDA runtime
|
||||||
|
- **Rationale**: Universal builds work everywhere, automatic GPU detection
|
||||||
|
- **Trade-off**: Larger file size (~600MB extra) for better UX
|
||||||
|
- **Impact**: Single build for both GPU and CPU users
|
||||||
|
|
||||||
|
### 2025-12-25: Web Server Always Running
|
||||||
|
- **Decision**: Remove enable/disable toggle, always run web server
|
||||||
|
- **Rationale**: Simplifies UX, no configuration needed for OBS
|
||||||
|
- **Impact**: Uses one local port (8080 by default), minimal overhead
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Contact & Contribution
|
||||||
|
|
||||||
|
When this project is public:
|
||||||
|
- **Issues**: Report bugs and request features on GitHub Issues
|
||||||
|
- **Pull Requests**: Contributions welcome! See CONTRIBUTING.md
|
||||||
|
- **Discussions**: Join GitHub Discussions for questions and ideas
|
||||||
|
- **License**: [To be determined - consider MIT or Apache 2.0]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Last Updated: 2025-12-25*
|
||||||
|
*Version: 1.0.0 (Phase 1 Complete)*
|
||||||
494
README.md
Normal file
494
README.md
Normal file
@@ -0,0 +1,494 @@
|
|||||||
|
# Local Transcription for Streamers
|
||||||
|
|
||||||
|
A local speech-to-text application designed for streamers that provides real-time transcription using Whisper or similar models. Multiple users can run the application locally and sync their transcriptions to a centralized web stream that can be easily captured in OBS or other streaming software.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **Standalone Desktop Application**: Use locally with built-in GUI display - no server required
|
||||||
|
- **Local Transcription**: Run Whisper (or compatible models) locally on your machine
|
||||||
|
- **CPU/GPU Support**: Choose between CPU or GPU processing based on your hardware
|
||||||
|
- **Real-time Processing**: Live audio transcription with minimal latency
|
||||||
|
- **Noise Suppression**: Built-in audio preprocessing to reduce background noise
|
||||||
|
- **User Configuration**: Set your display name and preferences through the GUI
|
||||||
|
- **Optional Multi-user Sync**: Connect to a server to sync transcriptions with other users
|
||||||
|
- **OBS Integration**: Web-based output designed for easy browser source capture
|
||||||
|
- **Privacy-First**: All processing happens locally; only transcription text is shared
|
||||||
|
- **Customizable**: Configure model size, language, and streaming settings
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Running from Source
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install dependencies
|
||||||
|
uv sync
|
||||||
|
|
||||||
|
# Run the application
|
||||||
|
uv run python main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Building Standalone Executables
|
||||||
|
|
||||||
|
To create standalone executables for distribution:
|
||||||
|
|
||||||
|
**Linux:**
|
||||||
|
```bash
|
||||||
|
./build.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Windows:**
|
||||||
|
```cmd
|
||||||
|
build.bat
|
||||||
|
```
|
||||||
|
|
||||||
|
For detailed build instructions, see [BUILD.md](BUILD.md).
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
The application can run in two modes:
|
||||||
|
|
||||||
|
### Standalone Mode (No Server Required):
|
||||||
|
1. **Desktop Application**: Captures audio, performs speech-to-text, and displays transcriptions locally in a GUI window
|
||||||
|
|
||||||
|
### Multi-user Sync Mode (Optional):
|
||||||
|
1. **Local Transcription Client**: Captures audio, performs speech-to-text, and sends results to the web server
|
||||||
|
2. **Centralized Web Server**: Aggregates transcriptions from multiple clients and serves a web stream
|
||||||
|
3. **Web Stream Interface**: Browser-accessible page displaying synchronized transcriptions (for OBS capture)
|
||||||
|
|
||||||
|
## Use Cases
|
||||||
|
|
||||||
|
- **Multi-language Streams**: Multiple translators transcribing in different languages
|
||||||
|
- **Accessibility**: Provide real-time captions for viewers
|
||||||
|
- **Collaborative Podcasts**: Multiple hosts with separate transcriptions
|
||||||
|
- **Gaming Commentary**: Track who said what in multiplayer sessions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Plan
|
||||||
|
|
||||||
|
### Phase 1: Standalone Desktop Application
|
||||||
|
|
||||||
|
**Objective**: Build a fully functional standalone transcription app with GUI that works without any server
|
||||||
|
|
||||||
|
#### Components:
|
||||||
|
1. **Audio Capture Module**
|
||||||
|
- Capture system audio or microphone input
|
||||||
|
- Support multiple audio sources (virtual audio cables, physical devices)
|
||||||
|
- Real-time audio buffering with configurable chunk sizes
|
||||||
|
- **Noise Suppression**: Preprocess audio to reduce background noise
|
||||||
|
- Libraries: `pyaudio`, `sounddevice`, `noisereduce`, `webrtcvad`
|
||||||
|
|
||||||
|
2. **Noise Suppression Engine**
|
||||||
|
- Real-time noise reduction using RNNoise or noisereduce
|
||||||
|
- Adjustable noise reduction strength
|
||||||
|
- Optional VAD (Voice Activity Detection) to skip silent segments
|
||||||
|
- Libraries: `noisereduce`, `rnnoise-python`, `webrtcvad`
|
||||||
|
|
||||||
|
3. **Transcription Engine**
|
||||||
|
- Integrate OpenAI Whisper (or alternatives: faster-whisper, whisper.cpp)
|
||||||
|
- Support multiple model sizes (tiny, base, small, medium, large)
|
||||||
|
- CPU and GPU inference options
|
||||||
|
- Model management and automatic downloading
|
||||||
|
- Libraries: `openai-whisper`, `faster-whisper`, `torch`
|
||||||
|
|
||||||
|
4. **Device Selection**
|
||||||
|
- Auto-detect available compute devices (CPU, CUDA, MPS for Mac)
|
||||||
|
- Allow user to specify preferred device via GUI
|
||||||
|
- Graceful fallback if GPU unavailable
|
||||||
|
- Display device status and performance metrics
|
||||||
|
|
||||||
|
5. **Desktop GUI Application**
|
||||||
|
- Cross-platform GUI using PyQt6, Tkinter, or CustomTkinter
|
||||||
|
- Main transcription display window (scrolling text area)
|
||||||
|
- Settings panel for configuration
|
||||||
|
- User name input field
|
||||||
|
- Audio input device selector
|
||||||
|
- Model size selector
|
||||||
|
- CPU/GPU toggle
|
||||||
|
- Start/Stop transcription button
|
||||||
|
- Optional: System tray integration
|
||||||
|
- Libraries: `PyQt6`, `customtkinter`, or `tkinter`
|
||||||
|
|
||||||
|
6. **Local Display**
|
||||||
|
- Real-time transcription display in GUI window
|
||||||
|
- Scrolling text with timestamps
|
||||||
|
- User name/label shown with transcriptions
|
||||||
|
- Copy transcription to clipboard
|
||||||
|
- Optional: Save transcription to file (TXT, SRT, VTT)
|
||||||
|
|
||||||
|
#### Tasks:
|
||||||
|
- [ ] Set up project structure and dependencies
|
||||||
|
- [ ] Implement audio capture with device selection
|
||||||
|
- [ ] Add noise suppression and VAD preprocessing
|
||||||
|
- [ ] Integrate Whisper model loading and inference
|
||||||
|
- [ ] Add CPU/GPU device detection and selection logic
|
||||||
|
- [ ] Create real-time audio buffer processing pipeline
|
||||||
|
- [ ] Design and implement GUI layout (main window)
|
||||||
|
- [ ] Add settings panel with user name configuration
|
||||||
|
- [ ] Implement local transcription display area
|
||||||
|
- [ ] Add start/stop controls and status indicators
|
||||||
|
- [ ] Test transcription accuracy and latency
|
||||||
|
- [ ] Test noise suppression effectiveness
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 2: Web Server and Sync System
|
||||||
|
|
||||||
|
**Objective**: Create a centralized server to aggregate and serve transcriptions
|
||||||
|
|
||||||
|
#### Components:
|
||||||
|
1. **Web Server**
|
||||||
|
- FastAPI or Flask-based REST API
|
||||||
|
- WebSocket support for real-time updates
|
||||||
|
- User/client registration and management
|
||||||
|
- Libraries: `fastapi`, `uvicorn`, `websockets`
|
||||||
|
|
||||||
|
2. **Transcription Aggregator**
|
||||||
|
- Receive transcription chunks from multiple clients
|
||||||
|
- Associate transcriptions with user IDs/names
|
||||||
|
- Timestamp management and synchronization
|
||||||
|
- Buffer management for smooth streaming
|
||||||
|
|
||||||
|
3. **Database/Storage** (Optional)
|
||||||
|
- Store transcription history (SQLite for simplicity)
|
||||||
|
- Session management
|
||||||
|
- Export functionality (SRT, VTT, TXT formats)
|
||||||
|
|
||||||
|
#### API Endpoints:
|
||||||
|
- `POST /api/register` - Register a new client
|
||||||
|
- `POST /api/transcription` - Submit transcription chunk
|
||||||
|
- `WS /api/stream` - WebSocket for real-time transcription stream
|
||||||
|
- `GET /stream` - Web page for OBS browser source
|
||||||
|
|
||||||
|
#### Tasks:
|
||||||
|
- [ ] Set up FastAPI server with CORS support
|
||||||
|
- [ ] Implement WebSocket handler for real-time streaming
|
||||||
|
- [ ] Create client registration system
|
||||||
|
- [ ] Build transcription aggregation logic
|
||||||
|
- [ ] Add timestamp synchronization
|
||||||
|
- [ ] Create data models for clients and transcriptions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 3: Client-Server Communication (Optional Multi-user Mode)
|
||||||
|
|
||||||
|
**Objective**: Add optional server connectivity to enable multi-user transcription sync
|
||||||
|
|
||||||
|
#### Components:
|
||||||
|
1. **HTTP/WebSocket Client**
|
||||||
|
- Register client with server on startup
|
||||||
|
- Send transcription chunks as they're generated
|
||||||
|
- Handle connection drops and reconnection
|
||||||
|
- Libraries: `requests`, `websockets`
|
||||||
|
|
||||||
|
2. **Configuration System**
|
||||||
|
- Config file for server URL, API keys, user settings
|
||||||
|
- Model preferences (size, language)
|
||||||
|
- Audio input settings
|
||||||
|
- Format: YAML or JSON
|
||||||
|
|
||||||
|
3. **Status Monitoring**
|
||||||
|
- Connection status indicator
|
||||||
|
- Transcription queue health
|
||||||
|
- Error handling and logging
|
||||||
|
|
||||||
|
#### Tasks:
|
||||||
|
- [ ] Add "Enable Server Sync" toggle to GUI
|
||||||
|
- [ ] Add server URL configuration field in settings
|
||||||
|
- [ ] Implement WebSocket client for sending transcriptions
|
||||||
|
- [ ] Add configuration file support (YAML/JSON)
|
||||||
|
- [ ] Create connection management with auto-reconnect
|
||||||
|
- [ ] Add local logging and error handling
|
||||||
|
- [ ] Add server connection status indicator to GUI
|
||||||
|
- [ ] Allow app to function normally if server is unavailable
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 4: Web Stream Interface (OBS Integration)
|
||||||
|
|
||||||
|
**Objective**: Create a web page that displays synchronized transcriptions for OBS
|
||||||
|
|
||||||
|
#### Components:
|
||||||
|
1. **Web Frontend**
|
||||||
|
- HTML/CSS/JavaScript page for displaying transcriptions
|
||||||
|
- Responsive design with customizable styling
|
||||||
|
- Auto-scroll with configurable retention window
|
||||||
|
- Libraries: Vanilla JS or lightweight framework (Alpine.js, htmx)
|
||||||
|
|
||||||
|
2. **Styling Options**
|
||||||
|
- Customizable fonts, colors, sizes
|
||||||
|
- Background transparency for OBS chroma key
|
||||||
|
- User name/ID display options
|
||||||
|
- Timestamp display (optional)
|
||||||
|
|
||||||
|
3. **Display Modes**
|
||||||
|
- Scrolling captions (like live TV captions)
|
||||||
|
- Multi-user panel view (separate sections per user)
|
||||||
|
- Overlay mode (minimal UI for transparency)
|
||||||
|
|
||||||
|
#### Tasks:
|
||||||
|
- [ ] Create HTML template for transcription display
|
||||||
|
- [ ] Implement WebSocket client in JavaScript
|
||||||
|
- [ ] Add CSS styling with OBS-friendly transparency
|
||||||
|
- [ ] Create customization controls (URL parameters or UI)
|
||||||
|
- [ ] Test with OBS browser source
|
||||||
|
- [ ] Add configurable retention/scroll behavior
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 5: Advanced Features
|
||||||
|
|
||||||
|
**Objective**: Enhance functionality and user experience
|
||||||
|
|
||||||
|
#### Features:
|
||||||
|
1. **Language Detection**
|
||||||
|
- Auto-detect spoken language
|
||||||
|
- Multi-language support in single stream
|
||||||
|
- Language selector in GUI
|
||||||
|
|
||||||
|
2. **Speaker Diarization** (Optional)
|
||||||
|
- Identify different speakers
|
||||||
|
- Label transcriptions by speaker
|
||||||
|
- Useful for multi-host streams
|
||||||
|
|
||||||
|
3. **Profanity Filtering**
|
||||||
|
- Optional word filtering/replacement
|
||||||
|
- Customizable filter lists
|
||||||
|
- Toggle in GUI settings
|
||||||
|
|
||||||
|
4. **Advanced Noise Profiles**
|
||||||
|
- Save and load custom noise profiles
|
||||||
|
- Adaptive noise suppression
|
||||||
|
- Different profiles for different environments
|
||||||
|
|
||||||
|
5. **Export Functionality**
|
||||||
|
- Save transcriptions in multiple formats (TXT, SRT, VTT, JSON)
|
||||||
|
- Export button in GUI
|
||||||
|
- Automatic session saving
|
||||||
|
|
||||||
|
6. **Hotkey Support**
|
||||||
|
- Global hotkeys to start/stop transcription
|
||||||
|
- Mute/unmute hotkey
|
||||||
|
- Quick save hotkey
|
||||||
|
|
||||||
|
7. **Docker Support**
|
||||||
|
- Containerized server deployment
|
||||||
|
- Docker Compose for easy multi-component setup
|
||||||
|
- Pre-built images for easy deployment
|
||||||
|
|
||||||
|
8. **Themes and Customization**
|
||||||
|
- Dark/light theme toggle
|
||||||
|
- Customizable font sizes and colors for display
|
||||||
|
- OBS-friendly transparent overlay mode
|
||||||
|
|
||||||
|
#### Tasks:
|
||||||
|
- [ ] Add language detection and multi-language support
|
||||||
|
- [ ] Implement speaker diarization
|
||||||
|
- [ ] Create optional profanity filter
|
||||||
|
- [ ] Add export functionality (SRT, VTT, plain text, JSON)
|
||||||
|
- [ ] Implement global hotkey support
|
||||||
|
- [ ] Create Docker containers for server component
|
||||||
|
- [ ] Add theme customization options
|
||||||
|
- [ ] Create advanced noise profile management
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Technology Stack
|
||||||
|
|
||||||
|
### Local Client:
|
||||||
|
- **Python 3.9+**
|
||||||
|
- **GUI**: PyQt6 / CustomTkinter / tkinter
|
||||||
|
- **Audio**: PyAudio / sounddevice
|
||||||
|
- **Noise Suppression**: noisereduce / rnnoise-python
|
||||||
|
- **VAD**: webrtcvad
|
||||||
|
- **ML Framework**: PyTorch (for Whisper)
|
||||||
|
- **Transcription**: openai-whisper / faster-whisper
|
||||||
|
- **Networking**: websockets, requests (optional for server sync)
|
||||||
|
- **Config**: PyYAML / json
|
||||||
|
|
||||||
|
### Server:
|
||||||
|
- **Backend**: FastAPI / Flask
|
||||||
|
- **WebSocket**: python-websockets / FastAPI WebSockets
|
||||||
|
- **Server**: Uvicorn / Gunicorn
|
||||||
|
- **Database** (optional): SQLite / PostgreSQL
|
||||||
|
- **CORS**: fastapi-cors
|
||||||
|
|
||||||
|
### Web Interface:
|
||||||
|
- **Frontend**: HTML5, CSS3, JavaScript (ES6+)
|
||||||
|
- **Real-time**: WebSocket API
|
||||||
|
- **Styling**: CSS Grid/Flexbox for layout
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
local-transcription/
|
||||||
|
| ||||||