Initial commit: Local Transcription App v1.0

Phase 1 Complete - Standalone Desktop Application

Features:
- Real-time speech-to-text with Whisper (faster-whisper)
- PySide6 desktop GUI with settings dialog
- Web server for OBS browser source integration
- Audio capture with automatic sample rate detection and resampling
- Noise suppression with Voice Activity Detection (VAD)
- Configurable display settings (font, timestamps, fade duration)
- Settings apply without restart (with automatic model reloading)
- Auto-fade for web display transcriptions
- CPU/GPU support with automatic device detection
- Standalone executable builds (PyInstaller)
- CUDA build support (works on systems without CUDA hardware)

Components:
- Audio capture with sounddevice
- Noise reduction with noisereduce + webrtcvad
- Transcription with faster-whisper
- GUI with PySide6
- Web server with FastAPI + WebSocket
- Configuration system with YAML

Build System:
- Standard builds (CPU-only): build.sh / build.bat
- CUDA builds (universal): build-cuda.sh / build-cuda.bat
- Comprehensive BUILD.md documentation
- Cross-platform support (Linux, Windows)

Documentation:
- README.md with project overview and quick start
- BUILD.md with detailed build instructions
- NEXT_STEPS.md with future enhancement roadmap
- INSTALL.md with setup instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-12-25 18:48:23 -08:00
commit 472233aec4
31 changed files with 5116 additions and 0 deletions

59
pyproject.toml Normal file
View File

@@ -0,0 +1,59 @@
[project]
name = "local-transcription"
version = "0.1.0"
description = "A standalone desktop application for real-time speech-to-text transcription using Whisper models"
readme = "README.md"
requires-python = ">=3.9"
license = {text = "MIT"}
authors = [
{name = "Your Name", email = "your.email@example.com"}
]
keywords = ["transcription", "speech-to-text", "whisper", "streaming", "obs"]
dependencies = [
"numpy>=1.24.0",
"pyyaml>=6.0",
"sounddevice>=0.4.6",
"scipy>=1.10.0",
"noisereduce>=3.0.0",
"webrtcvad>=2.0.10",
"faster-whisper>=0.10.0",
"torch>=2.0.0",
"PySide6>=6.6.0",
]
[project.optional-dependencies]
server = [
"fastapi>=0.104.0",
"uvicorn>=0.24.0",
"websockets>=12.0",
"requests>=2.31.0",
]
dev = [
"pytest>=7.4.0",
"black>=23.0.0",
"ruff>=0.1.0",
]
[project.scripts]
local-transcription = "main:main"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["client", "gui"]
[tool.uv]
dev-dependencies = [
"pyinstaller>=6.17.0",
]
[tool.ruff]
line-length = 100
target-version = "py39"
[tool.black]
line-length = 100
target-version = ["py39"]