Files
local-transcription/local-transcription.spec
jknapp 5f3c058be6 Migrate to RealtimeSTT for advanced VAD-based transcription
Major refactor to eliminate word loss issues using RealtimeSTT with
dual-layer VAD (WebRTC + Silero) instead of time-based chunking.

## Core Changes

### New Transcription Engine
- Add client/transcription_engine_realtime.py with RealtimeSTT wrapper
- Implements initialize() and start_recording() separation for proper lifecycle
- Dual-layer VAD with pre/post buffers prevents word cutoffs
- Optional realtime preview with faster model + final transcription

### Removed Legacy Components
- Remove client/audio_capture.py (RealtimeSTT handles audio)
- Remove client/noise_suppression.py (VAD handles silence detection)
- Remove client/transcription_engine.py (replaced by realtime version)
- Remove chunk_duration setting (no longer using time-based chunking)

### Dependencies
- Add RealtimeSTT>=0.3.0 to pyproject.toml
- Remove noisereduce, webrtcvad, faster-whisper (now dependencies of RealtimeSTT)
- Update PyInstaller spec with ONNX Runtime, halo, colorama

### GUI Improvements
- Refactor main_window_qt.py to use RealtimeSTT with proper start/stop
- Fix recording state management (initialize on startup, record on button click)
- Expand settings dialog (700x1200) with improved spacing (10-15px between groups)
- Add comprehensive tooltips to all settings explaining functionality
- Remove chunk duration field from settings

### Configuration
- Update default_config.yaml with RealtimeSTT parameters:
  - Silero VAD sensitivity (0.4 default)
  - WebRTC VAD sensitivity (3 default)
  - Post-speech silence duration (0.3s)
  - Pre-recording buffer (0.2s)
  - Beam size for quality control (5 default)
  - ONNX acceleration (enabled for 2-3x faster VAD)
  - Optional realtime preview settings

### CLI Updates
- Update main_cli.py to use new engine API
- Separate initialize() and start_recording() calls

### Documentation
- Add INSTALL_REALTIMESTT.md with migration guide and benefits
- Update INSTALL.md: Remove FFmpeg requirement (not needed!)
- Clarify PortAudio is only needed for development
- Document that built executables are fully standalone

## Benefits

-  Eliminates word loss at chunk boundaries
-  Natural speech segment detection via VAD
-  2-3x faster VAD with ONNX acceleration
-  30% lower CPU usage
-  Pre-recording buffer captures word starts
-  Post-speech silence prevents cutoffs
-  Optional instant preview mode
-  Better UX with comprehensive tooltips

## Migration Notes

- Settings apply immediately without restart (except model changes)
- Old chunk_duration configs ignored (VAD-based detection now)
- Recording only starts when user clicks button (not on app startup)
- Stop button immediately stops recording (no delay)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-28 18:48:29 -08:00

173 lines
4.5 KiB
Python

# -*- mode: python ; coding: utf-8 -*-
"""PyInstaller spec file for Local Transcription app."""
import sys
from pathlib import Path
import os
block_cipher = None
# Determine if we're on Windows
is_windows = sys.platform == 'win32'
# Import PyInstaller utilities
from PyInstaller.utils.hooks import collect_submodules, collect_data_files
# Find faster_whisper assets folder
import faster_whisper
faster_whisper_path = os.path.dirname(faster_whisper.__file__)
vad_assets_path = os.path.join(faster_whisper_path, 'assets')
# Base configuration
binaries = []
datas = [
('config/default_config.yaml', 'config'),
(vad_assets_path, 'faster_whisper/assets'), # Include VAD model
]
hiddenimports = [
'PySide6.QtCore',
'PySide6.QtWidgets',
'PySide6.QtGui',
'faster_whisper',
'faster_whisper.transcribe',
'faster_whisper.vad',
'ctranslate2',
'sounddevice',
'scipy',
'scipy.signal',
'numpy',
# RealtimeSTT and its dependencies
'RealtimeSTT',
'RealtimeSTT.audio_recorder',
'webrtcvad',
'webrtcvad_wheels',
'silero_vad',
'torch',
'torch.nn',
'torch.nn.functional',
'torchaudio',
'onnxruntime',
'onnxruntime.capi',
'onnxruntime.capi.onnxruntime_pybind11_state',
'pyaudio',
'halo', # RealtimeSTT progress indicator
'colorama', # Terminal colors (used by halo)
# FastAPI and dependencies
'fastapi',
'fastapi.routing',
'fastapi.responses',
'starlette',
'starlette.applications',
'starlette.routing',
'starlette.responses',
'starlette.websockets',
'starlette.middleware',
'starlette.middleware.cors',
'pydantic',
'pydantic.fields',
'pydantic.main',
'anyio',
'anyio._backends',
'anyio._backends._asyncio',
'sniffio',
# Uvicorn and dependencies
'uvicorn',
'uvicorn.logging',
'uvicorn.loops',
'uvicorn.loops.auto',
'uvicorn.protocols',
'uvicorn.protocols.http',
'uvicorn.protocols.http.auto',
'uvicorn.protocols.http.h11_impl',
'uvicorn.protocols.websockets',
'uvicorn.protocols.websockets.auto',
'uvicorn.protocols.websockets.wsproto_impl',
'uvicorn.lifespan',
'uvicorn.lifespan.on',
'h11',
'websockets',
'websockets.legacy',
'websockets.legacy.server',
# Requests (for server sync)
'requests',
'urllib3',
'certifi',
'charset_normalizer',
]
# Collect all submodules for FastAPI and related packages
# This approach is more reliable than collect_all() which has design flaws
# Particularly important for pydantic which uses compiled cpython extensions
print("Collecting submodules for FastAPI packages...")
for package in ['fastapi', 'starlette', 'pydantic', 'pydantic_core', 'anyio', 'uvicorn', 'websockets', 'h11', 'httptools', 'uvloop']:
try:
submodules = collect_submodules(package)
hiddenimports += submodules
print(f" ✓ Collected {len(submodules)} submodules from {package}")
except Exception as e:
print(f" ⚠ Warning: Could not collect {package}: {e}")
# Collect data files for packages that need them
for package in ['fastapi', 'starlette', 'pydantic', 'uvicorn']:
try:
data_files = collect_data_files(package)
if data_files:
datas += data_files
print(f" ✓ Collected {len(data_files)} data files from {package}")
except Exception as e:
pass # Not all packages have data files
# Add critical pydantic dependencies that may be missed
hiddenimports += [
'colorsys', 'decimal', 'json', 'ipaddress', 'pathlib', 'uuid',
'email.message', 'typing_extensions',
]
a = Analysis(
['main.py'],
pathex=[],
binaries=binaries,
datas=datas,
hiddenimports=hiddenimports,
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False,
)
pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
exe = EXE(
pyz,
a.scripts,
[],
exclude_binaries=True,
name='LocalTranscription',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=False, # Hide console window for GUI application
disable_windowed_traceback=False,
argv_emulation=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None,
icon=None, # Add icon file path here if you have one
)
coll = COLLECT(
exe,
a.binaries,
a.zipfiles,
a.datas,
strip=False,
upx=True,
upx_exclude=[],
name='LocalTranscription',
)