- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver for self-contained distribution without Python prerequisites - Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback - Parallel transcription + diarization pipeline (~30-40% faster) - GPU auto-detection for diarization (CUDA when available) - Async run_pipeline command for real-time progress event delivery - Web Audio API backend for instant playback and seeking - OpenAI-compatible provider replacing LiteLLM client-side routing - Cross-platform RAM detection (Linux/macOS/Windows) - Settings: speaker count hint, token reveal toggles, dark dropdown styling - Loading splash screen, flexbox layout fix for viewport overflow - Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM) - Updated README and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
53 lines
2.4 KiB
Markdown
53 lines
2.4 KiB
Markdown
# Voice to Notes — Project Guidelines
|
|
|
|
## Project Overview
|
|
Desktop app for transcribing audio/video with speaker identification. Runs locally on user's computer. See `docs/ARCHITECTURE.md` for full architecture.
|
|
|
|
## Tech Stack
|
|
- **Desktop shell:** Tauri v2 (Rust backend + Svelte/TypeScript frontend)
|
|
- **ML pipeline:** Python sidecar process (faster-whisper, pyannote.audio, wav2vec2)
|
|
- **Database:** SQLite (via rusqlite in Rust)
|
|
- **Local AI:** Bundled llama-server (llama.cpp) — default, no install needed
|
|
- **Cloud AI providers:** OpenAI, Anthropic, OpenAI-compatible endpoints (optional, user-configured)
|
|
- **Caption export:** pysubs2 (Python)
|
|
- **Audio UI:** wavesurfer.js
|
|
- **Transcript editor:** TipTap (ProseMirror)
|
|
|
|
## Key Architecture Decisions
|
|
- Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout)
|
|
- All ML models must work on CPU. GPU (CUDA) is optional acceleration.
|
|
- AI cloud providers are optional. Bundled llama-server (llama.cpp) is the default local AI — no separate install needed.
|
|
- Rust backend manages llama-server lifecycle (start/stop/port allocation).
|
|
- Project is open source (MIT license).
|
|
- SQLite database is per-project, stored alongside media files.
|
|
- Word-level timestamps are required for click-to-seek playback sync.
|
|
|
|
## Directory Structure
|
|
```
|
|
src/ # Svelte frontend source
|
|
src-tauri/ # Rust backend source
|
|
python/ # Python sidecar source
|
|
voice_to_notes/ # Python package
|
|
tests/ # Python tests
|
|
docs/ # Architecture and design documents
|
|
```
|
|
|
|
## Conventions
|
|
- Rust: follow standard Rust conventions, use `cargo fmt` and `cargo clippy`
|
|
- Python: Python 3.11+, use type hints, follow PEP 8, use `ruff` for linting
|
|
- TypeScript: strict mode, prefer Svelte stores for state management
|
|
- IPC messages: JSON-line format, each message has `id`, `type`, `payload` fields
|
|
- Database: UUIDs as primary keys (TEXT type in SQLite)
|
|
- All timestamps in milliseconds (integer) relative to media file start
|
|
|
|
## Distribution
|
|
- Python sidecar is frozen via PyInstaller into a standalone binary for distribution
|
|
- Tauri bundles the sidecar via `externalBin` — no Python required for end users
|
|
- CI/CD builds on Gitea Actions (Linux, Windows, macOS ARM)
|
|
- Dev mode uses system Python (`VOICE_TO_NOTES_DEV=1` or debug builds)
|
|
|
|
## Platform Targets
|
|
- Linux x86_64 (primary development target)
|
|
- Windows x86_64
|
|
- macOS aarch64 (Apple Silicon)
|