Files

Claude 58faa83cb3 Cross-platform distribution, UI improvements, and performance optimizations

- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-20 21:33:43 -07:00

2.4 KiB

Raw Blame History

Voice to Notes — Project Guidelines

Project Overview

Desktop app for transcribing audio/video with speaker identification. Runs locally on user's computer. See docs/ARCHITECTURE.md for full architecture.

Tech Stack

Desktop shell: Tauri v2 (Rust backend + Svelte/TypeScript frontend)
ML pipeline: Python sidecar process (faster-whisper, pyannote.audio, wav2vec2)
Database: SQLite (via rusqlite in Rust)
Local AI: Bundled llama-server (llama.cpp) — default, no install needed
Cloud AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (optional, user-configured)
Caption export: pysubs2 (Python)
Audio UI: wavesurfer.js
Transcript editor: TipTap (ProseMirror)

Key Architecture Decisions

Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout)
All ML models must work on CPU. GPU (CUDA) is optional acceleration.
AI cloud providers are optional. Bundled llama-server (llama.cpp) is the default local AI — no separate install needed.
Rust backend manages llama-server lifecycle (start/stop/port allocation).
Project is open source (MIT license).
SQLite database is per-project, stored alongside media files.
Word-level timestamps are required for click-to-seek playback sync.

Directory Structure

src/                    # Svelte frontend source
src-tauri/              # Rust backend source
python/                 # Python sidecar source
  voice_to_notes/       # Python package
  tests/                # Python tests
docs/                   # Architecture and design documents

Conventions

Rust: follow standard Rust conventions, use cargo fmt and cargo clippy
Python: Python 3.11+, use type hints, follow PEP 8, use ruff for linting
TypeScript: strict mode, prefer Svelte stores for state management
IPC messages: JSON-line format, each message has id, type, payload fields
Database: UUIDs as primary keys (TEXT type in SQLite)
All timestamps in milliseconds (integer) relative to media file start

Distribution

Python sidecar is frozen via PyInstaller into a standalone binary for distribution
Tauri bundles the sidecar via externalBin — no Python required for end users
CI/CD builds on Gitea Actions (Linux, Windows, macOS ARM)
Dev mode uses system Python (VOICE_TO_NOTES_DEV=1 or debug builds)

Platform Targets

Linux x86_64 (primary development target)
Windows x86_64
macOS aarch64 (Apple Silicon)

2.4 KiB Raw Blame History

Voice to Notes — Project Guidelines

Project Overview

Tech Stack

Key Architecture Decisions

Directory Structure

Conventions

Distribution

Platform Targets

2.4 KiB

Raw Blame History