Files
voice-to-notes/CLAUDE.md
Claude 58faa83cb3 Cross-platform distribution, UI improvements, and performance optimizations
- PyInstaller frozen sidecar: spec file, build script, and ffmpeg path resolver
  for self-contained distribution without Python prerequisites
- Dual-mode sidecar launcher: frozen binary (production) with dev mode fallback
- Parallel transcription + diarization pipeline (~30-40% faster)
- GPU auto-detection for diarization (CUDA when available)
- Async run_pipeline command for real-time progress event delivery
- Web Audio API backend for instant playback and seeking
- OpenAI-compatible provider replacing LiteLLM client-side routing
- Cross-platform RAM detection (Linux/macOS/Windows)
- Settings: speaker count hint, token reveal toggles, dark dropdown styling
- Loading splash screen, flexbox layout fix for viewport overflow
- Gitea Actions CI/CD pipeline (Linux, Windows, macOS ARM)
- Updated README and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:33:43 -07:00

2.4 KiB

Voice to Notes — Project Guidelines

Project Overview

Desktop app for transcribing audio/video with speaker identification. Runs locally on user's computer. See docs/ARCHITECTURE.md for full architecture.

Tech Stack

  • Desktop shell: Tauri v2 (Rust backend + Svelte/TypeScript frontend)
  • ML pipeline: Python sidecar process (faster-whisper, pyannote.audio, wav2vec2)
  • Database: SQLite (via rusqlite in Rust)
  • Local AI: Bundled llama-server (llama.cpp) — default, no install needed
  • Cloud AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (optional, user-configured)
  • Caption export: pysubs2 (Python)
  • Audio UI: wavesurfer.js
  • Transcript editor: TipTap (ProseMirror)

Key Architecture Decisions

  • Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout)
  • All ML models must work on CPU. GPU (CUDA) is optional acceleration.
  • AI cloud providers are optional. Bundled llama-server (llama.cpp) is the default local AI — no separate install needed.
  • Rust backend manages llama-server lifecycle (start/stop/port allocation).
  • Project is open source (MIT license).
  • SQLite database is per-project, stored alongside media files.
  • Word-level timestamps are required for click-to-seek playback sync.

Directory Structure

src/                    # Svelte frontend source
src-tauri/              # Rust backend source
python/                 # Python sidecar source
  voice_to_notes/       # Python package
  tests/                # Python tests
docs/                   # Architecture and design documents

Conventions

  • Rust: follow standard Rust conventions, use cargo fmt and cargo clippy
  • Python: Python 3.11+, use type hints, follow PEP 8, use ruff for linting
  • TypeScript: strict mode, prefer Svelte stores for state management
  • IPC messages: JSON-line format, each message has id, type, payload fields
  • Database: UUIDs as primary keys (TEXT type in SQLite)
  • All timestamps in milliseconds (integer) relative to media file start

Distribution

  • Python sidecar is frozen via PyInstaller into a standalone binary for distribution
  • Tauri bundles the sidecar via externalBin — no Python required for end users
  • CI/CD builds on Gitea Actions (Linux, Windows, macOS ARM)
  • Dev mode uses system Python (VOICE_TO_NOTES_DEV=1 or debug builds)

Platform Targets

  • Linux x86_64 (primary development target)
  • Windows x86_64
  • macOS aarch64 (Apple Silicon)