Files
voice-to-notes/CLAUDE.md
Josh Knapp 0edb06a913 Add architecture document and project guidelines
Detailed architecture covering Tauri + Svelte frontend, Rust backend,
Python sidecar for ML (faster-whisper, pyannote.audio), IPC protocol,
SQLite schema, AI provider system, export formats, and phased
implementation plan with agent work breakdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 08:37:45 -08:00

1.9 KiB

Voice to Notes — Project Guidelines

Project Overview

Desktop app for transcribing audio/video with speaker identification. Runs locally on user's computer. See docs/ARCHITECTURE.md for full architecture.

Tech Stack

  • Desktop shell: Tauri v2 (Rust backend + Svelte/TypeScript frontend)
  • ML pipeline: Python sidecar process (faster-whisper, pyannote.audio, wav2vec2)
  • Database: SQLite (via rusqlite in Rust)
  • AI providers: LiteLLM, OpenAI, Anthropic, Ollama (local)
  • Caption export: pysubs2 (Python)
  • Audio UI: wavesurfer.js
  • Transcript editor: TipTap (ProseMirror)

Key Architecture Decisions

  • Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout)
  • All ML models must work on CPU. GPU (CUDA) is optional acceleration.
  • AI cloud providers are optional. Local models (Ollama) are a first-class option.
  • SQLite database is per-project, stored alongside media files.
  • Word-level timestamps are required for click-to-seek playback sync.

Directory Structure

src/                    # Svelte frontend source
src-tauri/              # Rust backend source
python/                 # Python sidecar source
  voice_to_notes/       # Python package
  tests/                # Python tests
docs/                   # Architecture and design documents

Conventions

  • Rust: follow standard Rust conventions, use cargo fmt and cargo clippy
  • Python: Python 3.11+, use type hints, follow PEP 8, use ruff for linting
  • TypeScript: strict mode, prefer Svelte stores for state management
  • IPC messages: JSON-line format, each message has id, type, payload fields
  • Database: UUIDs as primary keys (TEXT type in SQLite)
  • All timestamps in milliseconds (integer) relative to media file start

Platform Targets

  • Linux (primary development target)
  • Windows (must work, tested before release)
  • macOS (future, not yet targeted)