- Replace Ollama dependency with bundled llama-server (llama.cpp) so users need no separate install for local AI inference - Rust backend manages llama-server lifecycle (spawn, port, shutdown) - Add MIT license for open source release - Update architecture doc, CLAUDE.md, and README accordingly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2.1 KiB
2.1 KiB
Voice to Notes — Project Guidelines
Project Overview
Desktop app for transcribing audio/video with speaker identification. Runs locally on user's computer. See docs/ARCHITECTURE.md for full architecture.
Tech Stack
- Desktop shell: Tauri v2 (Rust backend + Svelte/TypeScript frontend)
- ML pipeline: Python sidecar process (faster-whisper, pyannote.audio, wav2vec2)
- Database: SQLite (via rusqlite in Rust)
- Local AI: Bundled llama-server (llama.cpp) — default, no install needed
- Cloud AI providers: LiteLLM, OpenAI, Anthropic (optional, user-configured)
- Caption export: pysubs2 (Python)
- Audio UI: wavesurfer.js
- Transcript editor: TipTap (ProseMirror)
Key Architecture Decisions
- Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout)
- All ML models must work on CPU. GPU (CUDA) is optional acceleration.
- AI cloud providers are optional. Bundled llama-server (llama.cpp) is the default local AI — no separate install needed.
- Rust backend manages llama-server lifecycle (start/stop/port allocation).
- Project is open source (MIT license).
- SQLite database is per-project, stored alongside media files.
- Word-level timestamps are required for click-to-seek playback sync.
Directory Structure
src/ # Svelte frontend source
src-tauri/ # Rust backend source
python/ # Python sidecar source
voice_to_notes/ # Python package
tests/ # Python tests
docs/ # Architecture and design documents
Conventions
- Rust: follow standard Rust conventions, use
cargo fmtandcargo clippy - Python: Python 3.11+, use type hints, follow PEP 8, use
rufffor linting - TypeScript: strict mode, prefer Svelte stores for state management
- IPC messages: JSON-line format, each message has
id,type,payloadfields - Database: UUIDs as primary keys (TEXT type in SQLite)
- All timestamps in milliseconds (integer) relative to media file start
Platform Targets
- Linux (primary development target)
- Windows (must work, tested before release)
- macOS (future, not yet targeted)