From c450ef3c0cd9ea60859fa8153dc6b1d2e942d397 Mon Sep 17 00:00:00 2001 From: Josh Knapp Date: Thu, 26 Feb 2026 09:00:47 -0800 Subject: [PATCH] Switch local AI from Ollama to bundled llama-server, add MIT license - Replace Ollama dependency with bundled llama-server (llama.cpp) so users need no separate install for local AI inference - Rust backend manages llama-server lifecycle (spawn, port, shutdown) - Add MIT license for open source release - Update architecture doc, CLAUDE.md, and README accordingly Co-Authored-By: Claude Opus 4.6 --- CLAUDE.md | 7 +++++-- LICENSE | 21 +++++++++++++++++++++ README.md | 2 +- docs/ARCHITECTURE.md | 44 ++++++++++++++++++++++++++++++++++---------- 4 files changed, 61 insertions(+), 13 deletions(-) create mode 100644 LICENSE diff --git a/CLAUDE.md b/CLAUDE.md index b248506..6e55905 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -7,7 +7,8 @@ Desktop app for transcribing audio/video with speaker identification. Runs local - **Desktop shell:** Tauri v2 (Rust backend + Svelte/TypeScript frontend) - **ML pipeline:** Python sidecar process (faster-whisper, pyannote.audio, wav2vec2) - **Database:** SQLite (via rusqlite in Rust) -- **AI providers:** LiteLLM, OpenAI, Anthropic, Ollama (local) +- **Local AI:** Bundled llama-server (llama.cpp) — default, no install needed +- **Cloud AI providers:** LiteLLM, OpenAI, Anthropic (optional, user-configured) - **Caption export:** pysubs2 (Python) - **Audio UI:** wavesurfer.js - **Transcript editor:** TipTap (ProseMirror) @@ -15,7 +16,9 @@ Desktop app for transcribing audio/video with speaker identification. Runs local ## Key Architecture Decisions - Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout) - All ML models must work on CPU. GPU (CUDA) is optional acceleration. -- AI cloud providers are optional. Local models (Ollama) are a first-class option. +- AI cloud providers are optional. Bundled llama-server (llama.cpp) is the default local AI — no separate install needed. +- Rust backend manages llama-server lifecycle (start/stop/port allocation). +- Project is open source (MIT license). - SQLite database is per-project, stored alongside media files. - Word-level timestamps are required for click-to-seek playback sync. diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..db9356a --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2026 Voice to Notes Contributors + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md index c87b17a..740f612 100644 --- a/README.md +++ b/README.md @@ -27,4 +27,4 @@ A desktop application that transcribes audio/video recordings with speaker ident ## License -TBD +MIT diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index dc49a05..f6480e8 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -162,11 +162,12 @@ src/ The Rust layer is intentionally thin. It handles: -1. **Process Management** — Spawn, monitor, and kill the Python sidecar +1. **Process Management** — Spawn, monitor, and kill the Python sidecar and llama-server 2. **IPC Relay** — Forward messages between frontend and Python process 3. **File Operations** — Read/write project files, manage media 4. **SQLite** — All database operations via rusqlite 5. **System Info** — Detect GPU, RAM, CPU for hardware recommendations +6. **llama-server Lifecycle** — Start/stop bundled llama-server, manage port allocation ``` src-tauri/ @@ -179,6 +180,7 @@ src-tauri/ ai.rs # AI provider commands settings.rs # App settings and preferences system.rs # Hardware detection + llama_server.rs # llama-server process lifecycle db/ mod.rs # SQLite connection pool schema.rs # Table definitions and migrations @@ -215,7 +217,7 @@ python/ litellm_provider.py # LiteLLM (multi-provider gateway) openai_provider.py # Direct OpenAI SDK anthropic_provider.py # Direct Anthropic SDK - ollama_provider.py # Local Ollama models + local_provider.py # Bundled llama-server (OpenAI-compatible API) hardware/ __init__.py detect.py # GPU/CPU detection, VRAM estimation @@ -399,12 +401,33 @@ class AIProvider(ABC): ### Supported Providers -| Provider | Package | Use Case | -|----------|---------|----------| +| Provider | Package / Binary | Use Case | +|----------|-----------------|----------| +| **llama-server** (bundled) | llama.cpp binary | Default local AI — bundled with app, no install needed. OpenAI-compatible API on localhost. | | **LiteLLM** | `litellm` | Gateway to 100+ providers via unified API | | **OpenAI** | `openai` | Direct OpenAI API (GPT-4o, etc.) | | **Anthropic** | `anthropic` | Direct Anthropic API (Claude) | -| **Ollama** | HTTP to localhost:11434 | Local models (Llama, Mistral, Phi, etc.) | + +#### Local AI via llama-server (llama.cpp) + +The app bundles `llama-server` from the llama.cpp project (MIT license). This is the default AI provider — it runs entirely on the user's machine with no internet connection or separate install required. + +**How it works:** +1. Rust backend spawns `llama-server` as a managed subprocess on app launch (or on first AI use) +2. llama-server exposes an OpenAI-compatible REST API on `localhost:{dynamic_port}` +3. Python sidecar talks to it using the same OpenAI SDK interface as cloud providers +4. On app exit, Rust backend cleanly shuts down the llama-server process + +**Model management:** +- Models stored in `~/.voicetonotes/models/` (GGUF format) +- First-run setup downloads a recommended small model (e.g., Phi-3-mini, Llama-3-8B Q4) +- Users can download additional models or point to their own GGUF files +- Model selection in Settings UI with size/quality tradeoffs shown + +**Hardware utilization:** +- CPU: Works on any machine, uses all available cores +- NVIDIA GPU: CUDA acceleration when available +- The same CPU/GPU auto-detection used for Whisper applies here ### Context Window Strategy @@ -417,14 +440,14 @@ class AIProvider(ABC): ### Configuration -Users configure AI providers in Settings. API keys stored in OS keychain (libsecret on Linux, Windows Credential Manager). Local models (Ollama) require no keys. +Users configure AI providers in Settings. API keys for cloud providers stored in OS keychain (libsecret on Linux, Windows Credential Manager). The bundled llama-server requires no keys or internet. ```json { "ai": { - "default_provider": "ollama", + "default_provider": "local", "providers": { - "ollama": { "base_url": "http://localhost:11434", "model": "llama3:8b" }, + "local": { "model": "phi-3-mini-Q4_K_M.gguf", "gpu_layers": "auto" }, "openai": { "model": "gpt-4o" }, "anthropic": { "model": "claude-sonnet-4-20250514" }, "litellm": { "model": "gpt-4o" } @@ -530,7 +553,8 @@ Add AI provider support for Q&A and summarization. **Deliverables:** - Provider configuration UI with API key management -- Ollama local model support +- Bundled llama-server for local AI (default, no internet required) +- Model download manager for local GGUF models - OpenAI and Anthropic direct SDK support - LiteLLM gateway support - Chat panel for asking questions about the transcript @@ -563,6 +587,6 @@ For parallel development, the codebase splits into these independent workstreams | **Agent 5: Diarization Pipeline** | pyannote.audio integration, speaker-word alignment, combined pipeline | Agent 4 (transcription) | | **Agent 6: Audio Player + Transcript UI** | wavesurfer.js integration, TipTap transcript editor, playback-transcript sync | Agent 1 (shell), Agent 3 (DB) | | **Agent 7: Export System** | pysubs2 caption export, text formatters, export UI | Agent 2 (IPC), Agent 3 (DB) | -| **Agent 8: AI Provider System** | Provider abstraction, LiteLLM/OpenAI/Anthropic/Ollama adapters, chat UI | Agent 2 (IPC), Agent 1 (shell) | +| **Agent 8: AI Provider System** | Provider abstraction, bundled llama-server, LiteLLM/OpenAI/Anthropic adapters, chat UI | Agent 2 (IPC), Agent 1 (shell) | Agents 1, 2, and 3 can start immediately in parallel. Agents 4-8 follow once their dependencies are in place.