Switch local AI from Ollama to bundled llama-server, add MIT license

- Replace Ollama dependency with bundled llama-server (llama.cpp) so users need no separate install for local AI inference - Rust backend manages llama-server lifecycle (spawn, port, shutdown) - Add MIT license for open source release - Update architecture doc, CLAUDE.md, and README accordingly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:00:47 -08:00
parent 0edb06a913
commit c450ef3c0c
4 changed files with 61 additions and 13 deletions
@@ -7,7 +7,8 @@ Desktop app for transcribing audio/video with speaker identification. Runs local
 - **Desktop shell:** Tauri v2 (Rust backend + Svelte/TypeScript frontend)
 - **ML pipeline:** Python sidecar process (faster-whisper, pyannote.audio, wav2vec2)
 - **Database:** SQLite (via rusqlite in Rust)
- **AI providers:** LiteLLM, OpenAI, Anthropic, Ollama (local)
+- **Local AI:** Bundled llama-server (llama.cpp) — default, no install needed
 - **Cloud AI providers:** LiteLLM, OpenAI, Anthropic (optional, user-configured)
 - **Caption export:** pysubs2 (Python)
 - **Audio UI:** wavesurfer.js
 - **Transcript editor:** TipTap (ProseMirror)
@@ -15,7 +16,9 @@ Desktop app for transcribing audio/video with speaker identification. Runs local
 ## Key Architecture Decisions
 - Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout)
 - All ML models must work on CPU. GPU (CUDA) is optional acceleration.
- AI cloud providers are optional. Local models (Ollama) are a first-class option.
+- AI cloud providers are optional. Bundled llama-server (llama.cpp) is the default local AI — no separate install needed.
 - Rust backend manages llama-server lifecycle (start/stop/port allocation).
 - Project is open source (MIT license).
 - SQLite database is per-project, stored alongside media files.
 - Word-level timestamps are required for click-to-seek playback sync.
@@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2026 Voice to Notes Contributors
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
@@ -27,4 +27,4 @@ A desktop application that transcribes audio/video recordings with speaker ident
 ## License
-TBD
+MIT
@@ -162,11 +162,12 @@ src/
 The Rust layer is intentionally thin. It handles:
-1. **Process Management** — Spawn, monitor, and kill the Python sidecar
+1. **Process Management** — Spawn, monitor, and kill the Python sidecar and llama-server
 2. **IPC Relay** — Forward messages between frontend and Python process
 3. **File Operations** — Read/write project files, manage media
 4. **SQLite** — All database operations via rusqlite
 5. **System Info** — Detect GPU, RAM, CPU for hardware recommendations
 6. **llama-server Lifecycle** — Start/stop bundled llama-server, manage port allocation
 ```
 src-tauri/
@@ -179,6 +180,7 @@ src-tauri/
      ai.rs                     # AI provider commands
      settings.rs               # App settings and preferences
      system.rs                 # Hardware detection
      llama_server.rs           # llama-server process lifecycle
    db/
      mod.rs                    # SQLite connection pool
      schema.rs                 # Table definitions and migrations
@@ -215,7 +217,7 @@ python/
      litellm_provider.py       # LiteLLM (multi-provider gateway)
      openai_provider.py        # Direct OpenAI SDK
      anthropic_provider.py     # Direct Anthropic SDK
-      ollama_provider.py        # Local Ollama models
+      local_provider.py         # Bundled llama-server (OpenAI-compatible API)
    hardware/
      __init__.py
      detect.py                 # GPU/CPU detection, VRAM estimation
@@ -399,12 +401,33 @@ class AIProvider(ABC):
 ### Supported Providers
-| Provider | Package | Use Case |
+| Provider | Package / Binary | Use Case |
-|----------|---------|----------|
+|----------|-----------------|----------|
 | **llama-server** (bundled) | llama.cpp binary | Default local AI — bundled with app, no install needed. OpenAI-compatible API on localhost. |
 | **LiteLLM** | `litellm` | Gateway to 100+ providers via unified API |
 | **OpenAI** | `openai` | Direct OpenAI API (GPT-4o, etc.) |
 | **Anthropic** | `anthropic` | Direct Anthropic API (Claude) |
-| **Ollama** | HTTP to localhost:11434 | Local models (Llama, Mistral, Phi, etc.) |
+
 #### Local AI via llama-server (llama.cpp)
 The app bundles `llama-server` from the llama.cpp project (MIT license). This is the default AI provider — it runs entirely on the user's machine with no internet connection or separate install required.
 **How it works:**
 1. Rust backend spawns `llama-server` as a managed subprocess on app launch (or on first AI use)
 2. llama-server exposes an OpenAI-compatible REST API on `localhost:{dynamic_port}`
 3. Python sidecar talks to it using the same OpenAI SDK interface as cloud providers
 4. On app exit, Rust backend cleanly shuts down the llama-server process
 **Model management:**
 - Models stored in `~/.voicetonotes/models/` (GGUF format)
 - First-run setup downloads a recommended small model (e.g., Phi-3-mini, Llama-3-8B Q4)
 - Users can download additional models or point to their own GGUF files
 - Model selection in Settings UI with size/quality tradeoffs shown
 **Hardware utilization:**
 - CPU: Works on any machine, uses all available cores
 - NVIDIA GPU: CUDA acceleration when available
 - The same CPU/GPU auto-detection used for Whisper applies here
 ### Context Window Strategy
@@ -417,14 +440,14 @@ class AIProvider(ABC):
 ### Configuration
-Users configure AI providers in Settings. API keys stored in OS keychain (libsecret on Linux, Windows Credential Manager). Local models (Ollama) require no keys.
+Users configure AI providers in Settings. API keys for cloud providers stored in OS keychain (libsecret on Linux, Windows Credential Manager). The bundled llama-server requires no keys or internet.
 ```json
 {
  "ai": {
-    "default_provider": "ollama",
+    "default_provider": "local",
    "providers": {
-      "ollama": { "base_url": "http://localhost:11434", "model": "llama3:8b" },
+      "local": { "model": "phi-3-mini-Q4_K_M.gguf", "gpu_layers": "auto" },
      "openai": { "model": "gpt-4o" },
      "anthropic": { "model": "claude-sonnet-4-20250514" },
      "litellm": { "model": "gpt-4o" }
@@ -530,7 +553,8 @@ Add AI provider support for Q&A and summarization.
 **Deliverables:**
 - Provider configuration UI with API key management
- Ollama local model support
+- Bundled llama-server for local AI (default, no internet required)
 - Model download manager for local GGUF models
 - OpenAI and Anthropic direct SDK support
 - LiteLLM gateway support
 - Chat panel for asking questions about the transcript
@@ -563,6 +587,6 @@ For parallel development, the codebase splits into these independent workstreams
 | **Agent 5: Diarization Pipeline** | pyannote.audio integration, speaker-word alignment, combined pipeline | Agent 4 (transcription) |
 | **Agent 6: Audio Player + Transcript UI** | wavesurfer.js integration, TipTap transcript editor, playback-transcript sync | Agent 1 (shell), Agent 3 (DB) |
 | **Agent 7: Export System** | pysubs2 caption export, text formatters, export UI | Agent 2 (IPC), Agent 3 (DB) |
-| **Agent 8: AI Provider System** | Provider abstraction, LiteLLM/OpenAI/Anthropic/Ollama adapters, chat UI | Agent 2 (IPC), Agent 1 (shell) |
+| **Agent 8: AI Provider System** | Provider abstraction, bundled llama-server, LiteLLM/OpenAI/Anthropic adapters, chat UI | Agent 2 (IPC), Agent 1 (shell) |
 Agents 1, 2, and 3 can start immediately in parallel. Agents 4-8 follow once their dependencies are in place.
`@@ -27,4 +27,4 @@ A desktop application that transcribes audio/video recordings with speaker ident`

	`## License`	`## License`

	`TBD`	`MIT`