Switch local AI from Ollama to bundled llama-server, add MIT license

- Replace Ollama dependency with bundled llama-server (llama.cpp)
  so users need no separate install for local AI inference
- Rust backend manages llama-server lifecycle (spawn, port, shutdown)
- Add MIT license for open source release
- Update architecture doc, CLAUDE.md, and README accordingly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-26 09:00:47 -08:00
parent 0edb06a913
commit c450ef3c0c
4 changed files with 61 additions and 13 deletions

View File

@@ -7,7 +7,8 @@ Desktop app for transcribing audio/video with speaker identification. Runs local
- **Desktop shell:** Tauri v2 (Rust backend + Svelte/TypeScript frontend) - **Desktop shell:** Tauri v2 (Rust backend + Svelte/TypeScript frontend)
- **ML pipeline:** Python sidecar process (faster-whisper, pyannote.audio, wav2vec2) - **ML pipeline:** Python sidecar process (faster-whisper, pyannote.audio, wav2vec2)
- **Database:** SQLite (via rusqlite in Rust) - **Database:** SQLite (via rusqlite in Rust)
- **AI providers:** LiteLLM, OpenAI, Anthropic, Ollama (local) - **Local AI:** Bundled llama-server (llama.cpp) — default, no install needed
- **Cloud AI providers:** LiteLLM, OpenAI, Anthropic (optional, user-configured)
- **Caption export:** pysubs2 (Python) - **Caption export:** pysubs2 (Python)
- **Audio UI:** wavesurfer.js - **Audio UI:** wavesurfer.js
- **Transcript editor:** TipTap (ProseMirror) - **Transcript editor:** TipTap (ProseMirror)
@@ -15,7 +16,9 @@ Desktop app for transcribing audio/video with speaker identification. Runs local
## Key Architecture Decisions ## Key Architecture Decisions
- Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout) - Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout)
- All ML models must work on CPU. GPU (CUDA) is optional acceleration. - All ML models must work on CPU. GPU (CUDA) is optional acceleration.
- AI cloud providers are optional. Local models (Ollama) are a first-class option. - AI cloud providers are optional. Bundled llama-server (llama.cpp) is the default local AI — no separate install needed.
- Rust backend manages llama-server lifecycle (start/stop/port allocation).
- Project is open source (MIT license).
- SQLite database is per-project, stored alongside media files. - SQLite database is per-project, stored alongside media files.
- Word-level timestamps are required for click-to-seek playback sync. - Word-level timestamps are required for click-to-seek playback sync.

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Voice to Notes Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -27,4 +27,4 @@ A desktop application that transcribes audio/video recordings with speaker ident
## License ## License
TBD MIT

View File

@@ -162,11 +162,12 @@ src/
The Rust layer is intentionally thin. It handles: The Rust layer is intentionally thin. It handles:
1. **Process Management** — Spawn, monitor, and kill the Python sidecar 1. **Process Management** — Spawn, monitor, and kill the Python sidecar and llama-server
2. **IPC Relay** — Forward messages between frontend and Python process 2. **IPC Relay** — Forward messages between frontend and Python process
3. **File Operations** — Read/write project files, manage media 3. **File Operations** — Read/write project files, manage media
4. **SQLite** — All database operations via rusqlite 4. **SQLite** — All database operations via rusqlite
5. **System Info** — Detect GPU, RAM, CPU for hardware recommendations 5. **System Info** — Detect GPU, RAM, CPU for hardware recommendations
6. **llama-server Lifecycle** — Start/stop bundled llama-server, manage port allocation
``` ```
src-tauri/ src-tauri/
@@ -179,6 +180,7 @@ src-tauri/
ai.rs # AI provider commands ai.rs # AI provider commands
settings.rs # App settings and preferences settings.rs # App settings and preferences
system.rs # Hardware detection system.rs # Hardware detection
llama_server.rs # llama-server process lifecycle
db/ db/
mod.rs # SQLite connection pool mod.rs # SQLite connection pool
schema.rs # Table definitions and migrations schema.rs # Table definitions and migrations
@@ -215,7 +217,7 @@ python/
litellm_provider.py # LiteLLM (multi-provider gateway) litellm_provider.py # LiteLLM (multi-provider gateway)
openai_provider.py # Direct OpenAI SDK openai_provider.py # Direct OpenAI SDK
anthropic_provider.py # Direct Anthropic SDK anthropic_provider.py # Direct Anthropic SDK
ollama_provider.py # Local Ollama models local_provider.py # Bundled llama-server (OpenAI-compatible API)
hardware/ hardware/
__init__.py __init__.py
detect.py # GPU/CPU detection, VRAM estimation detect.py # GPU/CPU detection, VRAM estimation
@@ -399,12 +401,33 @@ class AIProvider(ABC):
### Supported Providers ### Supported Providers
| Provider | Package | Use Case | | Provider | Package / Binary | Use Case |
|----------|---------|----------| |----------|-----------------|----------|
| **llama-server** (bundled) | llama.cpp binary | Default local AI — bundled with app, no install needed. OpenAI-compatible API on localhost. |
| **LiteLLM** | `litellm` | Gateway to 100+ providers via unified API | | **LiteLLM** | `litellm` | Gateway to 100+ providers via unified API |
| **OpenAI** | `openai` | Direct OpenAI API (GPT-4o, etc.) | | **OpenAI** | `openai` | Direct OpenAI API (GPT-4o, etc.) |
| **Anthropic** | `anthropic` | Direct Anthropic API (Claude) | | **Anthropic** | `anthropic` | Direct Anthropic API (Claude) |
| **Ollama** | HTTP to localhost:11434 | Local models (Llama, Mistral, Phi, etc.) |
#### Local AI via llama-server (llama.cpp)
The app bundles `llama-server` from the llama.cpp project (MIT license). This is the default AI provider — it runs entirely on the user's machine with no internet connection or separate install required.
**How it works:**
1. Rust backend spawns `llama-server` as a managed subprocess on app launch (or on first AI use)
2. llama-server exposes an OpenAI-compatible REST API on `localhost:{dynamic_port}`
3. Python sidecar talks to it using the same OpenAI SDK interface as cloud providers
4. On app exit, Rust backend cleanly shuts down the llama-server process
**Model management:**
- Models stored in `~/.voicetonotes/models/` (GGUF format)
- First-run setup downloads a recommended small model (e.g., Phi-3-mini, Llama-3-8B Q4)
- Users can download additional models or point to their own GGUF files
- Model selection in Settings UI with size/quality tradeoffs shown
**Hardware utilization:**
- CPU: Works on any machine, uses all available cores
- NVIDIA GPU: CUDA acceleration when available
- The same CPU/GPU auto-detection used for Whisper applies here
### Context Window Strategy ### Context Window Strategy
@@ -417,14 +440,14 @@ class AIProvider(ABC):
### Configuration ### Configuration
Users configure AI providers in Settings. API keys stored in OS keychain (libsecret on Linux, Windows Credential Manager). Local models (Ollama) require no keys. Users configure AI providers in Settings. API keys for cloud providers stored in OS keychain (libsecret on Linux, Windows Credential Manager). The bundled llama-server requires no keys or internet.
```json ```json
{ {
"ai": { "ai": {
"default_provider": "ollama", "default_provider": "local",
"providers": { "providers": {
"ollama": { "base_url": "http://localhost:11434", "model": "llama3:8b" }, "local": { "model": "phi-3-mini-Q4_K_M.gguf", "gpu_layers": "auto" },
"openai": { "model": "gpt-4o" }, "openai": { "model": "gpt-4o" },
"anthropic": { "model": "claude-sonnet-4-20250514" }, "anthropic": { "model": "claude-sonnet-4-20250514" },
"litellm": { "model": "gpt-4o" } "litellm": { "model": "gpt-4o" }
@@ -530,7 +553,8 @@ Add AI provider support for Q&A and summarization.
**Deliverables:** **Deliverables:**
- Provider configuration UI with API key management - Provider configuration UI with API key management
- Ollama local model support - Bundled llama-server for local AI (default, no internet required)
- Model download manager for local GGUF models
- OpenAI and Anthropic direct SDK support - OpenAI and Anthropic direct SDK support
- LiteLLM gateway support - LiteLLM gateway support
- Chat panel for asking questions about the transcript - Chat panel for asking questions about the transcript
@@ -563,6 +587,6 @@ For parallel development, the codebase splits into these independent workstreams
| **Agent 5: Diarization Pipeline** | pyannote.audio integration, speaker-word alignment, combined pipeline | Agent 4 (transcription) | | **Agent 5: Diarization Pipeline** | pyannote.audio integration, speaker-word alignment, combined pipeline | Agent 4 (transcription) |
| **Agent 6: Audio Player + Transcript UI** | wavesurfer.js integration, TipTap transcript editor, playback-transcript sync | Agent 1 (shell), Agent 3 (DB) | | **Agent 6: Audio Player + Transcript UI** | wavesurfer.js integration, TipTap transcript editor, playback-transcript sync | Agent 1 (shell), Agent 3 (DB) |
| **Agent 7: Export System** | pysubs2 caption export, text formatters, export UI | Agent 2 (IPC), Agent 3 (DB) | | **Agent 7: Export System** | pysubs2 caption export, text formatters, export UI | Agent 2 (IPC), Agent 3 (DB) |
| **Agent 8: AI Provider System** | Provider abstraction, LiteLLM/OpenAI/Anthropic/Ollama adapters, chat UI | Agent 2 (IPC), Agent 1 (shell) | | **Agent 8: AI Provider System** | Provider abstraction, bundled llama-server, LiteLLM/OpenAI/Anthropic adapters, chat UI | Agent 2 (IPC), Agent 1 (shell) |
Agents 1, 2, and 3 can start immediately in parallel. Agents 4-8 follow once their dependencies are in place. Agents 1, 2, and 3 can start immediately in parallel. Agents 4-8 follow once their dependencies are in place.