Switch local AI from Ollama to bundled llama-server, add MIT license
- Replace Ollama dependency with bundled llama-server (llama.cpp) so users need no separate install for local AI inference - Rust backend manages llama-server lifecycle (spawn, port, shutdown) - Add MIT license for open source release - Update architecture doc, CLAUDE.md, and README accordingly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -7,7 +7,8 @@ Desktop app for transcribing audio/video with speaker identification. Runs local
|
|||||||
- **Desktop shell:** Tauri v2 (Rust backend + Svelte/TypeScript frontend)
|
- **Desktop shell:** Tauri v2 (Rust backend + Svelte/TypeScript frontend)
|
||||||
- **ML pipeline:** Python sidecar process (faster-whisper, pyannote.audio, wav2vec2)
|
- **ML pipeline:** Python sidecar process (faster-whisper, pyannote.audio, wav2vec2)
|
||||||
- **Database:** SQLite (via rusqlite in Rust)
|
- **Database:** SQLite (via rusqlite in Rust)
|
||||||
- **AI providers:** LiteLLM, OpenAI, Anthropic, Ollama (local)
|
- **Local AI:** Bundled llama-server (llama.cpp) — default, no install needed
|
||||||
|
- **Cloud AI providers:** LiteLLM, OpenAI, Anthropic (optional, user-configured)
|
||||||
- **Caption export:** pysubs2 (Python)
|
- **Caption export:** pysubs2 (Python)
|
||||||
- **Audio UI:** wavesurfer.js
|
- **Audio UI:** wavesurfer.js
|
||||||
- **Transcript editor:** TipTap (ProseMirror)
|
- **Transcript editor:** TipTap (ProseMirror)
|
||||||
@@ -15,7 +16,9 @@ Desktop app for transcribing audio/video with speaker identification. Runs local
|
|||||||
## Key Architecture Decisions
|
## Key Architecture Decisions
|
||||||
- Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout)
|
- Python sidecar communicates with Rust via JSON-line IPC (stdin/stdout)
|
||||||
- All ML models must work on CPU. GPU (CUDA) is optional acceleration.
|
- All ML models must work on CPU. GPU (CUDA) is optional acceleration.
|
||||||
- AI cloud providers are optional. Local models (Ollama) are a first-class option.
|
- AI cloud providers are optional. Bundled llama-server (llama.cpp) is the default local AI — no separate install needed.
|
||||||
|
- Rust backend manages llama-server lifecycle (start/stop/port allocation).
|
||||||
|
- Project is open source (MIT license).
|
||||||
- SQLite database is per-project, stored alongside media files.
|
- SQLite database is per-project, stored alongside media files.
|
||||||
- Word-level timestamps are required for click-to-seek playback sync.
|
- Word-level timestamps are required for click-to-seek playback sync.
|
||||||
|
|
||||||
|
|||||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2026 Voice to Notes Contributors
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
@@ -27,4 +27,4 @@ A desktop application that transcribes audio/video recordings with speaker ident
|
|||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
TBD
|
MIT
|
||||||
|
|||||||
@@ -162,11 +162,12 @@ src/
|
|||||||
|
|
||||||
The Rust layer is intentionally thin. It handles:
|
The Rust layer is intentionally thin. It handles:
|
||||||
|
|
||||||
1. **Process Management** — Spawn, monitor, and kill the Python sidecar
|
1. **Process Management** — Spawn, monitor, and kill the Python sidecar and llama-server
|
||||||
2. **IPC Relay** — Forward messages between frontend and Python process
|
2. **IPC Relay** — Forward messages between frontend and Python process
|
||||||
3. **File Operations** — Read/write project files, manage media
|
3. **File Operations** — Read/write project files, manage media
|
||||||
4. **SQLite** — All database operations via rusqlite
|
4. **SQLite** — All database operations via rusqlite
|
||||||
5. **System Info** — Detect GPU, RAM, CPU for hardware recommendations
|
5. **System Info** — Detect GPU, RAM, CPU for hardware recommendations
|
||||||
|
6. **llama-server Lifecycle** — Start/stop bundled llama-server, manage port allocation
|
||||||
|
|
||||||
```
|
```
|
||||||
src-tauri/
|
src-tauri/
|
||||||
@@ -179,6 +180,7 @@ src-tauri/
|
|||||||
ai.rs # AI provider commands
|
ai.rs # AI provider commands
|
||||||
settings.rs # App settings and preferences
|
settings.rs # App settings and preferences
|
||||||
system.rs # Hardware detection
|
system.rs # Hardware detection
|
||||||
|
llama_server.rs # llama-server process lifecycle
|
||||||
db/
|
db/
|
||||||
mod.rs # SQLite connection pool
|
mod.rs # SQLite connection pool
|
||||||
schema.rs # Table definitions and migrations
|
schema.rs # Table definitions and migrations
|
||||||
@@ -215,7 +217,7 @@ python/
|
|||||||
litellm_provider.py # LiteLLM (multi-provider gateway)
|
litellm_provider.py # LiteLLM (multi-provider gateway)
|
||||||
openai_provider.py # Direct OpenAI SDK
|
openai_provider.py # Direct OpenAI SDK
|
||||||
anthropic_provider.py # Direct Anthropic SDK
|
anthropic_provider.py # Direct Anthropic SDK
|
||||||
ollama_provider.py # Local Ollama models
|
local_provider.py # Bundled llama-server (OpenAI-compatible API)
|
||||||
hardware/
|
hardware/
|
||||||
__init__.py
|
__init__.py
|
||||||
detect.py # GPU/CPU detection, VRAM estimation
|
detect.py # GPU/CPU detection, VRAM estimation
|
||||||
@@ -399,12 +401,33 @@ class AIProvider(ABC):
|
|||||||
|
|
||||||
### Supported Providers
|
### Supported Providers
|
||||||
|
|
||||||
| Provider | Package | Use Case |
|
| Provider | Package / Binary | Use Case |
|
||||||
|----------|---------|----------|
|
|----------|-----------------|----------|
|
||||||
|
| **llama-server** (bundled) | llama.cpp binary | Default local AI — bundled with app, no install needed. OpenAI-compatible API on localhost. |
|
||||||
| **LiteLLM** | `litellm` | Gateway to 100+ providers via unified API |
|
| **LiteLLM** | `litellm` | Gateway to 100+ providers via unified API |
|
||||||
| **OpenAI** | `openai` | Direct OpenAI API (GPT-4o, etc.) |
|
| **OpenAI** | `openai` | Direct OpenAI API (GPT-4o, etc.) |
|
||||||
| **Anthropic** | `anthropic` | Direct Anthropic API (Claude) |
|
| **Anthropic** | `anthropic` | Direct Anthropic API (Claude) |
|
||||||
| **Ollama** | HTTP to localhost:11434 | Local models (Llama, Mistral, Phi, etc.) |
|
|
||||||
|
#### Local AI via llama-server (llama.cpp)
|
||||||
|
|
||||||
|
The app bundles `llama-server` from the llama.cpp project (MIT license). This is the default AI provider — it runs entirely on the user's machine with no internet connection or separate install required.
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
1. Rust backend spawns `llama-server` as a managed subprocess on app launch (or on first AI use)
|
||||||
|
2. llama-server exposes an OpenAI-compatible REST API on `localhost:{dynamic_port}`
|
||||||
|
3. Python sidecar talks to it using the same OpenAI SDK interface as cloud providers
|
||||||
|
4. On app exit, Rust backend cleanly shuts down the llama-server process
|
||||||
|
|
||||||
|
**Model management:**
|
||||||
|
- Models stored in `~/.voicetonotes/models/` (GGUF format)
|
||||||
|
- First-run setup downloads a recommended small model (e.g., Phi-3-mini, Llama-3-8B Q4)
|
||||||
|
- Users can download additional models or point to their own GGUF files
|
||||||
|
- Model selection in Settings UI with size/quality tradeoffs shown
|
||||||
|
|
||||||
|
**Hardware utilization:**
|
||||||
|
- CPU: Works on any machine, uses all available cores
|
||||||
|
- NVIDIA GPU: CUDA acceleration when available
|
||||||
|
- The same CPU/GPU auto-detection used for Whisper applies here
|
||||||
|
|
||||||
### Context Window Strategy
|
### Context Window Strategy
|
||||||
|
|
||||||
@@ -417,14 +440,14 @@ class AIProvider(ABC):
|
|||||||
|
|
||||||
### Configuration
|
### Configuration
|
||||||
|
|
||||||
Users configure AI providers in Settings. API keys stored in OS keychain (libsecret on Linux, Windows Credential Manager). Local models (Ollama) require no keys.
|
Users configure AI providers in Settings. API keys for cloud providers stored in OS keychain (libsecret on Linux, Windows Credential Manager). The bundled llama-server requires no keys or internet.
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"ai": {
|
"ai": {
|
||||||
"default_provider": "ollama",
|
"default_provider": "local",
|
||||||
"providers": {
|
"providers": {
|
||||||
"ollama": { "base_url": "http://localhost:11434", "model": "llama3:8b" },
|
"local": { "model": "phi-3-mini-Q4_K_M.gguf", "gpu_layers": "auto" },
|
||||||
"openai": { "model": "gpt-4o" },
|
"openai": { "model": "gpt-4o" },
|
||||||
"anthropic": { "model": "claude-sonnet-4-20250514" },
|
"anthropic": { "model": "claude-sonnet-4-20250514" },
|
||||||
"litellm": { "model": "gpt-4o" }
|
"litellm": { "model": "gpt-4o" }
|
||||||
@@ -530,7 +553,8 @@ Add AI provider support for Q&A and summarization.
|
|||||||
|
|
||||||
**Deliverables:**
|
**Deliverables:**
|
||||||
- Provider configuration UI with API key management
|
- Provider configuration UI with API key management
|
||||||
- Ollama local model support
|
- Bundled llama-server for local AI (default, no internet required)
|
||||||
|
- Model download manager for local GGUF models
|
||||||
- OpenAI and Anthropic direct SDK support
|
- OpenAI and Anthropic direct SDK support
|
||||||
- LiteLLM gateway support
|
- LiteLLM gateway support
|
||||||
- Chat panel for asking questions about the transcript
|
- Chat panel for asking questions about the transcript
|
||||||
@@ -563,6 +587,6 @@ For parallel development, the codebase splits into these independent workstreams
|
|||||||
| **Agent 5: Diarization Pipeline** | pyannote.audio integration, speaker-word alignment, combined pipeline | Agent 4 (transcription) |
|
| **Agent 5: Diarization Pipeline** | pyannote.audio integration, speaker-word alignment, combined pipeline | Agent 4 (transcription) |
|
||||||
| **Agent 6: Audio Player + Transcript UI** | wavesurfer.js integration, TipTap transcript editor, playback-transcript sync | Agent 1 (shell), Agent 3 (DB) |
|
| **Agent 6: Audio Player + Transcript UI** | wavesurfer.js integration, TipTap transcript editor, playback-transcript sync | Agent 1 (shell), Agent 3 (DB) |
|
||||||
| **Agent 7: Export System** | pysubs2 caption export, text formatters, export UI | Agent 2 (IPC), Agent 3 (DB) |
|
| **Agent 7: Export System** | pysubs2 caption export, text formatters, export UI | Agent 2 (IPC), Agent 3 (DB) |
|
||||||
| **Agent 8: AI Provider System** | Provider abstraction, LiteLLM/OpenAI/Anthropic/Ollama adapters, chat UI | Agent 2 (IPC), Agent 1 (shell) |
|
| **Agent 8: AI Provider System** | Provider abstraction, bundled llama-server, LiteLLM/OpenAI/Anthropic adapters, chat UI | Agent 2 (IPC), Agent 1 (shell) |
|
||||||
|
|
||||||
Agents 1, 2, and 3 can start immediately in parallel. Agents 4-8 follow once their dependencies are in place.
|
Agents 1, 2, and 3 can start immediately in parallel. Agents 4-8 follow once their dependencies are in place.
|
||||||
|
|||||||
Reference in New Issue
Block a user