Commit Graph

12 Commits

Author SHA1 Message Date
d00281f0c7 Fix critical integration issues for end-to-end functionality
- Rewrite SidecarManager as singleton with OnceLock, reusing one Python
  process across all commands instead of spawning per call
- Separate stdin/stdout ownership with dedicated BufReader to prevent
  data corruption between wait_for_ready and send_and_receive
- Add ensure_running() for auto-start on first command
- Fix asset protocol URL: use convertFileSrc() instead of manual
  encodeURIComponent which broke file paths with slashes
- Add +layout.svelte with global dark theme, CSS reset, and custom
  scrollbar styling to prevent white flash on startup
- Register AppState with Tauri .manage(), initialize SQLite database
  on app startup at ~/.voicetonotes/voice_to_notes.db
- Wire project commands (create/get/list) to real database queries
  instead of placeholder stubs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:50:14 -08:00
d3c2954c5e Add STT and diarization research report
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:44:58 -08:00
97a1a15755 Phase 6: Llama-server manager, settings UI, packaging, and polish
- Implement LlamaManager in Rust for llama-server lifecycle: spawn with
  port allocation, health check, clean shutdown on Drop, model listing
- Add llama_start/stop/status/list_models Tauri commands
- Add load_settings/save_settings commands with JSON persistence
- Build SettingsModal with tabs for Transcription, AI Provider, Local AI
  settings (model size, device, language, API keys, provider selection)
- Wire settings into pipeline calls (model, device, language, skip diarization)
- Configure Tauri packaging: asset protocol for local audio files,
  CSP policy, bundle metadata, Linux .deb/.AppImage and Windows .msi config
- Add keyboard shortcuts: Space (play/pause), Ctrl+O (import),
  Ctrl+, (settings), Escape (close menus/modals)
- Close export dropdown on outside click
- Tests: 30 Python, 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:38:23 -08:00
d67625cd5a Phase 5: AI provider system with local and cloud support
- Implement AIProvider base interface with chat() and is_available()
- Add LocalProvider connecting to bundled llama-server via OpenAI SDK
- Add OpenAIProvider for direct OpenAI API access
- Add AnthropicProvider for Anthropic Claude API
- Add LiteLLMProvider for multi-provider gateway
- Build AIProviderService with provider routing, auto-selection,
  and transcript context injection
- Add ai.chat IPC handler supporting chat, list_providers, set_provider,
  and configure actions
- Add ai_chat, ai_list_providers, ai_configure Tauri commands
- Build interactive AIChatPanel with message history, quick actions
  (Summarize, Action Items), and transcript context awareness
- Tests: 30 Python, 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:25:10 -08:00
415a648a2b Phase 4: Export to SRT, WebVTT, ASS, plain text, and Markdown
- Implement ExportService using pysubs2 for caption formats (SRT, VTT, ASS)
  and custom formatters for plain text and Markdown
- SRT exports with [Speaker]: prefix, WebVTT with <v Speaker> voice tags,
  ASS with color-coded speaker styles
- Plain text groups by speaker with labels, Markdown adds timestamps
- Add export.start IPC handler and export_transcript Tauri command
- Add export dropdown menu in header (appears after transcription)
- Uses native save dialog for output file selection
- Add pysubs2 dependency
- Tests: 30 Python (6 export tests), 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:18:54 -08:00
44480906a4 Phase 3: Speaker diarization and full transcription pipeline
- Implement DiarizeService with pyannote.audio speaker detection
- Build PipelineService combining transcribe → diarize → merge with
  overlap-based speaker assignment per segment
- Add pipeline.start and diarize.start IPC handlers
- Add run_pipeline Tauri command for full pipeline execution
- Wire frontend to use pipeline: speakers auto-created with colors,
  segments assigned to detected speakers
- Build SpeakerManager with rename support (double-click or edit button)
- Add speaker color coding throughout transcript display
- Add pyannote.audio dependency
- Tests: 24 Python (including merge logic), 6 Rust, 0 Svelte errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:09:48 -08:00
842f8d5f90 Add auto-scroll, file dialog, and transcript editing
- Auto-scroll transcript to active segment during playback with smart
  pause when user manually scrolls (resumes after 3s)
- Replace prompt() with native Tauri file dialog for audio/video import
  with file type filters
- Add inline transcript editing via double-click with Enter to save,
  Esc to cancel, preserving original text for change tracking
- Show "edited" badge on modified segments

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:02:27 -08:00
48fe41b064 Phase 2: Core transcription pipeline and audio playback
- Implement faster-whisper TranscribeService with word-level timestamps,
  progress reporting, and hardware auto-detection
- Wire up Rust SidecarManager for Python process lifecycle (spawn, IPC, shutdown)
- Add transcribe_file Tauri command bridging frontend to Python sidecar
- Integrate wavesurfer.js WaveformPlayer with play/pause, skip, seek controls
- Build TranscriptEditor with word-level click-to-seek and active highlighting
- Connect file import flow: prompt → asset load → transcribe → display
- Add typed tauri-bridge service with TranscriptionResult interface
- Add Python tests for hardware detection and transcription result formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 15:53:09 -08:00
503cc6c0cf Phase 1 foundation: Tauri shell, Python sidecar, SQLite database
Tauri v2 + Svelte + TypeScript frontend:
- App shell with workspace layout (waveform, transcript, speakers, AI chat)
- Placeholder components for all major UI areas
- Typed stores (project, transcript, playback, AI)
- TypeScript interfaces matching the database schema
- Tauri bridge service with typed invoke wrappers
- svelte-check passes with 0 errors

Rust backend:
- Tauri v2 app entry point with command registration
- SQLite database layer (rusqlite with bundled SQLite)
  - Full schema: projects, media_files, speakers, segments, words,
    ai_outputs, annotations (with indexes)
  - Model structs with serde serialization
  - CRUD queries for projects, speakers, segments, words
  - Segment text editing preserves original text
  - Schema versioning for future migrations
  - 6 tests passing
- Command stubs for project, transcribe, export, AI, settings, system
- App state management

Python sidecar:
- JSON-line IPC protocol (stdin/stdout)
- Message types: IPCMessage, progress, error, ready
- Handler registry with routing and error handling
- Ping/pong handler for connectivity testing
- Service stubs: transcribe, diarize, pipeline, AI, export
- Provider stubs: local (llama-server), OpenAI, Anthropic, LiteLLM
- Hardware detection stubs
- 14 tests passing, ruff clean

Also adds:
- Testing strategy document (docs/TESTING.md)
- Validation script (scripts/validate.sh)
- Updated .gitignore for Svelte, Rust, Python artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 15:16:06 -08:00
c450ef3c0c Switch local AI from Ollama to bundled llama-server, add MIT license
- Replace Ollama dependency with bundled llama-server (llama.cpp)
  so users need no separate install for local AI inference
- Rust backend manages llama-server lifecycle (spawn, port, shutdown)
- Add MIT license for open source release
- Update architecture doc, CLAUDE.md, and README accordingly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 09:00:47 -08:00
0edb06a913 Add architecture document and project guidelines
Detailed architecture covering Tauri + Svelte frontend, Rust backend,
Python sidecar for ML (faster-whisper, pyannote.audio), IPC protocol,
SQLite schema, AI provider system, export formats, and phased
implementation plan with agent work breakdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 08:37:45 -08:00
d2bdbe3315 Initial project setup with README and gitignore
Establish the voice-to-notes project with documentation covering
goals, platform targets, and planned feature set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 08:11:57 -08:00