- Replace Ollama dependency with bundled llama-server (llama.cpp) so users need no separate install for local AI inference - Rust backend manages llama-server lifecycle (spawn, port, shutdown) - Add MIT license for open source release - Update architecture doc, CLAUDE.md, and README accordingly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
31 lines
1.1 KiB
Markdown
31 lines
1.1 KiB
Markdown
# Voice to Notes
|
|
|
|
A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.
|
|
|
|
## Goals
|
|
|
|
- **Speech-to-Text Transcription** — Accurately convert spoken audio from recordings into text
|
|
- **Speaker Identification (Diarization)** — Detect and distinguish between different speakers in a conversation
|
|
- **Speaker Naming** — Assign and persist speaker names/IDs across the transcription
|
|
- **Synchronized Playback** — Click any transcribed text segment to play back the corresponding audio for review and correction
|
|
- **Export Formats**
|
|
- Closed captioning files (SRT, VTT) for video
|
|
- Plain text documents with speaker labels
|
|
- **AI Integration** — Connect to AI providers to ask questions about the conversation and generate condensed notes/summaries
|
|
|
|
## Platform Support
|
|
|
|
| Platform | Status |
|
|
|----------|--------|
|
|
| Linux | Planned (initial target) |
|
|
| Windows | Planned (initial target) |
|
|
| macOS | Future (pending hardware) |
|
|
|
|
## Project Status
|
|
|
|
**Early planning phase** — Architecture and technology decisions in progress.
|
|
|
|
## License
|
|
|
|
MIT
|