31 lines
1.1 KiB
Markdown
31 lines
1.1 KiB
Markdown
|
|
# Voice to Notes
|
||
|
|
|
||
|
|
A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.
|
||
|
|
|
||
|
|
## Goals
|
||
|
|
|
||
|
|
- **Speech-to-Text Transcription** — Accurately convert spoken audio from recordings into text
|
||
|
|
- **Speaker Identification (Diarization)** — Detect and distinguish between different speakers in a conversation
|
||
|
|
- **Speaker Naming** — Assign and persist speaker names/IDs across the transcription
|
||
|
|
- **Synchronized Playback** — Click any transcribed text segment to play back the corresponding audio for review and correction
|
||
|
|
- **Export Formats**
|
||
|
|
- Closed captioning files (SRT, VTT) for video
|
||
|
|
- Plain text documents with speaker labels
|
||
|
|
- **AI Integration** — Connect to AI providers to ask questions about the conversation and generate condensed notes/summaries
|
||
|
|
|
||
|
|
## Platform Support
|
||
|
|
|
||
|
|
| Platform | Status |
|
||
|
|
|----------|--------|
|
||
|
|
| Linux | Planned (initial target) |
|
||
|
|
| Windows | Planned (initial target) |
|
||
|
|
| macOS | Future (pending hardware) |
|
||
|
|
|
||
|
|
## Project Status
|
||
|
|
|
||
|
|
**Early planning phase** — Architecture and technology decisions in progress.
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
TBD
|