Go to file

Claude 727107323c Fix transcript text edit not showing after Enter

The display renders segment.words (not segment.text), so editing the text
field alone had no visible effect. Now finishEditing() rebuilds the words
array from the edited text so the change is immediately visible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-20 22:20:20 -07:00

.claude

Merge perf/stream-segments: streaming partial transcript segments and speaker updates

2026-03-20 13:51:51 -07:00

.gitea/workflows

Fix CI: downgrade artifact actions to v3 for Gitea compatibility

2026-03-20 21:58:29 -07:00

.github/workflows

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

docs

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

python

Fix CI: macOS Python toolcache permissions, Windows pip invocation

2026-03-20 21:56:25 -07:00

scripts

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

src

Fix transcript text edit not showing after Enter

2026-03-20 22:20:20 -07:00

src-tauri

File-based project save/load, AI chat formatting, text edit fix

2026-03-20 22:17:35 -07:00

static

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

.gitignore

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

CLAUDE.md

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

LICENSE

Switch local AI from Ollama to bundled llama-server, add MIT license

2026-02-26 09:00:47 -08:00

package-lock.json

Add auto-scroll, file dialog, and transcript editing

2026-02-26 16:02:27 -08:00

package.json

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

README.md

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

RESEARCH_REPORT.md

Add STT and diarization research report

2026-02-26 16:44:58 -08:00

svelte.config.js

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

tsconfig.json

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

vite.config.js

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

README.md

Voice to Notes

A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.

Features

Speech-to-Text Transcription — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
Speaker Identification (Diarization) — Detect and distinguish between speakers using pyannote.audio
Synchronized Playback — Click any word to seek to that point in the audio (Web Audio API for instant playback)
AI Integration — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
Export Formats — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
Cross-Platform — Builds for Linux, Windows, and macOS (Apple Silicon)

Platform Support

Platform	Architecture	Status
Linux	x86_64	Supported
Windows	x86_64	Supported
macOS	ARM (Apple Silicon)	Supported

Tech Stack

Desktop shell: Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
ML pipeline: Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
Audio playback: wavesurfer.js with Web Audio API backend
AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
Local AI: Bundled llama-server (llama.cpp)
Caption export: pysubs2

Development

Prerequisites

Node.js 20+
Rust (stable)
Python 3.11+ with ML dependencies
System: libgtk-3-dev, libwebkit2gtk-4.1-dev (Linux)

Getting Started

# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e . && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev

Building for Distribution

# Build the frozen Python sidecar
npm run sidecar:build

# Build the Tauri app (requires sidecar in src-tauri/binaries/)
npm run tauri build

CI/CD

Gitea Actions workflows are in .gitea/workflows/. The build pipeline:

Build sidecar — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
Build Tauri app — Bundles the sidecar via externalBin, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)

Required Secrets

Secret	Purpose	Required?
`TAURI_SIGNING_PRIVATE_KEY`	Signs Tauri update bundles	Optional (for auto-updates)

No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.

Project Structure

src/                    # Svelte 5 frontend
src-tauri/              # Rust backend (Tauri commands, sidecar manager, SQLite)
python/                 # Python sidecar (transcription, diarization, AI)
  voice_to_notes/       # Python package
  build_sidecar.py      # PyInstaller build script
  voice_to_notes.spec   # PyInstaller spec
.gitea/workflows/       # Gitea Actions CI/CD

License

MIT

Releases 10

Voice to Notes v0.2.46 Latest

2026-03-24 02:04:26 +00:00

Languages

Python 36.6%

Svelte 30.3%

Rust 29.6%

TypeScript 2.2%

Shell 0.5%

Other 0.8%