MacroPad/voice-to-notes

Fork 0

T

Claude db770c341d

Build Sidecars / Bump sidecar version and tag (push) Successful in 9s

Details

Release / Bump version and tag (push) Successful in 5s

Details

Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m37s

Details

Release / Build App (macOS) (push) Successful in 1m16s

Details

Build Sidecars / Build Sidecar (Linux) (push) Successful in 14m3s

Details

Release / Build App (Linux) (push) Successful in 4m45s

Details

Build Sidecars / Build Sidecar (Windows) (push) Successful in 24m32s

Details

Release / Build App (Windows) (push) Successful in 3m12s

Details

Fix CSP blocking IPC/assets + fix pyannote AudioDecoder crash

CSP: Add connect-src for ipc.localhost and asset.localhost so Tauri IPC
commands and local file loading (waveform, audio playback) work.

pyannote: Block torchcodec in sys.modules at startup so pyannote.audio
falls back to torchaudio for audio decoding. pyannote has a bug where
it uses AudioDecoder unconditionally even when torchcodec import fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-22 09:54:21 -07:00

.claude

Merge perf/stream-segments: streaming partial transcript segments and speaker updates

2026-03-20 13:51:51 -07:00

.gitea/workflows

Fix workflow race condition and sidecar path filter

2026-03-22 08:46:34 -07:00

docs

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

python

Fix CSP blocking IPC/assets + fix pyannote AudioDecoder crash

2026-03-22 09:54:21 -07:00

scripts

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

src

Download sidecar on first launch instead of bundling

2026-03-22 07:09:10 -07:00

src-tauri

Fix CSP blocking IPC/assets + fix pyannote AudioDecoder crash

2026-03-22 09:54:21 -07:00

static

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

.gitignore

Fix sidecar.zip not bundled: move resources config into tauri.conf.json

2026-03-21 07:33:02 -07:00

CLAUDE.md

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

LICENSE

Switch local AI from Ollama to bundled llama-server, add MIT license

2026-02-26 09:00:47 -08:00

package-lock.json

Download sidecar on first launch instead of bundling

2026-03-22 07:09:10 -07:00

package.json

chore: bump version to 0.2.16 [skip ci]

2026-03-22 16:27:25 +00:00

README.md

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

RESEARCH_REPORT.md

Add STT and diarization research report

2026-02-26 16:44:58 -08:00

svelte.config.js

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

tsconfig.json

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

vite.config.js

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

README.md

Voice to Notes

A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.

Features

Speech-to-Text Transcription — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
Speaker Identification (Diarization) — Detect and distinguish between speakers using pyannote.audio
Synchronized Playback — Click any word to seek to that point in the audio (Web Audio API for instant playback)
AI Integration — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
Export Formats — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
Cross-Platform — Builds for Linux, Windows, and macOS (Apple Silicon)

Platform Support

Platform	Architecture	Status
Linux	x86_64	Supported
Windows	x86_64	Supported
macOS	ARM (Apple Silicon)	Supported

Tech Stack

Desktop shell: Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
ML pipeline: Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
Audio playback: wavesurfer.js with Web Audio API backend
AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
Local AI: Bundled llama-server (llama.cpp)
Caption export: pysubs2

Development

Prerequisites

Node.js 20+
Rust (stable)
Python 3.11+ with ML dependencies
System: libgtk-3-dev, libwebkit2gtk-4.1-dev (Linux)

Getting Started

# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e . && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev

Building for Distribution

# Build the frozen Python sidecar
npm run sidecar:build

# Build the Tauri app (requires sidecar in src-tauri/binaries/)
npm run tauri build

CI/CD

Gitea Actions workflows are in .gitea/workflows/. The build pipeline:

Build sidecar — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
Build Tauri app — Bundles the sidecar via externalBin, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)

Required Secrets

Secret	Purpose	Required?
`TAURI_SIGNING_PRIVATE_KEY`	Signs Tauri update bundles	Optional (for auto-updates)

No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.

Project Structure

src/                    # Svelte 5 frontend
src-tauri/              # Rust backend (Tauri commands, sidecar manager, SQLite)
python/                 # Python sidecar (transcription, diarization, AI)
  voice_to_notes/       # Python package
  build_sidecar.py      # PyInstaller build script
  voice_to_notes.spec   # PyInstaller spec
.gitea/workflows/       # Gitea Actions CI/CD

License

MIT

Releases 10

Voice to Notes v0.2.46 Latest

2026-03-24 02:04:26 +00:00

Languages

Python 36.6%

Svelte 30.3%

Rust 29.6%

TypeScript 2.2%

Shell 0.5%

Other 0.8%