MacroPad/voice-to-notes

Fork 0

T

Claude f9226ee4d0

Build Sidecars / Bump sidecar version and tag (push) Successful in 3s

Details

Release / Bump version and tag (push) Successful in 3s

Details

Build Sidecars / Build Sidecar (macOS) (push) Successful in 3m58s

Details

Release / Build App (macOS) (push) Successful in 1m20s

Details

Release / Build App (Linux) (push) Has been cancelled

Details

Release / Build App (Windows) (push) Has been cancelled

Details

Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m41s

Details

Build Sidecars / Build Sidecar (Windows) (push) Successful in 34m33s

Details

Fix diarization: use soundfile instead of torchaudio for audio loading

torchaudio 2.10 unconditionally delegates load() to torchcodec, ignoring
the backend parameter. Since torchcodec is excluded from PyInstaller,
this broke our pyannote Audio monkey-patch.

Fix: replace torchaudio.load() with soundfile.read() + torch.from_numpy().
soundfile handles WAV natively (audio is pre-converted to WAV), has no
torchcodec dependency, and is already a transitive dependency.

Also added soundfile to PyInstaller hiddenimports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-22 11:49:39 -07:00

.claude

Merge perf/stream-segments: streaming partial transcript segments and speaker updates

2026-03-20 13:51:51 -07:00

.gitea/workflows

Fix workflow race condition and sidecar path filter

2026-03-22 08:46:34 -07:00

docs

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

python

Fix diarization: use soundfile instead of torchaudio for audio loading

2026-03-22 11:49:39 -07:00

scripts

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

src

Make DevTools a toggle in Settings > Developer tab

2026-03-22 10:55:50 -07:00

src-tauri

chore: bump version to 0.2.19 [skip ci]

2026-03-22 18:00:07 +00:00

static

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

.gitignore

Fix sidecar.zip not bundled: move resources config into tauri.conf.json

2026-03-21 07:33:02 -07:00

CLAUDE.md

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

LICENSE

Switch local AI from Ollama to bundled llama-server, add MIT license

2026-02-26 09:00:47 -08:00

package-lock.json

Download sidecar on first launch instead of bundling

2026-03-22 07:09:10 -07:00

package.json

chore: bump version to 0.2.19 [skip ci]

2026-03-22 18:00:07 +00:00

README.md

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

RESEARCH_REPORT.md

Add STT and diarization research report

2026-02-26 16:44:58 -08:00

svelte.config.js

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

tsconfig.json

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

vite.config.js

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

README.md

Voice to Notes

A desktop application that transcribes audio/video recordings with speaker identification, producing editable transcriptions with synchronized audio playback.

Features

Speech-to-Text Transcription — Accurate transcription via faster-whisper (Whisper models) with word-level timestamps
Speaker Identification (Diarization) — Detect and distinguish between speakers using pyannote.audio
Synchronized Playback — Click any word to seek to that point in the audio (Web Audio API for instant playback)
AI Integration — Ask questions about your transcript via OpenAI, Anthropic, or any OpenAI-compatible API (LiteLLM proxies, Ollama, vLLM)
Export Formats — SRT, WebVTT, ASS captions, plain text, and Markdown with speaker labels
Cross-Platform — Builds for Linux, Windows, and macOS (Apple Silicon)

Platform Support

Platform	Architecture	Status
Linux	x86_64	Supported
Windows	x86_64	Supported
macOS	ARM (Apple Silicon)	Supported

Tech Stack

Desktop shell: Tauri v2 (Rust backend + Svelte 5 / TypeScript frontend)
ML pipeline: Python sidecar (faster-whisper, pyannote.audio) — frozen via PyInstaller for distribution
Audio playback: wavesurfer.js with Web Audio API backend
AI providers: OpenAI, Anthropic, OpenAI-compatible endpoints (local or remote)
Local AI: Bundled llama-server (llama.cpp)
Caption export: pysubs2

Development

Prerequisites

Node.js 20+
Rust (stable)
Python 3.11+ with ML dependencies
System: libgtk-3-dev, libwebkit2gtk-4.1-dev (Linux)

Getting Started

# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e . && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev

Building for Distribution

# Build the frozen Python sidecar
npm run sidecar:build

# Build the Tauri app (requires sidecar in src-tauri/binaries/)
npm run tauri build

CI/CD

Gitea Actions workflows are in .gitea/workflows/. The build pipeline:

Build sidecar — PyInstaller-frozen Python binary per platform (CPU-only PyTorch)
Build Tauri app — Bundles the sidecar via externalBin, produces .deb/.AppImage (Linux), .msi (Windows), .dmg (macOS)

Required Secrets

Secret	Purpose	Required?
`TAURI_SIGNING_PRIVATE_KEY`	Signs Tauri update bundles	Optional (for auto-updates)

No other secrets are needed for building. AI provider API keys and HuggingFace tokens are configured by end users in the app's Settings.

Project Structure

src/                    # Svelte 5 frontend
src-tauri/              # Rust backend (Tauri commands, sidecar manager, SQLite)
python/                 # Python sidecar (transcription, diarization, AI)
  voice_to_notes/       # Python package
  build_sidecar.py      # PyInstaller build script
  voice_to_notes.spec   # PyInstaller spec
.gitea/workflows/       # Gitea Actions CI/CD

License

MIT

Releases 10

Voice to Notes v0.2.46 Latest

2026-03-24 02:04:26 +00:00

Languages

Python 36.6%

Svelte 30.3%

Rust 29.6%

TypeScript 2.2%

Shell 0.5%

Other 0.8%