MacroPad/voice-to-notes

Fork 0

Go to file

Claude 908762073f

Release / Bump version and tag (push) Successful in 3s

Details

Release / Build App (macOS) (push) Successful in 1m31s

Details

Release / Build App (Windows) (push) Successful in 3m25s

Details

Release / Build App (Linux) (push) Successful in 3m28s

Details

Fix ffmpeg permission denied on Linux

The bundled ffmpeg in the sidecar extract dir lacked execute permissions.
Now sets chmod 755 on Unix when find_ffmpeg locates the bundled binary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-23 13:18:51 -07:00

.claude

Merge perf/stream-segments: streaming partial transcript segments and speaker updates

2026-03-20 13:51:51 -07:00

.gitea/workflows

Fix diarization tensor mismatch + fix sidecar build triggers

2026-03-22 18:30:43 -07:00

docs

Extract audio from video files before loading

2026-03-22 20:04:10 -07:00

python

chore: bump sidecar version to 1.0.13 [skip ci]

2026-03-23 14:58:07 +00:00

scripts

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

src

Save As: use save dialog so user can type a new project name

2026-03-23 10:25:00 -07:00

src-tauri

Fix ffmpeg permission denied on Linux

2026-03-23 13:18:51 -07:00

static

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

.gitignore

Fix sidecar.zip not bundled: move resources config into tauri.conf.json

2026-03-21 07:33:02 -07:00

CLAUDE.md

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

CONTRIBUTING.md

Update README, add User Guide and Contributing docs

2026-03-22 12:06:13 -07:00

LICENSE

Switch local AI from Ollama to bundled llama-server, add MIT license

2026-02-26 09:00:47 -08:00

package-lock.json

Download sidecar on first launch instead of bundling

2026-03-22 07:09:10 -07:00

package.json

chore: bump version to 0.2.41 [skip ci]

2026-03-23 17:25:07 +00:00

README.md

Bundle soundfile with native libs in PyInstaller, link LICENSE in README

2026-03-22 15:27:12 -07:00

RESEARCH_REPORT.md

Add STT and diarization research report

2026-02-26 16:44:58 -08:00

svelte.config.js

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

tsconfig.json

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

vite.config.js

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

README.md

Voice to Notes

A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.

Features

Speech-to-Text — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
Speaker Identification — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
GPU Acceleration — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
Synchronized Playback — Click any word to seek. Waveform visualization via wavesurfer.js.
AI Chat — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
Export — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
Cross-Platform — Linux, Windows, macOS (Apple Silicon).

Quick Start

Download the installer from Releases
On first launch, choose CPU or CUDA sidecar (the AI engine downloads separately, ~500MB–2GB)
Import an audio/video file and click Transcribe

See the full User Guide for detailed setup and usage instructions.

Platform Support

Platform	Architecture	Installers
Linux	x86_64	.deb, .rpm
Windows	x86_64	.msi, .exe (NSIS)
macOS	ARM (Apple Silicon)	.dmg

Architecture

The app is split into two independently versioned components:

App (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
Sidecar (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.

This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.

Tech Stack

Component	Technology
Desktop shell	Tauri v2 (Rust + Svelte 5 / TypeScript)
Transcription	faster-whisper (CTranslate2)
Speaker ID	pyannote.audio 3.1
Audio UI	wavesurfer.js
Transcript editor	TipTap (ProseMirror)
AI (local)	Ollama (any model)
AI (cloud)	OpenAI, Anthropic, OpenAI-compatible
Caption export	pysubs2
Database	SQLite (rusqlite)

Development

Prerequisites

Node.js 20+
Rust (stable)
Python 3.11+ with uv or pip
Linux: libgtk-3-dev, libwebkit2gtk-4.1-dev, libappindicator3-dev, librsvg2-dev

Getting Started

# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e ".[dev]" && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev

Building

# Build the frozen Python sidecar (CPU-only)
cd python && python build_sidecar.py --cpu-only && cd ..

# Build with CUDA support
cd python && python build_sidecar.py --with-cuda && cd ..

# Build the Tauri app
npm run tauri build

CI/CD

Two Gitea Actions workflows in .gitea/workflows/:

release.yml — Triggers on push to main:

Bumps app version (patch), creates git tag and Gitea release
Builds lightweight app installers for all platforms (no sidecar bundled)

build-sidecar.yml — Triggers on changes to python/ or manual dispatch:

Bumps sidecar version, creates sidecar-v* tag and release
Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
Uploads as separate release assets

Required Secrets

Secret	Purpose
`BUILD_TOKEN`	Gitea API token for creating releases and pushing tags

Project Structure

src/                        # Svelte 5 frontend
  lib/components/           # UI components (waveform, transcript editor, settings, etc.)
  lib/stores/               # Svelte stores (settings, transcript state)
  routes/                   # SvelteKit pages
src-tauri/                  # Rust backend
  src/sidecar/              # Sidecar process manager (download, extract, IPC)
  src/commands/             # Tauri command handlers
  nsis-hooks.nsh            # Windows uninstall cleanup
python/                     # Python sidecar
  voice_to_notes/           # Python package (transcription, diarization, AI, export)
  build_sidecar.py          # PyInstaller build script
  voice_to_notes.spec       # PyInstaller spec
.gitea/workflows/           # CI/CD (release.yml, build-sidecar.yml)
docs/                       # Documentation

License

MIT

Releases 10

Voice to Notes v0.2.46 Latest

2026-03-24 02:04:26 +00:00

Languages

Python 36.6%

Svelte 30.3%

Rust 29.6%

TypeScript 2.2%

Shell 0.5%

Other 0.8%

README.md Unescape Escape

Voice to Notes

Features

Quick Start

Platform Support

Architecture

Tech Stack

Development

Prerequisites

Getting Started

Building

CI/CD

Required Secrets

Project Structure

License

README.md