MacroPad/voice-to-notes

Fork 0

Go to file

Claude 425e3c2b7c

Build Sidecars / Bump sidecar version and tag (push) Successful in 3s

Details

Release / Bump version and tag (push) Successful in 3s

Details

Build Sidecars / Build Sidecar (macOS) (push) Successful in 5m16s

Details

Release / Build App (macOS) (push) Successful in 1m19s

Details

Build Sidecars / Build Sidecar (Linux) (push) Successful in 13m55s

Details

Release / Build App (Linux) (push) Successful in 4m1s

Details

Release / Build App (Windows) (push) Has been cancelled

Details

Build Sidecars / Build Sidecar (Windows) (push) Successful in 33m38s

Details

Fix Ollama connection: remove double /v1 in URL

base_url was being set to 'http://localhost:11434/v1' by the frontend,
then LocalProvider appended another '/v1', resulting in '/v1/v1'.
Now the provider uses base_url directly (frontend already appends /v1).
Also fixed health check to hit Ollama root instead of /health.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-22 17:41:46 -07:00

.claude

Merge perf/stream-segments: streaming partial transcript segments and speaker updates

2026-03-20 13:51:51 -07:00

.gitea/workflows

Fix workflow race condition and sidecar path filter

2026-03-22 08:46:34 -07:00

docs

Update README, add User Guide and Contributing docs

2026-03-22 12:06:13 -07:00

python

Fix Ollama connection: remove double /v1 in URL

2026-03-22 17:41:46 -07:00

scripts

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

src

Fix word wrap in transcript editor

2026-03-22 11:59:15 -07:00

src-tauri

chore: bump version to 0.2.25 [skip ci]

2026-03-23 00:38:11 +00:00

static

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

.gitignore

Fix sidecar.zip not bundled: move resources config into tauri.conf.json

2026-03-21 07:33:02 -07:00

CLAUDE.md

Cross-platform distribution, UI improvements, and performance optimizations

2026-03-20 21:33:43 -07:00

CONTRIBUTING.md

Update README, add User Guide and Contributing docs

2026-03-22 12:06:13 -07:00

LICENSE

Switch local AI from Ollama to bundled llama-server, add MIT license

2026-02-26 09:00:47 -08:00

package-lock.json

Download sidecar on first launch instead of bundling

2026-03-22 07:09:10 -07:00

package.json

chore: bump version to 0.2.25 [skip ci]

2026-03-23 00:38:11 +00:00

README.md

Bundle soundfile with native libs in PyInstaller, link LICENSE in README

2026-03-22 15:27:12 -07:00

RESEARCH_REPORT.md

Add STT and diarization research report

2026-02-26 16:44:58 -08:00

svelte.config.js

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

tsconfig.json

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

vite.config.js

Phase 1 foundation: Tauri shell, Python sidecar, SQLite database

2026-02-26 15:16:06 -08:00

README.md

Voice to Notes

A desktop application that transcribes audio and video recordings with speaker identification, synchronized playback, and AI-powered analysis. Export to SRT, WebVTT, ASS captions, plain text, or Markdown.

Features

Speech-to-Text — Accurate transcription via faster-whisper with word-level timestamps. Supports 99 languages.
Speaker Identification — Detect and label speakers using pyannote.audio. Rename speakers for clean exports.
GPU Acceleration — CUDA support for NVIDIA GPUs (Windows/Linux). Falls back to CPU automatically.
Synchronized Playback — Click any word to seek. Waveform visualization via wavesurfer.js.
AI Chat — Ask questions about your transcript. Works with Ollama (local), OpenAI, Anthropic, or any OpenAI-compatible API.
Export — SRT, WebVTT, ASS, plain text, Markdown — all with speaker labels.
Cross-Platform — Linux, Windows, macOS (Apple Silicon).

Quick Start

Download the installer from Releases
On first launch, choose CPU or CUDA sidecar (the AI engine downloads separately, ~500MB–2GB)
Import an audio/video file and click Transcribe

See the full User Guide for detailed setup and usage instructions.

Platform Support

Platform	Architecture	Installers
Linux	x86_64	.deb, .rpm
Windows	x86_64	.msi, .exe (NSIS)
macOS	ARM (Apple Silicon)	.dmg

Architecture

The app is split into two independently versioned components:

App (v0.2.x) — Tauri desktop shell with Svelte frontend. Small installer (~50MB).
Sidecar (v1.x) — Python ML engine (faster-whisper, pyannote.audio). Downloaded on first launch. CPU (~500MB) or CUDA (~2GB) variants.

This separation means app UI updates don't require re-downloading the sidecar, and sidecar updates don't require reinstalling the app.

Tech Stack

Component	Technology
Desktop shell	Tauri v2 (Rust + Svelte 5 / TypeScript)
Transcription	faster-whisper (CTranslate2)
Speaker ID	pyannote.audio 3.1
Audio UI	wavesurfer.js
Transcript editor	TipTap (ProseMirror)
AI (local)	Ollama (any model)
AI (cloud)	OpenAI, Anthropic, OpenAI-compatible
Caption export	pysubs2
Database	SQLite (rusqlite)

Development

Prerequisites

Node.js 20+
Rust (stable)
Python 3.11+ with uv or pip
Linux: libgtk-3-dev, libwebkit2gtk-4.1-dev, libappindicator3-dev, librsvg2-dev

Getting Started

# Install frontend dependencies
npm install

# Install Python sidecar dependencies
cd python && pip install -e ".[dev]" && cd ..

# Run in dev mode (uses system Python for the sidecar)
npm run tauri:dev

Building

# Build the frozen Python sidecar (CPU-only)
cd python && python build_sidecar.py --cpu-only && cd ..

# Build with CUDA support
cd python && python build_sidecar.py --with-cuda && cd ..

# Build the Tauri app
npm run tauri build

CI/CD

Two Gitea Actions workflows in .gitea/workflows/:

release.yml — Triggers on push to main:

Bumps app version (patch), creates git tag and Gitea release
Builds lightweight app installers for all platforms (no sidecar bundled)

build-sidecar.yml — Triggers on changes to python/ or manual dispatch:

Bumps sidecar version, creates sidecar-v* tag and release
Builds CPU + CUDA variants for Linux/Windows, CPU for macOS
Uploads as separate release assets

Required Secrets

Secret	Purpose
`BUILD_TOKEN`	Gitea API token for creating releases and pushing tags

Project Structure

src/                        # Svelte 5 frontend
  lib/components/           # UI components (waveform, transcript editor, settings, etc.)
  lib/stores/               # Svelte stores (settings, transcript state)
  routes/                   # SvelteKit pages
src-tauri/                  # Rust backend
  src/sidecar/              # Sidecar process manager (download, extract, IPC)
  src/commands/             # Tauri command handlers
  nsis-hooks.nsh            # Windows uninstall cleanup
python/                     # Python sidecar
  voice_to_notes/           # Python package (transcription, diarization, AI, export)
  build_sidecar.py          # PyInstaller build script
  voice_to_notes.spec       # PyInstaller spec
.gitea/workflows/           # CI/CD (release.yml, build-sidecar.yml)
docs/                       # Documentation

License

MIT

Releases 10

Voice to Notes v0.2.46 Latest

2026-03-24 02:04:26 +00:00

Languages

Python 36.6%

Svelte 30.3%

Rust 29.6%

TypeScript 2.2%

Shell 0.5%

Other 0.8%

README.md Unescape Escape

Voice to Notes

Features

Quick Start

Platform Support

Architecture

Tech Stack

Development

Prerequisites

Getting Started

Building

CI/CD

Required Secrets

Project Structure

License

README.md