- OAuth authentication via Authentik - WebSocket connection to OpenClaw gateway - Configurable gateway URL with first-run setup - User preferences sync across devices - Multi-user support with custom assistant names - ElevenLabs TTS integration (local + remote) - FCM push notifications for alarms - Voice input via Google Speech API - No hardcoded secrets or internal IPs in tracked files
3.5 KiB
3.5 KiB
Open-Source Wake Word Alternatives
Vosk (Recommended ✅)
Best open-source option for Android
Pros:
- ✅ Fully open source (Apache 2.0)
- ✅ Actively maintained
- ✅ Excellent Android support
- ✅ Small models (20-50MB)
- ✅ Fast, on-device processing
- ✅ No API keys, no accounts
- ✅ Works offline
- ✅ Can do continuous keyword spotting
Cons:
- ❌ Not as battery-optimized as Porcupine
- ❌ Slightly larger model size
- ❌ More CPU intensive
Implementation:
// Add to build.gradle.kts
implementation("com.alphacephei:vosk-android:0.3.47")
// Download small model (~40MB)
// https://alphacephei.com/vosk/models
// vosk-model-small-en-us-0.15.zip
How it works:
- Continuous speech recognition
- Listen for "alfred" or "hey alfred" in the audio stream
- When detected, trigger voice input
- Can even extract what they said after the wake word!
Battery Impact:
- Moderate (~2-3% per hour)
- Can be optimized with shorter recognition windows
Pocketsphinx
The OG open-source speech recognition
Pros:
- ✅ Fully open source (BSD license)
- ✅ Mature, proven technology (CMU)
- ✅ Android library available
- ✅ Very customizable
- ✅ No external dependencies
Cons:
- ❌ Lower accuracy than modern solutions
- ❌ Older API, less documentation
- ❌ Harder to set up
- ❌ Higher battery usage
Implementation:
// Add to build.gradle.kts
implementation("edu.cmu.pocketsphinx:pocketsphinx-android:5prealpha-SNAPSHOT")
Android AlwaysOnHotwordDetector
Built into Android (8.0+)
Pros:
- ✅ Zero dependencies
- ✅ System-level battery optimization
- ✅ Built into Android
Cons:
- ❌ Only works with system wake words ("Ok Google", etc.)
- ❌ Can't train custom "Alfred" wake word
- ❌ Requires special permissions
- ❌ Limited control
Not recommended for custom wake words.
TensorFlow Lite + Custom Model
Roll your own
Pros:
- ✅ Complete control
- ✅ Open source
- ✅ Can be very efficient if done right
Cons:
- ❌ Need to train your own model
- ❌ Need training data (recordings of "Alfred")
- ❌ Complex implementation
- ❌ High development time (weeks)
Not recommended unless you want a fun project.
Recommendation: Vosk
Why Vosk is the best choice:
-
True Open Source
- No vendor lock-in
- Apache 2.0 license
- Active community
-
Good Balance
- Decent battery life (not as good as Porcupine, but acceptable)
- Good accuracy
- Easy to implement
- Well-documented
-
Bonus Features
- Can transcribe what they said AFTER "Alfred"
- So "Hey Alfred, what's the weather" could extract "what's the weather" directly
- This could skip the voice input step entirely!
-
No Account/API Key Required
- Just download the model
- Bundle it with the app
- Done!
Implementation Complexity
Vosk:
- Setup: ~30 minutes (download model, add dependency)
- Code: ~1-2 hours
- Total: ~2-3 hours
Pocketsphinx:
- Setup: ~1 hour (configure, download models)
- Code: ~3-4 hours (harder API)
- Total: ~4-5 hours
My Recommendation
Go with Vosk.
It's the best balance of:
- Open source ethos ✅
- Easy implementation ✅
- Good accuracy ✅
- Reasonable battery usage ✅
- Active development ✅
And the bonus feature of potentially extracting the full command ("Hey Alfred, what's the weather?") means we could make the UX even better than Porcupine!
Want me to implement Vosk wake word detection?