Initial commit: Alfred Mobile - AI Assistant Android App
- OAuth authentication via Authentik - WebSocket connection to OpenClaw gateway - Configurable gateway URL with first-run setup - User preferences sync across devices - Multi-user support with custom assistant names - ElevenLabs TTS integration (local + remote) - FCM push notifications for alarms - Voice input via Google Speech API - No hardcoded secrets or internal IPs in tracked files
This commit is contained in:
161
WAKE_WORD_ALTERNATIVES.md
Normal file
161
WAKE_WORD_ALTERNATIVES.md
Normal file
@@ -0,0 +1,161 @@
|
||||
# Open-Source Wake Word Alternatives
|
||||
|
||||
## Vosk (Recommended ✅)
|
||||
|
||||
**Best open-source option for Android**
|
||||
|
||||
**Pros:**
|
||||
- ✅ Fully open source (Apache 2.0)
|
||||
- ✅ Actively maintained
|
||||
- ✅ Excellent Android support
|
||||
- ✅ Small models (20-50MB)
|
||||
- ✅ Fast, on-device processing
|
||||
- ✅ No API keys, no accounts
|
||||
- ✅ Works offline
|
||||
- ✅ Can do continuous keyword spotting
|
||||
|
||||
**Cons:**
|
||||
- ❌ Not as battery-optimized as Porcupine
|
||||
- ❌ Slightly larger model size
|
||||
- ❌ More CPU intensive
|
||||
|
||||
**Implementation:**
|
||||
```kotlin
|
||||
// Add to build.gradle.kts
|
||||
implementation("com.alphacephei:vosk-android:0.3.47")
|
||||
|
||||
// Download small model (~40MB)
|
||||
// https://alphacephei.com/vosk/models
|
||||
// vosk-model-small-en-us-0.15.zip
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
- Continuous speech recognition
|
||||
- Listen for "alfred" or "hey alfred" in the audio stream
|
||||
- When detected, trigger voice input
|
||||
- Can even extract what they said after the wake word!
|
||||
|
||||
**Battery Impact:**
|
||||
- Moderate (~2-3% per hour)
|
||||
- Can be optimized with shorter recognition windows
|
||||
|
||||
---
|
||||
|
||||
## Pocketsphinx
|
||||
|
||||
**The OG open-source speech recognition**
|
||||
|
||||
**Pros:**
|
||||
- ✅ Fully open source (BSD license)
|
||||
- ✅ Mature, proven technology (CMU)
|
||||
- ✅ Android library available
|
||||
- ✅ Very customizable
|
||||
- ✅ No external dependencies
|
||||
|
||||
**Cons:**
|
||||
- ❌ Lower accuracy than modern solutions
|
||||
- ❌ Older API, less documentation
|
||||
- ❌ Harder to set up
|
||||
- ❌ Higher battery usage
|
||||
|
||||
**Implementation:**
|
||||
```kotlin
|
||||
// Add to build.gradle.kts
|
||||
implementation("edu.cmu.pocketsphinx:pocketsphinx-android:5prealpha-SNAPSHOT")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Android AlwaysOnHotwordDetector
|
||||
|
||||
**Built into Android (8.0+)**
|
||||
|
||||
**Pros:**
|
||||
- ✅ Zero dependencies
|
||||
- ✅ System-level battery optimization
|
||||
- ✅ Built into Android
|
||||
|
||||
**Cons:**
|
||||
- ❌ Only works with system wake words ("Ok Google", etc.)
|
||||
- ❌ Can't train custom "Alfred" wake word
|
||||
- ❌ Requires special permissions
|
||||
- ❌ Limited control
|
||||
|
||||
**Not recommended** for custom wake words.
|
||||
|
||||
---
|
||||
|
||||
## TensorFlow Lite + Custom Model
|
||||
|
||||
**Roll your own**
|
||||
|
||||
**Pros:**
|
||||
- ✅ Complete control
|
||||
- ✅ Open source
|
||||
- ✅ Can be very efficient if done right
|
||||
|
||||
**Cons:**
|
||||
- ❌ Need to train your own model
|
||||
- ❌ Need training data (recordings of "Alfred")
|
||||
- ❌ Complex implementation
|
||||
- ❌ High development time (weeks)
|
||||
|
||||
**Not recommended** unless you want a fun project.
|
||||
|
||||
---
|
||||
|
||||
## Recommendation: Vosk
|
||||
|
||||
**Why Vosk is the best choice:**
|
||||
|
||||
1. **True Open Source**
|
||||
- No vendor lock-in
|
||||
- Apache 2.0 license
|
||||
- Active community
|
||||
|
||||
2. **Good Balance**
|
||||
- Decent battery life (not as good as Porcupine, but acceptable)
|
||||
- Good accuracy
|
||||
- Easy to implement
|
||||
- Well-documented
|
||||
|
||||
3. **Bonus Features**
|
||||
- Can transcribe what they said AFTER "Alfred"
|
||||
- So "Hey Alfred, what's the weather" could extract "what's the weather" directly
|
||||
- This could skip the voice input step entirely!
|
||||
|
||||
4. **No Account/API Key Required**
|
||||
- Just download the model
|
||||
- Bundle it with the app
|
||||
- Done!
|
||||
|
||||
---
|
||||
|
||||
## Implementation Complexity
|
||||
|
||||
**Vosk:**
|
||||
- Setup: ~30 minutes (download model, add dependency)
|
||||
- Code: ~1-2 hours
|
||||
- Total: ~2-3 hours
|
||||
|
||||
**Pocketsphinx:**
|
||||
- Setup: ~1 hour (configure, download models)
|
||||
- Code: ~3-4 hours (harder API)
|
||||
- Total: ~4-5 hours
|
||||
|
||||
---
|
||||
|
||||
## My Recommendation
|
||||
|
||||
**Go with Vosk.**
|
||||
|
||||
It's the best balance of:
|
||||
- Open source ethos ✅
|
||||
- Easy implementation ✅
|
||||
- Good accuracy ✅
|
||||
- Reasonable battery usage ✅
|
||||
- Active development ✅
|
||||
|
||||
And the bonus feature of potentially extracting the full command ("Hey Alfred, what's the weather?") means we could make the UX even better than Porcupine!
|
||||
|
||||
Want me to implement Vosk wake word detection?
|
||||
Reference in New Issue
Block a user