Voice → Embedding → Action
100% in-browser · no server · no LLM · < 50 ms after warm-up
Intent classification using a tiny embedding model (MiniLM-L6-v2, 23 MB, WASM) — cosine similarity, not a language model
Click to speak
Transcript
🛒 Shopping list
- Say "add milk" or "remove bread"…
⚡ Custom actions
Intent
—
Confidence
—
Latency
—
Example commands — click to trigger with this text
Cosine similarity per intent
Waiting for first command…
Local pipeline · no server · no LLM
Web Speech API
→
Transcript
→
MiniLM embedding (WASM)
→
Cosine similarity
→
DOM action