Voice → Embedding → Action

100% in-browser · no server · no LLM · < 50 ms after warm-up

Intent classification using a tiny embedding model (MiniLM-L6-v2, 23 MB, WASM) — cosine similarity, not a language model

Click to speak

Transcript

🛒 Shopping list

⚡ Custom actions

Intent

—

Confidence

—

Latency

—

Example commands — click to trigger with this text

Cosine similarity per intent

Waiting for first command…

Local pipeline · no server · no LLM

Web Speech API → Transcript → MiniLM embedding (WASM) → Cosine similarity → DOM action