DEV Community

Adarsh Kant
Adarsh Kant

Posted on

How We Built a Voice AI That Takes Real DOM Actions

Voice is the new click. But most voice AI today? It's just chatbots with a microphone.

The Chatbot Trap

Traditional voice assistants follow a familiar pattern: listen → transcribe → think → respond. The problem? They're passive. They can tell you how to do something, but they can't actually do it.

When a user says "book me a flight to New York," they don't want a list of booking sites. They want a confirmation email.

Voice-First Architecture

We built Anve differently. Instead of just generating text responses, our AI makes decisions about actual DOM actions it can take on the user's behalf.

The flow looks like:

  1. Intent Recognition - What does the user want?
  2. Action Planning - What DOM operations achieve this?
  3. Execution - Actually click, type, and submit
  4. Confirmation - Verify the action succeeded

The DOM Action Engine

This is where it gets interesting. Our AI doesn't just see your website—it can interact with it.

// The AI generates action sequences like:
[
  { type: 'click', selector: '#book-flight' },
  { type: 'type', selector: '#destination', value: 'New York' },
  { type: 'click', selector: '#search' }
]
Enter fullscreen mode Exit fullscreen mode

We use a custom-trained model that understands:

  • CSS selectors and element hierarchies
  • Form validation states
  • Async operation timing
  • Error recovery patterns

5-Minute Integration

Getting started is surprisingly simple:

React/Next.js:

import { AnveVoice } from '@anve/voice-react';

function App() {
  return (
    <AnveVoice 
      apiKey={process.env.ANVE_API_KEY}
      actions={['navigate', 'click', 'type', 'submit']}
    />
  );
}
Enter fullscreen mode Exit fullscreen mode

Vanilla JS:

<script src="https://cdn.anvevoice.app/voice.min.js"></script>
<script>
  Anve.init({
    apiKey: 'your-api-key',
    container: '#voice-widget'
  });
</script>
Enter fullscreen mode Exit fullscreen mode

Live Demo

See it in action: anvevoice.app/demo

The demo shows a real booking flow completed entirely through voice commands. No pre-scripted responses—just actual DOM manipulation happening in real-time.


Want to add voice to your app? We're opening early access for developers who want to build voice-first experiences. Join the waitlist

Top comments (0)