Configure Symbl.ai to Listen for Action Items in a WebSocket Call

#javascript #ai #machinelearning #nlp

Symbl.ai, a conversation intelligence platform for developers to create new experiences around conversation data, empowers developers to extend beyond mere automated speech recognition to contextual insights. Contextual insights are the results of Symbl.ai's Natural Language Processing algorithms to recognize speaker intentions. Among a few of the most common intentions speakers express in speech are follow-ups, questions or action items.

In short, action-items as a conversational entity recognized by Symbl.ai's platform are a reflection of a speaker's call to action during his or her conversation at any time.

First Step

The first step to access action items as Symbl.ai's contextual insights is to sign up. Register for an account at Symbl.ai (i.e., platform.symbl.ai). Grab both your appId and your appSecret. With both of those you receive your x-api-key.

If you would like to sign your JSON Web Tokens with your appId together with appSecret in a cURL command executed in the terminal, here is a code snippet.

curl -k -X POST "https://api.symbl.ai/oauth2/token:generate" \
     -H "accept: application/json" \
     -H "Content-Type: application/json" \
     -d "{ \"type\": \"application\", \"appId\": \"<appId>\", \"appSecret\": \"<appSecret>\"}"

If you would like to authenticate into Symbl.ai's developer platform to signing the JWTs in Postman, the added benefit is that Symbl.ai's public workspace includes almost all of its APIs, including a section called "Labs" for its most experimental developer APIs.

Second Step

The second step is to familiarize yourself with Symbl.ai's documentation on its Streaming API. To review, loop through each of the following items from Symbl.ai's live speech-to-text tutorial: 1) creating a WebSocket, 2) setting its listeners, 3) creating an audio stream, or 4) handling the audio stream. If you wouldn't like to review these concepts, then paste the following copied code directly into your console:

/**
 * The JWT token you get after authenticating with our API.
 * Check the Authentication section of the documentation for more details.
 */
const accessToken = ""
const uniqueMeetingId = btoa("user@example.com")
const symblEndpoint = `wss://api.symbl.ai/v1/realtime/insights/${uniqueMeetingId}?access_token=${accessToken}`;

const ws = new WebSocket(symblEndpoint);

// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
  // You can find the conversationId in event.message.data.conversationId;
  const data = JSON.parse(event.data);
  if (data.type === 'message' && data.message.hasOwnProperty('data')) {
    console.log('conversationId', data.message.data.conversationId);
  }
  if (data.type === 'message_response') {
    for (let message of data.messages) {
      console.log('Transcript (more accurate): ', message.payload.content);
    }
  }
  if (data.type === 'message' && data.message.hasOwnProperty('punctuated')) {
    console.log('Live transcript (less accurate): ', data.message.punctuated.transcript)
  }
  console.log(`Response type: ${data.type}. Object: `, data);
};

// Fired when the WebSocket closes unexpectedly due to an error or lost connetion
ws.onerror  = (err) => {
  console.error(err);
};

// Fired when the WebSocket connection has been closed
ws.onclose = (event) => {
  console.info('Connection to websocket closed');
};

// Fired when the connection succeeds.
ws.onopen = (event) => {
  ws.send(JSON.stringify({
    type: 'start_request',
    meetingTitle: 'Websockets How-to', // Conversation name
    insightTypes: ['question', 'action_item'], // Will enable insight generation
    config: {
      confidenceThreshold: 0.5,
      languageCode: 'en-US',
      speechRecognition: {
        encoding: 'LINEAR16',
        sampleRateHertz: 44100,
      }
    },
    speaker: {
      userId: 'example@symbl.ai',
      name: 'Example Sample',
    }
  }));
};

const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });

/**
 * The callback function which fires after a user gives the browser permission to use
 * the computer's microphone. Starts a recording session which sends the audio stream to
 * the WebSocket endpoint for processing.
 */
const handleSuccess = (stream) => {
  const AudioContext = window.AudioContext;
  const context = new AudioContext();
  const source = context.createMediaStreamSource(stream);
  const processor = context.createScriptProcessor(1024, 1, 1);
  const gainNode = context.createGain();
  source.connect(gainNode);
  gainNode.connect(processor);
  processor.connect(context.destination);
  processor.onaudioprocess = (e) => {
    // convert to 16-bit payload
    const inputData = e.inputBuffer.getChannelData(0) || new Float32Array(this.bufferSize);
    const targetBuffer = new Int16Array(inputData.length);
    for (let index = inputData.length; index > 0; index--) {
        targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
    }
    // Send audio stream to websocket.
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(targetBuffer.buffer);
    }
  };
};


handleSuccess(stream);

Third Step

The third step Symbl.ai's WebSocket in your browser. Open up an instance of Chrome. Open up the console. Paste the above copied code directly into your console. After hitting enter, your WebSocket's messages start to pile up.

Fourth Step

After having run the Streaming API in the browser, you receive Symbl.ai's transcription from automated speech recognition in real-time. However, Symbl.ai enables you as a developer to extend far beyond merely automated speech recognition to contextual insights. In the code you run in the browser, you configure the WebSocket's event listener to capture contextual insights in real-time.

Navigate to the event listener ws.onmessage. Inside of the event listener, the ws.onmessage method provides you the ability to track events such as those events that are sent or received by the WebSocket. In the event listener there is a stream of raw data following on the WebSocket's protocol.

 if (data.type === 'topic_response') {
    for (let topic of data.topics) {
      console.log('Topic detected: ', topic.phrases)
    }
  }
  if (data.type === 'insight_response') {
    for (let insight of data.insights) {
      console.log('Insight detected: ', insight.payload.content);
       if (insight.type === "action_item" ) {
          console.log("Insight detected is an Action Item!!!")
      }
    }
  }

After adding the new log to your WebSocket's ws.onmessage method, the following is the full code:

/**
 * The JWT token you get after authenticating with our API.
 * Check the Authentication section of the documentation for more details.
 */
const accessToken = ""
const uniqueMeetingId = btoa("user@example.com")
const symblEndpoint = `wss://api.symbl.ai/v1/realtime/insights/${uniqueMeetingId}?access_token=${accessToken}`;

const ws = new WebSocket(symblEndpoint);

// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
  // You can find the conversationId in event.message.data.conversationId;
  const data = JSON.parse(event.data);
  if (data.type === 'message' && data.message.hasOwnProperty('data')) {
    console.log('conversationId', data.message.data.conversationId);
  }
  if (data.type === 'message_response') {
    for (let message of data.messages) {
      console.log('Transcript (more accurate): ', message.payload.content);
    }
  }
if (data.type === 'topic_response') {
    for (let topic of data.topics) {
      console.log('Topic detected: ', topic.phrases)
    }
  }
  if (data.type === 'insight_response') {
    for (let insight of data.insights) {
      console.log('Insight detected: ', insight.payload.content);
       if (insight.type === "action_item" ) {
          console.log("Insight detected is an Action Item!!!")
      }
    }
  }
  if (data.type === 'message' && data.message.hasOwnProperty('punctuated')) {
    console.log('Live transcript (less accurate): ', data.message.punctuated.transcript)
  }
  console.log(`Response type: ${data.type}. Object: `, data);
};

// Fired when the WebSocket closes unexpectedly due to an error or lost connetion
ws.onerror  = (err) => {
  console.error(err);
};

// Fired when the WebSocket connection has been closed
ws.onclose = (event) => {
  console.info('Connection to websocket closed');
};

// Fired when the connection succeeds.
ws.onopen = (event) => {
  ws.send(JSON.stringify({
    type: 'start_request',
    meetingTitle: 'Websockets How-to', // Conversation name
    insightTypes: ['question', 'action_item'], // Will enable insight generation
    config: {
      confidenceThreshold: 0.5,
      languageCode: 'en-US',
      speechRecognition: {
        encoding: 'LINEAR16',
        sampleRateHertz: 44100,
      }
    },
    speaker: {
      userId: 'example@symbl.ai',
      name: 'Example Sample',
    }
  }));
};

const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });

/**
 * The callback function which fires after a user gives the browser permission to use
 * the computer's microphone. Starts a recording session which sends the audio stream to
 * the WebSocket endpoint for processing.
 */
const handleSuccess = (stream) => {
  const AudioContext = window.AudioContext;
  const context = new AudioContext();
  const source = context.createMediaStreamSource(stream);
  const processor = context.createScriptProcessor(1024, 1, 1);
  const gainNode = context.createGain();
  source.connect(gainNode);
  gainNode.connect(processor);
  processor.connect(context.destination);
  processor.onaudioprocess = (e) => {
    // convert to 16-bit payload
    const inputData = e.inputBuffer.getChannelData(0) || new Float32Array(this.bufferSize);
    const targetBuffer = new Int16Array(inputData.length);
    for (let index = inputData.length; index > 0; index--) {
        targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
    }
    // Send audio stream to websocket.
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(targetBuffer.buffer);
    }
  };
};


handleSuccess(stream);

What's Next?

If you would like to add a listener for real-time sentiment analysis to your Symbl.ai configurations, Symbl.ai provides you the ability to listen to polarity scores on sentiments from messages in real-time. A basic knowledge of WebSockets is a first step. After logging sentiments, the next step is create a way to capture the data in real-time. If you would like to skip these blogs, feel free to download the code from Symbl.ai's GitHub where you can find the real-time sentiment analysis repo with instructions.

Sentiment analysis is only one way to handle the conversation data from Symbl.ai's enabled voice. Another way to handle Symbl.ai's enabled voice is to connect an API. In particular, action_items enable developers to create automated workflows from the detected insights in real-time so that those detected insights appear in external Third Party SaaS dashboards in real-time.

Imagine, for instance, creating JIRA tickets in real-time through a POST request after a Symbl.ai's insight for an action_item is detected in a live conversation. With Symbl.ai, you as a developer are empowered to connect, transform or visualize conversations in ways not yet imagined or implemented in real-life, since Symbl.ai's core product extends far beyond merely automated speech recognition.

Join Our Community

Symbl.ai invites developers to reach out to us via email at developer@symbl.ai, join our Slack channels, participate in our hackathons, fork our Postman public workspace, or git clone our repos at Symbl.ai's GitHub.

DEV Community