DEV Community

Cover image for What does the Computer Vision see? Analyse a local image with JavaScript
Goran Vuksic for Stratiteq

Posted on

What does the Computer Vision see? Analyse a local image with JavaScript

Every week here at Stratiteq we have tech talks called "Brown bag". Idea behind it is to grab your lunch (brown) bag and join a session where we watch presentation about different tech topics, and discuss it afterwards. Last week our session was about Azure Computer Vision.

Computer Vision is an AI service that analyses content in images. In documentation you can find several examples how to use it from different programming languages, in this post you'll also see one example that is not in official documentation and that is: how to analyse a local image with Javascript.

In order to set up Computer Vision you should log in to the Azure Portal, click "Create a resource", select "AI + Machine learning" and "Computer Vision".

Computer Vision

Define resource name, select subscription, location, pricing tier and resource group, and create the resource. In resource overview click on "Keys and Endpoint" in order to see keys and endpoint needed to access the Cognitive Service API. This values you'll need later in code we'll write.

Key and Endpoint

Sketch of HTML page we will create is visible on the image below. We'll use camera and show feed on the page, take screenshot of camera every 5 seconds, analyse that screenshot with Computer Vision and display description under it.

Page sketch

For setup of our page we'll use following HTML code, please note JQuery is included in page head.

<!DOCTYPE html>
    <title>Brown Bag - Computer Vision</title>
    <script src=""></script>
    <h2>What does AI see?</h2>
    <table class="mainTable">
                <video id="video" width="640" height="480" autoplay></video>
                <canvas id="canvas" width="640" height="480"></canvas>
                <br />
                <h3 id="AIresponse"></h3>
Enter fullscreen mode Exit fullscreen mode

We'll use simple CSS style to align content on top of our table cells and set colour of result heading.

table td, table td * {
    vertical-align: top;
h3 {
    color: #990000;
Enter fullscreen mode Exit fullscreen mode

Inside of document.ready function we'll define our elements, check for camera availability and start camera feed.

$(document).ready(function () {

    var video = document.getElementById("video");
    var canvas = document.getElementById("canvas");
    var context = canvas.getContext("2d");

    if(navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
        navigator.mediaDevices.getUserMedia({ video: true }).then(function(stream) {
            video.srcObject = stream;

Enter fullscreen mode Exit fullscreen mode

You can check compatibility of mediaDevices on following link:

Every 5 second we'll take a screenshot of our camera feed and we'll send blob of it to the Computer Vision API.

window.setInterval(function() {
    context.drawImage(video, 0, 0, 640, 480);

        .then(res => res.blob())
        .then(blob => processImage(blob));
}, 5000);
Enter fullscreen mode Exit fullscreen mode

Result processing is done in processImage function where you need to enter your subscription key and endpoint in order to make it work. Those values are available in the Azure Computer Vision overview as mentioned earlier.

function processImage(blobImage) {
    var subscriptionKey = "COMPUTER_VISION_SUBSCRIPTION_KEY";
    var endpoint = "COMPUTER_VISION_ENDPOINT";
    var uriBase = endpoint + "vision/v3.0/analyze";

    var params = {
        "visualFeatures": "Categories,Description,Color",
        "details": "",
        "language": "en",

        url: uriBase + "?" + $.param(params),
        beforeSend: function(xhrObj){
            xhrObj.setRequestHeader("Ocp-Apim-Subscription-Key", subscriptionKey);
        type: "POST",
        cache: false,
        processData: false,
        data: blobImage
        .done(function(data) {

Enter fullscreen mode Exit fullscreen mode

Result we receive from the Computer Vision API is JSON, we'll take description from it and add it to the header 3 element named "AIresponse".

document.getElementById('AIresponse').innerHTML = data.description.captions[0].text;
Enter fullscreen mode Exit fullscreen mode

We did few tests with it, Computer Vision describes images really well, if you mess around with you could also get few funny results as we did:

Arlon running in front of glass door

Thanks for reading, you can find full code on the GitHub:

Top comments (0)