Giovanni Lima

Posted on Oct 17

Creating a Libras Recognizer with Artificial Intelligence and Teachable Machine

#webdev #computervision #ai #programming

Introduction

You know that one topic that seems like a seven-headed beast? For a long time, for me, that beast was called "Artificial Intelligence" and "Computer Vision." I thought it was just for experts, something super complex and far from my reality. But my curiosity was always there, and with the AI boom, I decided it was time to dive in and understand what happens behind the curtain.
That's when I discovered Teachable Machine, a tool from Google. It allowed me to train an image recognition model in a super simple way and see, in practice, how the magic happens.
For my very first experiment, I decided to create a letter recognizer for Libras (Brazilian Sign Language). ✌️

The Motivation Behind the Project

Talking about accessibility is fundamental to me. We live in an era with incredible technologies, frameworks, and AIs, and we have all the tools at our fingertips to build applications that truly include everyone.

Libras (Língua Brasileira de Sinais) is the official language of the deaf community in Brazil. It's not just a set of gestures; it's a complete and rich language. Communication is the bridge that connects us, and technology can be a powerful tool to make that bridge stronger and more accessible. Imagine using your phone's camera to translate Libras in real-time? The possibilities are enormous!

With this idea in mind, I started looking for tools for a pilot project, and Teachable Machine seemed perfect to get started.

Teachable Machine

Think of Teachable Machine as a "private tutor" for your Artificial Intelligence, but much easier to use. It's a free online platform from Google that works visually, without you needing to write a single line of code for the training.
It basically works in three steps:

Collect: You show the AI examples. If it's an image project, you take pictures. If it's sound, you record audio clips. Simple as that.

Train: You press a button. The platform analyzes all the examples you provided and learns to recognize the patterns for each category.

Export: The tool gives you a "finished" model, which you can download or use via a link in your own website or app projects.

Training a Model 🏋️

To start training our first model, we have the option to choose which category we want to train:

First, I chose an image project. Then, the fun began: I created my "classes," which are the categories I wanted the model to learn. For example: "Letter A," "Letter B."

With the webcam on, I started capturing several photos for each class. (A quick tip: Don't take all the pictures exactly the same! I moved my hand a bit, changed the angle, and moved closer to and farther from the camera. This helps the model become "smarter" and not depend on perfect conditions to work).

After capturing a good number of images, I clicked "Train Model." A few seconds later, the model was ready to be tested and exported.

Inside the Code 👨‍💻

The code that Teachable Machine generates is basically a web page (HTML + CSS + JavaScript) that comes ready to use. It accesses your webcam, loads the model you just trained, and shows what it's "seeing" in real-time.

Example of the application running:

Script Used:

 <script type="text/javascript">
        const URL = "https://teachablemachine.withgoogle.com/models/Ku7kMohsH/";

        let model, webcam, maxPredictions;
        let isRunning = false;

        const overlay = document.getElementById('overlay');
        const startBtn = document.getElementById('startBtn');
        const stopBtn = document.getElementById('stopBtn');
        const predictionsPanel = document.getElementById('predictionsPanel');
        const predictionsGrid = document.getElementById('predictionsGrid');
        const perfectPredictions = document.getElementById('perfect-predictions');
        const errorDiv = document.getElementById('error');

        async function init() {
            try {
                startBtn.disabled = true;
                startBtn.innerHTML = '<span class="spinner"></span>Initializing...';
                errorDiv.classList.add('hidden');

                if (typeof tmImage === 'undefined') {
                    throw new Error('Teachable Machine library not loaded. Please refresh the page.');
                }

                console.log('Loading model...');
                const modelURL = URL + "model.json";
                const metadataURL = URL + "metadata.json";

                model = await tmImage.load(modelURL, metadataURL);
                maxPredictions = model.getTotalClasses();
                console.log('Model loaded successfully. Classes:', maxPredictions);

                console.log('Setting up webcam...');
                const flip = true;
                webcam = new tmImage.Webcam(640, 480, flip);
                await webcam.setup({
                    facingMode: "user"
                });
                await webcam.play();

                console.log('Webcam ready');

                document.getElementById("webcam-container").appendChild(webcam.canvas);

                overlay.classList.add('hidden');
                startBtn.classList.add('hidden');
                stopBtn.classList.remove('hidden');
                predictionsPanel.classList.remove('hidden');

                isRunning = true;
                window.requestAnimationFrame(loop);

            } catch (error) {
                console.error('Initialization error:', error);
                errorDiv.textContent = error.message;
                errorDiv.classList.remove('hidden');
                startBtn.disabled = false;
                startBtn.textContent = 'Start Webcam';
            }
        }

        async function loop() {
            if (!isRunning) return;

            webcam.update();
            await predict();
            window.requestAnimationFrame(loop);
        }

        async function predict() {
            const prediction = await model.predict(webcam.canvas);
            displayPredictions(prediction);
        }

        function displayPredictions(predictions) {
            // Update predictions grid
            predictionsGrid.innerHTML = '';
            predictions.forEach(pred => {
                const item = document.createElement('div');
                item.className = 'prediction-item';
                item.style.borderLeftColor = `rgba(34, 211, 238, ${pred.probability})`;

                item.innerHTML = `
                    <span class="prediction-name">${pred.className}</span>
                    <div class="prediction-bar">
                        <div class="bar">
                            <div class="bar-fill" style="width: ${pred.probability * 100}%"></div>
                        </div>
                        <span class="prediction-value">${(pred.probability * 100).toFixed(1)}%</span>
                    </div>
                `;

                predictionsGrid.appendChild(item);
            });

            // Show perfect predictions overlay
            const perfect = predictions.filter(p => p.probability >= 0.95);
            if (perfect.length > 0) {
                perfectPredictions.innerHTML = perfect.map(p => `
                    <h2>${p.className}</h2>
                    <p>Precisão: ${(p.probability * 100).toFixed(1)}%</p>
                `).join('');
                perfectPredictions.classList.remove('hidden');
            } else {
                perfectPredictions.classList.add('hidden');
            }
        }

        function stop() {
            isRunning = false;

            if (webcam) {
                webcam.stop();
                const canvas = document.querySelector('#webcam-container canvas');
                if (canvas) {
                    canvas.remove();
                }
            }

            overlay.classList.remove('hidden');
            startBtn.classList.remove('hidden');
            startBtn.disabled = false;
            startBtn.textContent = 'Start Webcam';
            stopBtn.classList.add('hidden');
            predictionsPanel.classList.add('hidden');
            perfectPredictions.classList.add('hidden');
        }

        startBtn.addEventListener('click', init);
        stopBtn.addEventListener('click', stop);

        window.addEventListener('load', () => {
            if (typeof tmImage === 'undefined') {
                errorDiv.textContent = 'Required libraries failed to load. Please refresh the page.';
                errorDiv.classList.remove('hidden');
                startBtn.disabled = true;
            }
        });
    </script>

Essential Technologies:

TensorFlow.js (tf.min.js): This is the main library, the "brain" of the operation. It allows Machine Learning models to run directly in the browser, without needing a complex server. It's what performs the calculations to analyze the image from your webcam.
Teachable Machine Image Library (teachablemachine-image.min.js): This is a "helper" library built on top of TensorFlow.js. It enormously simplifies the process of loading your specific Teachable Machine model and connecting it to the webcam.

How the Script Works:

The URL constant: At the beginning of the script, the URL variable points to your model hosted on the Google cloud. That's where the browser downloads model.json (the architecture and "weights" of your trained model) and metadata.json (the names of your classes, like "Letter A").
init() function (Initialization): When you click to start the webcam, this function is called. It:
Loads the model from the specified URL (tmImage.load).
Configures and turns on the webcam (new tmImage.Webcam).
Starts a continuous "loop" to make predictions.
loop() function (The Real-Time Heart): This function runs dozens of times per second. In each "lap" of the loop, it:
Updates the webcam image (webcam.update).
Calls the predict function to analyze the current image.
predict() function (The Prediction): This is the most important part. It takes the current frame from the webcam and sends it to the loaded model (model.predict). The model returns an array with the probability that the image belongs to each of your classes (e.g., [{className: "Letter A", probability: 0.98}, {className: "Letter B", probability: 0.02}]).
displayPredictions() function (The Display): This function takes the prediction result and updates the page's interface, showing the class names and the percentage bars, giving you real-time visual feedback.

Results, Challenges, and Reflections 🤔

For a first experience, the result was incredible! It was really cool to see the machine recognizing my gestures on the screen. But, like in any project, I also faced some challenges.

What went well? For well-defined gestures with good lighting, the accuracy was very high, often exceeding 95%. The real-time response was also fantastic, with no delays.

What were the difficulties encountered? It became clear that the model is sensitive to its environment. A change in the room's lighting or a cluttered background was enough to confuse the AI. Similar gestures were also a challenge. This just goes to show how extremely important the quality of the training data is.

This experience made me think about the next step: a model that can identify complete words in Libras. The challenge would be much greater because it would involve recognizing a sequence of moving gestures, but with the tools we have today, it seems more and more possible.

Conclusion

Creating a Libras recognizer, even a simple one, reinforced for me the giant potential that technology has to promote inclusion. It's not just about code and algorithms, but about building bridges.

If I was able to create this in one evening, what's stopping us from diving deeper into our own projects? AI is increasingly within our reach these days. For those of you who read this far, thank you very much! Any feedback or questions will be well received.

DEV Community