<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bright Etornam Sunu</title>
    <description>The latest articles on DEV Community by Bright Etornam Sunu (@_iametornam).</description>
    <link>https://dev.to/_iametornam</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F153888%2F7705f736-a847-4045-8624-d166b354e4d0.jpg</url>
      <title>DEV Community: Bright Etornam Sunu</title>
      <link>https://dev.to/_iametornam</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/_iametornam"/>
    <language>en</language>
    <item>
      <title>Bringing it to Life: The Real-Time Inference Engine (Part 3)</title>
      <dc:creator>Bright Etornam Sunu</dc:creator>
      <pubDate>Fri, 24 Apr 2026 20:05:35 +0000</pubDate>
      <link>https://dev.to/_iametornam/bringing-it-to-life-the-real-time-inference-engine-part-3-29fi</link>
      <guid>https://dev.to/_iametornam/bringing-it-to-life-the-real-time-inference-engine-part-3-29fi</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/_iametornam/from-pixels-to-predictions-data-pipelines-and-training-the-sequence-model-part-2-217d"&gt;Part 2&lt;/a&gt;, we successfully trained a Transformer model to map sequences of body keypoints to sign language glosses using CTC loss. However, training on pre-segmented videos is one thing; making it work in the real world—where a webcam stream is infinite and boundaries are unknown—is an entirely different beast.&lt;/p&gt;

&lt;p&gt;In this article, we tear down &lt;code&gt;inference/realtime.py&lt;/code&gt;, the beating heart of the &lt;strong&gt;asl-to-voice&lt;/strong&gt; project. We will explore how we handle infinite video streams, decode raw probabilities into words, and use Large Language Models (LLMs) to generate beautiful, spoken English on the fly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3: The Sliding Window and CTC Decoding
&lt;/h3&gt;

&lt;p&gt;When a user turns on their webcam, we don't know when a sentence begins or ends. To solve this, we implemented a &lt;strong&gt;Sliding Window&lt;/strong&gt; architecture.&lt;/p&gt;

&lt;p&gt;As the camera captures frames, &lt;code&gt;MediaPipe&lt;/code&gt; extracts the keypoints and appends them to a &lt;code&gt;collections.deque&lt;/code&gt; (a highly efficient queue). We maintain a window of &lt;code&gt;W&lt;/code&gt; frames (e.g., 64 frames, representing about 2 seconds of video). &lt;/p&gt;

&lt;p&gt;Every &lt;code&gt;S&lt;/code&gt; frames (the "stride", e.g., 16 frames), we take the current window, convert it to a PyTorch tensor, and push it through our Transformer model. This means the model is constantly analyzing overlapping chunks of time, ensuring we never "cut off" a sign in the middle of an inference step.&lt;/p&gt;

&lt;h4&gt;
  
  
  Making Sense of the Output
&lt;/h4&gt;

&lt;p&gt;The Transformer outputs a probability distribution across our entire vocabulary for every frame in that 64-frame window. How do we turn that into words? &lt;/p&gt;

&lt;p&gt;In &lt;code&gt;models/gloss_decoder.py&lt;/code&gt;, we implement CTC decoding. We offer two strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Greedy Search (Default):&lt;/strong&gt; For every time step, simply pick the word with the highest probability. We then collapse consecutive duplicate words and remove the &lt;code&gt;&amp;lt;BLANK&amp;gt;&lt;/code&gt; tokens. It's incredibly fast and works well for clear, distinct signs.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Beam Search:&lt;/strong&gt; Instead of just looking at the top choice, Beam Search keeps track of the top &lt;code&gt;K&lt;/code&gt; (the beam width) most likely &lt;em&gt;paths&lt;/em&gt; through the probabilities. It's computationally heavier but significantly more accurate, especially when the model is slightly unsure.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Stage 4: The LLM Translation Layer
&lt;/h3&gt;

&lt;p&gt;At this point, our decoder might output a sequence of glosses like: &lt;code&gt;["STORE", "I", "GO"]&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;To a hearing person, this sounds broken. Sign languages have their own distinct grammar and syntax. To make the system truly accessible and natural, we must translate these literal glosses into fluent English: &lt;em&gt;"I am going to the store."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is where &lt;code&gt;models/gloss_to_text.py&lt;/code&gt; comes in. We treat the gloss-to-English translation as a standard NLP translation task, leveraging modern Large Language Models (LLMs).&lt;/p&gt;

&lt;h4&gt;
  
  
  The Fallback Chain
&lt;/h4&gt;

&lt;p&gt;Relying on a single cloud API in a real-time system is dangerous. If the API rate-limits you or goes down, the application breaks. To guarantee reliability, we built an intelligent &lt;strong&gt;fallback chain&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Primary:&lt;/strong&gt; Google Gemini (&lt;code&gt;gemini-3.1-flash-lite-preview&lt;/code&gt;). It is blazingly fast, highly accurate, and extremely cost-effective for this type of few-shot translation.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Fallback 1:&lt;/strong&gt; OpenAI (&lt;code&gt;gpt-5.4-mini&lt;/code&gt;). If Gemini times out or throws a server error, the system automatically routes the exact same prompt to OpenAI.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Fallback 2:&lt;/strong&gt; Anthropic (&lt;code&gt;claude-haiku-4-5-20251001&lt;/code&gt;). Our final safety net.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We use a carefully crafted system prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"You are a sign language interpreter. Convert the following sign language gloss sequence into a natural, fluent English sentence. Output only the sentence, nothing else. Preserve the original meaning exactly."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By using these ultra-fast, lightweight LLMs, the translation usually takes less than 500 milliseconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 5: Text-to-Speech (Without Freezing)
&lt;/h3&gt;

&lt;p&gt;The final step is to read the translated sentence aloud. If we simply called a Text-to-Speech (TTS) function in our main &lt;code&gt;while True&lt;/code&gt; webcam loop, the entire video feed would freeze while the computer spoke.&lt;/p&gt;

&lt;p&gt;To solve this, &lt;code&gt;inference/tts.py&lt;/code&gt; implements a multi-threaded, non-blocking audio engine. &lt;/p&gt;

&lt;p&gt;When the LLM returns a sentence, the main thread pushes that string into a thread-safe &lt;code&gt;queue.Queue&lt;/code&gt;. A dedicated background worker thread constantly watches this queue. When it sees new text, it synthesizes the audio and plays it. The main webcam loop never waits, meaning the video feed stays at a buttery smooth 30 FPS.&lt;/p&gt;

&lt;p&gt;We unified three different TTS backends behind a single interface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Edge TTS (Primary):&lt;/strong&gt; This utilizes Microsoft Edge's internal API to access incredibly high-quality, neural text-to-speech voices for free, without requiring an API key.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;pyttsx3:&lt;/strong&gt; A fully offline fallback that uses the host OS's native voices.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;ElevenLabs:&lt;/strong&gt; For users who want ultra-realistic, premium voices (requires an API key).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The User Experience
&lt;/h3&gt;

&lt;p&gt;We wrap all of this in a sleek, real-time OpenCV window (&lt;code&gt;utils/visualize.py&lt;/code&gt;). As the user signs, the MediaPipe skeleton is drawn on their body. A clean HUD overlays the screen, showing the current raw gloss predictions in gray, and the final translated English sentence in bright green just before the computer speaks it aloud.&lt;/p&gt;

&lt;p&gt;With the core pipeline running live, what happens if you want to run this in a remote village with no internet? Or what if you want to teach it a sign language it's never seen before? &lt;/p&gt;

&lt;p&gt;In the final installment, &lt;strong&gt;Part 4&lt;/strong&gt;, we will explore the advanced features of the codebase: offline translation models, custom sign recording tools, and exporting to ONNX for massive performance gains.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;uploaded through &lt;a href="https://distroblog.etornam.dev" rel="noopener noreferrer"&gt;Distroblog&lt;/a&gt; - a platform i created specifically to post to multiple blog sites at once😅&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>transformer</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>From Pixels to Predictions: Data Pipelines and Training the Sequence Model (Part 2)</title>
      <dc:creator>Bright Etornam Sunu</dc:creator>
      <pubDate>Fri, 17 Apr 2026 23:21:40 +0000</pubDate>
      <link>https://dev.to/_iametornam/from-pixels-to-predictions-data-pipelines-and-training-the-sequence-model-part-2-217d</link>
      <guid>https://dev.to/_iametornam/from-pixels-to-predictions-data-pipelines-and-training-the-sequence-model-part-2-217d</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs3b4hne62ykrpkt1mh9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs3b4hne62ykrpkt1mh9.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/_iametornam/bridging-the-silence-building-a-real-time-sign-language-translator-part-1-1b8l"&gt;Part 1&lt;/a&gt; of this series, we introduced the architecture of the &lt;strong&gt;asl-to-voice&lt;/strong&gt; translation system—a five-stage pipeline designed to turn real-time webcam video into spoken English. But a machine learning model is only as good as the data it learns from, and in the world of computer vision, raw video is often too noisy, heavy, and unstructured to be useful directly.&lt;/p&gt;

&lt;p&gt;In this article, we dive into the data layer: how we extract meaningful signals from raw video, normalize them for robust inference, and train our temporal sequence model.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Data Foundation: WLASL and Beyond
&lt;/h3&gt;

&lt;p&gt;To teach a neural network to understand sign language, we need massive amounts of annotated video. The project supports several public datasets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;WLASL (Word-Level American Sign Language):&lt;/strong&gt; Contains over 2,000 signs performed by over 100 signers. We use this as our primary baseline, often starting with a top-50 sign subset for rapid iteration.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;RWTH-PHOENIX-2014T:&lt;/strong&gt; A dataset of continuous German Sign Language with rich gloss annotations.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;How2Sign:&lt;/strong&gt; A large-scale, continuous ASL dataset.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We built custom scripts (like &lt;code&gt;scripts/download_wlasl.py&lt;/code&gt;) to scrape, organize, and format these datasets automatically, preparing them for the extraction phase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1: Keypoint Extraction with MediaPipe
&lt;/h3&gt;

&lt;p&gt;Passing raw RGB frames directly into a temporal model (like a 3D CNN or Vision Transformer) requires massive computational power—usually a high-end GPU. Because our goal is &lt;em&gt;real-time&lt;/em&gt; inference on consumer hardware, we take a different approach: &lt;strong&gt;Skeletonization&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Using Google's &lt;code&gt;MediaPipe Holistic&lt;/code&gt; framework, we process the video frame-by-frame, extracting the 3D coordinates (x, y, z) of specific landmarks on the human body.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;models/keypoint_extractor.py&lt;/code&gt;, we construct a dense feature vector for every frame:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Hands:&lt;/strong&gt; 21 landmarks per hand × 3 dimensions = 126 dims.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Pose (Body):&lt;/strong&gt; 33 landmarks × 4 dimensions (including visibility) = 132 dims.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Face:&lt;/strong&gt; The full face mesh is 468 points (1,404 dims), which is often overkill. We provide a configuration toggle to extract just the &lt;strong&gt;mouth subset&lt;/strong&gt; (~20 landmarks = ~60 dims). Mouth shapes are critical for non-manual markers in ASL.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By default, we compress millions of pixels into a highly informative &lt;strong&gt;1,662-dimensional vector&lt;/strong&gt; per frame (including the full face mesh).&lt;/p&gt;

&lt;h4&gt;
  
  
  The Secret Sauce: Normalization
&lt;/h4&gt;

&lt;p&gt;If the model trains on a person standing in the center of the frame, it will fail if the user stands in the bottom left corner. To solve this, we implemented &lt;strong&gt;shoulder-based normalization&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Before the keypoints are saved, we calculate the midpoint between the left and right shoulder landmarks (Pose points 11 and 12). We then translate all other keypoints so that this shoulder midpoint becomes the origin &lt;code&gt;(0,0,0)&lt;/code&gt;. This makes our data translation-invariant—the model only cares about how the hands and face move &lt;em&gt;relative to the body&lt;/em&gt;, not where the body is in the camera frame.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: The Temporal Sequence Model
&lt;/h3&gt;

&lt;p&gt;With our videos converted into sequences of normalized 1,662-dimensional vectors, we are ready to train. The core of this system is the &lt;strong&gt;Transformer Encoder&lt;/strong&gt; (defined in &lt;code&gt;models/sequence_model.py&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Why a Transformer? While Recurrent Neural Networks (like our BiLSTM baseline) are good at sequence data, Transformers excel at modeling long-range dependencies and parallelize beautifully on modern hardware.&lt;/p&gt;

&lt;p&gt;Our default architecture (configured via &lt;code&gt;config.yaml&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Input Projection:&lt;/strong&gt; A linear layer scales the 1,662-dim input up to the model's hidden dimension (e.g., 256).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Positional Encoding:&lt;/strong&gt; Standard sinusoidal encodings are injected so the self-attention mechanism understands the order of time.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Encoder Blocks:&lt;/strong&gt; 6 layers of multi-head self-attention (8 heads) allow the model to look at the entire sequence of keypoints and understand the context of the sign.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;CTC Head:&lt;/strong&gt; A final linear layer projects the hidden state to our vocabulary size, followed by a log-softmax activation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Training with CTC Loss
&lt;/h4&gt;

&lt;p&gt;In continuous sign language, we don't know exactly &lt;em&gt;when&lt;/em&gt; a sign starts and stops in the video. We just know the video contains the glosses &lt;code&gt;["HELLO", "WORLD"]&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;To solve this alignment problem, we train the network using &lt;strong&gt;Connectionist Temporal Classification (CTC) loss&lt;/strong&gt;. CTC allows the model to predict a sequence of tokens from an unsegmented input stream by introducing a special &lt;code&gt;&amp;lt;BLANK&amp;gt;&lt;/code&gt; token. The model learns to predict blanks during the transitions between signs, and spikes the probability of a specific sign when it recognizes it.&lt;/p&gt;

&lt;p&gt;Our training script (&lt;code&gt;training/train_sequence.py&lt;/code&gt;) utilizes PyTorch's native &lt;code&gt;nn.CTCLoss(zero_infinity=True)&lt;/code&gt;, paired with an Adam optimizer, learning rate schedulers (&lt;code&gt;ReduceLROnPlateau&lt;/code&gt;), and gradient clipping to stabilize the somewhat notoriously unstable CTC training process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Measuring Success
&lt;/h3&gt;

&lt;p&gt;During training, standard loss metrics aren't enough. We evaluate our models using &lt;strong&gt;Word Error Rate (WER)&lt;/strong&gt; via the &lt;code&gt;jiwer&lt;/code&gt; library. WER measures how many insertions, deletions, and substitutions are required to turn the predicted gloss sequence into the ground truth sequence. The lower the WER, the better the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;Now we have a trained Transformer model capable of taking a sequence of keypoints and spitting out a sequence of gloss probabilities. But how do we do this live, on a webcam, without knowing when the user starts or stops signing? &lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;Part 3&lt;/strong&gt;, we will explore the real-time inference loop, the magic of sliding windows, and how we translate robotic glosses into beautiful, spoken English.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;uploaded through &lt;a href="https://distroblog.etornam.dev" rel="noopener noreferrer"&gt;Distroblog&lt;/a&gt; - a platform i created specifically to post to multiple blog sites at once😅&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>asl</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Bridging the Silence: Building a Real-Time Sign Language Translator (Part 1)</title>
      <dc:creator>Bright Etornam Sunu</dc:creator>
      <pubDate>Fri, 10 Apr 2026 11:55:49 +0000</pubDate>
      <link>https://dev.to/_iametornam/bridging-the-silence-building-a-real-time-sign-language-translator-part-1-1b8l</link>
      <guid>https://dev.to/_iametornam/bridging-the-silence-building-a-real-time-sign-language-translator-part-1-1b8l</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs3b4hne62ykrpkt1mh9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs3b4hne62ykrpkt1mh9.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;br&gt;
Communication is the cornerstone of human connection, yet for the deaf and hard-of-hearing communities, a significant barrier exists when interacting with those who don't understand sign language. While text-to-speech and speech-to-text technologies have advanced rapidly, translating visual, spatial languages like American Sign Language (ASL) into spoken word in real-time has remained a formidable challenge.&lt;/p&gt;

&lt;p&gt;Enter the &lt;strong&gt;asl-to-voice&lt;/strong&gt; project. &lt;/p&gt;

&lt;p&gt;In this four-part technical series, we will take a deep dive into how we built an end-to-end, continuous sign language recognition (CSLR) pipeline. This system doesn't just recognize isolated gestures; it watches a person signing via a standard webcam, understands the continuous flow of movements, translates those signs into fluent English, and speaks the translation aloud—all in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Complexity of Sign Language
&lt;/h3&gt;

&lt;p&gt;Before diving into the architecture, it's crucial to understand why this is hard. Sign language isn't just about hand shapes. It involves complex grammar, facial expressions, body posture, and non-manual markers. Furthermore, in natural signing, there are no clear "spaces" between words like there are in written text. This is known as continuous signing.&lt;/p&gt;

&lt;p&gt;Traditional computer vision approaches often fail here because they treat signs as static images or isolated video clips. To truly translate sign language, a system must understand spatio-temporal data (space and time) simultaneously. Furthermore, sign language has its own syntax, often represented as "glosses" (capitalized literal representations of signs, like &lt;code&gt;YOU NAME WHAT&lt;/code&gt;), which don't map one-to-one to English grammar.&lt;/p&gt;

&lt;h3&gt;
  
  
  Our Solution: The 5-Stage Pipeline
&lt;/h3&gt;

&lt;p&gt;To tackle these challenges, the &lt;code&gt;asl-to-voice&lt;/code&gt; codebase is structured around a highly modular, five-stage pipeline. By breaking the problem down, we can optimize each component independently.&lt;/p&gt;

&lt;h4&gt;
  
  
  Stage 1: Keypoint Extraction
&lt;/h4&gt;

&lt;p&gt;Processing raw video frames through a heavy neural network is computationally expensive and slow. Instead, we use Google's &lt;strong&gt;MediaPipe Holistic&lt;/strong&gt; to extract 2D and 3D landmarks from the signer's hands, body (pose), and face. This dramatically reduces the dimensionality of our data from millions of pixels down to a dense feature vector (up to 1,662 dimensions) that describes exactly how the body is moving.&lt;/p&gt;

&lt;h4&gt;
  
  
  Stage 2: Temporal Sequence Modeling
&lt;/h4&gt;

&lt;p&gt;With our stream of keypoints, we need a model that understands time. We implemented a &lt;strong&gt;Transformer encoder&lt;/strong&gt; (with a BiLSTM available as a baseline). The Transformer looks at a sliding window of keypoint frames and learns the temporal relationships between them. Because we are dealing with continuous streams without explicit word boundaries, the model is trained using Connectionist Temporal Classification (CTC) loss.&lt;/p&gt;

&lt;h4&gt;
  
  
  Stage 3: Gloss Decoding
&lt;/h4&gt;

&lt;p&gt;The output of the Transformer is a probability distribution over our vocabulary of signs. The CTC decoder (using either a fast greedy search or a more accurate beam search) collapses these probabilities into a discrete sequence of glosses.&lt;/p&gt;

&lt;h4&gt;
  
  
  Stage 4: Gloss to Natural Language
&lt;/h4&gt;

&lt;p&gt;If the system outputs &lt;code&gt;["STORE", "I", "GO"]&lt;/code&gt;, the user experience is poor. Sign language glosses need to be translated into grammatically correct spoken language. To achieve this, we route the gloss sequence through a Large Language Model (LLM). Our system uses a resilient fallback chain: it tries Google Gemini first, falls back to OpenAI if it fails, and then to Anthropic. &lt;/p&gt;

&lt;h4&gt;
  
  
  Stage 5: Text-to-Speech (TTS)
&lt;/h4&gt;

&lt;p&gt;Finally, the natural English sentence (e.g., "I am going to the store.") is sent to a TTS engine. We rely primarily on Microsoft Edge TTS for high-quality, neural voice generation. Crucially, this runs in a background thread so the video feed and inference loop never freeze while the computer is speaking.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Look at the Tech Stack
&lt;/h3&gt;

&lt;p&gt;The beauty of this project lies in how these diverse technologies are stitched together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Computer Vision:&lt;/strong&gt; &lt;code&gt;mediapipe&lt;/code&gt;, &lt;code&gt;opencv-python&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Deep Learning:&lt;/strong&gt; &lt;code&gt;torch&lt;/code&gt;, &lt;code&gt;transformers&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Metrics:&lt;/strong&gt; &lt;code&gt;jiwer&lt;/code&gt; (for Word Error Rate), &lt;code&gt;sacrebleu&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;APIs:&lt;/strong&gt; &lt;code&gt;google-generativeai&lt;/code&gt;, &lt;code&gt;openai&lt;/code&gt;, &lt;code&gt;anthropic&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Audio:&lt;/strong&gt; &lt;code&gt;edge-tts&lt;/code&gt;, &lt;code&gt;pyttsx3&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything in the system is entirely configuration-driven. A single &lt;code&gt;config.yaml&lt;/code&gt; file controls model architectures, feature subsets, sliding window sizes, and API fallback chains, making it incredibly easy to experiment.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Next?
&lt;/h3&gt;

&lt;p&gt;In &lt;strong&gt;Part 2&lt;/strong&gt;, we will roll up our sleeves and look at the data. We will explore how we process datasets like WLASL, how we normalize keypoints so the model works regardless of where you stand in the camera frame, and how the Transformer model is actually trained to understand sign language. &lt;/p&gt;

&lt;p&gt;Stay tuned as we move from concept to code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;uploaded through &lt;a href="https://distroblog.etornam.dev" rel="noopener noreferrer"&gt;Distroblog&lt;/a&gt; - a platform i created specifically to post to multiple blog sites at once😅&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>a11y</category>
      <category>deeplearning</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>Deploy your Nodejs + Auth0 REST API to Cyclic.sh under 4 minutes</title>
      <dc:creator>Bright Etornam Sunu</dc:creator>
      <pubDate>Thu, 16 Dec 2021 06:15:45 +0000</pubDate>
      <link>https://dev.to/_iametornam/deploy-your-nodejs-auth0-rest-api-to-cyclicsh-under-4-minutes-j8h</link>
      <guid>https://dev.to/_iametornam/deploy-your-nodejs-auth0-rest-api-to-cyclicsh-under-4-minutes-j8h</guid>
      <description>&lt;p&gt;Deploying APIs can sometimes be a pain in the butt when your service provider overcomplicates the deployment and setup process. This short article will demo how to deploy your Restful Nodejs application to Cyclic.sh in less than 4 minutes.&lt;/p&gt;

&lt;p&gt;Yes!, you heard right, less than 4 minutes🔥😱😱.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fju588ozkufwtgfmrkkwc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fju588ozkufwtgfmrkkwc.png" alt="from cyclic.sh homepage" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cyclic is a provider that helps you Launch your API in seconds. Push your code to Github and let the CI/CD (continuous integration/continuous delivery) integration trigger and deploy your service onto a global infrastructure in seconds. No cryptic CloudFormation errors. No mysterious API Gateway errors. No YAML parse errors. No hunting for CloudWatch log groups. No wasted time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important!&lt;/strong&gt;&lt;br&gt;
I already have my Nodejs Auth0 backend done.&lt;/p&gt;

&lt;p&gt;To follow along with this project, clone the repo from &lt;a href="https://github.com/RegNex/nodejs-auth0" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployment Demo&lt;/strong&gt;&lt;br&gt;
To deploy your codebase, follow the following steps:&lt;br&gt;
The first thing you must do is create a repository on github.com for your project and push your code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l6gxq0nzuffpztuh6vn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l6gxq0nzuffpztuh6vn.png" alt="Github repo create page" width="800" height="746"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, signup to &lt;a href="https://cyclic.sh" rel="noopener noreferrer"&gt;Cyclic.sh&lt;/a&gt;. The signup process is seamless, and all you need is to signup using your Github account.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9no1gx3n3rxpblrvvjr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9no1gx3n3rxpblrvvjr.png" alt="cyclic.sh signup page" width="800" height="945"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After successful signup, you will see a dashboard; where all the magic happens. You can locate the docs at the top right corner, just right before the profile.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfnh1tni5f6l2q58lcm9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfnh1tni5f6l2q58lcm9.png" alt="locate docs at the top right corner" width="800" height="621"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Now you need to deploy your code. Click the "deploy" button (green button) and select the "Link your own" tab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshygfa923tsbuioxb0gk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshygfa923tsbuioxb0gk.png" alt="deploy button" width="800" height="621"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8kbcfz87vlctfur7cow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8kbcfz87vlctfur7cow.png" alt="link your code from Github" width="800" height="453"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Search for the repo you want to deploy, in your case "nodejs-auth0," and select it and connect it to your Github account.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fznur4kmxjgybngrlyi19.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fznur4kmxjgybngrlyi19.png" alt=" " width="800" height="369"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;At the prompt, you need to confirm your Github access, and after confirming, all you have to do is to approve and install, and that's it 🎉&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpu5c2blnvkg5hbaznjlg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpu5c2blnvkg5hbaznjlg.png" alt=" " width="800" height="400"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Once you approve and install, the deployment process will start. 2–3mins should do it 🎊🎉🎊🎉🎊🎉&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6mlj6q01uhd7u24g6o6r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6mlj6q01uhd7u24g6o6r.png" alt=" " width="800" height="491"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;The final step is to set your environment variables on the dashboard. The dashboard for your project looks like this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a23mtm2nt5unyp1gohk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a23mtm2nt5unyp1gohk.png" alt=" " width="800" height="336"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;This is a ".env" file; you can also include those configurations on the dashboard by clicking on "Variables." After this configuration, everything should be up and running 🔥&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpk47xpnf1rd6oflgsuc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpk47xpnf1rd6oflgsuc.png" alt=" " width="800" height="499"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deploying new changes&lt;/strong&gt;&lt;br&gt;
After all the setups and configurations, to deploy new changes, push your code to Github, and Github actions will do the rest 😀😉&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Deploying a RESTful API shouldn't be hectic and cyclic.sh has made sure deploying your backend code to the cloud is a simple as possible.&lt;/p&gt;

&lt;p&gt;If you find any difficulty in the deployment process, you can reach out to the &lt;a href="https://cyclic.sh" rel="noopener noreferrer"&gt;cyclic.sh&lt;/a&gt; team on &lt;a href="https://discord.com/invite/huhcqxXCbE" rel="noopener noreferrer"&gt;discord&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Do well to follow me on &lt;a href="https://twitter.com/_iamEtornam" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/etornam-sunu/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; to connect.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;originally publish on &lt;a href="https://etornam-sunu.medium.com" rel="noopener noreferrer"&gt;medium.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>node</category>
      <category>cyclic</category>
      <category>auth0</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to limit window resize in Flutter Desktop</title>
      <dc:creator>Bright Etornam Sunu</dc:creator>
      <pubDate>Mon, 26 Oct 2020 09:27:34 +0000</pubDate>
      <link>https://dev.to/_iametornam/how-to-limit-window-resize-in-flutter-desktop-40d8</link>
      <guid>https://dev.to/_iametornam/how-to-limit-window-resize-in-flutter-desktop-40d8</guid>
      <description>&lt;p&gt;Hi there!&lt;br&gt;
It has been such a looooong time since I wrote an article or even created a youtube ☹️ tutorial but hey, there’s good news 🎤, I’M BACK 🔥🔥🔥.&lt;/p&gt;

&lt;p&gt;In the past few weeks, I have been working on a Desktop app using Flutter (you know one of those side projects you start then drop it then pick it up again…? yeaaaah!) and I had problems with the window screen resizing.&lt;br&gt;
At some point, you can resize it such that nothing else displays which is an undesirable feature. Hence I started doing some research on how to limit the resize of the window.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fos2bkx200u96ju9pktsb.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fos2bkx200u96ju9pktsb.gif" alt="Alt Text" width="600" height="400"&gt;&lt;/a&gt;&lt;br&gt;
After talking to a number of developers and also reading some articles, this effect can “only” be done natively. This requires editing some files for the MacOS, Windows and Linux file folder. This will be a bit stressful especially if the developer is not accustomed to these platforms, I did a deeper research by looking at the source code of existing Flutter desktop applications (FVM GUI) and discovered a google package that does it all for you 🔥🔥🔥 (flutter desktop embedding). Here is a simple example on how to use limit the window size of a desktop app:&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;After applying the above code example, it was all fixed!!! Whew!!!!..&lt;br&gt;
And now, I have a functional and beautiful page.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6pgaa6vijtzl2saf7l5v.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6pgaa6vijtzl2saf7l5v.gif" alt="Alt Text" width="600" height="400"&gt;&lt;/a&gt;&lt;br&gt;
And that’s all for now. I will share the progress of the desktop application I’m developing in due time ⏰. And if you need any clarifications on this topic, do well to reach out to me on twitter &lt;a href="https://twitter.com/_iamEtornam" rel="noopener noreferrer"&gt;@_iamEtornam&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Akpẽ Kaka (Thank you in EʋE)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;originally posted on &lt;a href="https://etornam-sunu.medium.com/how-to-limit-window-resize-in-flutter-desktop-47d6a495822e" rel="noopener noreferrer"&gt;medium.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>flutter</category>
      <category>desktop</category>
      <category>flutterdesktop</category>
    </item>
    <item>
      <title>UI Challenge 2020 (part 1)</title>
      <dc:creator>Bright Etornam Sunu</dc:creator>
      <pubDate>Thu, 02 Jan 2020 06:50:49 +0000</pubDate>
      <link>https://dev.to/_iametornam/ui-challenge-2020-part-1-371a</link>
      <guid>https://dev.to/_iametornam/ui-challenge-2020-part-1-371a</guid>
      <description>&lt;p&gt;As 2020 starts, i'm trying to sharpen my UI skills in flutter and also digging deep into Animation and Native Dart features. I started the year by implementing an Awesome mockup i saw on dribble.com by Mickael Guillaume. The first Page implemented and second page to go!&lt;br&gt;
source code here: &lt;a href="https://github.com/RegNex/InstaPic" rel="noopener noreferrer"&gt;https://github.com/RegNex/InstaPic&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Don't forget to give it a 🌟&lt;br&gt;
🎊Happy New Year 🎉&lt;/p&gt;

</description>
      <category>ui</category>
      <category>flutter</category>
      <category>design</category>
      <category>dart</category>
    </item>
  </channel>
</rss>
