<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ifeanyi Idiaye</title>
    <description>The latest articles on DEV Community by Ifeanyi Idiaye (@ifeanyi_idiaye_3f6d81ed8a).</description>
    <link>https://dev.to/ifeanyi_idiaye_3f6d81ed8a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2032490%2Fa5e1dd44-589a-4c2f-8c55-acfb876a7e6c.jpg</url>
      <title>DEV Community: Ifeanyi Idiaye</title>
      <link>https://dev.to/ifeanyi_idiaye_3f6d81ed8a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ifeanyi_idiaye_3f6d81ed8a"/>
    <language>en</language>
    <item>
      <title>SynthScope: Search, Visualize, Listen to Information</title>
      <dc:creator>Ifeanyi Idiaye</dc:creator>
      <pubDate>Wed, 18 Jun 2025 19:26:33 +0000</pubDate>
      <link>https://dev.to/ifeanyi_idiaye_3f6d81ed8a/synthscope-search-visualize-listen-to-information-2men</link>
      <guid>https://dev.to/ifeanyi_idiaye_3f6d81ed8a/synthscope-search-visualize-listen-to-information-2men</guid>
      <description>&lt;p&gt;In this post, I will introduce you to SynthScope, one of my latest Google Gemini-based projects that enables a user to search the web and return search results as text, image, and audio simultaneously.&lt;/p&gt;

&lt;p&gt;This post will give a high-level overview of the application. It will not discuss code implementation; just how to use the application for your daily information needs. Links to the application and codebase on GitHub are shared in this post.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is SynthScope?&lt;/strong&gt;&lt;br&gt;
SynthScope is an LLM-powered tool that can be used to retrieve information from the web. Web search results powered by Google Search are returned as text and audio, and also converted into an image generation prompt, which is used to imagine the search result. You can also set SynthScope to translate the generated text and audio into any of 15 supported languages besides English, including Tamil, Thai, Japanese, and Arabic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features of SynthScope&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text generation:&lt;/strong&gt; Displays the search result in the preferred language text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image generation:&lt;/strong&gt; Displays the search result in the preferred image style out of 11 different styles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio generation:&lt;/strong&gt; Speech capability reads out the search result.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language translation:&lt;/strong&gt; Select the preferred language for the text and audio output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How to Use SynthScope&lt;/strong&gt;&lt;br&gt;
Using SynthScope is very easy. Simply type in your search query, select the image style in which you want SynthScope to imagine the search result, select the preferred language from the language dropdown menu, and select the preferred voice of the reader from the voice dropdown menu. &lt;/p&gt;

&lt;p&gt;Here is a diagram summary of how to use SynthScope:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91tiqbgvzdvmmfbauzlo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91tiqbgvzdvmmfbauzlo.png" alt="SynthScope user flowchart" width="800" height="529"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With SynthScope, you can search for current information on the web and have it read out to you in your preferred language instead of scrolling to read text. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Technologies Built SynthScope?&lt;/strong&gt;&lt;br&gt;
Here are the technologies that were used to build SynthScope:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python for writing the application logic.&lt;/li&gt;
&lt;li&gt;Google &lt;a href="https://ai.google.dev/gemini-api/docs" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; family of models for text generation, image generation, and text-to-speech (TTS).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.gradio.app/" rel="noopener noreferrer"&gt;Gradio&lt;/a&gt; for frontend development.&lt;/li&gt;
&lt;li&gt;CSS for styling the frontend of the Gradio application.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; for deploying the application.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How to Access SynthScope&lt;/strong&gt;&lt;br&gt;
SynthScope is currently deployed on Hugging Face as a space. You can access it &lt;a href="https://huggingface.co/spaces/Ifeanyi/SynthScope" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Also, SynthScope is an open-source project, and that means that you can take a look at the code behind the application and even make contributions. You can access the code on &lt;a href="https://github.com/Ifeanyi55/SynthScope" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I would appreciate your supporting the project with a Hugging Face like and a GitHub star, if possible :).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitation of Using SynthScope&lt;/strong&gt;&lt;br&gt;
The principal limitation of using SynthScope is that it is subject to the rate limits imposed on Google's Gemini models' free tier API. Here are the rate limits:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Text generation:&lt;/strong&gt; Limited to 1500 requests per day&lt;br&gt;
&lt;strong&gt;Image generation:&lt;/strong&gt; Limited to 100 requests per day&lt;br&gt;
&lt;strong&gt;Audio generation:&lt;/strong&gt; Limited to 15 requests per day&lt;/p&gt;

&lt;p&gt;Therefore, you may try to use SynthScope at a time when the daily quota for any of the above functionalities has been exhausted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
SynthScope is a creative way to search the internet for information. It is designed to be user-friendly and language dynamic, enabling users to read, visualize, and listen to information.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>gradio</category>
      <category>huggingface</category>
    </item>
    <item>
      <title>Build A Real-Time Voice Assistant with Mistral AI and FastRTC</title>
      <dc:creator>Ifeanyi Idiaye</dc:creator>
      <pubDate>Sun, 09 Mar 2025 18:29:09 +0000</pubDate>
      <link>https://dev.to/ifeanyi_idiaye_3f6d81ed8a/build-a-real-time-voice-assistant-with-mistral-ai-and-fastrtc-p9b</link>
      <guid>https://dev.to/ifeanyi_idiaye_3f6d81ed8a/build-a-real-time-voice-assistant-with-mistral-ai-and-fastrtc-p9b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14vkzvxr0qtolf57bqm5.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14vkzvxr0qtolf57bqm5.jpeg" alt="A robot assistant sitting at a desk" width="800" height="1028"&gt;&lt;/a&gt;&lt;br&gt;
In this post, I will show you how to build a real-time voice assistant with Mistral AI and FastRTC. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://mistral.ai/en" rel="noopener noreferrer"&gt;Mistral AI&lt;/a&gt; is one of the leading LLM providers out there, and they have made their LLM API easily accessible to developers. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://fastrtc.org/" rel="noopener noreferrer"&gt;FastRTC&lt;/a&gt;, on the other hand, is a real-time communication library for Python that enables you to quickly turn any Python function into real-time audio and video stream over WebRTC or WebSockets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building A Real-time Voice Assistant&lt;/strong&gt;&lt;br&gt;
First, let's install the required libraries by running the code below in your terminal&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install mistalai fastrtc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, set an environment variable and import the libraries. Create a &lt;code&gt;.env&lt;/code&gt; file in your project and save your Mistral API key there&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MISTRAL_API_KEY = "&amp;lt;your-api-key&amp;gt;"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Import the libraries&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from mistralai import Mistral
from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model)
from dotenv import load_dotenv
import os

load_dotenv()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To get your Mistral API key, you will need to create an account on their &lt;a href="https://console.mistral.ai/home" rel="noopener noreferrer"&gt;website&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the above, we have imported Mistral and the specific methods that we need from FastRTC namely &lt;code&gt;ReplyOnPause(), Stream(), get_stt_model(), and get_tts_model()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ReplyOnPause()&lt;/code&gt;: This method takes a Python audio function. It monitors the audio, and when it detects a pause, it takes it as a cue to give a reply.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Stream()&lt;/code&gt;: This method streams the audio reply.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;get_stt_model()&lt;/code&gt;: This is used to access the speech-to-text model that is used to convert audio to text.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;get_tts_model()&lt;/code&gt;: This is used to access the text-to-speech model that is used to convert text back into audio.&lt;/p&gt;

&lt;p&gt;Now, let's activate the Mistral client with our API key stored in the &lt;code&gt;.env&lt;/code&gt; file&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-large-latest"

client = Mistral(api_key=api_key)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we are using Mistral large model. However, you can try out other Mistral models too.&lt;/p&gt;

&lt;p&gt;Also, in actuality, you can plug any LLM into FastRTC and get real-time voice responses.&lt;/p&gt;

&lt;p&gt;We will now build the audio function that will take a prompt and return a response&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stt_model = get_stt_model()
tts_model = get_tts_model()


def echo(audio):
    prompt = stt_model.stt(audio)
    chat_response = client.chat.complete(
    model = model,
    messages = [
        {
            "role": "user",
            "content": f"{prompt}"
        },
      ]
    )

    for audio_chunk in tts_model.stream_tts_sync(chat_response.choices[0].message.content):
        yield audio_chunk

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Above, we wrote a function called &lt;code&gt;echo&lt;/code&gt;, and the function takes an audio input, then passes that to the speech-to-text method, which is converted to a user prompt and given to the LLM. The response from the LLM is then passed to a text-to-speech method and is streamed synchronously.&lt;/p&gt;

&lt;p&gt;Finally, we will run the application&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")
stream.ui.launch() 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will launch the UI below at the URL: &lt;code&gt;http://127.0.0.1:7860/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe24qe9c2353mlxc8pjne.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe24qe9c2353mlxc8pjne.png" alt="FastRTC UI" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, you can activate the microphone and say something to your assistant who will give you a reply immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Change Voice&lt;/strong&gt;&lt;br&gt;
If you do not like the default voice, you can change that by passing an instance of &lt;code&gt;KokoroTTSOptions()&lt;/code&gt; to the text-to-speech method. &lt;/p&gt;

&lt;p&gt;First import &lt;code&gt;KokoroTTSOptions()&lt;/code&gt; from FastRTC by adding it to the import tuple&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model, KokoroTTSOptions)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, define the options&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tts_model = get_tts_model(model="kokoro")

options = KokoroTTSOptions(
    voice="af_bella",
    speed=1.0,
    lang="en-us"
)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then pass the options to the text-to-speech method in your audio function&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for audio_chunk in tts_model.stream_tts_sync(chat_response.choices[0].message.content, options = options)
        yield audio_chunk

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more voice options, you can check out &lt;a href="https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md" rel="noopener noreferrer"&gt;KokoroTTS documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complete Project Code&lt;/strong&gt; &lt;br&gt;
Here is the complete code that we have used to create the real-time voice assistant&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model, KokoroTTSOptions)
from dotenv import load_dotenv
from mistralai import Mistral

load_dotenv()

api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-large-latest"

client = Mistral(api_key=api_key)

options = KokoroTTSOptions(
    voice="af_bella",
    speed=1.0,
    lang="en-us"
)

stt_model = get_stt_model()
tts_model = get_tts_model(model="kokoro")

def echo(audio):
    prompt = stt_model.stt(audio)
    chat_response = client.chat.complete(
    model = model,
    messages = [

        {
            "role": "user",
            "content": f"{prompt}"
        },
      ]
    )

    for audio_chunk in tts_model.stream_tts_sync(chat_response.choices[0].message.content, options=options):
        yield audio_chunk

stream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")
stream.ui.launch()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, instead of typing a prompt, you can give voice commands to an LLM and have it speak its response, just like a natural human conversation.&lt;/p&gt;

&lt;p&gt;I hope you found this post useful. If you did, please share it with others it might benefit too. Thanks for reading!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Run DeepSeek R-1 Thinking Model in Kaggle Notebook Using Ollama</title>
      <dc:creator>Ifeanyi Idiaye</dc:creator>
      <pubDate>Tue, 04 Feb 2025 16:41:15 +0000</pubDate>
      <link>https://dev.to/ifeanyi_idiaye_3f6d81ed8a/run-deepseek-r-1-thinking-model-in-kaggle-notebook-using-ollama-5195</link>
      <guid>https://dev.to/ifeanyi_idiaye_3f6d81ed8a/run-deepseek-r-1-thinking-model-in-kaggle-notebook-using-ollama-5195</guid>
      <description>&lt;p&gt;With DeepSeek being the rave of the moment right now in the world of AI, it is no wonder that every developer wants to explore the power and capabilities of their models. However, a limiting factor remains the fact that powerful large language models like DeepSeek's models require heavy compute power (GPUs) in order to run them locally.&lt;/p&gt;

&lt;p&gt;Thankfully, cloud providers like &lt;a href="https://www.kaggle.com/" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt; offer free GPUs at a limited runtime per week to developers so that they can run or train compute-intensive open-source models. &lt;/p&gt;

&lt;p&gt;Now let's see the simple steps to run DeepSeek R-1 reasoning model in a Kaggle notebook with the aid of &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP 1:&lt;/strong&gt; Create a new Kaggle notebook&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP 2:&lt;/strong&gt; Go to &lt;strong&gt;Setting&lt;/strong&gt; &amp;gt; &lt;strong&gt;Accelerator&lt;/strong&gt; and select &lt;strong&gt;GPU T4 x2&lt;/strong&gt;. This is a powerful-enough machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP 3:&lt;/strong&gt; Install Ollama in your notebook&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!curl -fsSL https://ollama.com/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This downloads and executes the installation script for Ollama.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP 4:&lt;/strong&gt; Install Ollama Python SDK&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pip install ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;STEP 5:&lt;/strong&gt; Start Ollama server as a background process&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import subprocess
process = subprocess.Popen("ollama serve", shell=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;STEP 6:&lt;/strong&gt; Pull DeepSeek R-1 model from Ollama model hub&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!ollama run deepseek-r1:1.5b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will download DeepSeek R-1 1.5 billion parameter model&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP 7:&lt;/strong&gt; Chat with the model&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import ollama

result = ollama.chat(
    model='deepseek-r1:1.5b',
    messages=[{
        'role': 'user',
        'content': 'What are the steps involved in baking a chocolate cake?'
    }],
)

print(result.message.content)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By following the above steps, you can easily run open-source models in a Kaggle notebook, including DeepSeek models, using Ollama.&lt;/p&gt;

&lt;p&gt;Here is an existing notebook that implements all the above-mentioned steps: &lt;a href="https://github.com/Ifeanyi55/OllamaRun/blob/main/ollamarun.ipynb" rel="noopener noreferrer"&gt;https://github.com/Ifeanyi55/OllamaRun/blob/main/ollamarun.ipynb&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ollama</category>
      <category>opensource</category>
      <category>deepseek</category>
      <category>llm</category>
    </item>
    <item>
      <title>Transcription &amp; Translation App Powered by Assembly AI &amp; Google Gemini</title>
      <dc:creator>Ifeanyi Idiaye</dc:creator>
      <pubDate>Tue, 19 Nov 2024 19:51:54 +0000</pubDate>
      <link>https://dev.to/ifeanyi_idiaye_3f6d81ed8a/transcription-translation-app-powered-by-assembly-ai-google-gemini-igg</link>
      <guid>https://dev.to/ifeanyi_idiaye_3f6d81ed8a/transcription-translation-app-powered-by-assembly-ai-google-gemini-igg</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/assemblyai"&gt;AssemblyAI Challenge &lt;/a&gt;: Sophisticated Speech-to-Text.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built a web application that captures live audio recording, via a web microphone; transcribes the recording, and then translates the transcript into any of 15 languages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://transcribe-and-translate.netlify.app/" rel="noopener noreferrer"&gt;https://transcribe-and-translate.netlify.app/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvtoug9fip7m1jbtnfy88.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvtoug9fip7m1jbtnfy88.png" alt="AudioTranscriber" width="800" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Journey
&lt;/h2&gt;

&lt;p&gt;I used AssemblyAI's Universal-2 Speech-to-Text model's api to transcribe the audio recording. I got the API key from my AssemblyAI account dashboard. I built an audio transcriber function, which takes an audio file and passes that to AssemblyAI's transcriber function (&lt;code&gt;aai.Transcriber()&lt;/code&gt;), which turns the speech into text.&lt;/p&gt;

&lt;p&gt;Along with the audio transcription, I also implemented a translation feature using Google's Gemini 1.5 pro 002 model. This feature leverages the multi-modal capability of Google Gemini models to translate the audio transcript into any of 15 languages, including Spanish, Hindi, Yoruba, and Dutch.&lt;/p&gt;

&lt;p&gt;You can find all the code on github: &lt;a href="https://github.com/Ifeanyi55/Transcribe-and-Translate" rel="noopener noreferrer"&gt;https://github.com/Ifeanyi55/Transcribe-and-Translate&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>assemblyaichallenge</category>
      <category>ai</category>
      <category>api</category>
    </item>
  </channel>
</rss>
