DEV Community: Deepgram

Introducing Deepgram Starter Apps

@lukeocodes 🕹👨‍💻 — Tue, 18 Apr 2023 11:15:06 +0000

Deepgram aims to make world-class language AI available to any developer through just an API call. With today’s launch of Deepgram Starter Apps, it’s even easier to start building applications which take real-time and prerecorded audio data and transform them into transcripts enriched with natural language understanding metadata.

Put simply, we help developers build quickly because we’ve already taken care of integrating Deepgram into Starter Apps. Whether you've been coding for decades or just passed CS50, Deepgram's Starter apps provide all users with a seamless and efficient onboarding experience.

JavaScript Starter

The JavaScript Starter is the first arrival from the collection and provides developers with a pre-built Deepgram integration on Node.js, with a React frontend, to quickly get started with the Deepgram platform. With the JavaScript Starter, all developers can rapidly explore what Deepgram's platform can do.

Check out our JavaScript Starter Apps on Github.

Python Starter

The Python Starter is our follow up to the collection and provides developers with a pre-built Flask integration with Deepgram. Presently, there is a static frontend to interact with the Flask server.

Check out our Python Starter Apps on Github.

New transcription models included

Deepgram introduces two new models this week. These are available to try out immediately in our new Starter applications.

Deepgram Nova

On April 13th, 2023, we announced Deepgram Nova, a cutting-edge Automatic Speech Recognition (ASR) system. Deepgram Nova achieves unprecedented performance, beating competitors in speed, accuracy, and efficiency.

Whisper Cloud

Alongside Deepgram Nova’s release, we announced Deepgram Whisper Cloud. Following last month's release of OpenAI's Whisper API, we noticed its popularity, despite its limitations. We've developed our own fully managed Whisper API to address those limitations.

Read more about our new opensource projects in our latest post, Introducing Deepgram Starter Apps.

How to Add Speech AI Into Your Next.js App

BekahHW — Thu, 16 Feb 2023 18:15:00 +0000

I am a podcast addict. I admit it. From the moment I wake up, I have my headphones in my ears. But there are some times when it would be frowned upon to be listening to a podcast while in a room full of others. In some of those cases, it would not be frowned upon to look at my phone. So as a compromise, I could grab the transcript of my favorite podcast and read it while in those situations. I know I’m not the only one who’s done that. From the desire to get away with podcast listening, I put together a basic Next.js and Deepgram web app that allows the user to submit a link to audio, which is then transcribed using Deepgram, and output in the browser. If you too want to get away with reading your podcast, check out the tutorial below.

Getting Started with Repl.it, Next, and Deepgram

Prerequisites

Understanding of JavaScript and React
Familiarity of hooks
Understanding of HTML and CSS For this project, I worked with Repl.it, an instant IDE that runs in the browser. You can find the final project on my Next + DG Repl and this tutorial will explain how to utilize Repl to create your project.

To complete this project, you’ll need:

A Repl.it account
A Deepgram API Key - get it here

Getting Started with Next.js

Create a new Repl using the Next.js template. This will give you the basic Next file structure and provide access to the Next.js built-in features. If you’d like to learn more about Next.js, check out the Foundations section of their site.

For this project, we’ll be working in the pages folder in the index.tsx file and within the api folder.

We’re going to keep it simple and add our new code to the existing template. You’ll notice that I’ve made some updates in my Repl to link to Deepgram’s documentation, blog, and community, but our code to transcribe the audio will go above that.
Let’s get started. In the index.tsx file, we’ll need to create an input for the audio file link, add some useState hooks to handle the submission, transcription, and formatting, and we’ll need a transcribe function. In the api folder, we’ll need to add the server side code to handle the request to Deepgram. Lastly, we need to utalize Repl’s Secrets feature to handle our Deepgram API key.

The code you posted is a React.js application that uses the Deepgram API to transcribe audio files. The user inputs a link to the audio file in a form, and then the app sends a request to the API to transcribe the file. The transcript is then displayed on the page with a useEffect hook that splits the transcript into individual lines and maps over them, rendering them in separate paragraphs.

Let’s get started with the front-end code in the index.tsx file. At the top of the file, import the required dependencies that aren’t included: React's useState and useEffect hooks.
Before the resources, add a form input for the audio file.

<form >
          <label htmlFor="audio-file">Link to Audio </label>
          <input onChange={e => setFile(e.target.value)} type="text" id="audio-file" name="audio-file" required />
          <button type='button' onClick={transcribe} className={styles.button}>Transcribe</button>
        </form>

To get this to work, we need to add a hook for setFile. On line 8, add const [file, setFile] = useState(' ');
This will allow our application to keep track of the url to the audio file. Now we need to use that audio file and transcribe it. Grab your API key, and add that into your Repl Secrets.

We’re going to do some going back and forth between files here. Let’s start by creating a new file in our api folder called transcribe.tsx. This will be the path to deal with our server side logic and to utilize our API key. According to the Next.js documentation on API routes, “Any file inside the folder pages/api is mapped to /api/* and will be treated as an API endpoint instead of a page. They are server-side only bundles and won't increase your client-side bundle size.”

We’re going to create an async function to handle the incoming request and the response sent back to the client.
We’ll start by adding the DeepGram API key from the environment variables.

Next, we’ll use destructuring to extract the url from the parsed body of the incoming request--which we find in req.body. Now we’ll make a fetch request to make a POST request to the Deepgram API endpoint. We await the response from the API, convert to JSON format, and send it back to the client as the response.

export default async function handler(req, res) {
  // Get data submitted in request's body.
const mySecret = process.env['DG_API_KEY']
const {url} = JSON.parse(req.body)

 const response = await fetch('https://api.deepgram.com/v1/listen?tier=enhanced&punctuate=true&paragraphs=true&diarize=true&keywords=Bekah:2&keywords=Hacktoberfest:2', {
        method: 'POST',
        headers: {
          'Authorization': 'Token ' + mySecret,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
url        })
      });
const json = await response.json() 
  res.status(200).json(JSON.stringify(json))
}

Notice in the API call, we include &punctuate=true&paragraphs=true&diarize=true. Because we want to use an example where there is more than one person on the podcast, and we want to make it readable, we add these properties. Diarization allows the transcript to be broken down into different speakers. Now that the server-side code is set up, let’s connect it to our index.tsx file.

Below the setFile hook, let’s create a transcribe function. We need to connect to our api route, which is api/transcribe, send the file , get the response back with our transcription, and store that transcript in a new hook that we’ll call setTranscription, so we can render it on the page. Here’s what the will look like:

 const transcribe = async () => {
    try {
      const response = await fetch('/api/transcribe', {
        method: 'POST',
        body: JSON.stringify({
          "url": file,
        })
      });
      const received = await response.json();
            const data= JSON.parse(received)
      const transcription = data.results.channels[0].alternatives[0].paragraphs.transcript;
      setTranscription(transcription)
    } catch (error) {
      console.error(error);
    }
  }

Now we need to handle how the text is displayed on the page. To decrease confusion, we should display a new line everytime the speaker changes. To do that, we’re going to create a new useState hook called lines and implement some logic to break the speakers into different lines. We’ll conditionally render a div if there’s a transcript. Here’s the code to handle this:

  const [lines, setLines] = useState([]);

 useEffect(() => {
    setLines(transcription.split("."));
  }, [transcription]);

Here’s the html:

{transcription &&  <div className={styles.transcript} id="new-transcription">
      {lines.map((line, index) => {
        if (line.startsWith("Speaker 0:")) {
          return <p key={index}>{line}</p>
        } else {
          return <p key={index}>{line}</p>
        }
      })}
    </div>}

If you want to try it out, you can use this sample audio. At this point, our Next.js + Deepgram project should allow you to turn audio files into transcripts. Happy listening!

ChatGPT vs. Bard: What Can We Expect?

Jose Francisco — Thu, 16 Feb 2023 04:23:35 +0000

Big tech just dropped some big news: Google is soon releasing its own ChatGPT-like large language model (LLM) called Bard. And now many are beginning to speculate about what exactly this AI battle is going to look like. Will the newcomer Bard dominate our reigning champion, ChatGPT? Or will the incumbent conversational AI model defend its crown? It's a real Rocky vs. Creed story. But honestly it's hard to tell which AI is the main character. To each, the other is a worthy opponent.

But here's what we can expect. Here, we'll take a look at how these models were trained (pun intended) and then examine their build (again, pun intended). By the end, we might be able to determine which model is the underdog and which is the public's favorite. 🥊

Their training...

Just like boxers, AI models have to train before they can show their skills to the public. Both ChatGPT and Bard have unique training styles. Specifically, ChatGPT runs on a GPT-3.5 model while Bard runs on LaMDA.

But what does that mean?

Well, we can think of GPT-3.5 as ChatGPT's "brain" while LaMDA is Bard's. The main commonality between them is the fact that they are both built on Transformers. Now, transformers are quite a hefty topic, so if you'd like to do a deep-dive, read our definitive guide on transformers here. But for our purposes here, the only thing you have to know about transformers is that they allow these language models to "read" human writing, to pay attention to how the words in such writing relate to one another, and to predict what words will come next based on the words that came ___.

(If you can fill in the blank above, you think like a Transformer-based AI language model. Or, more properly, the model thinks like you 😉)

Okay, so we know that both ChatGPT and LaMDA have neural net "brains" that can read. But as far as we know, that's where the commonalities end. Now come the differences. Mainly, they differ in what they read.

OpenAI has been relatively secretive about what dataset GPT-3.5 was trained on. But we do know that GPT-2 and GPT-3 were both trained at least in part on The Pile---a dataset that contains multiple, complete fiction and non-fiction books, texts from Github, all of Wikipedia, StackExchange, PubMed, and much more. As a result, we can assume that GPT-3.5 has at least a little bit of The Pile within its metaphorical gears as well. And this dataset is massive, weighing in at a little over 825 gigabytes of raw text.

Check out the contents of The Pile below (source):

But here's the rub: Conversational language is not the same as written language. An author may write rhapsodically but come off as curt in one-on-one conversation. Likewise, the way that I am writing in this article is different from the way I'd speak if you were in the same room as me. As a result, OpenAI couldn't just release GPT-3.5 under the alias "ChatGPT" and call it a day. Rather, OpenAI needed to fine-tune GPT-3.5 on conversational texts to create ChatGPT.

That is, OpenAI literally had humans roleplay with themselves---acting both as an AI assistant and its user through a process known as reinforcement learning from human feedback (RLHF). Then, after enough of these conversations were constructed, they were fed to GPT-3.5. And after enough exposure to conversational dialogue, ChatGPT emerged. You can read about this fine-tuning process in more detail under the "Methods" section of OpenAI's blog on ChatGPT.

And this is where some may feel that Bard has an edge. See, LaMDA wasn't trained on The Pile. Rather, LaMDA specialized in reading dialogue from the get-go. It didn't read books; it's patterned on the pace and patois of conversation. As a result, Bard's brain picked up on the details that distinguish open-ended conversation from other forms of communication.

For example, an article like this one focuses on a single topic the whole time over the period of around 1,200-1,300 words. Meanwhile, a conversation of the same word-count may shift from "How are you doing?" to "Oh my goodness, no way" to "I love that song, have you heard the album yet?" to "Wait, my boyfriend's calling. One sec."

In other words, whereas ChatGPT's brain first learned to read novels, research papers, code, and Wikipedia before it learned how to have a human-like conversation, Bard's brain learned nothing but conversation. Does this mean that Bard is going to be a better conversationalist than ChatGPT? Well, we're going to have to talk to it to find out.

Their build 💪🧠

Google's CEO states that, at least to start, Bard is backed by a lightweight version of the LaMDA model. But does that mean Bard is going to be in a lower weight class than ChatGPT?

Not necessarily.

The stats would indicate that both these models are in the same weight class. Take a look:

🪙 Tokens 🪙

For our purposes, think of tokens as word fragments. For instance, the word "reorganizing" can be broken down into three tokens:

"re" → a prefix meaning "again",
"organiz" → the root of the word "organize"
"ing" → a suffix that denotes a gerund or present participle

Check out the image below to see more examples of tokenized word fragments:

See, GPT-3 and GPT-3.5 were trained on something to the order of 300-400 billion tokens. LaMDA, on the other hand, was trained on 1.56 trillion (with a T) tokens that were then broken down into 2.81 trillion SentencePiece tokens. And while this initial version of Bard contains a lightweight version of LaMDA, future iterations of the AI may come equipped with a full, heavyweight LaMDA. And who knows how that'll perform...

🤖 Parameters 🤖

Google publicly shares that LaMDA contains up to 137B model parameters while OpenAI says GPT-3.5 models hover around 175B parameters. It seems that, unlike tokens, the favor leans towards GPT in this category.

✅ Factual Correctness ✅

Here, Google proudly proclaims that they intentionally fleshed out the correctness of LaMDA's responses to knowledge-based questions by actively going through their dataset and annotating moments where LaMDA would have to resort to looking up factual information. When such questions arise, LaMDA is trained to call an external system to retrieve this encyclopedic information. This process is creatively called "information retrieval."

OpenAI, on the other hand, admits its limitations in this particular field. If you've ever used ChatGPT, you know about these shortcomings first-hand. Warnings are posted all over the place in the ChatGPT UI. So while we love our reigning champion, we have to admit that it isn't perfect.

In fact, ChatGPT---at least to date---is pretty definitive about saying "my training data ends in 2021." But Bard ostensibly remains okay with discussing more current events. But anyone outside of the current beta-testing group doesn't exactly know the extent to which that is the case.

The edge seems to lean towards Bard here---at least when it comes to discussing current-ish events. But we'll have to wait to use it to see for ourselves.

Conclusion

When it comes to conversation, Bard really seems to be a formidable competitor to ChatGPT. With its edge in conversational training and its explicit training in factual grounding, Bard might come out as the favorite. But this by no means makes ChatGPT an underdog. Again, we'll have to wait and see.

This battle may be the next iPhone vs. Android. Or the next Mac vs. PC. Or even the legendary emacs vs. vim. (Just don't get us started on tabs versus spaces.) Our headliners will indeed provide us with some interesting content. But for now, let's all rest happy, knowing that---regardless of the victor---we're all champions for being able to access, use, and analyze such incredible technology.

🎵Rocky outro theme fades into the background 🎵

(This blog post was written on February 8th 2023, two days after Google AI announced Bard.)

What do you love most about Tech?

BekahHW — Mon, 13 Feb 2023 10:06:00 +0000

Tomorrow is Valentine's Day, and we want to celebrate what you love most about tech! Share and tag the people, orgs, and communities in the comments below ❤️

What are the biggest problems AI can solve in the future you're most interested in?

BekahHW — Mon, 06 Feb 2023 14:24:20 +0000

From medical innovation, to writing the next great novel, to solving environmental issues, there's a lot of talk around the amazing things AI can do.

What are some use cases you'd like to see implemented?

CHALLENGE: Describe AI in five words or less.

BekahHW — Mon, 30 Jan 2023 10:06:00 +0000

We thought it'd be fun to end the month with a challenge. If you have the best answer, we'll send you some Deepgram swag!

What's your favorite science fiction technology that's been turned into something real?

BekahHW — Mon, 23 Jan 2023 15:45:43 +0000

In 1865, Jules Verne wrote From the Earth to the Moon, where he described spacecrafts and rockets, that eventually sounded a lot like Apollo-11.

Start-ups today are even using ideas from science fiction as the imagination to power their innovation.

What are some things you've read in sci fi that have become true-to-life?

How Does GPT-3 Work?

Michael Jolley — Tue, 17 Jan 2023 15:47:15 +0000

GPT-3 is a large language model (LLM), and it’s been making headlines in nearly every industry. From the release of the seemingly self-aware. ChatGPT to copywriting and coding AI apps, GPT-3 is spreading like wildfire, first through the tech press and now through mainstream media.

Originally released in 2020, GPT-3 was developed by OpenAI, an artificial intelligence research lab in San Francisco. It’s a pre-trained universal language model that uses deep learning transformers to generate human-like text—and it’s pretty good at it.

ChatGPT and the GPT-3 API family have been used to write poetry and fiction, code websites, respond to customer reviews, suggest better grammar, translate languages, generate dialogue, find tax deductions, and automate A/B testing. The use cases are seemingly endless and its results are surprisingly high-quality.

While this large language model can do some incredible, useful things, it still has its flaws. We’ll cover that later, though. First, let’s cover the basics like “What is GPT-3?” and “How does GPT-3 work?”

What is GPT-3?

GPT-3 stands for Generative Pre-trained Transformer 3, the third iteration of OpenAI’s GPT architecture. It’s a transformer-based language model that can generate human-like text.

This deep learning model was pre-trained on over 175 billion parameters, among the largest of large language models in production today. Because of its size, GPT-3 is much better than any previous model for producing long-form and specialized text.

Is GPT-3 open source?

GPT-3 is not open source. OpenAI reasons that GPT-3 could be misused and therefore shouldn’t be available for open-source use. Additionally, Microsoft acquired an exclusive license to GPT-3 in September 2020.

Microsoft is the only entity aside from OpenAI to have access to the underlying GPT-3 model. Others can still use GPT-3 via the public API and during ChatGPT’s testing phase. The project was backed by Microsoft (contributing $1 billion).

The closest open-source alternative to GPT-3 is GPT-JT, released by Together in November 2022. It was trained using a decentralized approach on 3.53 billion tokens, and they claim it can outperform other models. You can find GPT-JT on Hugging Face.

Why was GPT-3 created?

Before GPT-3, the largest trained language model was Microsoft’s Turin NLG model, which had 10 billion parameters. Previously, most generative language models could only produce simple sentences or yes and no answers.

What is GPT-3.5?

GPT-3 was released in 2020. GPT-4’s anticipated release could happen as soon as 2023. In the meantime, OpenAI quietly released GPT-3.5—which has a better grasp of relationships between words, parts of words, and sentences—with no formal announcement in November 2022.

After complaints that GPT-3 was generating toxic and biased text, OpenAI began experimenting. GPT3.5 is similar to InstructGPT, a version of GPT-3 that was re-trained to better align with users’ intentions.

OpenAI trained GPT-3 on a corpus of code and text it sourced through a crawl of open web content published through 2021. Its knowledge of events and developments post-2021 is limited. This new version of GPT learned relationships between words, parts of words, and sentences. GPT-3.5 is only available through OpenAI’s APIs and ChatGPT.

How does GPT-3 Work?

When a user inputs text, known as a prompt, the model analyzes the language using a text predictor and generates the most helpful result.

GPT-3 uses patterns from billions of parameters gleaned from over 570GB of internet-sourced text data to predict the most useful output. It expects the next appropriate token in a given sequence of tokens, even ones it hasn’t been trained on.

GPT-3 is a meta learner, meaning it’s been taught to learn. It understands, to an extent, how to perform new tasks similarly to how a human would. The original baseline GPT-3 doesn’t actually know how to perform any specific task; it knows how to learn. This makes it a powerfully versatile model.

Let’s dive deeper. First, GPT-3 goes through unsupervised training with its massive, internet-harvested set of parameters. Then, it’s ready to accept text inputs, a.k.a. prompts. It converts the inputted words into a vector representing the word, a list of numbers. Those vectors are used to compute predictions in transformer decoders, 96 of them to be exact. Then they’re converted back to words.

How is GPT-3 trained?

OpenAI used internet data to train GPT-3 to generate any type of text. It was trained in a generative, unsupervised manner. In simple terms, it was taught to transform prompts into large amounts of appropriate text without supervision.

Is GPT-3 conscious?

While they designed GPT-3 to sound like a human and learn, it’s not sentient. GPT-3 can logically reason with a level of understanding that’s not as good as the average adult. It is, however, about as close to human-like verbal output as any model yet released. Some people did wonder if GPT-3 was self-aware. In reality, unlike its predecessors, the model was trained to have a degree of common sense.

How to use GPT-3

Where to use GPT-3 is simple. Head to OpenAI’s ChatGPT to get started. If you want more control, you can mess around in the playground. Developers can incorporate GPT-3 into their applications through OpenAI's API.

What can GPT-3 do?

GPT-3 is used for generating realistic human text. It’s been used to write articles, essays, stories, poetry, news reports, dialogue, humor, advertisements, and social media copy. It can even philosophize, albeit badly. GPT-3 can mimic the styles of specific writers, generate memes, write recipes, and produce comic strips.

GPT-3 is capable of generating code, too. It’s written working code, created mock-up websites, and designed UI prototyping from just a few sentence descriptions. Users can also create plots, charts, and excel functions with GPT-3.

In addition to producing convincingly human text and code, it can generate automated conversations. GPT-3, as implemented in ChatGPT, responds to any text that a user types in with a new piece of contextual text. This feature is often implemented to add realistic dialogue to games and provide customer service through chatbots.

GPT-3’s predecessors, aptly named GPT-1 and GPT-2, were criticized for not actually “knowing” anything and not having common sense. This generation of GPT was trained to follow a sequence of events and predict what’s next. It has at least some common sense, the ability to learn, and even the capability of philosophizing.

Examples of GPT-3

Here are the top projects and applications powered by GPT-3 that you need to know.

ChatGPT

From OpenAI, ChatGPT is a chatbot. It answers questions, remembers what a user said earlier in a conversation, answers follow-up questions, admits mistakes, challenges incorrect premises, and rejects inappropriate requests.

Since it’s based on GPT-3, it’s pretrained. OpenAI has also fine tuned it with supervised and reinforcement learning techniques. Its research release in November 2022 sparked headlines in mainstream news outlets and tech sources alike.

Copilot

Part of Visual Studio Code, Copilot autocompletes code snippets. It was released by GitHub and OpenAI, based on Codex—a product of GPT-3. Codex came out in 2020, while Copilot was released in autumn 2021.

Debuild

Debuild creates code for web apps on-demand using GPT-3 as its base model. It asks for a description of the user’s app and use cases. Then, it generates React components, SQL code, and helps assemble interfaces visually. Currently, Debuild is waitlisted.

A/BTesting

A/BTesting is an automated A/B testing provider. It uses GPT-3 to generate multiple versions of the title, copy, and call to action. It tests them for you on its own using a JavaScript snippet or plugin. When the test reaches statistical significance it mixes them up, mutates them, and runs another batch to ensure their customers have the highest converting copy possible.

Replier

Replier responds to customer reviews automatically. It learns from previous review responses to create tailored, unique answers. They use GPT-3 and then clean the output using a monitoring system to detect bad behaviors and improve its replies.

Jasper

Jasper focuses on content writing, primarily blog posts. The service offers templates, art, content snippets in 25 languages, emails, reports, and stories. They also have a chatbot feature to help with brainstorming.

Lex

Lex, created by Every, is an AI-powered word processor. It allows writers to generate essays, articles, stories, and optimized headers with help from GPT-3. While Lex can write full articles, copy, and more, it differs from other AI copy generators by offering a standalone word processor that can also complete a partially written paragraph. Lex is in public beta with a waitlist.

Duolingo

Duolingo, the gamified language learning app, uses GPT-3 APIs to provide grammar suggestions. Currently, they’re only using it for French, but this could lead to the other 35 languages they offer in the future.

Keeper Tax

Keeper Tax helps users file their taxes and find more deductions. They market to 1099 contractors and freelancers. GPT-3 is used at Keeper Tax to find tax-deductible expenses in bank statements.

Quickchat

Another chatbot, Quickchat is a multilingual AI assistant that can automate customer support, online applications, and knowledge base searches. Customers can also use the widget to make automated live chat conversations based on training information uploaded by the user. Users can upload product descriptions, FAQs, and internal documentation to serve their customers automatically.

Algolia

Algolia is a vertical search engine offering recommendations and suggested searches for websites, mobile, and voice applications. Developers can use Algolia to implement site-specific search, digital content discovery, enterprise search, SaaS application search, customized content and product discovery, and more.

Lexion

Lexion started as a contract management system. Now, they’re using GPT-3 to help lawyers with their contracts. Their Microsoft Word plugin suggests edits and writes text summaries. Currently, the plugin is only intended to assist lawyers and improve efficiency, not to completely write contracts on its own.

GPT-3 is imperfect, but it is revolutionizing artificial intelligence.

GPT-3 can power a variety of tools and services in a way that AI couldn’t before. It can speed up professional writing, software engineering, and customer service tasks. It’s making editing, language learning, and coding tools more interactive and accessible to their users. The extent to which GPT-3 has already expanded beyond its predecessors is promising, and it implies an exciting future for GPT-4 and beyond.

Despite its usefulness, debates over the power and influence of generative AI are already flaring up. Samples have shown GPT-3 can sometimes regurgitate some of the racist, sexist points it learned from its internet training data; some are justifiably wary of the toxic language that occasionally arises in its automated answers.

OpenAI has adjusted the way GPT-3 learns after instances of lies, sexism, and racism were found in its answers. While the newest version, GPT-3.5, contains fixes, it’s still imperfect. GPT-3 has quickly become a provocative topic that will continue to change as new versions are released and new use cases are found.

Will it replace programmers, lawyers, and writers? No. So far, GPT-3 can’t produce completely error-free or convincing work independently. It can, however, be a supportive tool for people in these professions and many others.

What do you think tech will look like five years from now?

BekahHW — Mon, 16 Jan 2023 10:06:00 +0000

Last year we saw a lot of really cool technology released into the world. What do you think we'll see in the next five years? Who will be solving problems? What will learning look like? What new, shiny things do you think we'll be able to look forward to? Drop your response or a gif that captures your response in the comments.

What conferences do you want to go to in 2023?

BekahHW — Mon, 09 Jan 2023 10:06:00 +0000

It's a new year, and that means time for a new year of conferences! What conferences--virtual or in-person--do you want to attend this year? Are you speaking, sponsoring, or going as an attendee?

I'm starting the year off strong, speaking virtually at You Got This! on January 14th and then live at THAT Conference on January 18th.

What are your 2023 tech goals and do you want accountability?

BekahHW — Mon, 02 Jan 2023 10:11:00 +0000

We love a good New Year’s goal, and we’re setting some of our own, including starting a learning cohort for freeCodeCamp’s Machine Learning with Python in January. What goals are you setting? Do you want accountability?

Identify Sales Insights from Meeting Audio

Tonya Sims — Tue, 27 Dec 2022 17:52:41 +0000

You just started your first day as a Python developer at Dunder Mifflin Paper Company, Inc. The President of Sales has an urgent request for you, to transcribe a sales meeting from speech to text with the regional manager, Michael Scott.

This is not just any sales meeting. Landing this client could determine the health and future of the company. You see, Michael Scott was kind of a goofball and had a habit of joking around too much during important sales calls so the VP of Sales, Jan, was sent to watch over him.

The President of Sales could not figure out why this client didn’t sign the contract.

Was there even a deal made?

Did Michael not close the sale?

Or did Michael scare the client away by telling his lame jokes?

He needed sales insights ASAP and the only way he could get them without being there was by using AI speech recognition and Python.

You’ve probably guessed by now, but if you haven’t, this is a classic scene from the hit sitcom, The Office.

If you want the full code sample of how to identify sales insights from meeting audio, skip to the bottom. If you want to know what happens next with the foolery then keep reading.

In this sales call scene from The Office Michael Scott moves the meeting to a restaurant, Chili’s, without anyone’s permission. Since this episode was released in the mid-2000s, we’re going to fast-forward to 2022. Let’s say this meeting didn’t happen in a restaurant, it occurred over everyone’s favorite, a video call.

You explained to the President of Sales that the meeting could be recorded, then uploaded to be transcribed using Python and speech-to-text. You elaborate that certain features can be used to gather sales insights from the meeting audio.

You ask the President what type of insights they need. They need a quick summary of the transcript, instead of reading through the whole thing, and the ability to search through the transcript to determine if Michael Scott mentioned business, deals, or jokes.

Conversation Intelligence and Sales Insights from Meeting Audio

ou have the perfect solution for a speech recognition provider, Deepgram. You get to coding, using their Python SDK.

The first thing you do is grab an API key here.

Then create a directory with a Python file inside. You use pip to install Deepgram pip install deepgram-sdk.

It was very easy to use with this code:

from deepgram import Deepgram
import json

DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’
PATH_TO_FILE = 'audio/the-office-meeting.mp3'

def main():
   # Initializes the Deepgram SDK
   deepgram = Deepgram(DEEPGRAM_API_KEY)
   # Open the audio file
   with open(PATH_TO_FILE, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       options = {
           'summarize': True,
           'search': ['business', 'deal', 'joke']
       }

       response = deepgram.transcription.sync_prerecorded(source, options)
       print(json.dumps(response, indent=4))

main()

You’re importing the libraries at the top:

from deepgram import Deepgram
import json

Copying and pasting your Deepgram API Key into the code and adding the path to the file you want to transcribe:

DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’
PATH_TO_FILE = 'audio/the-office-meeting.mp3'

Inside the main function, you’re initializing the Deepgram SDK. Then you open the audio file with open(PATH_TO_FILE, 'rb') as audio:. Since the file being transcribed is an MP3, that’s what you set as the mimetype, while passing the audio into the Python dictionary as well: source = {'buffer': audio, 'mimetype': 'audio/mp3'}.

You tap into their Summary and Search features as explained here, by creating an options object with those parameters.

options = {
           'summarize': True,
           'search': ['business', 'deal', 'joke']
       }

Lastly, this line response = deepgram.transcription.sync_prerecorded(source, options) will take in the audio and features and do the transcription. The results will then be printed with the following print(json.dumps(response, indent=4)).

You’ll receive a JSON response with the transcript, the summary, and the search findings. It looked something like this:

The Summary

"summaries": [
                            {
                                "summary": "Lack of one county has not been immune to the slow economic growth over the past five years. So for us, the name of the game is budget reduction.",
                                "start_word": 0,
                                "end_word": 597
                            }
]

The Search

"search": [
                    {
                        "query": "business",
                          "hits": [
                            {
                                "confidence": 1.0,
                                "start": 231.305,
                                "end": 231.705,
                                "snippet": "business"
                            },
                        "query": "deal",
                         "hits": [
                            {
                                "confidence": 0.7395834,
                                "start": 86.13901,
                                "end": 86.298805,
                                "snippet": "i'll"
                            },
                        "query": "joke",
                         "hits": [
                            {
                                "confidence": 1.0,
                                "start": 82.125,
                                "end": 82.284996,
                                "snippet": "one joke"
                            },

Your insights from the sales meeting. From the summary, it seems as if the customer wants to reduce costs and the search confidence indicates Michael Scott talked about business, didn’t discuss deals too much, and told some jokes.

You share this with the President of Sales. They now have a better understanding of what happened in the sales call, how to coach Michael Scott on closing future sales deals, and how to follow up with the customer.

Moving forward, all of Dunder Mifflin’s sales meetings were recorded, transcribed, and insights were derived using Deepgram to improve performance and maximize revenue. Corny jokes were only allowed if they helped build relationships with the customer.

The end.

Here’s the whole code sample:

from deepgram import Deepgram
import json

DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’
PATH_TO_FILE = 'audio/the-office-meeting.mp3'

def main():
   # Initializes the Deepgram SDK
   deepgram = Deepgram(DEEPGRAM_API_KEY)
   # Open the audio file
   with open(PATH_TO_FILE, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       options = {
           'summarize': True,
           'search': ['business', 'deal', 'joke']
       }

       response = deepgram.transcription.sync_prerecorded(source, options)
       print(json.dumps(response, indent=4))

main()

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.