DEV Community

Isheanesu
Isheanesu

Posted on

Letting Supabase Users Transcribe YouTube Videos

Have you ever wanted to let your Supabase users transcribe YouTube videos using an open-source model deployed to a serverless GPU platform? Probably way too specific for a random scenario (fair enough) but it's a fun project anyway.

So first things, first: How do we transcribe a video?

While we could get into the intricacies of machine learning models all day but that might be a little out of scope for this article so we are going to use OpenAI's Whispher model. It's open source and pretty good.

We'll need a GPU to run the model (technically not 100% necessary but we want our inference speed to be really fast). The problem is that GPUs are expensive - as you'll know if you've seen NVIDIA's stock price lately...

Stock graph

Thankfully, serverless GPUs have really caught on lately and for this video we'll be using a platform called Beam to deploy our Whispher. They have a great article on this very topic. Our focus is what to do when the model is ready for inference by our Supabase users.

So first of all, how do we call it in Supabase? Let's use an edge function.

  • Start a Supabase project
  • Install the CLI and create a new function by calling supabase functions new transcribe-yt (transcribe-yt is our function name)

Our index.ts should look something like this:

// Follow this setup guide to integrate the Deno language server with your editor:
// https://deno.land/manual/getting_started/setup_your_environment
// This enables autocomplete, go to definition, etc.

import { serve } from "https://deno.land/std@0.168.0/http/server.ts"

console.log("Hello from Functions!")

serve(async (req) => {
  const { name } = await req.json()
  const data = {
    message: `Hello ${name}!`,
  }

  return new Response(
    JSON.stringify(data),
    { headers: { "Content-Type": "application/json" } },
  )
})

// To invoke:
// curl -i --location --request POST 'http://localhost:54321/functions/v1/' \
//   --header 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6ImFub24iLCJleHAiOjE5ODM4MTI5OTZ9.CRXP1A7WOeoJeXxjNni43kdQwgnWNReilDMblYTn_I0' \
//   --header 'Content-Type: application/json' \
//   --data '{"name":"Functions"}'

Enter fullscreen mode Exit fullscreen mode

Let's change the 'name' parameter to 'video_url' first:

  const { video_url } = await req.json()
  const data = {
    message: `Hello ${video_url}!`,
  }
Enter fullscreen mode Exit fullscreen mode

Now let's actually call the model on Beam:

fetch("https://apps.beam.cloud/<ID_FROM_BEAM>", {
  method: "POST",
  headers: {
    "Authorization": "[YOUR_AUTH_TOKEN]",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    video_url: video_url
  })
})
  .then(response => response.json())
  .then(data => console.log(data))
  .catch(error => console.error(error));
Enter fullscreen mode Exit fullscreen mode

We will use .then today, and data['pred'] will return our transcription and we can now return our transcription to the user like this:

  .then(data => {
    const transcription = data['pred']

    return new Response(
      JSON.stringify({ transcription }),
      { headers: { "Content-Type": "application/json" } },
    )
  })
Enter fullscreen mode Exit fullscreen mode

Congratulations, you've done it!

BUT...

One more thing, anyone who logins in can blow up your bill and bankrupt your cool new startup. We need to do something else and that's figuring out who is asking for a transcription. By 'logging in as the user' (in a sense), first we'll import our favourite SDK

import { createClient } from '@supabase/supabase-js'
Enter fullscreen mode Exit fullscreen mode

Then we can get our user plan:

    const { user } = await supabaseClient.auth.getUser()
    const user_id = user.id

    // Now we can run any query to check user usage, billing plan, etc as them
    const { data, error } = await supabaseClient
      .from('user_plans')
      .select('plan')
      .eq('user_id', user_id)
      .single()
Enter fullscreen mode Exit fullscreen mode

Hopefully, you're already using excellent Row Level Security policies and then if a user isn't on a supported plan you would need to cancel the request and send an error.

Here's how all the code looks together:

// Follow this setup guide to integrate the Deno language server with your editor:
// https://deno.land/manual/getting_started/setup_your_environment
// This enables autocomplete, go to definition, etc.

import { serve } from "https://deno.land/std@0.168.0/http/server.ts"

console.log("Hello from Functions!")

serve(async (req) => {
  const { video_url } = await req.json()

    // We use our client here
    const supabaseClient = createClient(
      // Use the URL and anon key (these will be in your Supabase project)
      Deno.env.get('SUPABASE_URL') ?? '',
      Deno.env.get('SUPABASE_ANON_KEY') ?? '',
      { global: { headers: { Authorization: req.headers.get('Authorization')! } } }
    )

    const { user } = await supabaseClient.auth.getUser()
    const user_id = user.id

    // Now we can run any query to check user usage, billing plan, etc as them
    const { data, error } = await supabaseClient
      .from('user_plans')
      .select('plan')
      .eq('user_id', user_id)
      .single()

  fetch("https://apps.beam.cloud/<ID_FROM_BEAM>", {
    method: "POST",
    headers: {
      "Authorization": "[YOUR_AUTH_TOKEN]",
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      video_url: video_url
    })
  })
  .then(response => response.json())
  .then(data => {
    const transcription = data['pred']

    return new Response(
      JSON.stringify({ transcription }),
      { headers: { "Content-Type": "application/json" } },
    )
  })
  .catch(error => console.error(error));

  return new Response(
    JSON.stringify(data),
    { headers: { "Content-Type": "application/json" } },
  )
})

// To invoke:
// curl -i --location --request POST 'http://localhost:54321/functions/v1/' \
//   --header 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6ImFub24iLCJleHAiOjE5ODM4MTI5OTZ9.CRXP1A7WOeoJeXxjNni43kdQwgnWNReilDMblYTn_I0' \
//   --header 'Content-Type: application/json' \
//   --data '{"name":"Functions"}'

Enter fullscreen mode Exit fullscreen mode

A super helpful hint to save you some research: If your users are invoking the functions from a browser use CORS.

There are some more things that can be done to optimise this function (like handling those tricky catch statements but for now this is enough to get you started on your AI adventure.

If you have any questions leave a comment and I (not AI) will answer.

Top comments (0)