How I Built a Recipe Extractor from YouTube Using AWS Transcribe

#aws #python #s3 #transrcibe

"Cooking videos are great, but following along in the kitchen is a pain. You're elbow-deep in dough and suddenly need to rewind for that one ingredient you missed."

So I built a small pipeline that takes any YouTube cooking video, pulls the audio, sends it to Amazon Transcribe, and gives me a clean text file of the entire recipe.

No paid tools. No complex setup. Just AWS services and a few Python scripts.

What the Pipeline Does

YouTube Video
↓
Download Audio (yt-dlp)
↓
Upload to S3
↓
Amazon Transcribe
↓
recipe.txt

Four steps. That's it.

Step 1 — Download the Audio

I used yt-dlp to pull just the audio from the video. No need to download the full video.

yt-dlp \ --extract-audio \ --audio-quality 0 \ --output "output/audio.%(ext)s" \ "https://youtu.be/YOUR_VIDEO_ID"

One thing I ran into — ffmpeg was not installed on my machine, so the mp3 conversion failed. But Amazon Transcribe supports webm format natively, so I skipped the conversion entirely and uploaded the raw .webm file. Saved time.

Step 2 — Create an S3 Bucket and Upload

BUCKET_NAME="recipe-transcribe-$(date +%s)" aws s3 mb s3://$BUCKET_NAME --region us-east-1 aws s3 cp output/audio.webm s3://$BUCKET_NAME/audio.webm

Using date +%s as a suffix keeps the bucket name unique without any extra thinking.

Step 3 — Start the Transcribe Job

  import boto3

  BUCKET_NAME = "your-bucket-name"
  JOB_NAME    = "recipe-job-01"
  REGION      = "us-east-1"
  MEDIA_URI   = f"s3://{BUCKET_NAME}/audio.webm"

  client = boto3.client("transcribe", region_name=REGION)

  client.start_transcription_job(
      TranscriptionJobName=JOB_NAME,
      Media={"MediaFileUri": MEDIA_URI},
      MediaFormat="webm",
      LanguageCode="en-US",
      OutputBucketName=BUCKET_NAME,
      OutputKey="transcript.json",
  )

Amazon Transcribe picks up the file from S3 and writes transcript.json back to the same bucket once done.

Step 4 — Poll the Job and Save the Recipe

while True:
      response = transcribe.get_transcription_job(TranscriptionJobName=JOB_NAME)
      status   = response["TranscriptionJob"]["TranscriptionJobStatus"]
      print(f"Status: {status}")

      if status == "COMPLETED":
          break
      if status == "FAILED":
          raise RuntimeError("Job failed")

      time.sleep(15)

  # Download and extract plain text
  s3.download_file(BUCKET_NAME, "transcript.json", "output/transcript.json")

  with open("output/transcript.json") as f:
      data = json.load(f)

  text = data["results"]["transcripts"][0]["transcript"]

  with open("output/recipe.txt", "w") as f:
      f.write(text)

The script checks every 15 seconds. For a 10-minute video, the job finished in about a minute.

The Output

Here's what came out for a Guntur Chicken Masala video:

Readable. Accurate. Ready to use in the kitchen.

IAM Permissions You Need

  {
    "Effect": "Allow",
    "Action": [
      "s3:CreateBucket",
      "s3:PutObject",
      "s3:GetObject",
      "transcribe:StartTranscriptionJob",
      "transcribe:GetTranscriptionJob"
    ],
    "Resource": "*"
  }

What I'd Build Next