"Cooking videos are great, but following along in the kitchen is a pain. You're elbow-deep in dough and suddenly need to rewind for that one ingredient you missed."
So I built a small pipeline that takes any YouTube cooking video, pulls the audio, sends it to Amazon Transcribe, and gives me a clean text file of the entire recipe.
No paid tools. No complex setup. Just AWS services and a few Python scripts.
What the Pipeline Does
YouTube Video
↓
Download Audio (yt-dlp)
↓
Upload to S3
↓
Amazon Transcribe
↓
recipe.txt
Four steps. That's it.
Step 1 — Download the Audio
I used yt-dlp to pull just the audio from the video. No need to download the full video.
yt-dlp \
--extract-audio \
--audio-quality 0 \
--output "output/audio.%(ext)s" \
"https://youtu.be/YOUR_VIDEO_ID"
One thing I ran into — ffmpeg was not installed on my machine, so the mp3 conversion failed. But Amazon Transcribe supports webm format natively, so I skipped the conversion entirely and uploaded the raw .webm file. Saved time.
Step 2 — Create an S3 Bucket and Upload
BUCKET_NAME="recipe-transcribe-$(date +%s)"
aws s3 mb s3://$BUCKET_NAME --region us-east-1
aws s3 cp output/audio.webm s3://$BUCKET_NAME/audio.webm
Using date +%s as a suffix keeps the bucket name unique without any extra thinking.
Step 3 — Start the Transcribe Job
import boto3
BUCKET_NAME = "your-bucket-name"
JOB_NAME = "recipe-job-01"
REGION = "us-east-1"
MEDIA_URI = f"s3://{BUCKET_NAME}/audio.webm"
client = boto3.client("transcribe", region_name=REGION)
client.start_transcription_job(
TranscriptionJobName=JOB_NAME,
Media={"MediaFileUri": MEDIA_URI},
MediaFormat="webm",
LanguageCode="en-US",
OutputBucketName=BUCKET_NAME,
OutputKey="transcript.json",
)
Amazon Transcribe picks up the file from S3 and writes transcript.json back to the same bucket once done.
Step 4 — Poll the Job and Save the Recipe
while True:
response = transcribe.get_transcription_job(TranscriptionJobName=JOB_NAME)
status = response["TranscriptionJob"]["TranscriptionJobStatus"]
print(f"Status: {status}")
if status == "COMPLETED":
break
if status == "FAILED":
raise RuntimeError("Job failed")
time.sleep(15)
# Download and extract plain text
s3.download_file(BUCKET_NAME, "transcript.json", "output/transcript.json")
with open("output/transcript.json") as f:
data = json.load(f)
text = data["results"]["transcripts"][0]["transcript"]
with open("output/recipe.txt", "w") as f:
f.write(text)
The script checks every 15 seconds. For a 10-minute video, the job finished in about a minute.
The Output
Here's what came out for a Guntur Chicken Masala video:
Readable. Accurate. Ready to use in the kitchen.
IAM Permissions You Need
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:PutObject",
"s3:GetObject",
"transcribe:StartTranscriptionJob",
"transcribe:GetTranscriptionJob"
],
"Resource": "*"
}
What I'd Build Next
- Trigger the whole pipeline on S3 upload via Lambda
- Process a full YouTube playlist at once
- Add speaker labels for videos with multiple hosts
The full code is on GitHub:
(https://github.com/robindeva/Extracting-a-Recipe)

Top comments (0)