Building an AI-Powered YouTube Publisher with n8n, OpenAI Whisper, and Google Drive
You've got audio files sitting in Google Drive. You need them transcribed, analyzed, and published to YouTube with SEO-optimized metadata. Here's how to architect an automation that handles the entire pipeline using n8n, OpenAI's Whisper API, and the YouTube Data API.
Architecture Overview
This integration chains five core APIs:
1. Google Drive API → Retrieve audio/video files
2. OpenAI Whisper API → Transcribe audio to text
3. OpenAI GPT-4.1-mini → Generate metadata from transcript
4. Google Drive API → Download video file
5. YouTube Data API v3 → Upload with generated metadata
Why this stack? Google Drive provides centralized file storage with robust search capabilities. Whisper offers high-accuracy transcription across languages. GPT-4.1-mini balances quality and cost for metadata generation. The YouTube API handles programmatic uploads with scheduling.
Alternative considered: Zapier's YouTube integration lacks structured output parsing for AI-generated content. n8n's JSON schema validation ensures consistent metadata formatting.
API Integration Deep-Dive
Google Drive API: File Search and Download
Authentication: OAuth2 via Google Cloud Console. Create credentials at console.cloud.google.com, enable Google Drive API, configure OAuth consent screen.
Search Request:
GET https://www.googleapis.com/drive/v3/files
Headers: { "Authorization": "Bearer {access_token}" }
Query Parameters: {
"q": "'{folder_id}' in parents",
"fields": "files(id, name, mimeType)"
}
Response Structure:
{
"files": [
{
"id": "1AbC2DeF3GhI4JkL5MnO6PqR7StU8VwX9YzA",
"name": "episode-42.mp3",
"mimeType": "audio/mpeg"
}
]
}
n8n Configuration:
- Resource:
File/Folder - Operation:
Search - Filter by Folder ID (found in Drive URL)
- Return All: Enabled
Download Request:
GET https://www.googleapis.com/drive/v3/files/{fileId}?alt=media
Headers: { "Authorization": "Bearer {access_token}" }
Returns binary data stream. n8n stores this in the data binary field.
Rate Limits: 1,000 queries per 100 seconds per user. Edge case: handle 403 userRateLimitExceeded with exponential backoff.
OpenAI Whisper API: Audio Transcription
Authentication: API key from platform.openai.com. Add to request header as Authorization: Bearer {api_key}.
Request Format:
POST https://api.openai.com/v1/audio/transcriptions
Headers: {
"Authorization": "Bearer {api_key}",
"Content-Type": "multipart/form-data"
}
Body (form-data): {
"file": <binary_audio_data>,
"model": "whisper-1"
}
Response:
{
"text": "Welcome to episode 42 where we discuss API integration patterns..."
}
Critical Parameters:
- File size limit: 25 MB
- Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
- Cost: $0.006 per minute
n8n Configuration:
- Resource:
Audio - Operation:
Transcribe a Recording - Input Data Field Name:
data
Error Handling: 413 Payload Too Large → compress audio or split files. Missing binary data → verify Google Drive download node output.
OpenAI GPT-4.1-mini: Structured Metadata Generation
Request Structure:
POST https://api.openai.com/v1/chat/completions
Headers: {
"Authorization": "Bearer {api_key}",
"Content-Type": "application/json"
}
Body: {
"model": "gpt-4.1-mini",
"messages": [
{
"role": "system",
"content": "You are a YouTube SEO expert. Generate title, description, and tags."
},
{
"role": "user",
"content": "{transcript_text}"
}
],
"response_format": { "type": "json_schema", "json_schema": {...} }
}
JSON Schema for Structured Output:
{
"type": "object",
"properties": {
"title": { "type": "string" },
"description": { "type": "string" },
"tags": { "type": "array", "items": { "type": "string" } }
},
"required": ["title", "description", "tags"]
}
Response:
{
"choices": [
{
"message": {
"content": "{\"title\":\"How to Integrate APIs with n8n\",\"description\":\"Learn API integration...\",\"tags\":[\"api\",\"n8n\",\"automation\"]}"
}
}
]
}
n8n AI Agent Setup:
- Model:
gpt-4.1-mini - Output Parser: JSON Schema
- Auto-Fix Format: Enabled
Cost Optimization: GPT-4.1-mini costs ~$0.15 per 1M tokens. Typical metadata generation uses 500-1000 tokens per video.
YouTube Data API v3: Video Upload
Authentication: OAuth2 with youtube.upload scope. Create credentials in Google Cloud Console, enable YouTube Data API v3.
Upload Request (resumable upload):
POST https://www.googleapis.com/upload/youtube/v3/videos?uploadType=resumable&part=snippet,status
Headers: {
"Authorization": "Bearer {access_token}",
"Content-Type": "application/json"
}
Body: {
"snippet": {
"title": "{ai_generated_title}",
"description": "{ai_generated_description}",
"tags": ["{tag1}", "{tag2}"],
"categoryId": "28"
},
"status": {
"privacyStatus": "private",
"publishAt": "2025-06-20T18:00:00Z"
}
}
Response:
{
"id": "dQw4w9WgXcQ",
"snippet": {
"title": "How to Integrate APIs with n8n",
"publishedAt": "2025-06-20T18:00:00Z"
}
}
n8n Configuration:
- Resource:
Video - Operation:
Upload - Title:
{{ $('AI Agent').item.json.output.title }} - Description:
{{ $('AI Agent').item.json.output.description }} - Privacy Status:
private - Publish At: ISO 8601 datetime string
Rate Limits: 10,000 quota units per day. One upload = 1,600 units. Handle 403 quotaExceeded by queueing uploads across days.
Implementation Gotchas
Missing Transcript Data: If Whisper returns empty text (silence detection), the AI agent receives no input. Add conditional logic: {{ $json.text ? $json.text : 'No audio detected' }}.
OAuth Token Expiration: Google OAuth tokens expire after 1 hour. n8n's credential system auto-refreshes, but manual API calls need refresh token handling.
YouTube Category IDs: Category 28 = Science & Technology. Wrong category causes upload rejection. Validate against YouTube's category list before deployment.
Binary Data Size: Large video files (>2GB) can timeout. Set n8n's EXECUTIONS_TIMEOUT environment variable to 3600 seconds for long uploads.
Scheduled Publish Failures: publishAt must be at least 6 hours in the future and use ISO 8601 format with timezone. JavaScript Date objects in n8n expressions need .toISO() conversion.
AI Hallucination: GPT-4.1-mini occasionally generates tags unrelated to content. Add validation: check if tags exist in transcript text before uploading.
Prerequisites
Required Accounts:
- n8n instance (self-hosted or cloud)
- OpenAI API account with credits
- Google Cloud project with Drive + YouTube APIs enabled
- YouTube channel with upload permissions
API Credentials Needed:
- OpenAI API key: platform.openai.com/api-keys
- Google OAuth2 credentials: console.cloud.google.com/apis/credentials
- YouTube OAuth consent configured with youtube.upload scope
Estimated Costs:
- Whisper: $0.006/minute audio
- GPT-4.1-mini: $0.15/1M input tokens, $0.60/1M output tokens
- Per video: ~$0.10-0.50 depending on audio length
Official Documentation:
- Google Drive API: developers.google.com/drive/api/v3/reference
- OpenAI Whisper: platform.openai.com/docs/guides/speech-to-text
- YouTube Data API: developers.google.com/youtube/v3/docs
Get the Complete Workflow Configuration
This tutorial covers the API integration architecture and critical parameters. For the complete n8n workflow JSON file with all node configurations, system prompts, and error handling logic, check out the full implementation guide.
Top comments (0)