DEV Community

Hackceleration
Hackceleration

Posted on • Originally published at hackceleration.com

Building an AI-Powered YouTube Publisher with n8n, OpenAI Whisper, and Google Drive

Building an AI-Powered YouTube Publisher with n8n, OpenAI Whisper, and Google Drive

You've got audio files sitting in Google Drive. You need them transcribed, analyzed, and published to YouTube with SEO-optimized metadata. Here's how to architect an automation that handles the entire pipeline using n8n, OpenAI's Whisper API, and the YouTube Data API.

Architecture Overview

This integration chains five core APIs:

1. Google Drive API → Retrieve audio/video files
2. OpenAI Whisper API → Transcribe audio to text
3. OpenAI GPT-4.1-mini → Generate metadata from transcript
4. Google Drive API → Download video file
5. YouTube Data API v3 → Upload with generated metadata
Enter fullscreen mode Exit fullscreen mode

Why this stack? Google Drive provides centralized file storage with robust search capabilities. Whisper offers high-accuracy transcription across languages. GPT-4.1-mini balances quality and cost for metadata generation. The YouTube API handles programmatic uploads with scheduling.

Alternative considered: Zapier's YouTube integration lacks structured output parsing for AI-generated content. n8n's JSON schema validation ensures consistent metadata formatting.

API Integration Deep-Dive

Google Drive API: File Search and Download

Authentication: OAuth2 via Google Cloud Console. Create credentials at console.cloud.google.com, enable Google Drive API, configure OAuth consent screen.

Search Request:

GET https://www.googleapis.com/drive/v3/files
Headers: { "Authorization": "Bearer {access_token}" }
Query Parameters: {
  "q": "'{folder_id}' in parents",
  "fields": "files(id, name, mimeType)"
}
Enter fullscreen mode Exit fullscreen mode

Response Structure:

{
  "files": [
    {
      "id": "1AbC2DeF3GhI4JkL5MnO6PqR7StU8VwX9YzA",
      "name": "episode-42.mp3",
      "mimeType": "audio/mpeg"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

n8n Configuration:

  • Resource: File/Folder
  • Operation: Search
  • Filter by Folder ID (found in Drive URL)
  • Return All: Enabled

Download Request:

GET https://www.googleapis.com/drive/v3/files/{fileId}?alt=media
Headers: { "Authorization": "Bearer {access_token}" }
Enter fullscreen mode Exit fullscreen mode

Returns binary data stream. n8n stores this in the data binary field.

Rate Limits: 1,000 queries per 100 seconds per user. Edge case: handle 403 userRateLimitExceeded with exponential backoff.

OpenAI Whisper API: Audio Transcription

Authentication: API key from platform.openai.com. Add to request header as Authorization: Bearer {api_key}.

Request Format:

POST https://api.openai.com/v1/audio/transcriptions
Headers: {
  "Authorization": "Bearer {api_key}",
  "Content-Type": "multipart/form-data"
}
Body (form-data): {
  "file": <binary_audio_data>,
  "model": "whisper-1"
}
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "text": "Welcome to episode 42 where we discuss API integration patterns..."
}
Enter fullscreen mode Exit fullscreen mode

Critical Parameters:

  • File size limit: 25 MB
  • Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
  • Cost: $0.006 per minute

n8n Configuration:

  • Resource: Audio
  • Operation: Transcribe a Recording
  • Input Data Field Name: data

Error Handling: 413 Payload Too Large → compress audio or split files. Missing binary data → verify Google Drive download node output.

OpenAI GPT-4.1-mini: Structured Metadata Generation

Request Structure:

POST https://api.openai.com/v1/chat/completions
Headers: {
  "Authorization": "Bearer {api_key}",
  "Content-Type": "application/json"
}
Body: {
  "model": "gpt-4.1-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are a YouTube SEO expert. Generate title, description, and tags."
    },
    {
      "role": "user",
      "content": "{transcript_text}"
    }
  ],
  "response_format": { "type": "json_schema", "json_schema": {...} }
}
Enter fullscreen mode Exit fullscreen mode

JSON Schema for Structured Output:

{
  "type": "object",
  "properties": {
    "title": { "type": "string" },
    "description": { "type": "string" },
    "tags": { "type": "array", "items": { "type": "string" } }
  },
  "required": ["title", "description", "tags"]
}
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "choices": [
    {
      "message": {
        "content": "{\"title\":\"How to Integrate APIs with n8n\",\"description\":\"Learn API integration...\",\"tags\":[\"api\",\"n8n\",\"automation\"]}"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

n8n AI Agent Setup:

  • Model: gpt-4.1-mini
  • Output Parser: JSON Schema
  • Auto-Fix Format: Enabled

Cost Optimization: GPT-4.1-mini costs ~$0.15 per 1M tokens. Typical metadata generation uses 500-1000 tokens per video.

YouTube Data API v3: Video Upload

Authentication: OAuth2 with youtube.upload scope. Create credentials in Google Cloud Console, enable YouTube Data API v3.

Upload Request (resumable upload):

POST https://www.googleapis.com/upload/youtube/v3/videos?uploadType=resumable&part=snippet,status
Headers: {
  "Authorization": "Bearer {access_token}",
  "Content-Type": "application/json"
}
Body: {
  "snippet": {
    "title": "{ai_generated_title}",
    "description": "{ai_generated_description}",
    "tags": ["{tag1}", "{tag2}"],
    "categoryId": "28"
  },
  "status": {
    "privacyStatus": "private",
    "publishAt": "2025-06-20T18:00:00Z"
  }
}
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "id": "dQw4w9WgXcQ",
  "snippet": {
    "title": "How to Integrate APIs with n8n",
    "publishedAt": "2025-06-20T18:00:00Z"
  }
}
Enter fullscreen mode Exit fullscreen mode

n8n Configuration:

  • Resource: Video
  • Operation: Upload
  • Title: {{ $('AI Agent').item.json.output.title }}
  • Description: {{ $('AI Agent').item.json.output.description }}
  • Privacy Status: private
  • Publish At: ISO 8601 datetime string

Rate Limits: 10,000 quota units per day. One upload = 1,600 units. Handle 403 quotaExceeded by queueing uploads across days.

Implementation Gotchas

Missing Transcript Data: If Whisper returns empty text (silence detection), the AI agent receives no input. Add conditional logic: {{ $json.text ? $json.text : 'No audio detected' }}.

OAuth Token Expiration: Google OAuth tokens expire after 1 hour. n8n's credential system auto-refreshes, but manual API calls need refresh token handling.

YouTube Category IDs: Category 28 = Science & Technology. Wrong category causes upload rejection. Validate against YouTube's category list before deployment.

Binary Data Size: Large video files (>2GB) can timeout. Set n8n's EXECUTIONS_TIMEOUT environment variable to 3600 seconds for long uploads.

Scheduled Publish Failures: publishAt must be at least 6 hours in the future and use ISO 8601 format with timezone. JavaScript Date objects in n8n expressions need .toISO() conversion.

AI Hallucination: GPT-4.1-mini occasionally generates tags unrelated to content. Add validation: check if tags exist in transcript text before uploading.

Prerequisites

Required Accounts:

  • n8n instance (self-hosted or cloud)
  • OpenAI API account with credits
  • Google Cloud project with Drive + YouTube APIs enabled
  • YouTube channel with upload permissions

API Credentials Needed:

  • OpenAI API key: platform.openai.com/api-keys
  • Google OAuth2 credentials: console.cloud.google.com/apis/credentials
  • YouTube OAuth consent configured with youtube.upload scope

Estimated Costs:

  • Whisper: $0.006/minute audio
  • GPT-4.1-mini: $0.15/1M input tokens, $0.60/1M output tokens
  • Per video: ~$0.10-0.50 depending on audio length

Official Documentation:

  • Google Drive API: developers.google.com/drive/api/v3/reference
  • OpenAI Whisper: platform.openai.com/docs/guides/speech-to-text
  • YouTube Data API: developers.google.com/youtube/v3/docs

Get the Complete Workflow Configuration

This tutorial covers the API integration architecture and critical parameters. For the complete n8n workflow JSON file with all node configurations, system prompts, and error handling logic, check out the full implementation guide.

Top comments (0)