Pato for Cloudinary

Posted on Sep 8 • Edited on Nov 13 • Originally published at cloudinary.com

Turn Any Image into a Blog Post with AI (React, Cloudinary & OpenAI)

#react #cloudinary #openai #javascript

Why this is cool

Give your content team a superpower: drop in any image → get a well‑formed, vertical‑specific blog post plus audio playback. We’ll combine Cloudinary AI (image captioning) with OpenAI (blog generation + TTS) inside a Vite + React app with an Express backend.

What you’ll build

Upload an image → Cloudinary generates a caption describing it
Send that caption to OpenAI → get a 300‑word marketing blog post tailored to the image’s vertical (auto, travel, fashion, etc.)
Generate an MP3 narration of the post with OpenAI TTS

Demo idea: a red Ferrari image becomes a short, punchy automotive blog post with a play button for audio.

Prereqs

Node 18+
Free accounts: Cloudinary and OpenAI
Basic React/JS/Node skills

⚠️ OpenAI billing: add a small credit (\$5–\$10) and a spending cap to avoid surprises.

1) Cloudinary setup

Create/login → Settings → Product Environments
Note your Cloud name (e.g. demo)
API Keys: Settings → Product Environments → API Keys → Generate New API Key

Keep these handy:

CLOUDINARY_CLOUD_NAME
CLOUDINARY_API_KEY
CLOUDINARY_API_SECRET

We’ll also use the public cloud name on the client (via Vite env).

2) OpenAI setup

Create/login at platform.openai.com
Billing → add payment details + monthly limit
API Keys → Create new secret key

Save your OPENAI_API_KEY in .env (server only).

3) Scaffold the project (Vite + React)

npm create vite@latest image-to-blog-ai -- --template react-swc
cd image-to-blog-ai
npm i

Install deps for client & server:

# client deps
npm i axios react-markdown @cloudinary/react @cloudinary/url-gen

# server deps
npm i express cors cloudinary multer streamifier openai dotenv

# (optional in dev)
npm i -D nodemon

Project layout (single repo, both client + server):

image-to-blog-ai/
├─ index.html
├─ src/
├─ server.js            # Express API
├─ public/              # serves speech.mp3
├─ .env                 # server secrets
├─ vite.config.js
├─ package.json
└─ ...

4) Vite dev proxy (no CORS headaches)

vite.config.js

import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'

export default defineConfig({
  plugins: [react()],
  server: {
    port: 3000,
    proxy: {
      '/api': {
        target: 'http://localhost:6000', // Express port
        changeOrigin: true,
        secure: false,
      },
    },
  },
})

5) React UI

Create src/App.jsx (or .tsx if you prefer TS):

import { useState, useEffect } from 'react'
import axios from 'axios'
import { AdvancedImage } from '@cloudinary/react'
import { fill } from '@cloudinary/url-gen/actions/resize'
import { Cloudinary } from '@cloudinary/url-gen'
import ReactMarkdown from 'react-markdown'
import AudioPlayer from './AudioPlayer'
import './App.css'

export default function App() {
  const [image, setImage] = useState(null)
  const [caption, setCaption] = useState('')
  const [story, setStory] = useState('')
  const [error, setError] = useState('')
  const [loading, setLoading] = useState(false)
  const [shouldSubmit, setShouldSubmit] = useState(false)

  const cld = new Cloudinary({ cloud: { cloudName: import.meta.env.VITE_CLOUD_NAME } })

  useEffect(() => {
    if (shouldSubmit && image) handleSubmit()
    // eslint-disable-next-line react-hooks/exhaustive-deps
  }, [shouldSubmit, image])

  const handleImageChange = (e) => {
    const file = e.target.files?.[0]
    if (!file) return
    setImage(file)
    setShouldSubmit(true)
  }

  const handleSubmit = async () => {
    if (!image) return

    const formData = new FormData()
    formData.append('image', image)

    try {
      setLoading(true)
      const { data } = await axios.post('/api/caption', formData, {
        headers: { 'Content-Type': 'multipart/form-data' },
      })

      setCaption(data.caption)
      setStory(data.story.content)

      const cldImg = cld.image(data.public_id)
      cldImg.resize(fill().width(500).height(500))
      setImage(cldImg)
      setError('')
    } catch (err) {
      console.error(err)
      setError(err?.response?.data?.error || err.message)
    } finally {
      setShouldSubmit(false)
      setLoading(false)
    }
  }

  return (
    <div className="app">
      <h1>Image → Blog AI</h1>

      <label className="custom-file-upload">
        <input type="file" accept="image/*" onChange={handleImageChange} />
        Choose Image
      </label>

      {loading && <div className="spinner" />}
      {error && <p style={{ color: 'red' }}>{error}</p>}
      {image && !loading && typeof image === 'object' && image.constructor?.name !== 'CloudinaryImage' && (
        <p>Uploading...</p>
      )}
      {image?.constructor?.name === 'CloudinaryImage' && (
        <AdvancedImage cldImg={image} alt={caption} />
      )}

      {story && (
        <div>
          <AudioPlayer text={story} setLoading={setLoading} />
          {!loading && <ReactMarkdown>{story}</ReactMarkdown>}
        </div>
      )}
    </div>
  )
}

Minimal src/AudioPlayer.jsx:

import { useState } from 'react'
import axios from 'axios'

export default function AudioPlayer({ text, setLoading }) {
  const [url, setUrl] = useState('')

  const generate = async () => {
    try {
      setLoading(true)
      const { data } = await axios.post('/api/generate-audio', { text })
      setUrl(data.audioUrl)
    } finally {
      setLoading(false)
    }
  }

  return (
    <div style={{ margin: '1rem 0' }}>
      <button onClick={generate}>🔊 Generate Audio</button>
      {url && (
        <audio controls src={url} style={{ display: 'block', marginTop: 8 }} />
      )}
    </div>
  )}

src/App.css (grab your own styles, e.g. a centered column, spinner, and a .custom-file-upload button).

Tip: Store only the Cloud Name on the client via VITE_CLOUD_NAME. Keep all secrets on the server.

6) Express backend (Cloudinary + OpenAI)

Create .env in project root:

VITE_CLOUD_NAME=YOUR_CLOUD_NAME
CLOUDINARY_CLOUD_NAME=YOUR_CLOUD_NAME
CLOUDINARY_API_KEY=YOUR_CLOUDINARY_API_KEY
CLOUDINARY_API_SECRET=YOUR_CLOUDINARY_API_SECRET
OPENAI_API_KEY=YOUR_OPENAI_API_KEY

Create server.js in project root:

import 'dotenv/config.js'
import express from 'express'
import cors from 'cors'
import { v2 as cloudinary } from 'cloudinary'
import multer from 'multer'
import streamifier from 'streamifier'
import OpenAI from 'openai'
import path from 'path'
import { fileURLToPath } from 'url'
import fs from 'fs/promises'

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
const __filename = fileURLToPath(import.meta.url)
const __dirname = path.dirname(__filename)

const app = express()
app.use(express.json({ limit: '1mb' }))
app.use(cors())

cloudinary.config({
  secure: true,
  cloud_name: process.env.CLOUDINARY_CLOUD_NAME,
  api_key: process.env.CLOUDINARY_API_KEY,
  api_secret: process.env.CLOUDINARY_API_SECRET,
})

// Multer in-memory store with basic filtering
const storage = multer.memoryStorage()
const upload = multer({
  storage,
  limits: { fileSize: 8 * 1024 * 1024 }, // 8MB
  fileFilter: (_req, file, cb) => {
    const ok = /image\/(jpeg|png|webp|gif|bmp|tiff)/i.test(file.mimetype)
    cb(ok ? null : new Error('Unsupported file type'), ok)
  },
})

// Helper: promisify Cloudinary upload_stream
function uploadBufferToCloudinary(buffer) {
  return new Promise((resolve, reject) => {
    const stream = cloudinary.uploader.upload_stream(
      { detection: 'captioning' },
      (error, result) => (error ? reject(error) : resolve(result))
    )
    streamifier.createReadStream(buffer).pipe(stream)
  })
}

app.post('/api/caption', upload.single('image'), async (req, res) => {
  try {
    if (!req.file) return res.status(400).json({ error: 'Image file is required' })

    const result = await uploadBufferToCloudinary(req.file.buffer)
    const caption = result?.info?.detection?.captioning?.data?.caption || 'Unknown image'

    const story = await generateBlog(caption)

    res.json({
      public_id: result.public_id,
      caption,
      story,
    })
  } catch (err) {
    console.error('Caption error:', err)
    res.status(500).json({ error: err.message || 'Internal Server Error' })
  }
})

app.post('/api/generate-audio', async (req, res) => {
  try {
    const text = req.body?.text?.slice(0, 6000) || ''
    if (!text) return res.status(400).json({ error: 'Text is required' })

    const mp3 = await openai.audio.speech.create({
      model: 'tts-1',
      voice: 'alloy',
      input: text,
    })

    const buffer = Buffer.from(await mp3.arrayBuffer())
    const filePath = path.resolve(__dirname, 'public', 'speech.mp3')
    await fs.mkdir(path.dirname(filePath), { recursive: true })
    await fs.writeFile(filePath, buffer)

    res.json({ audioUrl: `/speech.mp3` })
  } catch (err) {
    console.error('TTS error:', err)
    res.status(500).json({ error: 'Error generating audio' })
  }
})

async function generateBlog(caption) {
  const message = {
    role: 'user',
    content: `Create a 300-word blog post for a marketing campaign. The post should be tailored to the image's vertical based on this caption: "${caption}". The article is for readers interested in that vertical, not for the business itself. Use an inviting tone, clear subheadings, and a call to action.`,
  }

  try {
    const response = await openai.chat.completions.create({
      model: 'gpt-3.5-turbo',
      messages: [message],
      temperature: 0.8,
    })
    return response.choices[0].message
  } catch (err) {
    console.error('OpenAI error:', err)
    return { role: 'assistant', content: 'Sorry—could not generate content right now.' }
  }
}

app.use(express.static(path.resolve(__dirname, 'public')))

const PORT = 6000
app.listen(PORT, () => console.log(`API listening on http://localhost:${PORT}`))

package.json (scripts for both dev servers):

{
  "name": "image-to-blog-ai",
  "private": true,
  "type": "module",
  "scripts": {
    "dev": "vite",
    "start": "node server.js",
    "dev:api": "nodemon server.js"
  }
}

7) Run it

# Terminal A (API)
npm run dev:api
# API → http://localhost:6000

# Terminal B (Vite)
npm run dev
# Web → http://localhost:3000

Upload an image → watch the caption + blog appear → click Generate Audio to get an MP3.

8) Production & security notes

Keep secrets server‑side only; never expose API keys in the client
Add rate limiting (e.g. express-rate-limit) and basic auth or tokens on /api routes
Validate file types and size (shown above); consider virus scanning for public apps
Cache TTS results per post hash to avoid re‑billing
Consider the Responses API for future‑proof OpenAI calls; swap chat.completions when you’re ready

Troubleshooting

CORS in dev: use the Vite proxy as shown (don’t call http://localhost:6000 directly from the client)
Cloudinary caption is undefined: ensure the detection: 'captioning' add‑on is enabled for your account/plan
MP3 not found: verify public/ exists and the server has write permissions

Wrap‑up

You now have an image‑to‑blog pipeline with Cloudinary + OpenAI: caption → post → audio. Drop it into your content workflow to turn static visuals into dynamic marketing assets.

Repo suggestion: name it cloudinary-react-image-to-blog-ai. Add the README sections straight from this post and you’re set.

Resources

Cloudinary React SDK: @cloudinary/react, @cloudinary/url-gen
OpenAI Node SDK: openai
React Markdown: react-markdown
Dev proxy: Vite server.proxy

Top comments (1)

Cyber Safety Zone • Sep 8

This tutorial is such a clever and practical mashup—thank you for putting it together! Turning any image into a 300-word, vertical-aware blog post and offering narrated output? That’s a serious content-creation superpower.

A few things that really stood out:

"Image captioning via Cloudinary"is a smart and reliable starting point—leveraging existing AI tools to bootstrap the workflow.

The seamless "pipeline from image → caption → blog → TTS" is elegant and keeps the UX smooth for content teams.
Love how you included useful "production/security notes"—especially caching TTS results to save on API costs, rate limiting, and keeping secrets client-side only.

Thanks for leveling up our content tooling toolkit! Curious—have you explored adding *"multilingual blogging or voice support", or even letting users tweak tone/style parameters before final output?