Why this is cool
Give your content team a superpower: drop in any image → get a well‑formed, vertical‑specific blog post plus audio playback. We’ll combine Cloudinary AI (image captioning) with OpenAI (blog generation + TTS) inside a Vite + React app with an Express backend.
What you’ll build
- Upload an image → Cloudinary generates a caption describing it
- Send that caption to OpenAI → get a 300‑word marketing blog post tailored to the image’s vertical (auto, travel, fashion, etc.)
- Generate an MP3 narration of the post with OpenAI TTS
Demo idea: a red Ferrari image becomes a short, punchy automotive blog post with a play button for audio.
Prereqs
- Node 18+
- Free accounts: Cloudinary and OpenAI
- Basic React/JS/Node skills
⚠️ OpenAI billing: add a small credit (\$5–\$10) and a spending cap to avoid surprises.
1) Cloudinary setup
- Create/login → Settings → Product Environments
- Note your Cloud name (e.g.
demo
) - API Keys: Settings → Product Environments → API Keys → Generate New API Key
Keep these handy:
CLOUDINARY_CLOUD_NAME
CLOUDINARY_API_KEY
CLOUDINARY_API_SECRET
We’ll also use the public cloud name on the client (via Vite env).
2) OpenAI setup
- Create/login at platform.openai.com
- Billing → add payment details + monthly limit
- API Keys → Create new secret key
Save your OPENAI_API_KEY
in .env (server only).
3) Scaffold the project (Vite + React)
npm create vite@latest image-to-blog-ai -- --template react-swc
cd image-to-blog-ai
npm i
Install deps for client & server:
# client deps
npm i axios react-markdown @cloudinary/react @cloudinary/url-gen
# server deps
npm i express cors cloudinary multer streamifier openai dotenv
# (optional in dev)
npm i -D nodemon
Project layout (single repo, both client + server):
image-to-blog-ai/
├─ index.html
├─ src/
├─ server.js # Express API
├─ public/ # serves speech.mp3
├─ .env # server secrets
├─ vite.config.js
├─ package.json
└─ ...
4) Vite dev proxy (no CORS headaches)
vite.config.js
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
export default defineConfig({
plugins: [react()],
server: {
port: 3000,
proxy: {
'/api': {
target: 'http://localhost:6000', // Express port
changeOrigin: true,
secure: false,
},
},
},
})
5) React UI
Create src/App.jsx (or .tsx
if you prefer TS):
import { useState, useEffect } from 'react'
import axios from 'axios'
import { AdvancedImage } from '@cloudinary/react'
import { fill } from '@cloudinary/url-gen/actions/resize'
import { Cloudinary } from '@cloudinary/url-gen'
import ReactMarkdown from 'react-markdown'
import AudioPlayer from './AudioPlayer'
import './App.css'
export default function App() {
const [image, setImage] = useState(null)
const [caption, setCaption] = useState('')
const [story, setStory] = useState('')
const [error, setError] = useState('')
const [loading, setLoading] = useState(false)
const [shouldSubmit, setShouldSubmit] = useState(false)
const cld = new Cloudinary({ cloud: { cloudName: import.meta.env.VITE_CLOUD_NAME } })
useEffect(() => {
if (shouldSubmit && image) handleSubmit()
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [shouldSubmit, image])
const handleImageChange = (e) => {
const file = e.target.files?.[0]
if (!file) return
setImage(file)
setShouldSubmit(true)
}
const handleSubmit = async () => {
if (!image) return
const formData = new FormData()
formData.append('image', image)
try {
setLoading(true)
const { data } = await axios.post('/api/caption', formData, {
headers: { 'Content-Type': 'multipart/form-data' },
})
setCaption(data.caption)
setStory(data.story.content)
const cldImg = cld.image(data.public_id)
cldImg.resize(fill().width(500).height(500))
setImage(cldImg)
setError('')
} catch (err) {
console.error(err)
setError(err?.response?.data?.error || err.message)
} finally {
setShouldSubmit(false)
setLoading(false)
}
}
return (
<div className="app">
<h1>Image → Blog AI</h1>
<label className="custom-file-upload">
<input type="file" accept="image/*" onChange={handleImageChange} />
Choose Image
</label>
{loading && <div className="spinner" />}
{error && <p style={{ color: 'red' }}>{error}</p>}
{image && !loading && typeof image === 'object' && image.constructor?.name !== 'CloudinaryImage' && (
<p>Uploading...</p>
)}
{image?.constructor?.name === 'CloudinaryImage' && (
<AdvancedImage cldImg={image} alt={caption} />
)}
{story && (
<div>
<AudioPlayer text={story} setLoading={setLoading} />
{!loading && <ReactMarkdown>{story}</ReactMarkdown>}
</div>
)}
</div>
)
}
Minimal src/AudioPlayer.jsx:
import { useState } from 'react'
import axios from 'axios'
export default function AudioPlayer({ text, setLoading }) {
const [url, setUrl] = useState('')
const generate = async () => {
try {
setLoading(true)
const { data } = await axios.post('/api/generate-audio', { text })
setUrl(data.audioUrl)
} finally {
setLoading(false)
}
}
return (
<div style={{ margin: '1rem 0' }}>
<button onClick={generate}>🔊 Generate Audio</button>
{url && (
<audio controls src={url} style={{ display: 'block', marginTop: 8 }} />
)}
</div>
)}
src/App.css (grab your own styles, e.g. a centered column, spinner, and a .custom-file-upload
button).
Tip: Store only the Cloud Name on the client via
VITE_CLOUD_NAME
. Keep all secrets on the server.
6) Express backend (Cloudinary + OpenAI)
Create .env in project root:
VITE_CLOUD_NAME=YOUR_CLOUD_NAME
CLOUDINARY_CLOUD_NAME=YOUR_CLOUD_NAME
CLOUDINARY_API_KEY=YOUR_CLOUDINARY_API_KEY
CLOUDINARY_API_SECRET=YOUR_CLOUDINARY_API_SECRET
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
Create server.js in project root:
import 'dotenv/config.js'
import express from 'express'
import cors from 'cors'
import { v2 as cloudinary } from 'cloudinary'
import multer from 'multer'
import streamifier from 'streamifier'
import OpenAI from 'openai'
import path from 'path'
import { fileURLToPath } from 'url'
import fs from 'fs/promises'
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
const __filename = fileURLToPath(import.meta.url)
const __dirname = path.dirname(__filename)
const app = express()
app.use(express.json({ limit: '1mb' }))
app.use(cors())
cloudinary.config({
secure: true,
cloud_name: process.env.CLOUDINARY_CLOUD_NAME,
api_key: process.env.CLOUDINARY_API_KEY,
api_secret: process.env.CLOUDINARY_API_SECRET,
})
// Multer in-memory store with basic filtering
const storage = multer.memoryStorage()
const upload = multer({
storage,
limits: { fileSize: 8 * 1024 * 1024 }, // 8MB
fileFilter: (_req, file, cb) => {
const ok = /image\/(jpeg|png|webp|gif|bmp|tiff)/i.test(file.mimetype)
cb(ok ? null : new Error('Unsupported file type'), ok)
},
})
// Helper: promisify Cloudinary upload_stream
function uploadBufferToCloudinary(buffer) {
return new Promise((resolve, reject) => {
const stream = cloudinary.uploader.upload_stream(
{ detection: 'captioning' },
(error, result) => (error ? reject(error) : resolve(result))
)
streamifier.createReadStream(buffer).pipe(stream)
})
}
app.post('/api/caption', upload.single('image'), async (req, res) => {
try {
if (!req.file) return res.status(400).json({ error: 'Image file is required' })
const result = await uploadBufferToCloudinary(req.file.buffer)
const caption = result?.info?.detection?.captioning?.data?.caption || 'Unknown image'
const story = await generateBlog(caption)
res.json({
public_id: result.public_id,
caption,
story,
})
} catch (err) {
console.error('Caption error:', err)
res.status(500).json({ error: err.message || 'Internal Server Error' })
}
})
app.post('/api/generate-audio', async (req, res) => {
try {
const text = req.body?.text?.slice(0, 6000) || ''
if (!text) return res.status(400).json({ error: 'Text is required' })
const mp3 = await openai.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: text,
})
const buffer = Buffer.from(await mp3.arrayBuffer())
const filePath = path.resolve(__dirname, 'public', 'speech.mp3')
await fs.mkdir(path.dirname(filePath), { recursive: true })
await fs.writeFile(filePath, buffer)
res.json({ audioUrl: `/speech.mp3` })
} catch (err) {
console.error('TTS error:', err)
res.status(500).json({ error: 'Error generating audio' })
}
})
async function generateBlog(caption) {
const message = {
role: 'user',
content: `Create a 300-word blog post for a marketing campaign. The post should be tailored to the image's vertical based on this caption: "${caption}". The article is for readers interested in that vertical, not for the business itself. Use an inviting tone, clear subheadings, and a call to action.`,
}
try {
const response = await openai.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [message],
temperature: 0.8,
})
return response.choices[0].message
} catch (err) {
console.error('OpenAI error:', err)
return { role: 'assistant', content: 'Sorry—could not generate content right now.' }
}
}
app.use(express.static(path.resolve(__dirname, 'public')))
const PORT = 6000
app.listen(PORT, () => console.log(`API listening on http://localhost:${PORT}`))
package.json (scripts for both dev servers):
{
"name": "image-to-blog-ai",
"private": true,
"type": "module",
"scripts": {
"dev": "vite",
"start": "node server.js",
"dev:api": "nodemon server.js"
}
}
7) Run it
# Terminal A (API)
npm run dev:api
# API → http://localhost:6000
# Terminal B (Vite)
npm run dev
# Web → http://localhost:3000
Upload an image → watch the caption + blog appear → click Generate Audio to get an MP3.
8) Production & security notes
- Keep secrets server‑side only; never expose API keys in the client
- Add rate limiting (e.g.
express-rate-limit
) and basic auth or tokens on/api
routes - Validate file types and size (shown above); consider virus scanning for public apps
- Cache TTS results per post hash to avoid re‑billing
- Consider the Responses API for future‑proof OpenAI calls; swap
chat.completions
when you’re ready
Troubleshooting
-
CORS in dev: use the Vite proxy as shown (don’t call
http://localhost:6000
directly from the client) -
Cloudinary caption is undefined: ensure the
detection: 'captioning'
add‑on is enabled for your account/plan -
MP3 not found: verify
public/
exists and the server has write permissions
Wrap‑up
You now have an image‑to‑blog pipeline with Cloudinary + OpenAI: caption → post → audio. Drop it into your content workflow to turn static visuals into dynamic marketing assets.
Repo suggestion: name it cloudinary-react-image-to-blog-ai
. Add the README sections straight from this post and you’re set.
Resources
- Cloudinary React SDK:
@cloudinary/react
,@cloudinary/url-gen
- OpenAI Node SDK:
openai
- React Markdown:
react-markdown
- Dev proxy: Vite
server.proxy
Top comments (1)
This tutorial is such a clever and practical mashup—thank you for putting it together! Turning any image into a 300-word, vertical-aware blog post and offering narrated output? That’s a serious content-creation superpower.
A few things that really stood out:
"Image captioning via Cloudinary"is a smart and reliable starting point—leveraging existing AI tools to bootstrap the workflow.
Thanks for leveling up our content tooling toolkit! Curious—have you explored adding *"multilingual blogging or voice support", or even letting users tweak tone/style parameters before final output?