Why this is cool
Give your content team a superpower: drop in any image → get a well‑formed, vertical‑specific blog post plus audio playback. We’ll combine Cloudinary AI (image captioning) with OpenAI (blog generation + TTS) inside a Vite + React app with an Express backend.
What you’ll build
- Upload an image → Cloudinary generates a caption describing it
- Send that caption to OpenAI → get a 300‑word marketing blog post tailored to the image’s vertical (auto, travel, fashion, etc.)
- Generate an MP3 narration of the post with OpenAI TTS
Demo idea: a red Ferrari image becomes a short, punchy automotive blog post with a play button for audio.
Full repo: Cloudinary-React-Image-to-Blog-AI
Prereqs
- Node 18+
- Free accounts: Cloudinary and OpenAI
- Basic React/JS/Node skills
⚠️ OpenAI billing: add a small credit (\$5–\$10) and a spending cap to avoid surprises.
1) Cloudinary setup
- Create/login → Settings → Product Environments
- Note your Cloud name (e.g.
demo) - API Keys: Settings → Product Environments → API Keys → Generate New API Key
Keep these handy:
CLOUDINARY_CLOUD_NAMECLOUDINARY_API_KEYCLOUDINARY_API_SECRET
We’ll also use the public cloud name on the client (via Vite env).
2) OpenAI setup
- Create/login at platform.openai.com
- Billing → add payment details + monthly limit
- API Keys → Create new secret key
Save your OPENAI_API_KEY in .env (server only).
3) Scaffold the project (Vite + React)
npm create vite@latest image-to-blog-ai -- --template react-swc
cd image-to-blog-ai
npm i
Install deps for client & server:
# client deps
npm i axios react-markdown @cloudinary/react @cloudinary/url-gen
# server deps
npm i express cors cloudinary multer streamifier openai dotenv
# (optional in dev)
npm i -D nodemon
Project layout (single repo, both client + server):
image-to-blog-ai/
├─ index.html
├─ src/
├─ server.js # Express API
├─ public/ # serves speech.mp3
├─ .env # server secrets
├─ vite.config.js
├─ package.json
└─ ...
4) Vite dev proxy (no CORS headaches)
vite.config.js
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
export default defineConfig({
plugins: [react()],
server: {
port: 3000,
proxy: {
'/api': {
target: 'http://localhost:6000', // Express port
changeOrigin: true,
secure: false,
},
},
},
})
5) React UI
Create src/App.jsx. You can find the explanation of the main parts of the code below and the full code example here.
Creating the story
const handleSubmit = async () => {
if (!image) return
const formData = new FormData()
formData.append('image', image)
try {
setLoading(true)
const { data } = await axios.post('/api/caption', formData, {
headers: { 'Content-Type': 'multipart/form-data' },
})
setCaption(data.caption)
setStory(data.story.content)
const cldImg = cld.image(data.public_id)
cldImg.resize(fill().width(500).height(500))
setImage(cldImg)
setError('')
} catch (err) {
console.error(err)
setError(err?.response?.data?.error || err.message)
} finally {
setShouldSubmit(false)
setLoading(false)
}
}
We are going to break the code snippet above to have a better idea of what's going on.
const { data } = await axios.post('/api/caption', formData, {
headers: { 'Content-Type': 'multipart/form-data' },
})
We make a POST call to /api/caption with multipart/form-data. The backend is responsible for uploading the image to Cloudinary, generating a caption of that image using Cloudinary AI technologies, and generating a story via OpenAI.
We set the caption and the story in the state:
setCaption(data.caption)
setStory(data.story.content)
Now let's work with the image uploaded.
const cldImg = cld.image(data.public_id)
cldImg.resize(fill().width(500).height(500))
setImage(cldImg)
The code snippet above shows cld.image(publicId) which creates a CloudinaryImage instance. .resize(fill().width(500).height(500)) applies a fill resize, cropping to maintain aspect ratio while fitting the 500×500 frame. Then we overwrite the image state with the CloudinaryImage instance. This is key: image now stops being a File and becomes a Cloudinary object.
{story && (
<div>
<AudioPlayer text={story} setLoading={setLoading} />
{!loading && <ReactMarkdown>{story}</ReactMarkdown>}
</div>
)}
</div>
)
}
Now the image is shown with the code snippet above. This block renders only when a story exists. The AudioPlayer component receives the full story, generates audio (text-to-speech API from OpenAI), and uses setLoading to pause UI updates while processing. Meanwhile, ReactMarkdown converts the story’s Markdown into formatted HTML, displaying it only when the app isn’t busy (!loading). Together, these pieces let the user upload an image and immediately get a caption, hear the story read aloud, and view it as a polished blog-style post.
The AudioPlayer component
Time to create the AudioPlayer component src/AudioPlayer.jsx. Here's the complete code of the AudioPlayer.jsx file.
The most importat functio of this file is the generate function.
const generate = async () => {
try {
setLoading(true)
const { data } = await axios.post('/api/generate-audio', { text })
setUrl(data.audioUrl)
} finally {
setLoading(false)
}
}
This function sends the story text to the backend to generate audio. When generate runs, it first sets loading to true to indicate processing. It then makes a POST request to /api/generate-audio, passing the text payload. The server responds with an audioUrl, which the component stores in state so the audio player can use it. Finally, in a finally block—ensuring it runs whether the request succeeds or fails—it resets loading to false, keeping the UI state consistent.
src/App.css (grab your own styles, e.g. a centered column, spinner, and a .custom-file-upload button). Here's the complete code of the App.css file.
Tip: Store only the Cloud Name on the client via
VITE_CLOUD_NAME. Keep all secrets on the server.
6) Express backend (Cloudinary + OpenAI)
Create .env in project root:
VITE_CLOUD_NAME=YOUR_CLOUD_NAME
CLOUDINARY_CLOUD_NAME=YOUR_CLOUD_NAME
CLOUDINARY_API_KEY=YOUR_CLOUDINARY_API_KEY
CLOUDINARY_API_SECRET=YOUR_CLOUDINARY_API_SECRET
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
Create server.js in project root:
Here you can find the full code of the server.js file. Now, let's deep dive into the different parts of the server.js.
OpenAI Initialization
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
Creates a reusable OpenAI client for generating text (blog posts) and audio (text-to-speech).
Multer Upload Handling
const storage = multer.memoryStorage()
const upload = multer({
storage,
limits: { fileSize: 8 * 1024 * 1024 },
fileFilter: (_req, file, cb) => {
const ok = /image\/(jpeg|png|webp|gif|bmp|tiff)/i.test(file.mimetype)
cb(ok ? null : new Error('Unsupported file type'), ok)
},
})
This setup configures Multer to store uploaded images directly in memory rather than on disk, while enforcing a maximum file size of 8MB to prevent oversized uploads. It also filters incoming files to allow only valid image MIME types, ensuring users can't upload unsupported formats. With this configuration, the /api/caption endpoint receives the image as req.file.buffer, ready to be processed or uploaded to Cloudinary without ever touching the filesystem.
Helper Function: Upload Image Buffer to Cloudinary
function uploadBufferToCloudinary(buffer) {
return new Promise((resolve, reject) => {
const stream = cloudinary.uploader.upload_stream(
{ detection: 'captioning' },
(error, result) => (error ? reject(error) : resolve(result))
)
streamifier.createReadStream(buffer).pipe(stream)
})
}
This helper function wraps Cloudinary’s upload_stream in a Promise so it can be used with await, making the upload flow cleaner and fully async. It streams the in-memory file buffer into Cloudinary using streamifier, enabling AI caption detection through the detection: 'captioning' option. Once the upload completes, it returns the full Cloudinary response with the public_id, and the automatically generated caption found in info.detection.captioning.data.caption.
Generating the caption
app.post('/api/caption', upload.single('image'), (req, res) => {
if (!req.file) {
return res.status(400).json({ error: 'Image file is required' });
}
const uploadStream = cloudinary.uploader.upload_stream(
{ detection: 'captioning' },
async (error, result) => {
if (error) {
console.error('Cloudinary error:', error);
return res.status(500).json({ error: error.message });
}
const story = await generateBlog(result.info.detection.captioning.data.caption)
const resObj = {
public_id: result.public_id,
caption: result.info.detection.captioning.data.caption,
story
}
res.json(resObj);
}
);
streamifier.createReadStream(req.file.buffer).pipe(uploadStream);
});
This endpoint is reponsabile for uploading the image to Cloudinary via the multer stream and using the captioning AI mechanism to analyze the image and generate a caption. Then this caption gets sent to the generateBlog function to create a blog post using OpenAI APIs.
Generating the blog content
const generateBlog = async(caption) => {
const message = {
role: "user",
content: `create an 300 world blog post to be used as part of a marketing campaign from a business-- the blog must focused on the vertical industry of that image based on the following caption of the image: ${caption}. This blog is not for the business but for the person interested in the vetical industry of the image`
}
try {
const response = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [message], // pass the new message and the previous messages
});
console.log('open ai response', response.choices[0].message);
return response.choices[0].message;
} catch (error) {
console.error(error);
return `error: Internal Server Error`;
}
};
This functions generates the blog based on the instructions passed in the content property of the message object. Then we use the NodeJS OpenAI SDk completion to generate the blog based on the instructions we set and the caption we got from the image we uploaded to Cloudinary.
Generating audio
app.post("/api/generate-audio", async (req, res) => {
try {
const mp3 = await openai.audio.speech.create({
model: "tts-1",
voice: "alloy",
input: req.body.text,
});
const buffer = Buffer.from(await mp3.arrayBuffer());
const filePath = path.resolve(__dirname, "public", "speech.mp3");
await fs.writeFile(filePath, buffer);
res.json({ audioUrl: `/speech.mp3` });
} catch (error) {
console.error("Error generating audio:", error);
res.status(500).json({ error: "Error generating audio" });
}
});
In this endpoint, we are using the OpenAI SDK text-to-speech to generate the audio based on the request text. This text is blog text generated in the generateBlog() function. Once the audio has been generated, we store the audio in the speech.mp3 file, to be used in the front end.
package.json (scripts for both dev servers):
{
"name": "image-to-blog-ai",
"private": true,
"type": "module",
"scripts": {
"dev": "vite",
"start": "node server.js",
"dev:api": "nodemon server.js"
}
}
7) Run it
# Terminal A (API)
npm run dev:api
# API → http://localhost:6000
# Terminal B (Vite)
npm run dev
# Web → http://localhost:3000
Upload an image → watch the caption + blog appear → click Generate Audio to get an MP3.
8) Production & security notes
- Keep secrets server‑side only; never expose API keys in the client
- Add rate limiting (e.g.
express-rate-limit) and basic auth or tokens on/apiroutes - Validate file types and size (shown above); consider virus scanning for public apps
- Cache TTS results per post hash to avoid re‑billing
- Consider the Responses API for future‑proof OpenAI calls; swap
chat.completionswhen you’re ready
Troubleshooting
-
CORS in dev: use the Vite proxy as shown (don’t call
http://localhost:6000directly from the client) -
Cloudinary caption is undefined: ensure the
detection: 'captioning'add‑on is enabled for your account/plan -
MP3 not found: verify
public/exists and the server has write permissions
Wrap‑up
You now have an image‑to‑blog pipeline with Cloudinary + OpenAI: caption → post → audio. Drop it into your content workflow to turn static visuals into dynamic marketing assets.
Resources
- Cloudinary React SDK:
@cloudinary/react,@cloudinary/url-gen - OpenAI Node SDK:
openai - React Markdown:
react-markdown - Dev proxy: Vite
server.proxy
Top comments (1)
This tutorial is such a clever and practical mashup—thank you for putting it together! Turning any image into a 300-word, vertical-aware blog post and offering narrated output? That’s a serious content-creation superpower.
A few things that really stood out:
"Image captioning via Cloudinary"is a smart and reliable starting point—leveraging existing AI tools to bootstrap the workflow.
Thanks for leveling up our content tooling toolkit! Curious—have you explored adding *"multilingual blogging or voice support", or even letting users tweak tone/style parameters before final output?