How to Build an AI Image Captioning App with Azure AI Vision and Streamlit

AKASH Ramesh CB student — Fri, 07 Nov 2025 08:42:19 +0000

As a developer, I'm always looking for ways to build impactful projects. One of the most powerful applications of AI is its ability to make the digital world more accessible.

I was inspired by Microsoft's mission to empower everyone, so I built a simple web app that helps describe the world for those who are visually impaired.

This application uses Microsoft Azure AI Vision to generate human-readable captions for any image you upload. And the best part? We can build the entire web app in about 30 lines of Python using Streamlit.

Let's get started!

What You'll Need

Python: Make sure you have Python 3.7+ installed.

An Azure Account: You'll need one to create an "AI Vision" resource. You can get a free account to start.

A Few Python Libraries: We'll install them with pip.

pip install streamlit requests pillow

Step 1: Get Your Azure AI Vision Keys

Before we can code, we need to tell Azure who we are.

Go to the Azure portal and click "Create a resource."

Search for "AI Vision" and create one.

Once it's deployed, go to the "Keys and Endpoint" blade.

Copy your VISION_API_KEY (one of the keys) and your VISION_ENDPOINT. We'll need these for our code.

Step 2: The Code Walkthrough

Here is the complete Python script. You can save this as app.py. I'll break down what each part does.

import streamlit as st
import requests
from PIL import Image
import io

# --- 1. SET UP AZURE CREDENTIALS ---
# (Paste your key and endpoint here)
VISION_API_KEY = "YOUR_API_KEY_HERE"
VISION_ENDPOINT = "YOUR_ENDPOINT_HERE"

# (This is the specific API endpoint we'll hit)
analyze_url = f"{VISION_ENDPOINT}/vision/v3.2/analyze"


# --- 2. CREATE THE STREAMLIT UI ---
st.title("🖼️ AI Image Captioning with Azure")
st.write("Upload an image and let Azure's AI describe it for you!")

uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])


# --- 3. RUN THE ANALYSIS ---
if uploaded_file:
    # A. Display the uploaded image
    image = Image.open(uploaded_file)
    st.image(image, caption="Uploaded Image", use_column_width=True)

    # B. Convert image to bytes for the API
    image_bytes = io.BytesIO()
    image.save(image_bytes, format="JPEG")
    image_bytes = image_bytes.getvalue() # Get the byte value

    # C. Set up headers and parameters for the API call
    headers = {
        "Ocp-Apim-Subscription-Key": VISION_API_KEY,
        "Content-Type": "application/octet-stream"
    }
    params = {"visualFeatures": "Description"}

    st.write("Analyzing image...")

    # D. Make the API call to Azure
    response = requests.post(analyze_url, headers=headers, params=params, data=image_bytes)
    response.raise_for_status() # Raise an error if the call fails
    analysis = response.json()

    # E. Display the result
    captions = analysis["description"]["captions"]
    if captions:
        caption = captions[0]["text"]
        confidence = captions[0]["confidence"]

        # Display the caption in a green success box!
        st.success(f"**Caption:** {caption} (Confidence: {confidence:.2f})")
    else:
        st.warning("No caption found for this image.")


Step 3: Run Your App!

Save the code as app.py. Open your terminal in the same folder and run:

streamlit run app.py

Your browser will automatically open, and you'll have a working web app!

Example Output

When you run the app and upload an image, you'll see a result that looks like this:

How It Works

This project is a perfect example of how Microsoft Azure makes complex AI simple. We didn't have to train a single model.

All the heavy lifting is done by the requests.post call. We send the image bytes to the Azure endpoint, and Azure's pre-trained AI model analyzes it and sends us back a simple JSON file with the description.

We just ask for the "Description" feature in our params, and the API does the rest.

Final Thoughts

This simple app shows the power of leveraging cloud-based AI. We built an accessibility tool in just a few minutes that can have a real impact. This same analyze API can also detect objects, read text in images (OCR), and more.

I highly recommend exploring the Azure AI Services. The possibilities are endless.

project url: https://github.com/akash7ashy/vision

Thanks for reading!

DEV Community: AKASH Ramesh CB student

How to Build an AI Image Captioning App with Azure AI Vision and Streamlit