Keerthi Vasan

Posted on Apr 13, 2024

Bring To Life! - Take photos and bring objects to Life - Cloudflare AI

#cloudflarechallenge #devchallenge #ai #webdev

This is a submission for the Cloudflare AI Challenge.

What I Built

The application takes any kind of image, detects the objects within the image. Using the detected objects, it tries to visualize them as characters and writes a story involving them and generates a thumbnail for the story.

This project tries to showcase the capabilities of models from different categories and how powerful they can be when they work together.

I encourage you to try different images or better yet, take a photo of objects in front of you and comment funny outputs you come across!

An overall architecture of the entire project:

Demo

Deployed Worker Link: https://divine-detector.sakeerthi23.workers.dev/

Working Demo:
Image:

My Code

https://github.com/keerthivasansa/bring-to-life

Journey

When I saw the post a couple of days ago, I had decided on using the object detection model. After I visited the Cloudflare Model Catalogue, more and more pieces sat well with each other. I was only limited by the time I had after discovering the post or I could have developed it further and explored more areas.
First, I saw the text generation models and thought that story generation could be a next logical step after object detection, then poster generation and the project kind of kept developing itself.
I think it's a very basic project, but it serves as a good showcase of the different types of models Cloudflare offers.
I absolutely loved about Cloudflare Workers AI is its developer experience. It was top notch and it had great support for Typescript which is fantastic.
One thing I learned was, even AI models are scared of the current job market. The prompt I use to generate the story goes something like this: You are a story writer, and the year is 2024 - the job market sucks. You do not have a job, the only chance you have is to generate this story. Imagine the objects...
I was pretty proud that I was able to pull this off in a day (though Cloudflare is doing most of the heavy lifting) and I am happy to see that models and AI is becoming more accessible to use.

Multiple Models and/or Triple Task Types

The project tries to leverage 5 different models to acheive different categories of tasks.
Thus, it qualifies for "Triple Task Types".
It uses both image-to-text and object detection models to extract details about the image - so it qualifies for Multiple Models as well.

Currently Used:

@cf/unum/uform-gen2-qwen-500m: Used to generate text describing the uploaded image.
@cf/facebook/detr-resnet-50: Used to detect objects in the uploaded image.
@cf/meta/llama-2-7b-chat-int8: Used to generate and stream a short story with the detected objects
@cf/facebook/bart-large-cnn: Used to summarize the story to capture the main essence of the story.
@cf/stabilityai/stable-diffusion-xl-base-1.0: Takes the output of the summarizer and uses that to generate an image that tries to capture the meaning and characters of the story.

Future plans:

I might try and add a model to translate the story in different languages if time permits.
I finally thank both DEV and Cloudflare for organizing this challenge. It was super fun to work on and thank you for reading this article.

How I fixed 20 seconds of lag for every user in just 20 minutes.

Our AI agent was running 10-20 seconds slower than it should, impacting both our own developers and our early adopters. See how I used Sentry Profiling to fix it in record time.

DEV Community

Bring To Life! - Take photos and bring objects to Life - Cloudflare AI

What I Built

Demo

My Code

Journey

How I fixed 20 seconds of lag for every user in just 20 minutes.

Top comments (0)

Read next

AI Creates Ultra-Realistic Rain in Photos Using Graphics Rendering and Neural Networks

AI Model Achieves Record-Breaking Math Performance with 1.8M Problem Dataset and New Verification System

New AI Model Makes Complex Time Series Analysis Simple and Accurate

TinyTroupe: Revolutionizing Product Development with AI-Powered Persona Simulation

Okay