DEV Community

Cover image for Bring To Life! - Take photos and bring objects to Life - Cloudflare AI
Keerthi Vasan
Keerthi Vasan

Posted on

1 1 2 1

Bring To Life! - Take photos and bring objects to Life - Cloudflare AI

This is a submission for the Cloudflare AI Challenge.

What I Built

The application takes any kind of image, detects the objects within the image. Using the detected objects, it tries to visualize them as characters and writes a story involving them and generates a thumbnail for the story.

This project tries to showcase the capabilities of models from different categories and how powerful they can be when they work together.

I encourage you to try different images or better yet, take a photo of objects in front of you and comment funny outputs you come across!

An overall architecture of the entire project:

Architechture

Demo

Deployed Worker Link: https://divine-detector.sakeerthi23.workers.dev/

Working Demo:
Image:

The project generating a story for the given image

Generation of thumbnail for generated story

My Code

https://github.com/keerthivasansa/bring-to-life

Journey

  • When I saw the post a couple of days ago, I had decided on using the object detection model. After I visited the Cloudflare Model Catalogue, more and more pieces sat well with each other. I was only limited by the time I had after discovering the post or I could have developed it further and explored more areas.

  • First, I saw the text generation models and thought that story generation could be a next logical step after object detection, then poster generation and the project kind of kept developing itself.

  • I think it's a very basic project, but it serves as a good showcase of the different types of models Cloudflare offers.

  • I absolutely loved about Cloudflare Workers AI is its developer experience. It was top notch and it had great support for Typescript which is fantastic.

  • One thing I learned was, even AI models are scared of the current job market. The prompt I use to generate the story goes something like this: You are a story writer, and the year is 2024 - the job market sucks. You do not have a job, the only chance you have is to generate this story. Imagine the objects...

  • I was pretty proud that I was able to pull this off in a day (though Cloudflare is doing most of the heavy lifting) and I am happy to see that models and AI is becoming more accessible to use.

Multiple Models and/or Triple Task Types

  • The project tries to leverage 5 different models to acheive different categories of tasks.
  • Thus, it qualifies for "Triple Task Types".
  • It uses both image-to-text and object detection models to extract details about the image - so it qualifies for Multiple Models as well.

Currently Used:

  • @cf/unum/uform-gen2-qwen-500m: Used to generate text describing the uploaded image.
  • @cf/facebook/detr-resnet-50: Used to detect objects in the uploaded image.
  • @cf/meta/llama-2-7b-chat-int8: Used to generate and stream a short story with the detected objects
  • @cf/facebook/bart-large-cnn: Used to summarize the story to capture the main essence of the story.
  • @cf/stabilityai/stable-diffusion-xl-base-1.0: Takes the output of the summarizer and uses that to generate an image that tries to capture the meaning and characters of the story.

Future plans:

  • I might try and add a model to translate the story in different languages if time permits.

  • I finally thank both DEV and Cloudflare for organizing this challenge. It was super fun to work on and thank you for reading this article.

Image of Docusign

Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more