DEV Community

Elijah
Elijah

Posted on

1

Frame fushion

This is a submission for the Cloudflare AI Challenge.

What I Built

For this challenge I built a Video Analysis Tool that uses Cloudflare AI models to analyze frames from a video file. This tool is designed to provide advanced video analysis capabilities, synthesizing information from video frames to offer a deeper understanding of the visual data. It can be used for various purposes, including security surveillance and data analysis.
These frames are first captured and then sent through an API gateway for further analysis by the models.
Analysis involved are as thus:

  1. Get a description of the frame using Image-To-Text model
  2. Get the embeddings of all frames using the cloudflare Text Embeddings model
  3. Also a summary of the description is fetch using the cloudflare Summarizations model
  4. These data is then stored in the database.

A user is able to chat with an AI in the context of the video analyzed. The frames vector embeddings is then used as context data for the AI Text Generation model .

Features

  • Video Upload: Users can upload videos from their local machine for analysis.
  • Frame Analysis: The tool analyzes individual frames of the video to extract and synthesize key information.
  • Scene Analysis: Analyzes scenes to identify different environments or settings in the video.
  • Data Visualization: Provides visualizations of the analysis results for easier interpretation.

Demo

The repositories for this can be found here:
https://github.com/ezecodes/serverless-c3
https://github.com/ezecodes/simple-sockets

My Code

Journey

Initially, I envisioned it as a threat detection tool for CCTV AI surveillance, aiming to enhance security systems. However, as the project evolved, I realized its potential to go beyond security applications and become a versatile video analysis tool.

One of the major challenges I faced was integrating and fine-tuning various ML models to analyze video frames effectively. Understanding and implementing these models required a solid grasp of basic ML concepts, which I had to learn along the way. This learning curve was steep but incredibly rewarding.

As I continue to improve the software, I aim to broaden its scope to encompass various domains. For instance, I envision the tool being used to analyze health video scans, aiding in medical diagnostics and research. This expansion into new domains presents both technical and conceptual challenges, but I am excited about the possibilities it offers.

Multiple Models and/or Triple Task Types
This project utilized multiple models and task types such as
ImageToText @cf/unum/uform-gen2-qwen-500m
VectorEmbedding @cf/baai/bge-base-en-v1.5
Summerisation @cf/facebook/bart-large-cnn
Text Generation

Team member(s) includes - https://dev.to/ezecodes

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry đź‘€

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

đź‘‹ Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay