DEV Community

Cover image for Diving into Object Detection Basics
AI Pool
AI Pool

Posted on

2 1

Diving into Object Detection Basics

Intro

The prospects of Artificial Intelligence (AI) are not just limited to predicting if a person will get a loan or not by giving his credit history, annual income, annual expenses, criminal records, etc. Computer Vision is a trending topic for AI enthusiasts of any experience level.

Let me give you a brief idea about Computer Vision. It is nothing but when a machine identifies an 'object' in an image/video after learning from the data that was fed to it. For example, when we see an object first time, we are mostly aware of it like what it is called. But after getting to know its name, the next time we see it we know exactly what it is. Exactly like our brain, Computer Vision, to be specific Object Detection works.

intro

Introduction of Object Detection

Object detection is locating and identifying an object in an image or in a video. Locating an object is nothing but giving the exact position where the object resides in the frame. (Here frame can be a single image or a sequence of frames that is a video). To locate an object, we can either use a bounding box or any other geometrical shape like a circle. The easiest and standard approach is by using the bounding box, where we first obtain the center coordinates (x, y) and the width (w) and height (h) of the box.

To identify an object, the network must be trained on data, for example, images of the person. This step is called the classification of objects and it is very essential for the bounding box to be formed correctly. To ensure the correct training of the network, ensure the data is correct.

image

Anchor Boxes

How does the network predict or identify the box?

The network first makes a random guess of the coordinates and assigns them a value w for width and h for height. It assigns (0, 0) for the center of the box (x, y). Of course, this is not the actual prediction. So after every step of training, which is termed as an iteration, the network performs regression to get the correct estimates.

Datasets to start with

There are many datasets to start training your first object detection model. These datasets are open source meaning anyone is free to use them. These datasets have a large collection of classes of objects to choose from. So have fun while exploring these datasets

  • COCO Dataset
  • ImageNet
  • Open Image Dataset V6
  • Labelme
  • CelebFaces
  • 50 other datasets

You can find more in the Following Article

Other Resources

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more